hadoop环境搭建
配置环境:
ubuntu 16.04
主机信息:
机器名 IP地址 作用
muhe221 10.121.63.240 NameNode、JobTracker
caoming 10.121.63.215 DataNode、TaskTracker
1、在两台电脑上分别做以下操作:
sudo useradd hadoop -m
sudo passwd hadoop #这里设置密码为hadoop
$su hadoop #切换到hadoop账户
修改主机名:
修改主机名需要修改两个地方
/etc/hostname
caoming
/etc/hosts
127.0.1.1 caoming #这一行后面将被注释
修改之后重启
sudo /etc/init.d/networking restart
安装jdk 1.8 (省略....)
$ ssh-keygen
需要设置各个节点无密码ssh访问
这里将不同节点的公钥id_rsa.pub内容全部放到authorized_keys中,并拷贝到每个节点电脑的~/.ssh目录下
hadoop@muhe221:~/.ssh$ ls -al
total 24
drwx------ 2 hadoop hadoop 4096 11月 18 01:34 .
drwxr-xr-x 8 hadoop hadoop 4096 11月 18 03:59 ..
-rw-r--r-- 1 hadoop hadoop 794 11月 18 01:34 authorized_keys
-rw------- 1 hadoop hadoop 1679 11月 18 01:34 id_rsa
-rw-r--r-- 1 hadoop hadoop 397 11月 18 01:34 id_rsa.pub
-rw-r--r-- 1 hadoop hadoop 888 11月 18 03:52 known_hosts
设置节点名称,全部节点运行:
$ sudo vi /etc/hosts
#127.0.1.1 muhe221 #这行注释掉,否则muhe221会解析成127.0.0.1
#下面两行是新增的映射
10.121.63.240 muhe221
10.121.63.215 caoming
关闭防火墙
hadoop@muhe221:~/.ssh$ sudo ufw status
[sudo] password for hadoop:
Status: inactive #ubuntu默认关闭防火墙
安装相关的软件
sudo apt-get install pdsh
设置pdsh的rcmd,执行下面内容。
sudo vi /etc/pdsh/rcmd_default #创建/etc/pdsh/rcmd_default,里面填入ssh
ssh
2、安装和配置hadoop
下载地址:
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
压缩包解压到hadoop@muhe221:~/hadoop-3.2.0
hadoop@muhe221:~/hadoop-3.2.0$ vi ./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121 #这里根据实际情况进行修改
配置hadoop-3.2.0/etc/hadoop/workers #hadoop-3.2.0/sbin/start-all.sh根据这个文件的机器名列表依次初始化所有节点
hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi workers
muhe221
caoming
hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://10.121.63.240:9000</value> </property> </configuration>
所有节点上需要创建/home/hadoop/hadoop/tmp目录
hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop/name</value> </property> </configuration>
所有节点上需要创建/home/hadoop/hadoop/name目录
hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration>
hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
这里需要在muhe221上建立相应的目录
hadoop@muhe221:~/hadoop$ ls -al
total 16
drwxrwxr-x 4 hadoop hadoop 4096 11月 18 03:26 .
drwxr-xr-x 8 hadoop hadoop 4096 11月 18 04:48 ..
drwxrwxr-x 3 hadoop hadoop 4096 11月 18 04:00 name
drwxrwxr-x 4 hadoop hadoop 4096 11月 18 04:00 tmp
在另外一台机器上设置
hadoop@caoming:~$ mkdir -p ~/hadoop-3.2.0/etc/hadoop
#将muhe221机器上的~/hadoop-3.2.0/etc/hadoop目录下的文件拷贝到该目录
hadoop@caoming:~/hadoop-3.2.0/etc/hadoop$ scp -r hadoop@muhe221:/home/hadoop/hadoop-3.2.0 .
修改~/hadoop-3.2.0/etc/hadoop/hadoop-env.sh中的JAVA_HOME设置
name和tmp不知道是否要创建(这里没创建,暂时没报错)
另外各主机中配置~/.bashrc
export HADOOP_HOME=~/hadoop-3.2.0
export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3、启动hadoop
在muhe221主库上运行如下命令来格式化hdfs文件系统
hadoop@muhe221:~/hadoop-3.2.0/bin$ ./hadoop namenode -format //如果漏掉会导致NameNode起不来
......
14/08/21 04:51:27 INFO common.Storage: Storage directory /data/hadoop/name has been successfully formatted.
......
上面仅仅要出现“successfully formatted”就表示成功了
adoop@muhe221:~/hadoop/name目录下会多出current文件夹以及里面相关的文件
启动应用:
hadoop@muhe221:~/hadoop-3.2.0/sbin$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [muhe221]
Starting datanodes
Starting secondary namenodes [muhe221]
Starting resourcemanager
resourcemanager is running as process 19304. Stop it first.
Starting nodemanagers
浏览器中访问http://10.121.63.240:8088 可以显示相关状态则表示安装成功
hadoop@muhe221:~/hadoop-3.2.0/sbin$ hdfs dfsadmin -report #查看各节点的使用情况 Configured Capacity: 975964258304 (908.94 GB) Present Capacity: 401822752768 (374.23 GB) DFS Remaining: 401822703616 (374.23 GB) DFS Used: 49152 (48 KB) DFS Used%: 0.00% Replicated Blocks: Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 Erasure Coded Block Groups: Low redundancy block groups: 0 Block groups with corrupt internal blocks: 0 Missing block groups: 0 Low redundancy blocks with highest priority to recover: 0 Pending deletion blocks: 0 ------------------------------------------------- Live datanodes (2): Name: 10.121.63.215:9866 (caoming) Hostname: caoming Decommission Status : Normal Configured Capacity: 487982129152 (454.47 GB) DFS Used: 24576 (24 KB) Non DFS Used: 230488723456 (214.66 GB) DFS Remaining: 232681713664 (216.70 GB) DFS Used%: 0.00% DFS Remaining%: 47.68% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 2 Last contact: Sun Nov 20 00:44:37 CST 2016 Last Block Report: Sun Nov 20 00:35:19 CST 2016 Num of Blocks: 0 Name: 10.121.63.240:9866 (muhe221) Hostname: muhe221 Decommission Status : Normal Configured Capacity: 487982129152 (454.47 GB) DFS Used: 24576 (24 KB) Non DFS Used: 294029447168 (273.84 GB) DFS Remaining: 169140989952 (157.52 GB) DFS Used%: 0.00% DFS Remaining%: 34.66% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Sun Nov 20 00:44:36 CST 2016 Last Block Report: Sun Nov 20 00:35:18 CST 2016 Num of Blocks: 0
NameDode节点上查看运行状态
hadoop@muhe221:~/hadoop$ jps
12146 SecondaryNameNode
12531 NodeManager
11924 DataNode
12924 Jps
12382 ResourceManager
11758 NameNode
DataNode上查看本节点运行状态
hadoop@caoming:~/hadoop$ jps
21108 DataNode
21367 Jps
21242 NodeManager
停止应用:
hadoop@muhe221:~/hadoop-3.2.0/sbin$ ./stop-all.sh
查看运行状态
ResourceManager:
访问群集的所有应用程序的默认端口号为8088
http://10.121.63.240:8088
访问Hadoop的默认端口号为50070
NameNode:
http://10.121.63.240:9870 (Version: hadoop-3.2.0)
http://10.121.63.240:50070 (Version: hadoop-2.7.7)
http://10.121.63.240:8042/node/allContainers
Q1: hadoop is not in the sudoers file. This incident will be reported
sudo vi /etc/sudoers
找到root ALL=(ALL) ALL这一行,一下面增加hadoop ALL=(ALL) ALL(注:hadoop为普通用户的用户名)
Q2: muhe221: rcmd: socket: Permission denied
相关的目录(tmp、name)没有创建
Q3:live note为1,或者jsp显示DataNode未启动
hadoop-3.2.0/etc/hadoop/workers未配置,所以只有nameNode节点初始化了
在hadoop-3.2.0/etc/hadoop/workers中配置节点列表

浙公网安备 33010602011771号