hadoop环境搭建

配置环境:
ubuntu 16.04

主机信息:
机器名 IP地址 作用
muhe221 10.121.63.240 NameNode、JobTracker
caoming 10.121.63.215 DataNode、TaskTracker

1、在两台电脑上分别做以下操作:
sudo useradd hadoop -m
sudo passwd hadoop #这里设置密码为hadoop

$su hadoop #切换到hadoop账户

修改主机名:
修改主机名需要修改两个地方
/etc/hostname
caoming
/etc/hosts
127.0.1.1 caoming #这一行后面将被注释
修改之后重启
sudo /etc/init.d/networking restart

安装jdk 1.8 (省略....)

$ ssh-keygen
需要设置各个节点无密码ssh访问
这里将不同节点的公钥id_rsa.pub内容全部放到authorized_keys中,并拷贝到每个节点电脑的~/.ssh目录下
hadoop@muhe221:~/.ssh$ ls -al
total 24
drwx------ 2 hadoop hadoop 4096 11月 18 01:34 .
drwxr-xr-x 8 hadoop hadoop 4096 11月 18 03:59 ..
-rw-r--r-- 1 hadoop hadoop 794 11月 18 01:34 authorized_keys
-rw------- 1 hadoop hadoop 1679 11月 18 01:34 id_rsa
-rw-r--r-- 1 hadoop hadoop 397 11月 18 01:34 id_rsa.pub
-rw-r--r-- 1 hadoop hadoop 888 11月 18 03:52 known_hosts

设置节点名称,全部节点运行:
$ sudo vi /etc/hosts
#127.0.1.1 muhe221 #这行注释掉,否则muhe221会解析成127.0.0.1
#下面两行是新增的映射
10.121.63.240 muhe221
10.121.63.215 caoming

关闭防火墙
hadoop@muhe221:~/.ssh$ sudo ufw status
[sudo] password for hadoop:
Status: inactive #ubuntu默认关闭防火墙

安装相关的软件
sudo apt-get install pdsh
设置pdsh的rcmd,执行下面内容。
sudo vi /etc/pdsh/rcmd_default #创建/etc/pdsh/rcmd_default,里面填入ssh
ssh

2、安装和配置hadoop
下载地址:
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz

压缩包解压到hadoop@muhe221:~/hadoop-3.2.0
hadoop@muhe221:~/hadoop-3.2.0$ vi ./etc/hadoop/hadoop-env.sh
export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121 #这里根据实际情况进行修改

配置hadoop-3.2.0/etc/hadoop/workers   #hadoop-3.2.0/sbin/start-all.sh根据这个文件的机器名列表依次初始化所有节点

hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi workers
muhe221
caoming

hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi core-site.xml

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://10.121.63.240:9000</value>
    </property>
</configuration>

所有节点上需要创建/home/hadoop/hadoop/tmp目录

hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/hadoop/hadoop/name</value>
    </property>
</configuration>

所有节点上需要创建/home/hadoop/hadoop/name目录

hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi mapred-site.xml

 <configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

hadoop@muhe221:~/hadoop-3.2.0/etc/hadoop$ vi yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

这里需要在muhe221上建立相应的目录
hadoop@muhe221:~/hadoop$ ls -al
total 16
drwxrwxr-x 4 hadoop hadoop 4096 11月 18 03:26 .
drwxr-xr-x 8 hadoop hadoop 4096 11月 18 04:48 ..
drwxrwxr-x 3 hadoop hadoop 4096 11月 18 04:00 name
drwxrwxr-x 4 hadoop hadoop 4096 11月 18 04:00 tmp

在另外一台机器上设置
hadoop@caoming:~$ mkdir -p ~/hadoop-3.2.0/etc/hadoop
#将muhe221机器上的~/hadoop-3.2.0/etc/hadoop目录下的文件拷贝到该目录
hadoop@caoming:~/hadoop-3.2.0/etc/hadoop$ scp -r hadoop@muhe221:/home/hadoop/hadoop-3.2.0  .
修改~/hadoop-3.2.0/etc/hadoop/hadoop-env.sh中的JAVA_HOME设置
name和tmp不知道是否要创建(这里没创建,暂时没报错)

另外各主机中配置~/.bashrc

export HADOOP_HOME=~/hadoop-3.2.0

export JAVA_HOME=/home/muhe221/soft/jdk1.8.0_121
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

3、启动hadoop
在muhe221主库上运行如下命令来格式化hdfs文件系统
hadoop@muhe221:~/hadoop-3.2.0/bin$ ./hadoop namenode -format        //如果漏掉会导致NameNode起不来
......
14/08/21 04:51:27 INFO common.Storage: Storage directory /data/hadoop/name has been successfully formatted.
......
上面仅仅要出现“successfully formatted”就表示成功了

adoop@muhe221:~/hadoop/name目录下会多出current文件夹以及里面相关的文件

启动应用:
hadoop@muhe221:~/hadoop-3.2.0/sbin$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [muhe221]
Starting datanodes
Starting secondary namenodes [muhe221]
Starting resourcemanager
resourcemanager is running as process 19304. Stop it first.
Starting nodemanagers

浏览器中访问http://10.121.63.240:8088 可以显示相关状态则表示安装成功

hadoop@muhe221:~/hadoop-3.2.0/sbin$ hdfs dfsadmin -report   #查看各节点的使用情况
Configured Capacity: 975964258304 (908.94 GB)
Present Capacity: 401822752768 (374.23 GB)
DFS Remaining: 401822703616 (374.23 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: 10.121.63.215:9866 (caoming)
Hostname: caoming
Decommission Status : Normal
Configured Capacity: 487982129152 (454.47 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 230488723456 (214.66 GB)
DFS Remaining: 232681713664 (216.70 GB)
DFS Used%: 0.00%
DFS Remaining%: 47.68%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 2
Last contact: Sun Nov 20 00:44:37 CST 2016
Last Block Report: Sun Nov 20 00:35:19 CST 2016
Num of Blocks: 0

Name: 10.121.63.240:9866 (muhe221)
Hostname: muhe221
Decommission Status : Normal
Configured Capacity: 487982129152 (454.47 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 294029447168 (273.84 GB)
DFS Remaining: 169140989952 (157.52 GB)
DFS Used%: 0.00%
DFS Remaining%: 34.66%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Nov 20 00:44:36 CST 2016
Last Block Report: Sun Nov 20 00:35:18 CST 2016
Num of Blocks: 
NameDode节点上查看运行状态
hadoop@muhe221:~/hadoop$ jps
12146 SecondaryNameNode
12531 NodeManager
11924 DataNode
12924 Jps
12382 ResourceManager
11758 NameNode
DataNode上查看本节点运行状态
hadoop@caoming:~/hadoop$ jps
21108 DataNode
21367 Jps
21242 NodeManager

停止应用:
hadoop@muhe221:~/hadoop-3.2.0/sbin$ ./stop-all.sh

查看运行状态

ResourceManager:
访问群集的所有应用程序的默认端口号为8088
http://10.121.63.240:8088

访问Hadoop的默认端口号为50070
NameNode:
http://10.121.63.240:9870 (Version: hadoop-3.2.0)
http://10.121.63.240:50070 (Version: hadoop-2.7.7)

http://10.121.63.240:8042/node/allContainers

 

Q1: hadoop is not in the sudoers file. This incident will be reported
sudo vi /etc/sudoers
找到root ALL=(ALL) ALL这一行,一下面增加hadoop ALL=(ALL) ALL(注:hadoop为普通用户的用户名)

Q2: muhe221: rcmd: socket: Permission denied
相关的目录(tmp、name)没有创建

Q3:live note为1,或者jsp显示DataNode未启动
hadoop-3.2.0/etc/hadoop/workers未配置,所以只有nameNode节点初始化了
在hadoop-3.2.0/etc/hadoop/workers中配置节点列表

posted @ 2019-02-20 20:59  牧 天  阅读(340)  评论(0)    收藏  举报