环境:
系统:Ubuntu14.04? hadoop1.1.2
Master? 192.168.0.221
Slave?? 192.168.0.222
在Ubuntu下创建hadoop用户组和用户
2台主机都得操作
1.添加hadoop用户到系统用户
zhang@master:~$?sudo?addgroup?hadoop
zhang@master:~$?sudo?adduser?--ingroup?hadoop?hadoop?????
- 现在只是添加了一个用户hadoop,它并不具备管理员权限,我们给hadoop用户添加权限,打开/etc/sudoers文件
zhang@master:~$?sudo?gedit?/etc/sudoers
在root??ALL=(ALL:ALL)??ALL下添加hadoop??ALL=(ALL:ALL)??ALL
配置SSH,安装Java,hadoop
1、2台主机都得安装ssh
1)?由于Hadoop用ssh通信,先安装ssh. 注意,我先从zhang用户转到了hadoop.
zhang@master:~$?su?-?hadoop
密码:
hadoop@master:~$?sudo?apt-get?install?openssh-server
因为我的机器已安装最新版的ssh,因此这一步实际上什么也没做。
2) 假设ssh安装完成,先启动服务。启动后,可以通过命令查看服务是否正确启动:
hadoop@master:~$?sudo?/etc/init.d/ssh?start
hadoop@master:~$?ps?-e?|grep?ssh
759??????????00:00:00?sshd
1691??????????00:00:00?ssh-agent
12447??????????00:00:00?ssh
12448??????????00:00:00?sshd
12587??????????00:00:00?sshd
hadoop@master:~$
3) 作为一个安全通信协议(ssh生成密钥有rsa和dsa两种生成方式,默认情况下采用rsa方式),使用时需要密码,因此我们要设置成免密码登录,生成私钥和公钥:
hadoop@master:~$?ssh-keygen?-t?rsa?-P?""
hadoop@master:~$ ssh-copy-id -i slave
hadoop@master:~$?ssh-copy-id -i master
hadoop@slave:~$ ssh-keygen -t rsa -P ''
hadoop@slave:~$ ssh-copy-id -i slave
hadoop@slave:~$ ssh-copy-id -i master
安装Java
2台主机都得操作过程参考我另外的文章,centos6.5部署hadoop2.7.3集群文章。Java部署在hadoop用户下面。
hadoop@master:~$ java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
hadoop@master:~$
master安装hadoop-1.1.2
将hadoop解压到到/usr/local/hadoop
hadoop@master:/usr/local$?sudo?tar?xzf?hadoop-1.1.2.tar.gz
hadoop@master:/usr/local$?sudo?mv?hadoop-1.1.2?/usr/local/hadoop
要确保所有的操作都是在用户hadoop下完成的,所以将该hadoop文件夹的属主用户设为hadoop
hadoop@master:/usr/local$?sudo?chown?-R?hadoop:hadoop?hadoop
配置hadoop-env.sh
进入用hadoop用户登录,进入/usr/localhadoop目录,打开conf目录的hadoop-env.sh,添加以下信息:(找到#export JAVA_HOME=...,去掉#,然后加上本机jdk的路径)
export JAVA_HOME=/home/hadoop/jdk1.7.0_79/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:/usr/local/hadoop/bin
并且,让环境变量配置生效source
hadoop@master:/usr/local/hadoop$?source?/usr/local/hadoop/conf/hadoop-env.sh
可以显示Hadoop版本如下
hadoop@master:/usr/local/hadoop$?hadoop?version
Hadoop?1.1.2
hadoop@master:/usr/local/hadoop$
修改hadoop配置
这里需要设定3个文件:core-site.xml hdfs-site.xml mapred-site.xml,都在/usr/local/hadoop/conf目录下
hadoop@master:/usr/local/hadoop$ mkdir tmp data name
hadoop@master:/usr/local/hadoop/conf$ cat masters
192.168.0.221
此文件写入master节点ip,既namenode
hadoop@master:/usr/local/hadoop/conf$ cat slaves
192.168.0.222
此文件写slave 节点ip既datanode节点。
1.编辑三个文件:
1). core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.0.221:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
2).hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/name </value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/data </value>
</property>
</configuration>
3). mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.221:9001</value>
</property>
</configuration>
将hadoop拷贝到slave主机
拷贝hadoop目录到slave主机相关路径,完成后并修改目录权限
hadoop@slave:/usr/local$ sudo mkdir Hadoop
hadoop@master:/usr/local$ scp -r hadoop/* hadoop@slave:/usr/local/hadoop/
启动Hadoop服务
hadoop@master:/usr/local$ source hadoop/conf/hadoop-env.sh
hadoop@master:/usr/local$ cd hadoop/
hadoop@master:/usr/local/hadoop$ hadoop namenode -format
看到下面的信息就说明hdfs文件系统格式化成功了
6/12/09 10:54:40 INFO common.Storage: Storage directory /usr/local/hadoop/name? has been successfully formatted.
16/12/09 10:54:40 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.0.221
************************************************************/
hadoop@master:/usr/local/hadoop$
启动Hadoop
hadoop@master:/usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-namenode-master.out
192.168.0.222: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-datanode-slave.out
192.168.0.221: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-secondarynamenode-master.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-jobtracker-master.out
192.168.0.222: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-hadoop-tasktracker-slave.out
出现如下列表,表明成功
hadoop@master:/usr/local/hadoop$ jps
21199 NameNode
21514 Jps
18565 JobTracker
18490 SecondaryNameNode
hadoop@master:/usr/local/hadoop$
hadoop@slave:/usr/local$ jps
9743 TaskTracker
9618 DataNode
12956 Jps
hadoop@slave:/usr/local$
检查运行状态
所有的设置已完成,Hadoop也启动了,现在可以通过下面的操作来查看服务是否正常,在Hadoop中用于监控集群健康状态的Web界面:
http://192.168.0.221:50030/?- Hadoop 管理介面
http://192.168.0.221:50060/?- Hadoop Task Tracker 状态
http://192.168.0.221:50070/?- Hadoop DFS 状态
测试wordcount
至此,hadoop的分布模式已经安装成功,于是运行一下hadoop自带的例子WordCount来感受以下MapReduce过程:
这时注意程序是在文件系统dfs运行的,创建的文件也都基于文件系统:
hadoop@master:/usr/local/hadoop$ hadoop dfs -mkdir input
hadoop@master:/usr/local/hadoop$
将conf中的文件拷贝到dfs中的input
hadoop@master:/usr/local/hadoop$ hadoop dfs -copyFromLocal conf/* input
查看文件
hadoop@master:/usr/local/hadoop$ hadoop dfs -ls /user/hadoop/input
Found 17 items
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/configuration.xsl
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/core-site.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/core-site.xml_bak
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/fair-scheduler.xml
-rw-r--r--?? 1 hadoop supergroup???????? ?0 2016-12-09 11:02 /user/hadoop/input/hadoop-env.sh
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/hadoop-metrics2.properties
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/hadoop-policy.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/hdfs-site.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/log4j.properties
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/mapred-queue-acls.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/mapred-site.xml
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/masters
-rw-r--r--?? 1 hadoop supergroup ?????????0 2016-12-09 11:02 /user/hadoop/input/slaves
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/ssl-client.xml.example
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/ssl-server.xml.example
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:02 /user/hadoop/input/taskcontroller.cfg
运行WordCount
hadoop@master:/usr/local/hadoop$ hadoop jar hadoop-examples-1.1.2.jar wordcount input output
16/12/09 11:25:12 INFO input.FileInputFormat: Total input paths to process : 16
16/12/09 11:25:12 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/12/09 11:25:12 WARN snappy.LoadSnappy: Snappy native library not loaded
16/12/09 11:25:12 INFO mapred.JobClient: Running job: job_201612091123_0001
16/12/09 11:25:13 INFO mapred.JobClient:? map 0% reduce 0%
16/12/09 11:25:34 INFO mapred.JobClient:? map 25% reduce 0%
16/12/09 11:25:39 INFO mapred.JobClient:? map 31% reduce 0%
16/12/09 11:25:40 INFO mapred.JobClient:? map 37% reduce 0%
16/12/09 11:25:44 INFO mapred.JobClient:? map 43% reduce 12%
16/12/09 11:25:45 INFO mapred.JobClient:? map 50% reduce 12%
16/12/09 11:25:50 INFO mapred.JobClient:? map 62% reduce 12%
16/12/09 11:25:53 INFO mapred.JobClient:? map 62% reduce 16%
16/12/09 11:26:11 INFO mapred.JobClient:? map 100% reduce 100%
16/12/09 11:26:13 INFO mapred.JobClient: Job complete: job_201612091123_0001
16/12/09 11:26:13 INFO mapred.JobClient: Counters: 29
16/12/09 11:26:13 INFO mapred.JobClient:?? Job Counters
16/12/09 11:26:13 INFO mapred.JobClient:???? Launched reduce tasks=1
16/12/09 11:26:13 INFO mapred.JobClient:???? SLOTS_MILLIS_MAPS=95372
省略
显示输出结果
hadoop@master:/usr/local/hadoop$ hadoop dfs -ls output/*
-rw-r--r--?? 1 hadoop supergroup????????? 0 2016-12-09 11:26 /user/hadoop/output/_SUCCESS
drwxr-xr-x?? - hadoop supergroup????????? 0 2016-12-09 11:25 /user/hadoop/output/_logs/history
-rw-r--r--?? 1 hadoop supergroup????? 15826 2016-12-09 11:26 /user/hadoop/output/part-r-00000
当Hadoop结束时,可以通过stop-all.sh脚本来关闭Hadoop的守护进程