hadoop集群环境搭建
物理机环境
192.168.1.200 puroc-centos
192.168.1.201 puroc-centos2
注:物理机内存至少2G
部署规划:
192.168.1.200上部署NameNode、SecondaryNameNode、NodeManager、ResourceManager、DataNode
192.168.1.201上部署DataNode、NodeManager
第一步:物理机配置
配置SSH互信
配置ssh互信后,主机之间ssh不需要输入用户名和密码,配置方式不在此介绍
/etc/hosts
#真实的集群环境请将127.0.0.1 localhost注释掉,伪集群环境需要加上
#127.0.0.1 localhost
192.168.1.200 puroc-centos
192.168.1.201 puroc-centos2
修改/etc/hosts之后需要重启服务器
关闭防火墙,默认开机不启动防火墙
service iptables stop
chkconfig iptables off
第二步:下载hadoop
下载地址:http://hadoop.apache.org/releases.html
我下载的版本是2.7.1,将hadoop-2.7.1.tar.gz上传至服务器
第三步:配置hadoop
在192.168.1.200(namenode)上进行配置,需要修改以下几个配置文件
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://puroc-centos:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/root/pud/hadoop/hadoop-2.7.1/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/root/pud/hadoop/hadoop-2.7.1/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/root/pud/hadoop/hadoop-2.7.1/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>puroc-centos:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>puroc-centos:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>puroc-centos:19888</value> </property> </configuration>
yarn-site.xml
<configuration>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>puroc-centos:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>puroc-centos:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>puroc-centos:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>puroc-centos:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>puroc-centos:8088</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2000</value> </property>
</configuration>
hadoop-env.sh和yarn-env.sh
修改JAVA_HOME
slaves
将所有datanode所在服务器的主机名添加到该文件,如下:
localhost
puroc-centos2
/etc/hosts
将两台服务器的IP和主机名添加到该文件
第四步:拷贝hadoop至其他服务器
将上面配置好的hadoop拷贝至其他服务器
第五步:格式化Namenode
在namenode服务器上执行
bin/hdfs namenode -format
第六步:启停hadoop
在namenode上执行如下命令:
#在namenode上执行该指令,会自动启动所有datanode sbin/start-all.sh #停止所有程序 sbin/stop-all.sh #启动之后可以通过jps命令查看启动的java进程
第七步:监控
可以访问如下两个网址
第八步:验证
执行wordcount这个自带的mapreduce程序,来验证hadoop环境是否搭建成功
#在hdfs上新建文件夹test hdfs dfs -mkdir /test #上传README.txt文件到/test目录下 hdfs dfs -put README.txt /test #执行mapreduce程序 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test output 正常运行时输出结果如下: [root@puroc-centos hadoop-2.7.1]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test output 15/10/23 23:23:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/23 23:23:35 INFO client.RMProxy: Connecting to ResourceManager at puroc-centos/192.168.1.200:8032 15/10/23 23:23:36 INFO input.FileInputFormat: Total input paths to process : 1 15/10/23 23:23:36 INFO mapreduce.JobSubmitter: number of splits:1 15/10/23 23:23:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445667722544_0001 15/10/23 23:23:37 INFO impl.YarnClientImpl: Submitted application application_1445667722544_0001 15/10/23 23:23:37 INFO mapreduce.Job: The url to track the job: http://puroc-centos:8088/proxy/application_1445667722544_0001/ 15/10/23 23:23:37 INFO mapreduce.Job: Running job: job_1445667722544_0001 15/10/23 23:23:46 INFO mapreduce.Job: Job job_1445667722544_0001 running in uber mode : false 15/10/23 23:23:46 INFO mapreduce.Job: map 0% reduce 0% 15/10/23 23:23:55 INFO mapreduce.Job: map 100% reduce 0% 15/10/23 23:24:02 INFO mapreduce.Job: map 100% reduce 100% 15/10/23 23:24:02 INFO mapreduce.Job: Job job_1445667722544_0001 completed successfully 15/10/23 23:24:02 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=1836 FILE: Number of bytes written=234741 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1471 HDFS: Number of bytes written=1306 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=5468 Total time spent by all reduces in occupied slots (ms)=4150 Total time spent by all map tasks (ms)=5468 Total time spent by all reduce tasks (ms)=4150 Total vcore-seconds taken by all map tasks=5468 Total vcore-seconds taken by all reduce tasks=4150 Total megabyte-seconds taken by all map tasks=5599232 Total megabyte-seconds taken by all reduce tasks=4249600 Map-Reduce Framework Map input records=31 Map output records=179 Map output bytes=2055 Map output materialized bytes=1836 Input split bytes=105 Combine input records=179 Combine output records=131 Reduce input groups=131 Reduce shuffle bytes=1836 Reduce input records=131 Reduce output records=131 Spilled Records=262 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=139 CPU time spent (ms)=1230 Physical memory (bytes) snapshot=304541696 Virtual memory (bytes) snapshot=4119244800 Total committed heap usage (bytes)=182194176 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1366 File Output Format Counters Bytes Written=1306
在执行这个mapreduce程序时,可能会遇到如下的问题:
1、出现连接异常
请查看程序是否全部正常启动,防火墙是否已经关闭
2、日志始终停止在map 0% reduce 0%
请查看程序是否全部正常启动,/etc/hosts是否配置正确
3、出现其他异常
请查看程序是否全部正常启动,查看各网元的日志,看是否有异常或错误信息