centos6.5/7.2 上Hadoop2.7.3集群部署
操作系统 hadoop版本 jdk版本
Centos6.5/7.2 hadoop2.7.3 jdk-7u79-linux-x64.gz
主机列表:
master ip: 192.168.0.251
slave ip: 192.168.0.253
设置hosts文件,2台主机保持一样
[root@master ~]# cat /etc/hosts
192.168.0.251 master
192.168.0.253 slave
http://192.168.0.251:8088
SSH 免密码登录安装,配置
分别在master,slave上操作: [root@master ~]# ssh-keygen -t rsa 敲击enter 直到结束。 [root@master ~]# ssh-copy-id -i slave [root@master ~]# ssh-copy-id -i master [root@slave~]# ssh-copy-id -i slave [root@slave ~]# ssh-copy-id -i masterJDK的安装与卸载
2台主机同样的操作 卸载 JDK # 检查当前操作系统默认安装的JDK rpm -qa|grep jdk 若是有安装的话,则卸载 rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 rpm -e --nodeps java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64 安装JDK 在home目录下创建java目录并且下载JDK,然后解压到 /home/java 目录下 cd /home mkdir java cd java tar -zxvf jdk-7u79-linux-x64.gz 编辑 vim /etc/profile 文件并且在末尾追加 export JAVA_HOME=/home/java/jdk1.7.0_79 export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export PATH=$PATH:$JAVA_HOME/bin 在不重新启动操作系统的情况下使 /etc/profile 文件生效 source /etc/profile 检查java的安装状态 [root@master java]# java -version java version "1.7.0_79"Java(TM) SE Runtime Environment (build 1.7.0_79-b15)Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)[root@master java]#Master主机上安装 Hadoop2.7.3
下载Hadoop-2.7.3 mkdir /usr/hadoop wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz 在/usr/hadoop解压 hadoop-2.7.3.tar.gz tar -zxvf hadoop-2.7.3.tar.gz [root@master hadoop]# pwd /usr/hadoop [root@master hadoop]# mkdir -p dfs/name [root@master hadoop]# mkdir -p dfs/data [root@master hadoop]# mkdir tmp 配置hadoop的环境变量,分别在2台主机的/etc/profile下追加 vim /etc/profile 修改PATH,追加hadoop变量设置 export PATH=$PATH: /usr/Hadoop/hadoop-2.7.3/sbin: /usr/Hadoop/hadoop-2.7.3/binsource /etc/profile使其生效修改环境配置文件
以下所有操作均在cd /usr/Hadoop/hadoop-2.7.3 目录下 修改 etc/hadoop/hadoop-env.sh yarn-env.sh mapred-env.sh文件 增加JAVA_HOME记录 # The java implementation to use.#export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/home/java/jdk1.7.0_79修改hadoop相关配置文件
修改 etc/hadoop/core-site.xml 文件 <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> </configuration> 修改 etc/hadoop/hdfs-site.xml 文件 <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:9001</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration> 修改 etc/hadoop/yarn-site.xml 文件 <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration> 修改 etc/hadoop/mapred-site.xml 文件默认没有,则cp mapred-site.xml.example 一份 <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration> 修改 etc/hadoop/slaves 文件,添加Slave [root@master hadoop]# cat slaves slave拷贝hadoop到slave主机
打包文件夹 /usr/hadoop/hadoop-2.7.3 ,复制到 slave节点机同样的目录,保证节点机环境配置与master保持一致 scp –r /usr/hadoop root@slave:/usr/格式化集群
在master执行 [root@master hadoop]hdfs namenode –format 看到 successfully 字段就成功了。其显示不是很明显,请仔细查看。启动集群
启动文件服务 [root@master hadoop-2.7.3]# pwd /usr/hadoop/hadoop-2.7.3 [root@master hadoop-2.7.3]# sbin/start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [master]master: starting namenode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-namenode-master.outslave: starting datanode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-datanode-slave.outStarting secondary namenodes [master]master: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-master.outstarting yarn daemonsstarting resourcemanager, logging to /usr/hadoop/hadoop-2.7.3/logs/yarn-root-resourcemanager-master.outslave: starting nodemanager, logging to /usr/hadoop/hadoop-2.7.3/logs/yarn-root-nodemanager-slave.out 执行这个命令后,会启动namenode(master),datanode(slave)相关进程。验证集群
分别在各个主机上执行 jps 查看服务情况 [root@master hadoop-2.7.3]# jps 24830 SecondaryNameNode 25252 Jps 24635 NameNode 24993 ResourceManager [root@master hadoop-2.7.3]# [root@slave hadoop-2.7.3]# jps 15844 Jps 15596 DataNode 15713 NodeManager测试hadoop
创建用户文件系统文件夹 [root@master hadoop-2.7.3]# hdfs dfs -mkdir /input [root@master hadoop-2.7.3]# hdfs dfs -ls / Found 2 items drwxr-xr-x - root supergroup 0 2016-12-08 16:32 /input drwx------ - root supergroup 0 2016-12-06 14:06 /tmp 复制文件本地文件到分布式文件系统 input 下 [root@master hadoop-2.7.3]# hdfs dfs -put etc/hadoop/* /input 执行wordcound [root@master hadoop-2.7.3]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output 16/12/08 16:35:06 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.0.251:8032 16/12/08 16:35:06 INFO input.FileInputFormat: Total input paths to process : 31 16/12/08 16:35:06 INFO mapreduce.JobSubmitter: number of splits:31 16/12/08 16:35:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1481185761339_0001 16/12/08 16:35:07 INFO impl.YarnClientImpl: Submitted application application_1481185761339_0001 16/12/08 16:35:07 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1481185761339_0001/ 16/12/08 16:35:07 INFO mapreduce.Job: Running job: job_1481185761339_0001 16/12/08 16:35:13 INFO mapreduce.Job: Job job_1481185761339_0001 running in uber mode : false 16/12/08 16:35:13 INFO mapreduce.Job:? map 0% reduce 0% 16/12/08 16:35:20 INFO mapreduce.Job:? map 19% reduce 0% 16/12/08 16:35:24 INFO mapreduce.Job:? map 23% reduce 0% 16/12/08 16:35:25 INFO mapreduce.Job:? map 26% reduce 0% 省略 16/12/08 16:35:44 INFO mapreduce.Job:? map 94% reduce 30% 16/12/08 16:35:45 INFO mapreduce.Job:? map 97% reduce 30% 16/12/08 16:35:46 INFO mapreduce.Job:? map 100% reduce 30% 16/12/08 16:35:47 INFO mapreduce.Job:? map 100% reduce 100% 省略 CPU time spent (ms)=11580 Physical memory (bytes) snapshot=8510996480 Virtual memory (bytes) snapshot=28232007680 Total committed heap usage (bytes)=6442450944 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=81346 File Output Format Counters Bytes Written=37663 [root@master hadoop-2.7.3]# 查看分布式文件系统文件内容 [root@master hadoop-2.7.3]# hdfs dfs -ls /output Found 2 items -rw-r--r-- 1 root supergroup 0 2016-12-08 16:35 /output/_SUCCESS -rw-r--r-- 1 root supergroup 37663 2016-12-08 16:35 /output/part-r-00000 查看结果值 [root@master hadoop-2.7.3]# hdfs dfs -cat /output/part-r-00000web 访问页面
http://192.168.0.251:50070/
http://192.168.0.251:8088
浙公网安备 33010602011771号