安装Hadoop
Hadoop的安装模式
Hadoop的安装模式分为三种:单机模式、伪分布模式、全分布模式。
单机模式,这是默认的安装模式,也是占用资源最少的模式,配置文件不用修改。完全运行在本地,不与其它节点交互,也不使用Hadoop文件系统,不加载任何守护进程,主要用于开发调试MapReduce应用程序。
伪分布模式,即“单节点集群”模式,所有的守护进程都运行在同一台机子上。这种模式增加了代码调试功能,可以查看内存、HDFS的输入/输出,以及与其它守护进程之间的交互。
全分布模式,真正的分布式集群配置,用于生产环境。
前期学习,小讲建议还是使用伪分布模式安装,方便调试也省去了很多分布式环境下的麻烦问题干扰了自己的学习主线。本课程中,小讲会在中级阶段为同学安排分布式集群安装及维护的课程,现在我们的目标就是安装一个单节点的伪分布模式的Hadoop运行环境。
安装环境准备
1台电脑,硬件方面:内存建议4G以上最低2G,空余硬盘30G以上。电脑配置比较这低也没关系,小讲的独门秘方在后面。先仔细往下看。
Hadoop目前只能在Linux环境下运行,小讲知道,我们几乎所有同学都是在Windows下学习和工作的,那么,我们要么用Cygwin在Windows下模拟Linux环境,要么搭Linux虚拟机(ps.其实还有一个偷懒的捷径,为了让大家好好学习,小讲后面再说)。Cygwin模拟的方案还是不要用了,要学习我们就正式一点,还是踏踏实实玩Linux虚拟机吧。
1.首先,在Windows上安装VMware Workstation,这个小讲就不细说了,不会的同学看这里。
2.在VMware中安装一台Linux虚拟机,建议选用Centos6.5,不会的同学看这里。
3.在Linux虚拟机上安装JDK1.7或以上版本,不会的同学看这里。
安装配置
1、配置hosts,假设你linux的地址是192.168.1.123,且在windows中能够正常访问.用root用户修改/etc/hosts文件,添加如下解析
- 192.168.1.123 single.hadoop.dajiangtai.com
因Hadoop使用端口比较多,建议关闭防火墙避免出现不必要的问题,生产环境中可以对相应端口做安全控制
- [root@single-hadoop-dajiangtai-com ~]# service iptables stop
- [root@single-hadoop-dajiangtai-com ~]# chkconfig iptables off
2、准备Hadoop专用用户和组
- [root@single-hadoop-dajiangtai-com ~]# groupadd hadoop //创建用户组
- [root@single-hadoop-dajiangtai-com ~]# useradd -g hadoop hadoop //新建hadoop用户并增加到hadoop工作组
- [root@single-hadoop-dajiangtai-com ~]# passwd hadoop //设置密码
- //为hadoop设置密码:dajiangtai
3、下载并解压Hadoop2.2.0,小讲是把文件放到/usr/java/目录下,注意:下载前确认一下你的Linux系统是64位系统还是32位系统,分别下载对应的版本,如果下载错了,后面会有很多问题
32位系统:
- [root@single-hadoop-dajiangtai-com ~]$ cd /usr/java/ //root用户下,将 hadoop 下载到/usr/java目录下
- [root@single-hadoop-dajiangtai-com java]$ wget http://hadoop.f.dajiangtai.com/hadoop2.2/hadoop-2.2.0.tar.gz //在线下载 hadoop-2.2.0.tar.gz
- [root@single-hadoop-dajiangtai-com java]$ tar zxvf hadoop-2.2.0.tar.gz //解压
- [root@single-hadoop-dajiangtai-com java]$ chown -R hadoop:hadoop hadoop-2.2.0 //将hadoop-2.2.0操作权限赋给hadoop用户
- [root@single-hadoop-dajiangtai-com java]$ rm hadoop-2.2.0.tar.gz //删除hadoop安装包
64位系统:
- [root@single-hadoop-dajiangtai-com ~]$ cd /usr/java/
- [root@single-hadoop-dajiangtai-com java]$ wget http://hadoop.f.dajiangtai.com/hadoop2.2/hadoop-2.2.0-x64.tar.gz //在线下载 hadoop-2.2.0-x64.tar.gz
- [root@single-hadoop-dajiangtai-com java]$ tar zxvf hadoop-2.2.0-x64.tar.gz //解压
- [root@single-hadoop-dajiangtai-com java]$ chown -R hadoop:hadoop hadoop-2.2.0-x64 //将hadoop-2.2.0-x64操作权限赋给hadoop用户
- [root@single-hadoop-dajiangtai-com java]$ rm hadoop-2.2.0-x64.tar.gz //删除hadoop安装包
4、创建Hadoop数据目录。
- [root@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ mkdir -p /data/dfs/name
- [root@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ mkdir -p /data/dfs/data
- [root@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ mkdir -p /data/tmp
- [root@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ chown -R hadoop:hadoop /data //将/data文件权限赋给hadoop
5、修改etc/hadoop/core-site.xml配置文件,添加如下信息。
[hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ vi etc/hadoop/core-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://single.hadoop.dajiangtai.com:9000</value>
- </property>
- <property>
- <name>io.file.buffer.size</name>
- <value>131072</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>file:/data/tmp</value>
- <description>Abase for other temporary directories.</description>
- </property>
- <property>
- <name>hadoop.proxyuser.hduser.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.hduser.groups</name>
- <value>*</value>
- </property>
- </configuration>
6、修改etc/hadoop/hdfs-site.xml配置文件,添加如下信息。
[hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ vi etc/hadoop/hdfs-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <configuration>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/data/dfs/name</value>
- <description>Determineswhere on the local filesystem the DFS name node should store the name table. Ifthis is a comma-delimited list of directories then the name table is replicatedin all of the directories, for redundancy. </description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>/data/dfs/data</value>
- <description>Determineswhere on the local filesystem an DFS data node should store its blocks. If thisis a comma-delimited list of directories, then data will be stored in all nameddirectories, typically on different devices.Directories that do not exist areignored.
- </description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.permissions</name>
- <value>false</value>
- </property>
- </configuration>
7、修改etc/hadoop/mapred-site.xml配置文件,添加如下信息。
[hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ vi etc/hadoop/mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- </configuration>
8、修改etc/hadoop/yarn-site.xml配置文件,添加如下信息。
[hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ vi etc/hadoop/yarn-site.xml
- <?xml version="1.0"?>
- <configuration>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- </configuration>
9、设置etc/hadoop/slaves,添加如下信息。
[hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ vi etc/hadoop/slaves
- single.hadoop.dajiangtai.com
10、设置Hadoop环境变量,使用root账户创建/etc/profile.d/hadoop.sh 文件,并输入如下内容
- [root@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]# vi /etc/profile.d/hadoop.sh
32位系统
- HADOOP_HOME=/usr/java/hadoop-2.2.0
- PATH=$HADOOP_HOME/bin:$PATH
- export HADOOP_HOME PATH
64位系统
- HADOOP_HOME=/usr/java/hadoop-2.2.0-x64
- PATH=$HADOOP_HOME/bin:$PATH
- export HADOOP_HOME PATH
SSH无密码验证配置
由于hadoop需要无密码登录作为datanode的节点,而由于部署单节点的时候,当前节点既是namenode又是datanode,所以此时需要生成无密码登录的ssh。方法如下:
- [root@single-hadoop-dajiangtai-com ~]# su hadoop //切换到 hadoop用户下
- [hadoop@single-hadoop-dajiangtai-com root]$ cd //直接输入cd会自动切换到/home/hadoop根目录下
- [hadoop@single-hadoop-dajiangtai-com ~]$ mkdir .ssh //创建‘.ssh’目录
- [hadoop@single-hadoop-dajiangtai-com ~]$ ssh-keygen -t rsa //输入此命令后,一直按 Enter 键
- [hadoop@single-hadoop-dajiangtai-com ~]$ cd .ssh //切换到 .ssh 目录下
- [hadoop@single-hadoop-dajiangtai-com .ssh]$ cp id_rsa.pub authorized_keys //把生成的 id_rsa.pub 复制一份,命名为authorized_keys
- [hadoop@single-hadoop-dajiangtai-com .ssh]$ cd .. //后退到根目录下
- [hadoop@single-hadoop-dajiangtai-com ~]$ chmod 700 .ssh //.ssh 文件夹权限必须是 700
- [hadoop@single-hadoop-dajiangtai-com ~]$ chmod 600 .ssh/* //“.ssh” 里面的文件
- [hadoop@single-hadoop-dajiangtai-com ~]$ ssh single.hadoop.dajiangtai.com //第一次登陆需要密码,第二次以后登录就不需要密码,此时表明设置成功
测试运行
1、格式化文件系统,并启动Hadoop
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop namenode -format
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ sbin/start-all.sh
2、检查一个各个守护进程是否已经启动,在linux命令行中执行jps命令,如果成功会看到以下几个进程
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ jps
- 2336 Jps
- 1819 SecondaryNameNode
- 1561 NameNode
- 2052 NodeManager
- 1951 ResourceManager
- 1653 DataNode
3、在Windows上可以通过 http://192.168.1.123:50070 访问WebUI,查看NameNode,集群、文件系统的状态。
4、创建一个文件到HDFS中,执行:bin/hadoop fs -mkdir /dajiangtai 命令
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop fs -mkdir /dajiangtai
查看HDFS文件系统,执行:bin/hadoop fs -ls / 命令,出现如下结果:
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop fs -ls /
- Found 1 items
- drwxr-xr-x - hadoop supergroup 0 2015-01-16 23:39 /dajiangtai
5、执行自带的wordcount程序,首先在/home/hadoop用户目录下建一个djt.txt文件,内容如下。
- Hi this is Dajiangtai
- Dajiangtai is an IT study platform
- Hello hadoop
- Hello Dajiangtai
把创建好的djt.txt文件上传到HDFS系统的/dajiangtai目录,命令如下:
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop fs -put /home/hadoop/djt.txt /dajiangtai
查看文件是否上传成功,结果如下:
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop fs -ls /dajiangtai
- Found 1 items
- -rw-r--r-- 1 hadoop supergroup 87 2015-01-16 23:57 /dajiangtai/djt.txt
让word count程序run起来,结果如下:
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /dajiangtai/djt.txt /dajiangtai/wordcount-out
- 15/01/17 00:09:15 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
- 15/01/17 00:09:16 INFO input.FileInputFormat: Total input paths to process : 1
- 15/01/17 00:09:16 INFO mapreduce.JobSubmitter: number of splits:1
- 15/01/17 00:09:16 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
- 15/01/17 00:09:16 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
- 15/01/17 00:09:16 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1421422334389_0001
- 15/01/17 00:09:17 INFO impl.YarnClientImpl: Submitted application application_1421422334389_0001 to ResourceManager at /0.0.0.0:8032
- 15/01/17 00:09:17 INFO mapreduce.Job: The url to track the job: http://single-hadoop-dajiangtai-com:8088/proxy/application_1421422334389_0001/
- 15/01/17 00:09:17 INFO mapreduce.Job: Running job: job_1421422334389_0001
- 15/01/17 00:09:26 INFO mapreduce.Job: Job job_1421422334389_0001 running in uber mode : false
- 15/01/17 00:09:26 INFO mapreduce.Job: map 0% reduce 0%
- 15/01/17 00:09:34 INFO mapreduce.Job: map 100% reduce 0%
- 15/01/17 00:09:46 INFO mapreduce.Job: map 100% reduce 100%
- 15/01/17 00:09:47 INFO mapreduce.Job: Job job_1421422334389_0001 completed successfully
- 15/01/17 00:09:47 INFO mapreduce.Job: Counters: 43
- File System Counters
- FILE: Number of bytes read=122
- FILE: Number of bytes written=158667
- FILE: Number of read operations=0
- FILE: Number of large read operations=0
- FILE: Number of write operations=0
- HDFS: Number of bytes read=211
- HDFS: Number of bytes written=76
- HDFS: Number of read operations=6
- HDFS: Number of large read operations=0
- HDFS: Number of write operations=2
- Job Counters
- Launched map tasks=1
- Launched reduce tasks=1
- Data-local map tasks=1
- Total time spent by all maps in occupied slots (ms)=6964
- Total time spent by all reduces in occupied slots (ms)=8222
- Map-Reduce Framework
- Map input records=4
- Map output records=14
- Map output bytes=143
- Map output materialized bytes=122
- Input split bytes=124
- Combine input records=14
- Combine output records=10
- Reduce input groups=10
- Reduce shuffle bytes=122
- Reduce input records=10
- Reduce output records=10
- Spilled Records=20
- Shuffled Maps =1
- Failed Shuffles=0
- Merged Map outputs=1
- GC time elapsed (ms)=130
- CPU time spent (ms)=1640
- Physical memory (bytes) snapshot=304041984
- Virtual memory (bytes) snapshot=1687367680
- Total committed heap usage (bytes)=136646656
- Shuffle Errors
- BAD_ID=0
- CONNECTION=0
- IO_ERROR=0
- WRONG_LENGTH=0
- WRONG_MAP=0
- WRONG_REDUCE=0
- File Input Format Counters
- Bytes Read=87
- File Output Format Counters
- Bytes Written=76
查看运行结果,显示如下:
- [hadoop@single-hadoop-dajiangtai-com hadoop-2.2.0-x64]$ bin/hadoop fs -text /dajiangtai/wordcount-out/part-r-00000
- Dajiangtai 3
- Hello 2
- Hi 1
- IT 1
- an 1
- hadoop 1
- is 2
- platform 1
- study 1
- this 1
如果以上都OK,那么恭喜你,你的Hadoop伪分布运行环境已经搭建成功了!

浙公网安备 33010602011771号