hadoop学习笔记(三)
一:搭建完全分布式的hadoop
192.168.80.128 hadoop.fengyue.com
192.168.80.129 hadoop.fengyue02.com 00:0c:29:b4:52:98
192.168.80.130 hadoop.fengyue03.com 00:50:56:2b:92:8e
1.进入hadoop.fengyue02.com
vim /etc/udev/rules.d/70-persistent-net.rules 将eth0删除,记录网关;
vim /etc/sysconfig/network-scripts/ifcfg-eth0 将HWADDR地址改为记录的网关.
机器重启,配置静态ip。
2.修改主机名
host hadoop.fengyue02.com; 修改/etc/sysconfig/network hostname
3.配置host
192.168.80.128 hadoop.fengyue.com hadoop.fengyue
192.168.80.129 hadoop.fengyue02.com hadoop.fengyue02
192.168.80.130 hadoop.fengyue03.com hadoop.fengyue03
====================================================================================
HDFS
NameNode
DataNode DataNode DataNode
SecondaryNameNode
YARN
ResourceManager
NodeManager NodeManager NodeManager
MapReduce
JobHistoryServer
配置:
* hdfs
* hadoop-env.sh 配置jdk
* core-site.xml 配置namenode ,临时文件目录 , 垃圾回收
* hdfs-site.xml
* slaves
* yarn
* yarn-env.sh
* yarn-site.xml
* slaves
* MapReduce
* mapred-env.sh
* mapred-site.xml
集群启动:
分别在各自主节点上运行命令:
hadoop.fengyue.com sbin/start-dfs.sh
hadoop.fengyue02.com sbin/start-yarn.sh
注意点:
1.NameNode需要先格式化
2.各自主节点(NameNode,ResourceManager)需要自己对自己ssh-copy-id
3.打开web页面时,需确认防火墙关闭
基准测试:
基本测试:
启动hdfs
启动yarn
启动history
文件上传: bin/hdfs dfs -put /opt/modules/hadoop-2.5.0/wcinput /user/hadoop/fengyue/mapreduce/input/
文件夹创建: bin/hdfs dfs -mkdir -p /user/hadoop/fengyue/mapreduce/input
文件创建: bin/hdfs dfs -touchz /user/hadoop/fengyue/mapreduce/input/wc.input
wordcount运行: bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount/user/hadoop/fengyue/mapreduce/input/wcinput/wc.input /user/hadoop/fengyue/mapreduce/output/wcoutput/
基准测试:
测试集群性能。
*HDFS 读数据,写数据
*YARN 运行一定量的MAP,REDUCE
配置内网中集群各个机器的时间:以hadoop.fengyue.com 为时间服务器。
vim /etc/ntp.conf
vim /etc/sysconfig/ntpd
配置开机启动:
service ntpd status
service ntpd start
chkconfig ntpd on
非时间服务器向时间服务器获取时间
crontab -e
0-59/10 * * * * /usr/sbin/ntpdate hadoop-fengyue.com
Zookeeper:
见笔记。
NameNode HA
HDFS High Availability Using the Quorum Journal Manager
NameNode Active
NameNode StandBy
HA 配置思路:
-
1.将编辑日志文件写入journalNode
-
2.配置两个nameNode(active,standby)
-
3.配置代理,查找可用的namenode
-
4.使得两个namenode的隔离性
系统分布
128 129 130
namenode
namenode
journalNode journalNode journalNode
注意点:关闭防火墙
hdfs-site.xml
<configuration>
<!--dfs.nameservices -->
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<!--dfs.ha.namenodes.[nameservice ID] -->
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<!--dfs.namenode.rpc-address.[nameservice ID].[name node ID]-->
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>hadoop.fengyue.com:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>hadoop.fengyue02.com:8020</value>
</property>
<!--dfs.namenode.http-address.[nameservice ID].[name node ID] -->
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>hadoop.fengyue.com:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>hadoop.fengyue02.com:50070</value>
</property>
<!--dfs.namenode.shared.edits.dir-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop.fengyue.com:8485;hadoop.fengyue02.com:8485;hadoop.fengyue03.com:8485/ns1</value>
</property>
<!--dfs.client.failover.proxy.provider.[nameservice ID] -->
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--dfs.ha.fencing.methods-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/app/hadoop-2.5.0/data/dist_tmp</value>
</property>
HA启动步骤:
1.分别启动每台机器的journalnode。
sbin/hadoop-daemon.sh start journalnode
2.格式化nn1
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
3.在nn2上同步nn1的元数据
bin/hdfs namenode -bootstrapStandby
4.启动nn2
sbin/hadoop-daemon.sh start namenode
5.将nn1切换为active
bin/hdfs haadmin -transitionToActive nn1
6.在nn1上启动所有datanode
sbin/hadoop-daemon.sh start datanode
HA 故障自动转移
HA故障自动转移配置:
hdfs-site.xml
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
core-site.xml
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop.fengyue.com:2181,hadoop.fengyue02.com:2181,hadoop.fengyue03.com:2181</value>
</property>
启动步骤

浙公网安备 33010602011771号