Hapoop 搭建 (七)HDFS HA搭建

Hapoop 搭建 (三)hadoop集群搭建

Hapoop 搭建 (五)搭建zookeeper集群环境

本文是在hadoop搭建的基础上进行改造

 先对之前三个节点的配置、数据进行备份 

/opt/modules/hadoop-2.8.2/etc/hadoop 文件夹

/opt/modules/hadoop-2.8.2/tmp文件夹   

cp -r hadoop/ backup-hadoop
cp
-r tmp/ backup-tmp

   

 

一、hdfs-site.xml 文件配置

<configuration>
  <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <!-- mycluster 为自定义的值,下方配置要使用改值 -->
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
    <!-- 配置两个NameNode的标示符 -->
   <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
    </property>
    <!-- 配置两个NameNode 所在节点与访问端口 -->
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>centos01:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>centos02:8020</value>
    </property>
    <!-- 配置两个NameNode 的web页面访问地址 -->
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>centos01:50070</value>
    </property>
   <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>centos02:50070</value>
    </property>
    <!-- 设置一组JournalNode的URL地址 -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://centos01:8485;centos02:8485;centos03:8485/mycluster</value>
    </property>
    <!-- JournalNode用于存放元数据和状态的目录 -->
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/modules/hadoop-2.8.2/tmp/dfs/jn</value>
    </property>
    <!-- 客户端与NameNode通讯的类 -->
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <!-- 解决HA集群隔离问题 -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
    </property>
   <!-- 上述ssh通讯使用的密钥文件 -->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
       <!-- hadoop为当前用户名 -->
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>

    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/data</value>
    </property>
</configuration>

 

二、core-site.xml 文件配置

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <!-- <value>hdfs://centos01:9000</value> -->
        <value>hdfs://mycluster</value>
    </property>
    <property>
        <name>hadoop.temp.dir</name>
        <value>file:/opt/modules/hadoop-2.8.2/tmp</value>
    </property>
</configuration>

 hdfs://centos01:9000 改为 hdfs://mycluster Hadoop启动时会找到对应的两个NameNode

将 hdfs-stie.xml 与 core-site.xml 发送到另两个节点

scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/

 

 

三、启动与测试

1、启动JournalNode进程

删除各个节点$HADOOP_HOME/tmp目录下所有文件

分别进入3个节点Hadoop安装目录,启动3个节点的JournalNode进程

sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start journalnode

 

2、格式化NameNode

在centos01上执行 *在namenade所在的节点上处理 

bin/hdfs namenode -format

执行后存在这句话,执行成功
common.Storage: Storage directory /opt/modules/hadoop-2.8.2/tmp/dfs/name has been successfully formatted.

 

 

 

3、启动NameNode1(活动NameNode)

进入centos01的hadoop安装目录启动namenode1

sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start namenode

启动后生成images元数据

 

 

 

4、复制NameNode1元数据

进入centos02上进入Hadoop安装目录,执行以下,将centos01上的NameNode元数据复制到centos02上(或者将centos01 $HADOOP_HOME/tmp目录复制到centos02相同的位置)

sh /opt/modules/hadoop-2.8.2/bin/hdfs namenode -bootstrapStandby

执行后存在这句话,执行成功
common.Storage: Storage directory /opt/modules/hadoop-2.8.2/tmp/dfs/name has been successfully formatted.

 

 

 

5、启动NameNode2(备用NameNode)

进入centos02的hadoop安装目录启动namenode2

sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start namenode

启动后 浏览器 http://192.168.0.171:50070 查看NameNode1状态

浏览器 http://192.168.0.172:50070 查看NameNode2状态

 

 状态都为standby

 

6、将NameNode1状态设置为Active

进入centos01的hadoop安装目录启动namenode1

sh /opt/modules/hadoop-2.8.2/bin/hdfs haadmin -transitionToActive nn1

 

 刷新浏览器 http://192.168.0.171:50070 查看NameNode1状态

 

 状态变为Active

此时DataNode还没有启动

7、重启HDFS

进入centos01的hadoop安装目录

停止hdfs

sh  sbin/stop-dfs.sh

启动hdfs

sh  sbin/start-dfs.sh

 

 8、再次将NameNode1状态设置为Active

重启后NameNode、DataNode等进程已经启动,需要将NameNode1重新设置Active 

sh /opt/modules/hadoop-2.8.2/bin/hdfs haadmin -transitionToActive nn1

通过命令查看状态

bin/hdfs haadmin -getServiceState nn1

 

9、每个节点jps命令,查看状态

查看每个节点进程

jps

 

10、测试HDFS

将centos01上 kill -9 35396 ,手动到centos02上手动激活NameNode2(第六步)

当发生故障需要手动切换

 

 

四、结合Zookeeper自动进行故障转移 (zookeeper集群、ZKFailoverController进程(ZKFC))

Zookeeper主要作用故障检测和NameNode选举作用 

 

1、开启自动故障转移功能

在centos01上,修改hdfs-site.xml文件,加入如下内容

    <!-- 开启自动故障转移,mycluster为自定义配置的nameservice ID值 -->
    <property>
        <name>dfs.ha.automatic-failover.enabled.mycluster</name>
        <value>true</value>
    </property>

完整的配置

 1 <configuration>
 2   <property>
 3         <name>dfs.replication</name>
 4         <value>2</value>
 5     </property>
 6     
 7     <!-- mycluster 为自定义的值,下方配置要使用改值 -->
 8     <property>
 9         <name>dfs.nameservices</name>
10         <value>mycluster</value>
11     </property>
12     <!-- 配置两个NameNode的标示符 -->
13    <property>
14         <name>dfs.ha.namenodes.mycluster</name>
15         <value>nn1,nn2</value>
16     </property>
17     <!-- 配置两个NameNode 所在节点与访问端口 -->
18     <property>
19         <name>dfs.namenode.rpc-address.mycluster.nn1</name>
20         <value>centos01:8020</value>
21     </property>
22     <property>
23         <name>dfs.namenode.rpc-address.mycluster.nn2</name>
24         <value>centos02:8020</value>
25     </property>
26     <!-- 配置两个NameNode 的web页面访问地址 -->
27     <property>
28         <name>dfs.namenode.http-address.mycluster.nn1</name>
29         <value>centos01:50070</value>
30     </property>
31    <property>
32         <name>dfs.namenode.http-address.mycluster.nn2</name>
33         <value>centos02:50070</value>
34     </property>
35     <!-- 设置一组JournalNode的URL地址 -->
36     <property>
37         <name>dfs.namenode.shared.edits.dir</name>
38         <value>qjournal://centos01:8485;centos02:8485;centos03:8485/mycluster</value>
39     </property>
40     <!-- JournalNode用于存放元数据和状态的目录 -->
41     <property>
42         <name>dfs.journalnode.edits.dir</name>
43         <value>/opt/modules/hadoop-2.8.2/tmp/dfs/jn</value>
44     </property>
45     <!-- 客户端与NameNode通讯的类 -->
46     <property>
47         <name>dfs.client.failover.proxy.provider.mycluster</name>
48         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
49     </property>
50     <!-- 解决HA集群隔离问题 -->
51     <property>
52         <name>dfs.ha.fencing.methods</name>
53         <value>sshfence</value>
54     </property>
55    <!-- 上述ssh通讯使用的密钥文件 -->
56     <property>
57         <name>dfs.ha.fencing.ssh.private-key-files</name>
58        <!-- hadoop为当前用户名 -->
59         <value>/home/hadoop/.ssh/id_rsa</value>
60     </property>
61 
62     <!-- 开启自动故障转移,mycluster为自定义配置的nameservice ID值 -->
63     <property>
64         <name>dfs.ha.automatic-failover.enabled.mycluster</name>
65         <value>true</value>
66     </property>
67 
68     <!-- 配置sshfence隔离机制超时时间 -->
69     <property>
70         <name>dfs.ha.fencing.ssh.connect-timeout</name>
71         <value>30000</value>
72     </property>
73 
74     <property>
75         <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
76         <value>60000</value>
77     </property>
78 
79     <property>
80         <name>dfs.permissions.enabled</name>
81         <value>false</value>
82     </property>
83     <property>
84         <name>dfs.namenode.name.dir</name>
85         <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/name</value>
86     </property>
87     <property>
88         <name>dfs.datanode.data.dir</name>
89         <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/data</value>
90     </property>
91 </configuration>

 

 

2、指定Zookeeper集群

在centos01节点中,修改core-site.xml文件,加入以下内容

   <!-- 指定zookeeper集群节点以及端口 --> 
   <property>
        <name>ha.zookeeper.quorum</name>
        <value>centos01:2181,centos02:2181,centos03:2181</value>
    </property>

完整的配置

 1 <configuration>
 2     <property>
 3         <name>fs.defaultFS</name>
 4         <!-- <value>hdfs://centos01:9000</value> -->
 5         <value>hdfs://mycluster</value>
 6     </property>
 7     <property>
 8         <name>hadoop.temp.dir</name>
 9         <value>file:/opt/modules/hadoop-2.8.2/tmp</value>
10     </property>
11    <!-- 指定zookeeper集群节点以及端口 --> 
12    <property>
13         <name>ha.zookeeper.quorum</name>
14         <value>centos01:2181,centos02:2181,centos03:2181</value>
15     </property>
16      <!-- hadoop链接zookeeper的超时时长设置 -->
17     <property>
18         <name>ha.zookeeper.session-timeout.ms</name>
19         <value>1000</value>
20         <description>ms</description>
21     </property>
22 </configuration>

 

3、同步其它节点

将修改好的hdfs-site.xml、core-site.xml 同步到其它2个节点

scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/
scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/

4、停止HDFS集群

进入centos01的hadoop安装目录

停止hdfs

sh  sbin/stop-dfs.sh

 

 

 

5、启动Zookeeper集群

需要登陆到每个节点启动

sh /opt/modules/zookeeper-3.4.14/bin/zkServer.sh start

 

6、初始化HA在Zookeeper中的状态

进入centos01节点hadoop安装目录,执行命令创建znode节点,存储自动故障转移数据

sh bin/hdfs zkfc -formatZK

 

 

 

 

7、启动HDFS集群

进入centos01的hadoop安装目录

启动hdfs

sh  sbin/start-dfs.sh

 

8、启动ZKFC守护进程

需要手动启动运行NameNode的每个节点ZKFC进程,(centos01,centos02 两个节点上运行了NameNode)

sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start zkfc
停止
sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh stop zkfc

先启动的NameNode状态为Active

 

9、测试HDFS故障自动转移

查看每个节点进程情况

 

上传一个文件测试

hdfs dfs -mkdir /input
hdfs dfs -put /opt/modules/hadoop-2.8.2/README.txt /input

 

 

 

 测试kill centos02上NameNode

 

访问测试,centos01状态为active

 

 

⚠️ 当一个NameNode被kill后,另一个无法自动acitve

由于dfs.ha.fencing.methods参数的value是sshfence,需要使用的fuser命令;所以通过如下命令安装一下即可,两个namenode节点都需要安装

执行 fuser显示未找到命令
fuser

安装
sudo yum install -y psmisc 
[hadoop@centos02 hadoop-2.8.2]$ sudo yum install -y psmisc
[sudo] hadoop 的密码:
已加载插件:fastestmirror
Determining fastest mirrors
 * base: mirrors.njupt.edu.cn
 * extras: mirrors.163.com
 * updates: mirrors.163.com
base                                                                                                      | 3.6 kB  00:00:00     
docker-ce-stable                                                                                          | 3.5 kB  00:00:00     
extras                                                                                                    | 2.9 kB  00:00:00     
updates                                                                                                   | 2.9 kB  00:00:00     
(1/3): extras/7/x86_64/primary_db                                                                         | 164 kB  00:00:06     
(2/3): docker-ce-stable/x86_64/primary_db                                                                 |  41 kB  00:00:07     
(3/3): updates/7/x86_64/primary_db                                                                        | 6.7 MB  00:00:07     
正在解决依赖关系
--> 正在检查事务
---> 软件包 psmisc.x86_64.0.22.20-16.el7 将被 安装
--> 解决依赖关系完成

依赖关系解决

=================================================================================================================================
 Package                      架构                         版本                                 源                          大小
=================================================================================================================================
正在安装:
 psmisc                       x86_64                       22.20-16.el7                         base                       141 k

事务概要
=================================================================================================================================
安装  1 软件包

总下载量:141 k
安装大小:475 k
Downloading packages:
psmisc-22.20-16.el7.x86_64.rpm                                                                            | 141 kB  00:00:06     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : psmisc-22.20-16.el7.x86_64                                                                                   1/1 
  验证中      : psmisc-22.20-16.el7.x86_64                                                                                   1/1 

已安装:
  psmisc.x86_64 0:22.20-16.el7                                                                                                   

完毕!
[hadoop@centos02 hadoop-2.8.2]$ fuser
未指定进程
Usage: fuser [-fMuvw] [-a|-s] [-4|-6] [-c|-m|-n SPACE] [-k [-i] [-SIGNAL]] NAME...
       fuser -l
       fuser -V
Show which processes use the named files, sockets, or filesystems.

  -a,--all              display unused files too
  -i,--interactive      ask before killing (ignored without -k)
  -k,--kill             kill processes accessing the named file
  -l,--list-signals     list available signal names
  -m,--mount            show all processes using the named filesystems or block device
  -M,--ismountpoint     fulfill request only if NAME is a mount point
  -n,--namespace SPACE  search in this name space (file, udp, or tcp)
  -s,--silent           silent operation
  -SIGNAL               send this signal instead of SIGKILL
  -u,--user             display user IDs
  -v,--verbose          verbose output
  -w,--writeonly        kill only processes with write access
  -V,--version          display version information
  -4,--ipv4             search IPv4 sockets only
  -6,--ipv6             search IPv6 sockets only
  -                     reset options

  udp/tcp names: [local_port][,[rmt_host][,[rmt_port]]]

 

posted @ 2020-03-15 13:13  桥头堡洗脚城  阅读(306)  评论(0编辑  收藏  举报