安装高可用Hadoop生态 (三) 安装Hadoop

3.    安装Hadoop

3.1. 解压程序

※ 3台服务器分别执行

tar -xf ~/install/hadoop-2.7.3.tar.gz -C/opt/cloud/packages

ln -s /opt/cloud/packages/hadoop-2.7.3 /opt/cloud/bin/hadoop
ln -s /opt/cloud/packages/hadoop-2.7.3/etc/hadoop /opt/cloud/etc/hadoop

mkdir -p /opt/cloud/hdfs/name
mkdir -p /opt/cloud/hdfs/data
mkdir -p /opt/cloud/hdfs/journal
mkdir -p /opt/cloud/hdfs/tmp/java
mkdir -p /opt/cloud/logs/hadoop/yarn

3.2. 设置环境变量

设置JAVA环境变量和Hadoop环境变量

vi ~/.bashrc

增加

export HADOOP_HOME=/opt/cloud/bin/hadoop
export HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
export HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
export HADOOP_PID_DIR=/opt/cloud/hdfs/tmp

export YARN_PID_DIR=/opt/cloud/hdfs/tmp
export HADOOP_OPTS="-Djava.io.tmpdir=/opt/cloud/hdfs/tmp/java"

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

即刻生效

source ~/.bashrc

复制到另外两台服务器

scp ~/.bashrc hadoop2:/home/hadoop
scp ~/.bashrc hadoop3:/home/hadoop

3.3. 修改Hadoop参数

cd ${HADOOP_HOME}/etc/hadoop

修改log4j.properties 、hadoop-env.sh、yarn-env.sh、slaves、core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml,分发到hadoop2和hadoop2相同的目录下

3.3.1.    修改log配置文件log4j.properties

hadoop.root.logger =INFO,DRFA

hadoop.log.dir=/opt/cloud/logs/hadoop

3.3.2.    修改hadoop-env.sh

hadoop-env.sh设置了Hadoop的一些环境变量,但是直到2.7.3都有bug,不能从系统的环境变量中提取正确的值,需要手工修改,在文件头部

export JAVA_HOME=${JAVA_HOME}

将其注释,手工修改为

export JAVA_HOME="/usr/lib/jvm/java"

在文件中查找#export HADOOP_LOG_DIR,在其下增加

export HADOOP_LOG_DIR=/opt/cloud/logs/hadoop

在文件中查找export HADOOP_PID_DIR=${HADOOP_PID_DIR}

export HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

设置java的临时目录,查找

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true "

修改为

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.io.tmpdir=/opt/cloud/hdfs/tmp/java"

3.3.3.    修改yarn-env.sh

查找default log directory,在其后增加一行

export YARN_LOG_DIR=/opt/cloud/logs/hadoop/yarn

3.3.4.    修改slaves

# vi slaves

配置内容:

删除:localhost

添加:

hadoop2
hadoop3

3.3.5.    修改core-site.xml

# vi  core-site.xml

配置内容:

<configuration>

<property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/cloud/hdfs/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>hadoop</value> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>hadoop1, hadoop2, hadoop3,127.0.0.1,localhost</value> </property> <property> <name>ipc.client.rpc-timeout.ms</name>[1] <value>4000</value> </property> <property> <name>ipc.client.connect.timeout</name> <value>4000</value> </property> <property> <name>ipc.client.connect.max.retries</name> <value>100</value> </property> <property> <name>ipc.client.connect.retry.interval</name> <value>10000</value> </property> </configuration>

3.3.6.    修改hdfs-site.xml

# vi  hdfs-site.xml

配置内容:

<configuration>
    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>hadoop1:9000</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>hadoop1:50070</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>hadoop2:9000</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>hadoop2:50070</value>
    </property>
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value>
    </property>
    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/opt/cloud/hdfs/journal</value>
    </property>
    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>
        sshfence
        shell(/bin/true)
      </value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
    </property>
    <property>
       <name>dfs.ha.fencing.ssh.connect-timeout</name>
       <value>30000</value>
    </property>

   <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/opt/cloud/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/opt/cloud/hdfs/data</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.support.append</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
        <value>NEVER</value>
    </property>
    <property>
        <name>dfs.datanode.max.xcievers</name>
        <value>8192</value>
    </property>
</configuration>

3.3.7.    修改mapred-site.xml

mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml

配置内容:

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
     <name>mapreduce.jobhistory.address</name>
     <value>0.0.0.0:10020</value>
  </property>
  <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>0.0.0.0:19888</value>
  </property>
  <property> 
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx800m</value>
  </property>
  <property>
     <name>mapreduce.map.memory.mb</name>
     <value>512</value>
  </property>
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx400m</value>
  </property>
  <property>
     <name>mapreduce.reduce.memory.mb</name>
     <value>1024</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx800m</value>
  </property>
</configuration>

3.3.8.    修改yarn-site.xml(非HA版)

vi yarn-site.xml

配置内容:

<configuration>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop1</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
         <name> yarn.nodemanager.aux-services.mapreduce_shuffle.class </name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

3.3.9.    修改yarn-site.xml(HA版)

vi yarn-site.xml

配置内容:

<configuration>  
    <property>  
       <name>yarn.resourcemanager.ha.enabled</name>  
       <value>true</value>  
    </property>  
    <property>  
       <name>yarn.resourcemanager.cluster-id</name>  
       <value>clusteryarn</value>  
    </property>  
    <property>  
       <name>yarn.resourcemanager.ha.rm-ids</name>  
       <value>rm1,rm2</value>  
    </property>  
    <property>  
       <name>yarn.resourcemanager.hostname.rm1</name>  
       <value>hadoop1</value>  
    </property>  
    <property>  
       <name>yarn.resourcemanager.hostname.rm2</name>  
       <value>hadoop2</value>  
    </property>
    <property>
       <name>yarn.log-aggregation-enable</name>
       <value>true</value>
    </property>
    <property>  
       <name>yarn.resourcemanager.zk-address</name>  
       <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>  
    </property>
    <property>
       <name>yarn.nodemanager.aux-services</name>  
       <value>mapreduce_shuffle</value>  
    </property>
    <property>
         <name> yarn.nodemanager.aux-services.mapreduce_shuffle.class </name>
         <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property> 
      <name>yarn.resourcemanager.connect.retry-interval.ms</name>
      <value>5000</value>
    </property>
    <property> 
       <name>yarn.nodemanager.resource.memory-mb</name>
       <value>3072</value>
    </property> 
    <property> 
      <name>yarn.nodemanager.vmem-pmem-ratio</name>
      <value>4</value>
    </property>
    <property>
       <name>yarn.nodemanager.resource.cpu-vcores</name>
       <value>2</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
    </property>
    <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>2</value>
    </property>
</configuration>  

3.3.10.   复制到另外2台服务器

配置文件打包为 

scp /opt/cloud/bin/hadoop/etc/hadoop/* hadoop2:/opt/cloud/bin/hadoop/etc/hadoop/
scp /opt/cloud/bin/hadoop/etc/hadoop/* hadoop3:/opt/cloud/bin/hadoop/etc/hadoop/

3.4. 首次启动HDFS

  • 启动JournalNode集群:
cexec 'hadoop-daemon.sh start journalnode'

 

注意只有第一次需要这么启动,之后启动hdfs会包含journalnode。

  • 格式化第1个NameNode:
ssh hadoop1 'hdfs namenode -format -clusterId mycluster'

 

输出信息的最后部分出现下面两行表示格式化成功

INFO common.Storage: Storage directory /opt/cloud/hdfs/name has been successfully formatted.

...

INFO util.ExitUtil: Exiting with status 0

  • 启动第1个NameNode:
ssh hadoop1 'hadoop-daemon.sh start namenode'

 

  • 格式化第2个NameNode:
ssh hadoop2 'hdfs namenode -bootstrapStandby'

输出信息的最后部分出现下面两行表示格式化成功

INFO common.Storage: Storage directory /opt/cloud/hdfs/name has been successfully formatted.

...

INFO util.ExitUtil: Exiting with status 0

  • 启动第2个NameNode:
ssh hadoop2 'hadoop-daemon.sh start namenode'
  • 格式化Zk
ssh hadoop1 'hdfs zkfc -formatZK'

 

信息

INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.

即为格式化成功

  • 启动2个Zkfc
ssh hadoop1 'hadoop-daemon.sh start zkfc'
ssh hadoop2 'hadoop-daemon.sh start zkfc'
  • 启动所有的DataNodes:
ssh hadoop1 'hadoop-daemons.sh start datanode'

 

用浏览器访问http://hadoop1:50070和http://hadoop2:50070 查看状态

namenode一个是active一个是standby,其中active的网页中QJM三台服务器的Written txid相同。

3.5. 正式启动hdfs和Yarn

在hadoop1上执行

start-dfs.sh
start-yarn.sh

在hadoop2上执行

ssh hadoop2 'yarn-daemon.sh start resourcemanager'

通过jps查看进程

[hadoop@hadoop1 ~]$ cexec jps
************************* cloud *************************
--------- hadoop1---------
1223 QuorumPeerMain
3757 DFSZKFailoverController
4787 Jps
3872 ResourceManager
3365 NameNode
3578 JournalNode
--------- hadoop2---------
1220 QuorumPeerMain
24240 NodeManager
24545 Jps
24022 JournalNode
24139 DFSZKFailoverController
23847 NameNode
23923 DataNode
24419 ResourceManager
--------- hadoop3---------
23764 Jps
23578 NodeManager
23471 JournalNode
23372 DataNode
1224 QuorumPeerMain

 

在浏览器中下列网址,会看到图形界面的监控程序

http://hadoop1:50070/  dfs的图形界面的监控程序

http://hadoop2:50070/  dfs的图形界面的监控程序,hadoop1和hadoop2其中一个是active,另外一个是standby

http://hadoop1:8088

http://hadoop2:8088 自动跳转到http://hadoop1:8088

3.6. 开机自动运行hdfs

       Centos7 采用Systemd作为自启动管理器,有方便设置依赖关系等多个优点,不过,每个服务的环境变量都是初始化的,即“systemd不继承任何上下文环境”,所以服务脚本需要设置必要的所有环境变量,每个变量需要用Environment = name = value的方式设置,好消息Environment可以多行,坏消息是Environment中不支持已经使用已经声明的变量,就是说value中不能有$name,${name}。

3.6.1.    journalnode service

vi hadoop-journalnode.service

[Unit]
Description=hadoop journalnode service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh start journalnode'
ExecStop =/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh stop journalnode'
[Install]
WantedBy=multi-user.target

3.6.2.    namenode service

vi hadoop-namenode.service

[Unit]
Description=hadoop namenode service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh start namenode'
ExecStop=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh stop namenode'
[Install]
WantedBy=multi-user.target

3.6.3.    datanode service

vi hadoop-datanode.service

[Unit]
Description=hadoop datanode service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh start datanode'
ExecStop =/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh stop datanode'
[Install]
WantedBy=multi-user.target

3.6.4.    zkfc service

vi hadoop-zkfc.service

[Unit]
Description=hadoop zkfc service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh start zkfc'
ExecStop=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/hadoop-daemon.sh stop zkfc'
[Install]
WantedBy=multi-user.target

3.6.5.    yarn resource manager service

vi yarn-rm.service

[Unit]
Description=yarn resource manager service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/yarn-daemon.sh start resourcemanager'
ExecStop=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/yarn-daemon.sh stop resourcemanager'
[Install]
WantedBy=multi-user.target

3.6.6.    yarn nodemanager service

vi yarn-nm.service

[Unit]
Description=yarn node manager service
After= network.target
[Service]
Type=forking
User=hadoop
Group=hadoop
Environment = JAVA_HOME=/usr/lib/jvm/java
Environment = JRE_HOME=/usr/lib/jvm/java/jre
Environment = CLASSPATH='.:/usr/lib/jvm/java/jre/lib/rt.jar:/usr/lib/jvm/java/lib/dt.jar:$JAVA_HOME/lib/tools.jar'
Environment = HADOOP_HOME=/opt/cloud/bin/hadoop
Environment = HADOOP_CONF_DIR=/opt/cloud/etc/hadoop
Environment = HADOOP_LOG_DIR=/opt/cloud/logs/hadoop
Environment = HADOOP_PID_DIR=/opt/cloud/hdfs/tmp/

ExecStart=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/yarn-daemon.sh start nodemanager'
ExecStop=/usr/bin/sh -c '/opt/cloud/bin/hadoop/sbin/yarn-daemon.sh stop nodemanager'
[Install]
WantedBy=multi-user.target

3.6.7.    测试和设置为自动启动服务

       编写6种服务的启动脚本,分别复制到对应服务的/etc/systemd/system目录

hadoop2 (6种服务)

systemctl start hadoop-journalnode
systemctl start hadoop-namenode
systemctl start hadoop-datanode
systemctl start hadoop-zkfc
systemctl start yarn-rm
systemctl start yarn-nm

测试通过后

systemctl enable hadoop-journalnode
systemctl enable hadoop-namenode
systemctl enable hadoop-datanode
systemctl enable hadoop-zkfc
systemctl enable yarn-rm
systemctl enable yarn-nm

hadoop1 (4种服务)

systemctl enable hadoop-journalnode
systemctl enable hadoop-namenode
systemctl enable hadoop-zkfc
systemctl enable yarn-rm

hadoop3 (3种服务)

systemctl enable hadoop-journalnode
systemctl enable hadoop-datanode
systemctl enable yarn-nm

重新启动3台服务器,运行 cexec jps 查看系统状态

3.7. 卸载

  • 停止yarn,停止DFS:
ssh hadoop1 'stop-yarn.sh'
ssh hadoop2 'yarn-daemon.sh stop resourcemanager'
ssh hadoop1 'stop-dfs.sh'

 

     cexec jps 不再看到hdfs和yarn的进程

  • 停止并删除系统服务

hadoop2 (6种服务)

systemctl disable hadoop-journalnode
systemctl disable hadoop-namenode
systemctl disable hadoop-datanode
systemctl disable hadoop-zkfc
systemctl disable yarn-rm
systemctl disable yarn-nm

hadoop1 (4种服务)

systemctl disable hadoop-journalnode
systemctl disable hadoop-namenode
systemctl disable hadoop-zkfc
systemctl disable yarn-rm

hadoop3 (3种服务)

systemctl disable hadoop-journalnode
systemctl disable hadoop-datanode
systemctl disable yarn-nm
  • 删除数据目录
rm /opt/cloud/hdfs -rf
rm /opt/cloud/logs/hadoop -rf
  • 删除程序目录
rm /opt/cloud/bin/hadoop -rf
rm /opt/cloud/etc/hadoop -rf
rm /opt/cloud/packages/hadoop-2.7.3 -rf
  • 复原环境变量

       vi ~/.bashrc

       删除hadoop相关行

 



[1] 重要的参数,设置hadoop服务之间通讯超时,尤其是nodemanager和resoucemanager之间的ha机制

[2] 适应虚拟机4G内存,各项值都较低

[3] 与yarn的高可用有关,属于nodemanager连接失败后的策略参数

[4] 虚拟机仅4G内存2个核,这些资源参数也偏小

[5] 内存不足,虚拟内存比由2.1改为4

posted @ 2016-12-18 12:14  范振勇  阅读(425)  评论(0编辑  收藏  举报