上一篇文章,介绍了服务器的基本配置,本文将介绍Hadoop的设置。(JDK也已经下载并解压到对应目录,环境变量也已经设置过了,本文不再赘述。)

 

1创建用于存储数据的目录

在三台机器上分别执行如下命令:

mkdir /data/hadoop

mkdir /data/hadoop/hdfs
mkdir /data/hadoop/hdfs/nn
mkdir /data/hadoop/hdfs/dn
mkdir /data/hadoop/hdfs/snn
mkdir /data/hadoop/hdfs/tmp

mkdir /data/hadoop/yarn
mkdir /data/hadoop.yarn/nm

如果没有创建的权限,就使用 sudo mkdir xxx 来执行。

但需要修改这些文件夹的所属用户ubuntu,可以使用如下命令修改文件所属用户,并设置可读可写的权限:

cd /data

sudo chown -R ubuntu:root *
sudo chmod 766 *

 

2 master机器,修改hdfs相关配置文件

以下7个文件,均在 /usr/local/hadoop-2.9.2/etc/hadoop/ 目录中。

 

2.1 masters 文件

ubuntu@master

 

2.2 slaves 文件

ubuntu@slave1
ubuntu@slave2

 

2.3 hadoop-env.sh 文件

在此文件中加入如下语句:

export JAVA_HOME=/usr/local/jdk1.8.0_261

 

2.4 core-site.xml 文件

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:///data/hadoop/hdfs/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>
    <property>
        <name>fs.checkpoint.period</name>
        <value>3600</value>
        <description>The number of seconds between two periodic checkpoints</description>
    </property>
    <property>
        <name>hadoop.proxyuser.spark.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.spark.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>fs.checkpoint.txns</name>
        <value>1000000</value>
    </property>
</configuration>

 

2.5 hdfs-site.xml 文件

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.nameservices</name>
        <value>hadoop-cluster</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/hdfs/nn</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:///data/hadoop/hdfs/snn</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/hdfs/dn</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1:50090</value>
    </property>
    <property>
        <name>dfs.datanode.address</name>
        <value>0.0.0.0:50011</value>
    </property>
     <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

 

2.6 mapred-site.xml 文件

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>

 

2.7 yarn-site.xml 文件

<configuration>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
</property>
<property>
    <name>yarn.nodemanager.hostname</name>
    <value>master</value>
</property>
<property>
    <name>yarn.nodemanager.webapp.address</name>
    <value>master:8042</value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data/hadoop/yarn/nm</value>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
</configuration>

 

3 slave1机器修改hadoop设置:

slave1中的6个文件与master中的配置相同,第7个文件(yarn-site.xml 文件),中有两个节点标记为红色,这两个节点在slave1上与master不同,详情如下:

<configuration>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
</property>
<property>
    <name>yarn.nodemanager.hostname</name>
    <value>slave1</value>
</property>
<property>
    <name>yarn.nodemanager.webapp.address</name>
    <value>slave1:8042</value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data/hadoop/yarn/nm</value>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
</configuration>

 

4 slave2机器修改hadoop设置:

slave2中的6个文件与master中的配置相同,第7个文件(yarn-site.xml 文件),中有两个节点标记为红色,这两个节点在slave2上与master不同,详情如下:

<configuration>
<property>
    <name>yarn.resourcemanager.address</name>
    <value>master:8032</value>
</property>
<property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
</property>
<property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
</property>
<property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>master:8033</value>
</property>
<property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
</property>
<property>
    <name>yarn.nodemanager.hostname</name>
    <value>slave2</value>
</property>
<property>
    <name>yarn.nodemanager.webapp.address</name>
    <value>slave2:8042</value>
</property>
<property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data/hadoop/yarn/nm</value>
</property>
<property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
</property>
</configuration>

 

5 初始化namenode ,并启动集群

在master机器上,输入如下命令初始化namenode:

cd /usr/local/hadoop-2.9.2/bin
hdfs namenode -format

 

启动集群命令有:

cd /usr/local/hadoop-2.9.2/sbin

./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
./hadoop-daemons.sh start secondarynamenode

 

关闭集群命令有:

cd /usr/local/hadoop-2.9.2/sbin

./stop-dfs.sh
./stop-yarn.sh
./mr-jobhistory-daemon.sh stop historyserver
./hadoop-daemons.sh stop secondarynamenode

 

Hadoop部署完毕,可通过web页面查看集群状态信息:

 

#HDFS web页面地址
http://master公网IP:50070/

#Yarn web页面地址
http://master公网IP:8088/

 

Hadoop部署完毕。

警告:开发Yarn 的web地址8088端口是存在风险的,本示例服务器就被挂马并且成为挖矿的肉机,因此不要将Yarn地址开放到公网!

详情请见:https://www.cnblogs.com/qcloud1001/p/9173253.html

posted on 2020-08-31 10:17  Sempron2800+  阅读(150)  评论(0编辑  收藏  举报