Hadoop Single Node Cluster

https://hadoop.apache.org/releases.html & https://dlcdn.apache.org/hadoop

准备工作

安装 SSHD

sudo apt install --reinstall -y openssh-server
sudo systemctl start ssh # sudo service ssh restart
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost

 

下载 JDK

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

curl -LOJ https://repo.huaweicloud.com/java/jdk/8u202-b08/jdk-8u202-linux-x64.tar.gz
tar -zxf jdk-8u202-linux-x64.tar.gz -C /opt/

 

安装 Hadoop

镜像:https://mirrors.tuna.tsinghua.edu.cn/apache/hadoophttps://mirrors.huaweicloud.com/apache/hadoop/https://mirrors.aliyun.com/apache/hadoophttps://mirrors.cloud.tencent.com/apache/hadoop/

Windows 还需下载 winutils.exe 和 hadoop.dll 放入 bin 目录下,也可自行编译

curl -LOJ https://mirrors.cloud.tencent.com/apache/hadoop/common/stable/hadoop-3.4.1.tar.gz
tar -zxf hadoop-3.4.1.tar.gz -C /opt/
cd /opt/hadoop-3.4.1
bin/hadoop version

 

配置 Hadoop 伪分布式

2.X 默认端口,HDFS NameNode 内部通信:8020 或 9000,HDFS NameNode HTTP UI:50070,HDFS DataNode HTTP UI:50075,Yarn 任务执行状态查看:8088,历史服务器通信:19888。

3.X 默认端口,HDFS NameNode 内部通信:8020、9000 或 9820,HDFS NameNode HTTP UI:9870,HDFS DataNode HTTP UI:9864,Yarn 任务执行状态查看:8088,历史服务器通信:19888。

配置文件,core-site.xml:系统设置。hdfs-site.xml:定义文件系统的参数。yarn-site.xml:资源管理器的配置。mapred-site.xml:MapReduce 作业的配置。workers(3.X)/slaves(2.X):定义集群中的工作节点。

一、配置 HDFS

vim etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/jdk1.8.0_202

vim etc/hadoop/core-site.xml
<configuration>
    <property><!-- 指定 HDFS 中 NameNode 的地址 -->
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value><!-- hostnamectl status | grep "Static hostname" -->
    </property>
    <property><!-- 指定 Hadoop 运行时产生文件的存储目录 -->
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoopTmp</value>
    </property>
</configuration>
vim etc/hadoop/hdfs-site.xml
<configuration>
    <property><!-- 指定 HDFS 副本的数量 -->
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

bin/hdfs namenode -format
sbin/start-dfs.sh

 

访问(2.x默认50070,3.x默认9870)端口查看 web 端

bin/hdfs dfs -mkdir -p /user/$USER
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.1.jar grep input output 'dfs[a-z.]+'
# bin/hdfs dfs -get output output
# cat output/*
bin/hdfs dfs -cat output/*

 

二、配置 YARN

vim etc/hadoop/yarn-site.xml
<configuration>
    <property><!-- Reducer 获取数据的方式 -->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
    <property><!-- 物理内存检查 -->
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property><!-- 虚拟内存检查 -->
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
</configuration>

sbin/start-yarn.sh

访问 8088 端口查看 web 端

 

三、配置 MapReduce

vim etc/hadoop/mapred-site.xml
<configuration>
    <property><!-- 指定 MR 运行在 YARN 上 -->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

# 运行一个计算圆周率的 MapReduce 任务
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.1.jar pi 10 100

访问 8088 端口可查看记录

 

其它

 

一、启动 jobhistory,查看历史记录

bin/mapred --daemon start historyserver

访问 19888 端口查看 web 端

 

二、配置 log-aggregation,查看运行详情

vim etc/hadoop/yarn-site.xml
<configuration>
    <property><!-- 开启日志聚集功能 -->
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property><!-- 设置日志保留时间(7天) -->
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>
# 重启服务
bin/mapred --daemon stop historyserver
sbin/stop-yarn.sh
sbin/stop-dfs.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
bin/mapred --daemon start historyserver

bin/mapred --daemon stop historyserver
bin/yarn --daemon stop nodemanager
bin/yarn --daemon stop resourcemanager
bin/hdfs --daemon stop secondarynamenode
bin/hdfs --daemon stop namenode
bin/hdfs --daemon stop datanode

bin/hdfs --daemon start namenode
bin/hdfs --daemon start datanode
bin/hdfs --daemon start secondarynamenode
bin/yarn --daemon start resourcemanager
bin/yarn --daemon start nodemanager
bin/mapred --daemon start historyserver

# 再运行一个任务,就可以看到详情
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.1.jar pi 10 100

查看刚刚运行的任务的详情,未开启 log-aggregation 之前运行的任务无法查看详情

 


https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html

https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html

posted @ 2019-03-20 22:43  江湖小小白  阅读(1714)  评论(0)    收藏  举报