Hadoop学习:伪分布环境搭建

环境要求

此次流程记录是Ubuntu Server 20.04系统下Hadoop集群的搭建。Linux和Windows所需软件包括:(1)Java 1.5.x及以上,必须安装,建议选择Java 8版本。(2)ssh 必须安装并且保证 sshd一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。

软件下载

Mobaxterm:一款多功能的远程连接工具,支持多种协议,自带SFTP功能,下载链接
20220316004252
Java:Java 8是目前使用最为广泛的版本(可能是因为Java版本迭代所带来的学习成本问题),自此安装也是选择Java 8为运行环境,方便维护,下载链接

20220316004305
SSH:集群的工作环境需要远程连接进行管理,通过对master和slaves安装SSH服务,方便管理。Ubuntu安装SSH服务命令apt-get install openssh-server。

20220316004315
Hadoop:此次安装的重头戏,前往Apache官方网站下载即可,https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz

20220316004323

软件的安装和配置

Mobaxterm的安装

Mobaxterm下载后,双击安装包进行安装,经典的“下一步”,并进行安装位置的选择,完成安装。

20220316004331

SSH的安装

Ubuntu安装后默认就具有SSH服务,由于未进行配置,此时还是个单向的服务,“只出不进”——能通过SSH命令远程连接其他主机,其他主机不能通过SSH访问本机。安装OpenSSH,在Terminal中输入命令:

apt-get update
apt-get install openssh-server

20220316004341

配置SSH

新建root用户,键入命令passwd root,并输入、确定密码。
首先,编辑系统或者重要服务配置文件时,做好备份。键入命令

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.bak

再通过Vim进行SSH的配置,

vim /etc/ssh/sshd_config

添加一行PermitRootLogin yes,按键盘Esc,输入英文冒号、wq,推出并保存。再重启SSH服务:

service ssh restart

远程连接

在虚拟主机Terminal中输入ip addr,查看默认分配的IP地址。

20220316004355
打开Mobaxterm,进行远程连接配置,新建远程连接任务,选择协议类型,根据上一步获取的信息,在Mobaxterm中输入虚拟主机的相关信息。

20220316004403
输入之前配置的root账户的密码,完成远程连接。

20220316004410

Java的安装和配置

使用的Mobaxterm的SFTP的功能,支持拖拽上传,直接将下载的Java 8打包压缩包拖至文件夹中。由于处在内网环境下,传输速率应该会很快。

20220316004420
进入到Java 8所在文件夹,进行解压

tar -zxvf jdk-8u321-linux-x64.tar.gz

配置环境变量,vim /etc/profile,在末尾添加语句,配置环境变量。

JAVA_HOME=/path/to/java/jdk1.8.0_321
PATH=$PATH:$HOME/bin:$JAVA_HOME/bin
export JAVA_HOME
export PATH

重载环境变量配置文件source /etc/profile,并检查Java是否配置完成java -version
20220316004434

Hadoop的安装和配置

重复安装Java时的步骤,拖拽上传Hadoop压缩包,并进行解压,验证安装。
20220316004452

Hadoop分布式配置

克隆虚拟主机

将配置好的已有主机(Master)克隆两台(Node1、Node2)。

在克隆选项时,需注意选择“创建完整克隆”,否则后续步骤将失败。

20220316004534

配置网络和主机名

由于VMware克隆机制,会将源主机的所有信息原封不动得复制一份,包括软件和硬件信息。所以需要对克隆的Nodes的网络进行配置。

  • 生成MAC地址。

20220316004542

  • 配置IP和网关,分别对三个主机进行配置。使用shell命令
vim etc/netplan/00-installer-config.yaml

对网络进行配置。使用netplan –debug apply使得配置生效。

# This is the network config written by 'subiquity'
network:
  ethernets:
    ens33:
      addresses: [192.168.***.***/24]
      gateway4: 192.168.***.2
      nameservers:
              addresses: [8.8.8.8, 8.8.4.4]
      dhcp4: false
  version: 2

20220316004549

  • 修改主机名
vim /etc/hostname

分别为master、node1、node2。

20220316004557

分布式集群

经过上述步骤,再通过Mobaxterm连接这三台主机(master、node1、node2)。

20220316004604

  • 防火墙:分别关闭这三台主机的防火墙,并不允许防火墙自启。在Ubuntu 20.04版本中,ufw status防火墙默认时关闭(inactive)状态。

20220316004612

  • SSH免密:在master主机执行
ssh-keygen -t rsa

生成密钥文件,生成过程一路回车即可,并使用

ssh-copy-id -i node1
ssh-copy-id -i node2

命令,该命令的作用其实就是将密钥复制到node1和node2的/root/.ssh/authorized_keys文件中,方法不限,也可手动将密钥复制到两个节点的/root/.ssh/authorized_keys文件中去,能达到同样的效果。实现master能够免密登录到node1和node2。为后续的工作打好基础。

20220316004619

  • IP映射:修改master主机IP hosts映射,vim /etc/hosts。在node1、node2中做出相同的配置,或者使用命令
scp /etc/hosts node<id>:/etc/hosts

20220316004627

  • Hadoop配置文件:修改master主机的Hadoop配置文件workers(/path/to/hadoop/etc/hadoop/workers),向其中添加主机名node1、node2或者二者对应的IP。修改hadoop-env.sh(/path/to/hadoop/etc/hadoop/hadoop-env.sh),添加Java安装的根目录。

20220316004634
修改core-site.xml文件,hdfs监听端口、Hadoop临时文件夹内容因人而异。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:12369</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
    </property>
    <property>
        <name>fs.trash.interval</name>
        <value>1440</value>
    </property>
</configuration>

20220316004640
修改hdfs-site.xml配置文件。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>0.0.0.0:50070</value>
    </property>
</configuration>

20220316004646
修改yarn-site.xml配置文件。

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

    <!-- Site specific YARN configuration properties -->

    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>

20220316004659
修改mapred-site.xml配置文件。

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>
</configuration>

20220316004707

  • 拷贝master主机已经配置完毕的hadoop到node1和node2中
scp -rf /usr/local/hadoop/hadoop-3.3.2 node<id>:/usr/local/hadoop/

此时应该是覆盖了。

  • 启动Hadoop:输入
/usr/local/hadoop/hadoop-3.3.2/bin/hdfs namenode -format

命令生成初始化文件。运行启动键本:

/usr/local/hadoop/hadoop-3.3.2/sbin/start-all.sh

20220316004716

  • 大功告成!。

20220316004724

posted @ 2022-03-22 20:42  Flipped1121  阅读(37)  评论(0编辑  收藏  举报