CentOS 7下Hadoop3.1伪分布式环境搭建

    本教程JDK版本为1.8.0  Hadoop版本为3.1.1

相关资源:https://pan.baidu.com/s/1EhkiCXidke-iN6kU3yuMJQ 提取码:p0bl

1.安装虚拟机

  可自行选择VMware或者VirtualBox进行虚拟机安装

  此教程基于VMware 

2.安装操作系统

  可从CentOS 官网自行选择版本进行安装 

  此教程基于CentOS 7 X86_64-Minimal-1804

3.检查是否安装ssh (CentOS 7 即使是最小化安装也已附带openssh 可跳过本步骤)

  rpm -qa | grep ssh

  若已安装进行下一步骤 若未安装 请自行百度 本教程不做过多讲解

4.配置ssh,实现无密码登录

  1.开启sshd服务

  systemctl start sshd.service

  2.进入 ~/.ssh 文件夹

  cd ~/.ssh

   若不存在该文件夹 可使用以下命令 使用root账户登录后生成

  ssh root@localhost

   然后输入yes 并输入本机root密码 

  3.进入 .ssh目录后 执行

  ssh-keygen -t rsa 

   一路按回车就可以 

  4.做ssh免密认证 执行以下命令即可

  cat id_rsa.pub >> authorized_keys

  5.修改文件权限

  chmod 644 authorized_keys

  6.检测是否可以免密登录

  ssh root@localhost

   无需输入密码登录 即为成功

5.上传jdk,并配置环境变量

  通过xftp 或者 winSCP等工具 将文件上传至CentOS7 的 /usr/local/java 文件夹中

  进入文件夹并进行解压缩

  cd /usr/local/java
  tar -zxvf jdk-8u191-linux-x64.tar.gz

  设置环境变量 

  vim ~/.bashrc

  在最下方添加

  export JAVA_HOME=/usr/local/java/jdk1.8.0_191
  export PATH=$JAVA_HOME/bin:$PATH

  使用以下命令使配置生效

  source ~/.bashrc

6.上传Hadoop,并配置环境变量

 1.系统环境变量 

  通过xftp 或者 winSCP等工具 将文件上传至CentOS7 的 /usr/local/java 文件夹中

  进入文件夹并进行解压缩

  cd /usr/local/hadoop
  tar -zxvf hadoop-3.1.1.tar.gz

  设置环境变量

  vim ~/.bashrc

  在最下方添加

 export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.1
 export HADOOP_INSTALL=$HADOOP_HOME
 export HADOOP_MAPRED_HOME
=$HADOOP_HOME  export HADOOP_COMMON_HOME=$HADOOP_HOME  export HADOOP_HDFS_HOME=$HADOOP_HOME  export YARN_HOME=$HADOOP_HOME  export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

  使用以下命令使配置生效

  source ~/.bashrc

 2.准备工作

  创建存放数据的目录

  mkdir /usr/local/hadoop/tmp

  创建namenode 存放 name table 的目录

  mkdir /usr/local/hadoop/tmp/dfs/name

  创建 datanode  存放 数据 block 的目录

  mkdir /usr/local/hadoop/tmp/dfs/data

 3.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的core-site.xml配置文件

  默认情况下,Hadoop将数据保存在/tmp下,当重启系统时,/tmp中的内容将被自动清空,所以我们需要制定自己的一个Hadoop的目录,用来存放数据。另外需要配置Hadoop所使用的默认文件系统,以及Namenode进程所在的主机

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<!-- 指定hadoop运行时产生文件的存储路径-->
<name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property>
<!--hdfs namenode的通信地址-->
<name>fs.defaultFS</name> <value>hdfs://127.0.0.1:9000</value> </property> </configuration>

 4.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的hdfs-site.xml配置文件

  该文件指定与HDFS相关的配置信息。需要修改HDFS默认的块的副本属性,因为HDFS默认情况下每个数据块保存3个副本,而在伪分布式模式下运行时,由于只有一个数据节点,所以需要将副本个数改为1,否则Hadoop程序会报错

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<!--指定HDFS储存数据的副本数目,默认情况下为3份-->
<name>dfs.replication</name> <value>1</value> </property> <property>
<!--name node 存放 name table 的目录-->
<name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property>
<!--data node 存放数据 block 的目录-->
<name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> <property>
<!--设置监控页面的端口及地址-->
<name>dfs.http.address</name> <value>0.0.0.0:50070</value> </property> </configuration>

 5.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的mapred-site.xml配置文件

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<!-- 指定mapreduce 编程模型运行在yarn上 -->
<name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

 6.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的yarn-site.xml配置文件

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<!-- 指定mapreduce 编程模型运行在yarn上 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

  7.格式化namenode,只格式化一次即可

   hadoop namenode -format

  8.启动hadoop

  start-all.sh

  9.查看进程,检查是否启动

  jps

  若显示五个进程 : namenodesecondarynamenodedatanoderesourcemanagernodemanager 则启动成功

7.排除错误

 1. 初始化时的问题

  WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

  解决方案(共两种方式)

  1.在~/.bashrc 中加入如下配置 

  export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

    使上面的配置生效

  source ~/.bashrc

  2.修改core-site.xml文件,添加

<property>
    <name>hadoop.native.lib</name>
    <value>false</value>
</property>

 2.启动时的错误

  错误 1

Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [localhost.localdomain]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

  解决方案

  因为缺少用户定义造成的,所以分别编辑开始和关闭脚本 

     /usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-dfs.sh 和 stop-dfs.sh 

  在最上方 #/usr/bin/env bash 下空白处添加

HDFS_DATANODE_USER=root 
HADOOP_SECURE_DN_USER=hdfs 
HDFS_NAMENODE_USER=root 
HDFS_SECONDARYNAMENODE_USER=root 

  错误 2

Starting resourcemanager 
ERROR: Attempting to launch yarn resourcemanager as root 
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting launch. 
Starting nodemanagers 
ERROR: Attempting to launch yarn nodemanager as root 
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting launch. 

   解决方案

     因为缺少用户定义造成的,所以分别编辑开始和关闭脚本

      /usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-yarn.sh 和 stop-yarn.sh

   在最上方 #/usr/bin/env bash 下空白处添加

YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

  错误 3

WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.

  解决方案

  在/usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-dfs.sh 和 stop-dfs.sh 中将

HDFS_DATANODE_USER=root  
HADOOP_SECURE_DN_USER=hdfs  
HDFS_NAMENODE_USER=root  
HDFS_SECONDARYNAMENODE_USER=root 

  改为

HDFS_DATANODE_USER=root  
HDFS_DATANODE_SECURE_USER=hdfs  
HDFS_NAMENODE_USER=root  
HDFS_SECONDARYNAMENODE_USER=root 

8.至此已经成功安装完成Hadoop

  HDFS Web界面:http://192.168.0.3:50070 

  ResourceManager Web界面:http://192.168.0.3:8088 

 

posted @ 2018-12-05 14:49  小二君i  阅读(451)  评论(0编辑  收藏  举报