hadoop-3.1.1分布式搭建

hadoop-3.1.1分布式搭建

1、上传解压配置环境变量

# 1、解压
tar -xvf hadoop-3.1.1.tar.gz.gz

# 2、配置环境变量
vim /etc/profile

# 3、在最后增加配置
export HADOOP_HOME=/usr/local/soft/hadoop-3.1.1
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

# 4、使环境变量剩下
source /etc/profile

2、修改配置文件

# 1、进入hadoop配置文件所在位置,修改hadoop配置文件
cd /usr/local/soft/hadoop-3.1.1/etc/hadoop

# 2、修改core-site.xml配置文件,在configuration中间增加配置
vim core-site.xml
# 增加配置
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:9000</value>
</property>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/soft/hadoop-3.1.1/tmp</value>
</property>
<property>
  <name>fs.trash.interval</name>
  <value>1440</value>
</property>
</configuration>

# 3、修改hdfs-site.xml配置文件,在configuration中间增加配置
vim hdfs-site.xml
# 增加配置
<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
</configuration>

# 4、修改yarn-site.xml配置文件,在configuration中间增加配置
vim yarn-site.xml
# 增加配置
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>master</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>


> mapreduce.framework.name:用于执行MapReduce作业的运行时框架。

> mapreduce.jobhistory.address:Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过*mr-**jobhistory-daemon.sh start historyserver**命令来启动Hadoop历史服务器。我们可以通过Hadoop jar的命令来实现我们的程序jar包的运行,关于运行的日志,我们一般都需要通过启动一个服务来进行查看,就是我们的JobHistoryServer,我们可以启动一个进程,专门用于查看我们的任务提交的日志。mapreduce.jobhistory.address和mapreduce.jobhistory.webapp.address默认的值分别是0.0.0.0:10020和0.0.0.0:19888

vim mapred-site.xml
# 2、修改
	<property>
    	<name>mapreduce.framework.name</name>
    	<value>yarn</value>
    </property>

    <property>  
    	<name>mapreduce.jobhistory.address</name>  
    	<value>master:10020</value>  
    </property>  

    <property>  
    	<name>mapreduce.jobhistory.webapp.address</name>  
    	<value>master:19888</value>  
	</property> 



# 5、修改hadoop-env.sh配置文件
vim hadoop-env.sh
# 增加配置
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home


# 6、修改hadoop-env.sh配置文件
vim workers
# 增加配置
node1
node2

方法1:在Hadoop安装目录下找到sbin文件夹,修改里面的四个文件(方法1)

1、对于start-dfs.sh和stop-dfs.sh文件,添加下列参数:

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

2、对于start-yarn.sh和stop-yarn.sh文件,添加下列参数:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

方法2:修改hadoop-env.sh的文件(推荐)

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

将hadoop文件夹分发到字节点中

scp -r hadoop-3.1.1 node1:`pwd`
scp -r hadoop-3.1.1 node2:`pwd`

3、初始化hdfs

# 初始化
hdfs namenode -format

4、启动hadoop

# 启动hadoop
start-all.sh

# 停止hadoop
# stop-all.sh

# hdfs web ui
http://master:9870

# yarn web ui
http://master:8088

排查问题

#  进入日志所在目录
cd /usr/local/soft/hadoop-3.1.1/logs

# 1、namenode启动失败
cat hadoop-root-namenode-master.log

# 2、resourcemanager启动失败
cat hadoop-root-resourcemanager-master.log

# 3、datanode启动失败
cat hadoop-root-datanode-master.log

格式化集群(暴力方式)

1. 关闭集群

stop-all.sh

2. 删除每个节点的hadoop根目录下的tmp目录

rm -rf tmp/

3. 重新格式化 在hadoop的bin目录下执行

hdfs namenode -format

4. 重新启动

start-all.sh

HDFS常用命令

创建文件夹

Usage: hadoop fs -mkdir [-p] <paths>

Takes path uri’s as argument and creates directories.

Options:

The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.
Example:

hadoop fs -mkdir /user/hadoop/dir1 /user/hadoop/dir2
hadoop fs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir
Exit Code:

Returns 0 on success and -1 on error.

将Linux中的文件上传到HDFS文件系统中


Usage: hadoop fs -put [-f] [-p] [-l] [-d] [ - | <localsrc1> .. ]. <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system if the source is set to “-”

Copying fails if the file already exists, unless the -f flag is given.

Options:

-p : Preserves access and modification times, ownership and the permissions. (assuming the permissions can be propagated across filesystems)
-f : Overwrites the destination if it already exists.
-l : Allow DataNode to lazily persist the file to disk, Forces a replication factor of 1. This flag will result in reduced durability. Use with care.
-d : Skip creation of temporary file with the suffix ._COPYING_.
Examples:

hadoop fs -put localfile /user/hadoop/hadoopfile
hadoop fs -put -f localfile1 localfile2 /user/hadoop/hadoopdir
hadoop fs -put -d localfile hdfs://nn.example.com/hadoop/hadoopfile
hadoop fs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.
Exit Code:

Returns 0 on success and -1 on error.

查看文件内容

Usage: hadoop fs -cat [-ignoreCrc] URI [URI ...]

Copies source paths to stdout.

Options

The -ignoreCrc option disables checkshum verification.
Example:

hadoop fs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2
hadoop fs -cat file:///file3 /user/hadoop/file4
Exit Code:

Returns 0 on success and -1 on error.

复制文件到HDFS其他目录下

Usage: hadoop fs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>

Copy files from source to destination. This command allows multiple sources as well in which case the destination must be a directory.

‘raw.*’ namespace extended attributes are preserved if (1) the source and destination filesystems support them (HDFS only), and (2) all source and destination pathnames are in the /.reserved/raw hierarchy. Determination of whether raw.* namespace xattrs are preserved is independent of the -p (preserve) flag.

Options:

The -f option will overwrite the destination if it already exists.
The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no arg, then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Determination of whether raw namespace extended attributes are preserved is independent of the -p flag.
Example:

hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2
hadoop fs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir
Exit Code:

Returns 0 on success and -1 on error.

将HDFS的文件移动到HDFS其他目录下

Usage: hadoop fs -mv URI [URI ...] <dest>

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

Example:

hadoop fs -mv /user/hadoop/file1 /user/hadoop/file2
hadoop fs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1
Exit Code:

Returns 0 on success and -1 on error.

hadoop fs -rm -r -f /bigdata29/aaa

将HDFS文件下载到Linux中

Usage: hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] <src> <localdst>

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

Example:

hadoop fs -get /user/hadoop/file localfile
hadoop fs -get hdfs://nn.example.com/user/hadoop/file localfile
Exit Code:

Returns 0 on success and -1 on error.

Options:

-p : Preserves access and modification times, ownership and the permissions. (assuming the permissions can be propagated across filesystems)
-f : Overwrites the destination if it already exists.
-ignorecrc : Skip CRC checks on the file(s) downloaded.
-crc: write CRC checksums for the files downloaded.

hadoop fs -put xxx

hadoop fs -cat xxx

hadoop fs -get xxx

hadoop fs -rm -r -f xxx

hadoop fs -cp xx

hadoop fs -mv


hdfs dfs -put xxx

hdfs dfs -cat xxx

hdfs dfs -get xxx

hdfs dfs -rm -r -f xxx

hdfs dfs -cp xx

hdfs dfs -mv

posted @ 2024-04-18 21:23  low-reed  阅读(14)  评论(0)    收藏  举报