hopeless-dream

导航

HDFS集群搭建及命令使用(Hadoop2.0)

HDFS介绍

HDFS(Hadoop Distributed File System)来源于2003Google 发表的论文——分布式文件系统GFS(Google File System),HDFS是GFS的开源实现,是一个块级别的分布式文件系统。

为了解决海量数据存储扩容问题,有两种方案:纵向扩展(scale-up)和横向扩展(scale-out)。纵向扩展是在现有存储系统上,增大存储设备的容量来满足数据增长的需求,成本高、难以升级、很容易遇到瓶颈(存储设备容量的增大是有限度的)。横向扩展是在原网络连接的基础上增加节点,变为集群。横向扩展的问题在于:

1、容错性:解决个别节点故障而不丢失数据问题

2、高效性:GB级别的文件很常见,而且数量很多,需要分布式文件系统在IO操作和块大小方面进行设计

3、一次写入多次读取:文件通过追加(append-only)的方式写入,写入后不可修改,且只能顺序读取

HDFS解决了上面的问题,适合一次写入,多次读取,且不支持修改,适合做数据分析OLAP(On-Line Analytical Processing,联机分析处理)

Hadoop的四大模块

模块名称 配置文件 作用
Common

core-site.xml

系统配置工具、远程过程调用RPC、序列化机制和Hadoop抽象文件系统FileSystem
HDFS

hdfs-site.xml

分布式文件系统,提供对应用程序数据的高吞吐量,高伸缩性,高容错性的访问
MapReduce

mapred-site.xml

分布式的离线并行计算框架
YARN

yarn-site.xml

资源调度和集群资源管理模块

HDFS五大进程

守护进程名 端口 配置项 说明
NameNode 50070 dfs.namenode.http-address http服务的端口
  50470 dfs.namenode.https-address https服务端口
  8020 fs.defaultFS 接收Client连接的RPC端口,用于获取文件系统元数据信息。
       
DataNode 50075 dfs.datanode.http.address http服务端口
  50475 dfs.datanode.https.address https服务端口
  50020 dfs.datanode.ipc.address ipc服务端口
  50010 dfs.datanode.address datanode服务端口,用于数据传输
       
ResourceManager 8030 yarn.resourcemanager.scheduler.address 调度器的IPC端口
  8031 yarn.resourcemanager.resource-tracker.address IPC
  8032 yarn.resourcemanager.address RM的applications manager(ASM)端口
  8033 yarn.resourcemanager.admin.address IPC
  8088 yarn.resourcemanager.webapp.address http服务端口
       
NodeManager 8040 yarn.nodemanager.localizer.address localizer IPC
  8041 yarn.nodemanager.address NM中container manager的端口
  8042 yarn.nodemanager.webapp.address http服务端口
       

其他进程

守护进程名 端口 配置项 说明
journalnode 8480 dfs.journalnode.http-address HTTP服务
  8485 dfs.journalnode.rpc-address RPC服务
       
ZKFC 8019 dfs.ha.zkfc.port zk FailoverController端口
       
JobHistory Server 10020 mapreduce.jobhistory.address IPC
  19888 mapreduce.jobhistory.webapp.address http服务端口

hadoop2.6集群搭建

基础环境准备

地址规划

 

主机名

ip

角色

node1

10.0.0.31

namenode,datanode

node2

10.0.0.32

datanode

node3

10.0.0.33

secondarynamenode,datanode

系统CentOS7.4

 关闭防火墙

# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

关闭selinux
# setenforce 0

# getenforce 
Disabled

配置yum源

# mv /etc/yum.repos.d/CentOS-Base.repo{,.bak}

# wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo

安装时间服务

# yum install -y chrony

# vim /etc/chrony.conf
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server ntp1.aliyun.com iburst
# Allow NTP client access from local network.
allow 10/8

# rm -rf /etc/localtime
# ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
# systemctl enable chronyd
# systemctl start chronyd

安装jdk

# mkdir /opt/{software,module}

# cd /opt/software&& ll
total 377944
-rw-r--r-- 1 root root 195257604 Mar 30  2019 hadoop-2.6.0.tar.gz
-rw-r--r-- 1 root root 191753373 Dec 16  2018 jdk-8u191-linux-x64.tar.gz

# tar xfz jdk-8u191-linux-x64.tar.gz -C ../module/

# ln -sv ../module/jdk1.8.0_191 ../module/jdk

配置环境变量
# sed -i.ori '$a export JAVA_HOME=/opt/module/jdk\nexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH\nexport CLASSPATH=.$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar' /etc/profile

# . /etc/profile

安装hadoop

# tar xfz /opt/software/hadoop-2.6.0.tar.gz -C /opt/module/

# vim /etc/profile
export HADOOP_HOME=/opt/module/hadoop-2.6.0
export JAVA_HOME=/opt/module/jdk
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export CLASSPATH=.$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar

# . /etc/profile

修改hdfs运行时环境变量

$ vim  /opt/module/hadoop-2.6.0/etc/hadoop/hadoop-env.sh

# The java implementation to use.
export JAVA_HOME=/opt/module/jdk

免密

# vim key.sh
#!/bin/bash

ssh-keygen -t rsa -f /root/.ssh/id_rsa -P ""

for i in 10.0.0.3{1..3};do
  sshpass -p123456 ssh-copy-id -i /root/.ssh/id_rsa.pub "-o StrictHostKeyChecking=no" root@$i
done

# useradd hdfs -s /bin/bash

# echo 123456|passwd --stdin hdfs
Changing password for user hdfs.
passwd: all authentication tokens updated successfully.

 

# su - hdfs

$ ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""


$ for i in node{1..3};do ssh-copy-id -i ~/.ssh/id_rsa.pub hdfs@$i;done

hadoop配置文件及脚本

文件名称

格式

描述

hadoop-env.sh

BASH脚本

运行hadoop时用到的环境变量

mapred-env.sh

BASH脚本

运行MapReduce时用到的环境变量(覆盖hadoop-env.sh中的变量)

yarn-env.sh

BASH脚本

运行YARN时用到的环境变量(覆盖hadoop-env.sh中的变量)

core-site.xml

hadoop配置XML

hadoop核心配置项,例如hdfs、MapReduce和YARN的I/O设置

hdfs-site.xml

hadoop配置XML

hadoop守护进程配置,包括namenode、secondarynamenode、datanode等

mapred-site.xml

hadoop配置XML

MapReduce守护进程配置,包括作业历史服务器

yarn-site.xml

hadoop配置XML

YARN守护进程配置,包括资源管理器、web应用代理服务器和节点管理器

slaves

纯文本

运行datanode和节点管理器的机器列表(每行一个)

 

hadoop-metrics2

java属性

控制如何在hadoop集群发布度量的属性

log4j.properties

java属性

系统日志文件、namenode审计日志、任务JVM进程的任务日志的属性

hadoop-policy.xml

hadoop配置XML

安全模式下运行hadoop时访问控制列表的配置项

配置slaves(运行DataNode的节点)

# vim slaves
node1
node2
node3

修改hadoop环境变量

# vim core-site.xml
  <!-- 指定 HDFS 中 NameNode(master)节点 的地址 -->
<property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:8020</value>
</property>
<!-- 指定 hadoop 运行时产生文件的存储目录,包括索引数据和真实数据 -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/opt/module/hadoop-2.6.0/data/tmp</value>
</property>
<!-- 指定hdfs回收时间 -->
<property>          
<name>fs.trash.interval</name>
<value>1440</value>
</property>

配置2nn和副本

# vim hdfs-site.xml

<configuration>
<!-- 指定 HDFS 块副本的数量 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node3:50090</value>
</property>


</configuration>
~                   

发送到另外两台服务器

# for i in 10.0.0.3{2..3};do scp -rp /opt/module hdfs@$i:/opt/ ;done

# for i in node{2..3};do scp -r /etc/profile root@${i}:/etc;done

# for i in node{2..3};do scp -r /etc/hosts root@${i}:/etc;done

初始化集群

$ hdfs namenode -format

20/03/04 20:37:32 INFO namenode.NameNode: Caching file names occuring more than 10 times
20/03/04 20:37:32 INFO util.GSet: Computing capacity for map cachedBlocks
20/03/04 20:37:32 INFO util.GSet: VM type       = 64-bit
20/03/04 20:37:32 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
20/03/04 20:37:32 INFO util.GSet: capacity      = 2^18 = 262144 entries
20/03/04 20:37:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/03/04 20:37:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
20/03/04 20:37:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
20/03/04 20:37:32 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/03/04 20:37:32 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/03/04 20:37:32 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/03/04 20:37:32 INFO util.GSet: VM type       = 64-bit
20/03/04 20:37:32 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
20/03/04 20:37:32 INFO util.GSet: capacity      = 2^15 = 32768 entries
20/03/04 20:37:32 INFO namenode.NNConf: ACLs enabled? false
20/03/04 20:37:32 INFO namenode.NNConf: XAttrs enabled? true
20/03/04 20:37:32 INFO namenode.NNConf: Maximum size of an xattr: 16384
20/03/04 20:37:32 INFO namenode.FSImage: Allocated new BlockPoolId: BP-112428611-10.0.0.31-1583325452737
20/03/04 20:37:33 INFO common.Storage: Storage directory /opt/module/hadoop-2.6.0/data/tmp/dfs/name has been successfully formatted.
20/03/04 20:37:33 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/03/04 20:37:33 INFO util.ExitUtil: Exiting with status 0
20/03/04 20:37:33 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1/10.0.0.31

启动NameNode、Secondary NameNode和DataNode

$ start-dfs.sh
Starting namenodes on [node1]
node1: starting namenode, logging to /opt/module/hadoop-2.6.0/logs/hadoop-hdfs-namenode-node1.out
node2: starting datanode, logging to /opt/module/hadoop-2.6.0/logs/hadoop-hdfs-datanode-node2.out
node3: starting datanode, logging to /opt/module/hadoop-2.6.0/logs/hadoop-hdfs-datanode-node3.out
node1: starting datanode, logging to /opt/module/hadoop-2.6.0/logs/hadoop-hdfs-datanode-node1.out
Starting secondary namenodes [node3]

查看此脚本内容

#Add other possible options
nameStartOpt="$nameStartOpt $@"

#---------------------------------------------------------
# namenodes

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

echo "Starting namenodes on [$NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
  --config "$HADOOP_CONF_DIR" \
  --hostnames "$NAMENODES" \
  --script "$bin/hdfs" start namenode $nameStartOpt

#---------------------------------------------------------
# datanodes (using default slaves file)

if [ -n "$HADOOP_SECURE_DN_USER" ]; then
  echo \
    "Attempting to start secure cluster, skipping datanodes. " \
    "Run start-secure-dns.sh as root to complete startup."
else
  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
    --config "$HADOOP_CONF_DIR" \
    --script "$bin/hdfs" start datanode $dataStartOpt
fi


# 此脚本的执行过程如下:
  1、在每台机器上启动namenode,这些机器由 hdfs getconf -namenodes得到的返回值确定
   2、通过slaves文件,在列举的每一台机器上启动DataNode
   3、在每台机器上启动一个secondarynamnode,这些机器由 hdfs getconf -secondarynamenode的返回值确定

HDFS相关命令

查看命令帮助

Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
                        Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      get all the existing block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.
$ hdfs

dfs命令

此命令是hadoop fs的子集,这两个命令的区别是hdfs dfs是只对于HDFS文件系统使用的,还有一种写法是hadoop dfs。而hadoop fs并不是只针对hdfs文件系统

$ hdfs dfs
Usage: hadoop fs [generic options]
    [-appendToFile <localsrc> ... <dst>]
    [-cat [-ignoreCrc] <src> ...]
    [-checksum <src> ...]
    [-chgrp [-R] GROUP PATH...]
    [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    [-chown [-R] [OWNER][:[GROUP]] PATH...]
    [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
    [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-count [-q] [-h] <path> ...]
    [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
    [-createSnapshot <snapshotDir> [<snapshotName>]]
    [-deleteSnapshot <snapshotDir> <snapshotName>]
    [-df [-h] [<path> ...]]
    [-du [-s] [-h] <path> ...]
    [-expunge]
    [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    [-getfacl [-R] <path>]
    [-getfattr [-R] {-n name | -d} [-e en] <path>]
    [-getmerge [-nl] <src> <localdst>]
    [-help [cmd ...]]
    [-ls [-d] [-h] [-R] [<path> ...]]
    [-mkdir [-p] <path> ...]
    [-moveFromLocal <localsrc> ... <dst>]
    [-moveToLocal <src> <localdst>]
    [-mv <src> ... <dst>]
    [-put [-f] [-p] [-l] <localsrc> ... <dst>]
    [-renameSnapshot <snapshotDir> <oldName> <newName>]
    [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
    [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
    [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
    [-setfattr {-n name [-v value] | -x name} <path>]
    [-setrep [-R] [-w] <rep> <path> ...]
    [-stat [format] <path> ...]
    [-tail [-f] <file>]
    [-test -[defsz] <path>]
    [-text [-ignoreCrc] <src> ...]
    [-touchz <path> ...]
    [-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
$ hdfs dfs

1、查看子命令帮助

-ls [-d] [-h] [-R] [<path> ...] :
  List the contents that match the specified file pattern. If path is not
  specified, the contents of /user/<currentUser> will be listed. Directory entries
  are of the form:
      permissions - userId groupId sizeOfDirectory(in bytes)
  modificationDate(yyyy-MM-dd HH:mm) directoryName
  
  and file entries are of the form:
      permissions numberOfReplicas userId groupId sizeOfFile(in bytes)
  modificationDate(yyyy-MM-dd HH:mm) fileName
                                                                                 
  -d  Directories are listed as plain files.                                     
  -h  Formats the sizes of files in a human-readable fashion rather than a number
      of bytes.                                                                  
  -R  Recursively list the contents of directories.                              
$ hdfs dfs -help ls

2、在hdfs中创建目录

$ hdfs dfs -mkdir /data
$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - hdfs supergroup          0 2020-03-04 22:39 /data
$ hdfs dfs -mkdir /data

3、在hdfs中创建文件

$ hdfs dfs -touchz /data/test.txt
$ hdfs dfs -ls /data
Found 1 items
-rw-r--r--   3 hdfs supergroup          0 2020-03-04 22:43 /data/test.txt
$ hdfs dfs -touchz /data/test.txt

4、向文件中追加内容

$ echo test-content >test.txt
$ hdfs dfs -appendToFile test.txt /data/test.txt

$ hdfs dfs -cat /data/test.txt
test-content
$ hdfs dfs -appendToFile test.txt /data/test.txt

5、拷贝hdfs中的文件到其他目录

[-cp [-f] [-p | -p[topax]] <src> ... <dst>]

-f      如果目标文件存在,会覆盖文件内容
-p     保持文件属性

$ hdfs dfs -cp /data/test.txt /
$ hdfs dfs -ls /
Found 2 items
drwxr-xr-x   - hdfs supergroup          0 2020-03-04 22:43 /data
-rw-r--r--   3 hdfs supergroup         13 2020-03-04 22:48 /test.txt
$ hdfs dfs -cp /data/test.txt /

6、删除文件

[-rm [-f] [-r|-R] [-skipTrash] <src> ...]

-f   如果文件不存在,忽略错误信息
-r   递归删除
-skipTrash  跳过回收站,直接删除指定文件

$ hdfs dfs -rm  /test.txt
20/03/04 22:59:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://node1:8020/test.txt' to trash at: hdfs://node1:8020/user/hdfs/.Trash/Current

$ hdfs dfs -rm -r -skipTrash /data
Deleted /data

$ hdfs dfs -rm /a
rm: `/a': No such file or directory

$ hdfs dfs -rm  -f /a
$ hdfs dfs -rm -r -skipTrash /data

7、上传本地文件到hdfs

$ hdfs dfs -put test.txt /

$ hdfs dfs -ls /
Found 2 items
-rw-r--r--   3 hdfs supergroup         13 2020-03-04 23:04 /test.txt
drwx------   - hdfs supergroup          0 2020-03-04 22:59 /user
$ hdfs dfs -put test.txt /

 8、下载hdfs中的文件

$ hdfs dfs -put /etc/profile /

$ hdfs dfs -ls /
Found 3 items
-rw-r--r--   3 hdfs supergroup       2048 2020-03-04 23:16 /profile
-rw-r--r--   3 hdfs supergroup         13 2020-03-04 23:04 /test.txt
drwx------   - hdfs supergroup          0 2020-03-04 22:59 /user

$ hdfs dfs -get /profile

$ ll
total 8
-rw-r--r-- 1 hdfs hdfs 2048 Mar  4 23:17 profile
-rw-rw-r-- 1 hdfs hdfs   13 Mar  4 22:45 test.txt
$ hdfs dfs -put /etc/profile /

9、查看本地文件系统中的文件

$ hdfs dfs -ls file:///home
Found 1 items
drwx------   - hdfs hdfs        146 2020-03-04 23:17 file:///home/hdfs
$ hdfs dfs -ls file:///home

10、从本地文件系统上传和下载文件

$ hdfs dfs -mkdir /data
$ hdfs dfs -copyFromLocal -p .ssh/ /data

$ hdfs dfs -ls -R /data
drwx------   - hdfs hdfs          0 2020-03-04 20:26 /data/.ssh
-rw-------   3 hdfs hdfs       1176 2020-03-04 20:31 /data/.ssh/authorized_keys
-rw-------   3 hdfs hdfs       1679 2020-03-04 19:57 /data/.ssh/id_rsa
-rw-r--r--   3 hdfs hdfs        392 2020-03-04 19:57 /data/.ssh/id_rsa.pub
-rw-r--r--   3 hdfs hdfs        531 2020-03-04 20:03 /data/.ssh/known_hosts

$ hdfs dfs -copyToLocal /data
$ ll
total 8
drwxrwxr-x 3 hdfs hdfs   18 Mar  4 23:28 data
-rw-r--r-- 1 hdfs hdfs 2048 Mar  4 23:17 profile
-rw-rw-r-- 1 hdfs hdfs   13 Mar  4 22:45 test.txt
$ hdfs dfs -copyToLocal /data

11、移动hdfs中的文件

$ hdfs dfs -mv /test.txt /data

$ hdfs dfs -ls /data
Found 2 items
drwx------   - hdfs hdfs                0 2020-03-04 20:26 /data/.ssh
-rw-r--r--   3 hdfs supergroup         13 2020-03-04 23:04 /data/test.txt
$ hdfs dfs -mv /test.txt /data

12、查看目录中文件数量及大小

$ hdfs dfs -count -q -h /test1
        none             inf         768.0 M          84.0 M            1            2              228 M /test1

输出格式如下:
文件数配额      剩余文件数配额   物理空间的quota(限制空间占用大小)  剩余物理空间    目录数    文件数       逻辑空间大小  路径

$ hdfs dfs -count -h /test1
           1            2              228 M /test1
$ hdfs dfs -count -q -h /test1

13、将本地文件或目录移动到hdfs

$ mkdir a
$ hdfs dfs -moveFromLocal a /data

$ ll
total 8
drwxrwxr-x 3 hdfs hdfs   18 Mar  4 23:28 data
-rw-r--r-- 1 hdfs hdfs 2048 Mar  4 23:17 profile
-rw-rw-r-- 1 hdfs hdfs   13 Mar  4 22:45 test.txt

$ hdfs dfs -ls /data
Found 3 items
drwx------   - hdfs hdfs                0 2020-03-04 20:26 /data/.ssh
drwxr-xr-x   - hdfs supergroup          0 2020-03-04 23:41 /data/a
-rw-r--r--   3 hdfs supergroup         13 2020-03-04 23:04 /data/test.txt
$ hdfs dfs -moveFromLocal a /data

 14、合并hdfs中的文件内容到本地文件系统文件中

# cat a.txt 
food 
flood
flood234
floodabc
floooddsfa

# hdfs dfs -getmerge /profile a.txt

# cat a.txt 
# /etc/profile

# System wide environment and startup programs, for login setup
# Functions and aliases go in /etc/bashrc

# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.
# hdfs dfs -getmerge /profile a.txt

15、查看hdfs文件系统可用空间

# hdfs dfs -df -h
Filesystem           Size   Used  Available  Use%
hdfs://node1:8020  51.0 G  216 K     35.7 G    0%
# hdfs dfs -df -h

16、查看指定目录使用空间

# hdfs dfs -du -s -h /data
3.7 K  /data
# hdfs dfs -du -s -h /data

17、查看文件时间戳

# hdfs dfs -stat /data
2020-03-04 15:41:50
# hdfs dfs -stat /data

18、降低文件复制系数

# hdfs dfs -setrep 2 /data
Replication 2 set: /data/.ssh/authorized_keys
Replication 2 set: /data/.ssh/id_rsa
Replication 2 set: /data/.ssh/id_rsa.pub
Replication 2 set: /data/.ssh/known_hosts
Replication 2 set: /data/test.txt
# hdfs dfs -setrep 2 /data

19、修改文件权限属性

 

 

$ hdfs dfs -chmod -R 755 /data
$ hdfs dfs -chmod -R 755 /data

20、修改文件和目录的属主、属组

改属组
$ hdfs dfs -chgrp -R supergroup /data

改属主
$ hdfs dfs -chown -R supergroup /data
View Code

getconf命令

1、获取namenode或Secondary NameNode主机名

$ hdfs getconf -namenodes
node1

$ hdfs getconf -secondaryNameNodes
node3
$ hdfs getconf -namenodes

2、查看最小块大小(默认1048576,修改需要设置成512的倍数)

$ hdfs getconf -confkey fs.defaultFS
hdfs://node1:8020

$ hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size
1048576
$ hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size

3、获取namenode的RPC地址

$ hdfs getconf -nnRpcAddresses
node1:8020
$ hdfs getconf -nnRpcAddresses

dfsadmin命令

1、查看帮助

$ hdfs dfsadmin
Usage: hdfs dfsadmin
Note: Administrative commands can only be run as the HDFS superuser.
    [-report [-live] [-dead] [-decommissioning]]
    [-safemode <enter | leave | get | wait>]
    [-saveNamespace]
    [-rollEdits]
    [-restoreFailedStorage true|false|check]
    [-refreshNodes]
    [-setQuota <quota> <dirname>...<dirname>]
    [-clrQuota <dirname>...<dirname>]
    [-setSpaceQuota <quota> <dirname>...<dirname>]
    [-clrSpaceQuota <dirname>...<dirname>]
    [-finalizeUpgrade]
    [-rollingUpgrade [<query|prepare|finalize>]]
    [-refreshServiceAcl]
    [-refreshUserToGroupsMappings]
    [-refreshSuperUserGroupsConfiguration]
    [-refreshCallQueue]
    [-refresh <host:ipc_port> <key> [arg1..argn]
    [-reconfig <datanode|...> <host:ipc_port> <start|status>]
    [-printTopology]
    [-refreshNamenodes datanode_host:ipc_port]
    [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
    [-setBalancerBandwidth <bandwidth in bytes per second>]
    [-fetchImage <local directory>]
    [-allowSnapshot <snapshotDir>]
    [-disallowSnapshot <snapshotDir>]
    [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
    [-getDatanodeInfo <datanode_host:ipc_port>]
    [-metasave filename]
    [-setStoragePolicy path policyName]
    [-getStoragePolicy path]
    [-help [cmd]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
$ hdfs dfsadmin

2、查看当前模式

$ hdfs dfsadmin -safemode get
Safe mode is OFF
$ hdfs dfsadmin -safemode get

3、离开安全模式

[-safemode <enter | leave | get | wait>]

enter         进入安全模式
leave         离开安全模式
get            查看当前模式
wait           安全模式的wait状态(如果为安全模式,则wait会阻塞)

$ hdfs dfsadmin -safemode enter
Safe mode is ON

$ hdfs dfsadmin -safemode get
Safe mode is ON

$ hdfs dfsadmin -safemode wait     

$ hdfs dfsadmin -safemode leave
Safe mode is OFF
$ hdfs dfsadmin -safemode

4、查看状态报告

$ hdfs dfsadmin -report
Configured Capacity: 54713647104 (50.96 GB)        # 已经配置的容量
Present Capacity: 38327644160 (35.70 GB)         #当前使用容量
DFS Remaining: 38327468032 (35.70 GB)         #剩余容量
DFS Used: 176128 (172 KB)            #hdfs使用的容量
DFS Used%: 0.00%                        #hdfs使用的百分比
Under replicated blocks: 0               #未复制充分的块数量
Blocks with corrupt replicas: 0         #坏块数量
Missing blocks: 0                            # 丢失块数量

-------------------------------------------------
Live datanodes (3):                         #存活的DataNode数量

Name: 10.0.0.31:50010 (node1)         #节点名(机架名)
Hostname: node1                              #主机名
Decommission Status : Normal          #次DataNode状态
Configured Capacity: 18237882368 (16.99 GB)
DFS Used: 73728 (72 KB)
Non DFS Used: 5539364864 (5.16 GB)
DFS Remaining: 12698443776 (11.83 GB)
DFS Used%: 0.00%
DFS Remaining%: 69.63%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 05 10:24:22 CST 2020


Name: 10.0.0.32:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 18237882368 (16.99 GB)
DFS Used: 73728 (72 KB)
Non DFS Used: 5147488256 (4.79 GB)
DFS Remaining: 13090320384 (12.19 GB)
DFS Used%: 0.00%
DFS Remaining%: 71.78%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 05 10:24:22 CST 2020


Name: 10.0.0.33:50010 (node3)
Hostname: node3
Decommission Status : Normal
Configured Capacity: 18237882368 (16.99 GB)
DFS Used: 28672 (28 KB)
Non DFS Used: 5699149824 (5.31 GB)
DFS Remaining: 12538703872 (11.68 GB)
DFS Used%: 0.00%
DFS Remaining%: 68.75%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Mar 05 10:24:22 CST 2020
$ hdfs dfsadmin -report

5、文件配额(设置配额为n,表示此目录只能存放n-1个目录或文件)

[-setQuota <quota> <dirname>...<dirname>] 

$ hdfs dfs -mkdir /test
$ hdfs dfsadmin -setQuota 5 /test
$ hdfs dfs -put conf.xml /test
$ hdfs dfs -put batch.properties /test
$ touch test1 test2 test3

$ hdfs dfs -put test* /test
put: The NameSpace quota (directories and files) of directory /test is exceeded: quota=5 file count=6
put: The NameSpace quota (directories and files) of directory /test is exceeded: quota=5 file count=6

$ hdfs dfs -ls /test
Found 4 items
-rw-r--r--   3 hdfs supergroup          0 2020-03-05 11:02 /test/batch.properties
-rw-r--r--   3 hdfs supergroup          0 2020-03-05 11:02 /test/conf.xml
-rw-r--r--   3 hdfs supergroup          0 2020-03-05 11:03 /test/test1
-rw-r--r--   3 hdfs supergroup          0 2020-03-05 11:03 /test/test2
$ hdfs dfsadmin -setQuota 5 /test

6、空间配额

$ dd if=/dev/zero of=b.txt bs=100M count=1
1+0 records in
1+0 records out
104857600 bytes (105 MB) copied, 2.15439 s, 48.7 MB/s

$ ll -h
total 51M
-rw-rw-r-- 1 hdfs hdfs  50M Mar  5 11:12 a.txt
-rw-rw-r-- 1 hdfs hdfs    0 Mar  5 11:02 batch.properties
-rw-rw-r-- 1 hdfs hdfs    0 Mar  5 10:51 conf.xml
drwxrwxr-x 3 hdfs hdfs   18 Mar  4 23:28 data
-rw-r--r-- 1 hdfs hdfs 2.0K Mar  4 23:17 profile
-rw-rw-r-- 1 hdfs hdfs    0 Mar  5 11:03 test1
-rw-rw-r-- 1 hdfs hdfs    0 Mar  5 11:03 test2
-rw-rw-r-- 1 hdfs hdfs    0 Mar  5 11:03 test3
-rw-rw-r-- 1 hdfs hdfs   13 Mar  4 22:45 test.txt

$ hdfs dfs -setrep 3 /test1

#一个blocksize(dfs.block.size, dfs.blocksize)为128M,我这里设置2个block大小,复制因子3,又增加了1KB
$ hdfs dfsadmin -setSpaceQuota 805309440 /test1

查看一下容量
$ hdfs dfs -count -q -h /test1
        none             inf         768.0 M         768.0 M            1            1                  0 /test1

$ hdfs dfs -put b.txt /test1

$ dd if=/dev/zero of=a.txt bs=128M count=1
1+0 records in
1+0 records out
134217728 bytes (134 MB) copied, 1.82081 s, 73.7 MB/s

$ hdfs dfs -put a.txt /test1

$ hdfs dfs -count -q -h /test1
        none             inf         768.0 M          84.0 M            1            2              228 M /test1

# 这时候再上传一个1字节的文件就报错,超出限额
$ hdfs dfs -put test.txt /test1
20/03/05 11:57:28 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /test1 is exceeded: quota = 805309440 B = 768.00 MB but diskspace consumed = 1119879168 B = 1.04 GB
    at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyDiskspaceQuota(DirectoryWithQuotaFeature.java:144)


明明剩余84M的可用配额,为什么上传1字节的文件都报错?
因为上传的文件和block块大小有关,我这里块大小blocksize是默认的128M,所以hdfs在写文件时,总配额的大小应该满足如下公式:
已写入文件的容量+blocksize*3=总容量
用这个总容量和实际的目录空间配额容量做比较,如果比后者大,就报错
View Code

7、清除配额

$ hdfs dfsadmin -clrQuota /test1
$ hdfs dfsadmin -clrSpaceQuota /test1
$ hdfs dfsadmin -clrQuota /test1

fsck磁盘检测

$ hdfs fsck /
Connecting to namenode via http://node1:50070
FSCK started by hdfs (auth:SIMPLE) from /10.0.0.31 for path / at Thu Mar 05 13:24:10 CST 2020
..............Status: HEALTHY
 Total size:    343938780 B        # 文件总大小
 Total dirs:    13                  # 目录数
 Total files:    14          # 文件数
 Total symlinks:        0          # 符号链接数
 Total blocks (validated):    10 (avg. block size 34393878 B)  # 可用块数量
 Minimally replicated blocks:    10 (100.0 %)      # 最小完全复制快数量
 Over-replicated blocks:    0 (0.0 %)               # 当前副本数大于指定副本数的块数量
 Under-replicated blocks:    0 (0.0 %)             # 当前副本数小于指定副本数的块数量
 Mis-replicated blocks:        0 (0.0 %)            # 丢失的副本块数
 Default replication factor:    3                   # 默认副本数
 Average block replication:    2.5                   # 平均副本块数量 大于2说明存在多余副本块    
 Corrupt blocks:        0                            #坏块数量
 Missing replicas:        0 (0.0 %)                  # 丢失副本数
 Number of data-nodes:        3                       #DataNode数量
 Number of racks:        1                           # 机架数
FSCK ended at Thu Mar 05 13:24:10 CST 2020 in 2 milliseconds
$ hdfs fsck /

检测坏块数

$ hdfs fsck  / -files -blocks
Connecting to namenode via http://node1:50070
FSCK started by hdfs (auth:SIMPLE) from /10.0.0.31 for path / at Thu Mar 05 13:39:31 CST 2020
/ <dir>
/data <dir>
/data/.ssh <dir>
/data/.ssh/authorized_keys 1176 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741829_1005 len=1176 repl=2

/data/.ssh/id_rsa 1679 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741830_1006 len=1679 repl=2

/data/.ssh/id_rsa.pub 392 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741831_1007 len=392 repl=2

/data/.ssh/known_hosts 531 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741832_1008 len=531 repl=2

/data/test.txt 13 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741827_1003 len=13 repl=2

/profile 2048 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741828_1004 len=2048 repl=3

/snap <dir>
/test <dir>
/test/batch.properties 0 bytes, 0 block(s):  OK

/test/conf.xml 0 bytes, 0 block(s):  OK

/test/test1 0 bytes, 0 block(s):  OK

/test/test2 0 bytes, 0 block(s):  OK

/test1 <dir>
/test1/a.txt 134217728 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741840_1016 len=134217728 repl=3

/test1/b.txt 104857600 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741839_1015 len=104857600 repl=3

/user <dir>
/user/hdfs <dir>
/user/hdfs/.Trash <dir>
/user/hdfs/.Trash/Current <dir>
/user/hdfs/.Trash/Current/data <dir>
/user/hdfs/.Trash/Current/data/a <dir>
/user/hdfs/.Trash/Current/test.txt 13 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741826_1002 len=13 repl=3

/user/hdfs/.Trash/Current/test1 <dir>
/user/hdfs/.Trash/Current/test1/b.txt 104857600 bytes, 1 block(s):  OK
0. BP-112428611-10.0.0.31-1583325452737:blk_1073741838_1014 len=104857600 repl=3

Status: HEALTHY
 Total size:    343938780 B
 Total dirs:    13
 Total files:    14
 Total symlinks:        0
 Total blocks (validated):    10 (avg. block size 34393878 B)
 Minimally replicated blocks:    10 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.5
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        3
 Number of racks:        1
FSCK ended at Thu Mar 05 13:39:31 CST 2020 in 2 milliseconds
View Code

删除坏块

$ hdfs fsck / -delete

oiv查看镜像文件

hadoop运行后产生临时文件的目录hadoop.tmp.dir里有两种很重要的文件镜像文件和编辑日志。

语法:hdfs oiv -p 文件类型 -i镜像文件 -o 转换后文件输出路径

$ hdfs oiv -p XML -i fsimage_0000000000000000258 -o ~/img258.xml

$ cat ~/img258.xml
<?xml version="1.0"?>
<fsimage><NameSection>
<genstampV1>1000</genstampV1><genstampV2>1017</genstampV2><genstampV1Limit>0</genstampV1Limit><lastAllocatedBlockId>1073741841</lastAllocatedBlockId><txid>258</txid></NameSection>
<INodeSection><lastInodeId>16426</lastInodeId><inode><id>16385</id><type>DIRECTORY</type><name></name><mtime>1583378291307</mtime><permission>hdfs:supergroup:rwxr-xr-x</permission><nsquota>9223372036854775807</nsquota><dsquota>-1</dsquota></inode>
<inode><id>16388</id><type>FILE</type><name>test.txt</name><replication>3</replication><mtime>1583333292056</mtime><atime>1583333291816</atime><perferredBlockSize>134217728</perferredBlockSize><permission>hdfs:supergroup:rw-r--r--</permission><blocks><block><id>1073741826</id><genstamp>1002</genstamp><numBytes>13</numBytes></block>
</blocks>

oev查看编辑日志

语法:hdfs oev -p 文件类型 -i编辑日志 -o 转换后文件输出路径

$ hdfs oev -p XML -i edits_0000000000000000001-0000000000000000001 -o ~/edit.xml 

$ cat ~/edit.xml
<?xml version="1.0" encoding="UTF-8"?>
<EDITS>
  <EDITS_VERSION>-60</EDITS_VERSION>
  <RECORD>
    <OPCODE>OP_START_LOG_SEGMENT</OPCODE>
    <DATA>
      <TXID>1</TXID>
    </DATA>
  </RECORD>
</EDITS>

 haadmin命令管理高可用集群

帮助信息

# hdfs haadmin
Usage: DFSHAAdmin [-ns <nameserviceId>]
    [-transitionToActive <serviceId> [--forceactive]]
    [-transitionToStandby <serviceId>]
    [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>]
    [-getServiceState <serviceId>]
    [-checkHealth <serviceId>]
    [-help <command>]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

1、激活namenode节点

[-transitionToActive <serviceId> [--forceactive]]

# hdfs getconf -confkey dfs.ha.namenodes.mycluster
nn1,nn2

# hdfs haadmin -transitionToActive nn1
# hdfs haadmin -transitionToActive nn1

2、查看节点状态

# hdfs haadmin -getServiceState nn1
active

3、将nn2设置为active状态

# hdfs haadmin -getServiceState nn1
active

# hdfs haadmin -transitionToActive nn2
transitionToActive: Node nn1 is already active
Usage: HAAdmin [-transitionToActive <serviceId> [--forceactive]]

# hdfs haadmin -transitionToActive --forceactive nn2

# hdfs haadmin -getServiceState nn2
active

# hdfs haadmin -getServiceState nn1
active

此时再get nn1和nn2的状态(由于有两个active节点,出现脑裂,在配置文件中定义了fencing,所以会kill掉一个namenode)

# hdfs haadmin -getServiceState nn1
20/03/06 21:40:28 INFO ipc.Client: Retrying connect to server: node1/10.0.0.31:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
Operation failed: Call From node1/10.0.0.31 to node1:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

# hdfs haadmin -getServiceState nn2
active

# ss -lntp
State       Recv-Q Send-Q         Local Address:Port                        Peer Address:Port              
LISTEN      0      128                        *:22                                     *:*                   users:(("sshd",pid=742,fd=3))
LISTEN      0      100                127.0.0.1:25                                     *:*                   users:(("master",pid=936,fd=13))
LISTEN      0      128                        *:8480                                   *:*                   users:(("java",pid=7167,fd=187))
LISTEN      0      128                        *:8485                                   *:*                   users:(("java",pid=7167,fd=199))
LISTEN      0      128                       :::22                                    :::*                   users:(("sshd",pid=742,fd=4))
LISTEN      0      100                      ::1:25                                    :::*                   users:(("master",pid=936,fd=14))

手动故障转移(dfs.ha.automatic-failover.enabled参数不能为true,即自动故障切换时无法使用)

# hdfs haadmin -getServiceState nn2
standby

# hdfs haadmin -getServiceState nn1
standby

# hdfs haadmin -transitionToActive --forceactive nn1

# hdfs haadmin -getServiceState nn1
active

# hdfs haadmin -failover -forcefence -forceactive -forcemanual nn1 nn2
You have specified the forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.

It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.

You may abort safely by answering 'n' or hitting ^C now.

Are you sure you want to continue? (Y or N) Y
20/03/06 21:59:51 INFO ha.NodeFencer: ====== Beginning Service Fencing Process... ======
20/03/06 21:59:51 INFO ha.NodeFencer: Trying method 1/2: org.apache.hadoop.ha.SshFenceByTcpPort(null)
20/03/06 21:59:52 INFO ha.SshFenceByTcpPort: Connecting to node1...
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Connecting to node1 port 22
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Connection established
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Remote version string: SSH-2.0-OpenSSH_7.4
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Local version string: SSH-2.0-JSCH-0.1.42
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: kex: server->client aes128-ctr hmac-sha1 none
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: kex: client->server aes128-ctr hmac-sha1 none
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
20/03/06 21:59:52 WARN SshFenceByTcpPort.jsch: Permanently added 'node1' (RSA) to the list of known hosts.
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Authentications that can continue: gssapi-with-mic,publickey,keyboard-interactive,password
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Next authentication method: gssapi-with-mic
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Authentications that can continue: publickey,keyboard-interactive,password
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Next authentication method: publickey
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Authentications that can continue: password
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Next authentication method: password
20/03/06 21:59:52 INFO SshFenceByTcpPort.jsch: Disconnecting from node1 port 22
20/03/06 21:59:52 WARN ha.SshFenceByTcpPort: Unable to connect to node1 as user root
com.jcraft.jsch.JSchException: Auth fail
    at com.jcraft.jsch.Session.connect(Session.java:452)
    at org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
    at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
    at org.apache.hadoop.ha.FailoverController.failover(FailoverController.java:216)
    at org.apache.hadoop.ha.HAAdmin.failover(HAAdmin.java:295)
    at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:455)
    at org.apache.hadoop.hdfs.tools.DFSHAAdmin.runCmd(DFSHAAdmin.java:120)
    at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:384)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hdfs.tools.DFSHAAdmin.main(DFSHAAdmin.java:132)
20/03/06 21:59:52 WARN ha.NodeFencer: Fencing method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
20/03/06 21:59:52 INFO ha.NodeFencer: Trying method 2/2: org.apache.hadoop.ha.ShellCommandFencer(/bin/true)
20/03/06 21:59:52 INFO ha.ShellCommandFencer: Launched fencing command '/bin/true' with pid 15701
20/03/06 21:59:52 INFO ha.NodeFencer: ====== Fencing successful by method org.apache.hadoop.ha.ShellCommandFencer(/bin/true) ======
Failover from nn1 to nn2 successful

# hdfs haadmin -getServiceState nn1
standby

# hdfs haadmin -getServiceState nn2
active
# hdfs haadmin -failover -forcefence -forceactive -forcemanual nn1 nn2

 

posted on 2020-03-05 14:33  hopeless-dream  阅读(919)  评论(0编辑  收藏  举报