Cloudera Certified Associate Administrator案例之Manage篇

      Cloudera Certified Associate Administrator案例之Manage

                                      作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

 

 

一.下载Namenode镜像文件

问题描述:
  公司集群的Namenode今天发生了故障,你想通过分析Fsimage文件来排查问题。你需要下载最新的fsimage文件,命名为"timestamp_xxx",其中xxx为以秒为单位的Unix时间戳,代表你操作时的当前时间,并上传到HDFS的/yinzhengjie/debug/hdfs/log/目录下。

解决方案:
  这里涉及到hdfs命令的dfsadmin,dfs指令,以及基本Linux命令的使用。这些知识我们尽量不要查官方文档或者简单看一下命令的help输出就能操作。

1>.下载镜像文件

[root@node101.yinzhengjie.org.cn ~]# ll
total 0
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfsadmin -fetchImage ./            #你得确保HDFS集群时正常运行的,否则下载会失败哟~
19/06/15 15:27:57 INFO namenode.TransferFsImage: Opening connection to http://node101.yinzhengjie.org.cn:50070/imagetransfer?getimage=1&txid=latest
19/06/15 15:27:57 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
19/06/15 15:27:57 INFO namenode.TransferFsImage: Transfer took 0.02s at 3263.16 KB/s
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# ll
total 64
-rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578
[root@node101.yinzhengjie.org.cn ~]# 

2>.将镜像文件进行重命名操作

[root@node101.yinzhengjie.org.cn ~]# ll
total 64
-rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# mv fsimage_0000000000000004578 timestamp_`date +%s`
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# ll
total 64
-rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# 

3>.如果不存在目录就得手动创建hdfs上的路径

[root@node101.yinzhengjie.org.cn ~]# su hdfs        #由于HDFS默认开启了sample认证功能,因此我们要切换用户,否则会抛异常"Permission denied"
[hdfs@node101.yinzhengjie.org.cn /root]$ 
[hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -mkdir -p /yinzhengjie/debug/hdfs/log
[hdfs@node101.yinzhengjie.org.cn /root]$  
[hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -chmod 777 /yinzhengjie/debug/hdfs/log/
[hdfs@node101.yinzhengjie.org.cn /root]$ 
[hdfs@node101.yinzhengjie.org.cn /root]$ exit
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# 

4>.将日志上传到hdfs上

[root@node101.yinzhengjie.org.cn ~]# ll
total 64
-rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -copyFromLocal timestamp_1560583829 /yinzhengjie/debug/hdfs/log/
[root@node101.yinzhengjie.org.cn ~]# 
[root@node101.yinzhengjie.org.cn ~]# hdfs dfs -ls /yinzhengjie/debug/hdfs/log/
Found 1 items
-rw-r--r--   3 root supergroup      64384 2019-06-15 15:35 /yinzhengjie/debug/hdfs/log/timestamp_1560583829
[root@node101.yinzhengjie.org.cn ~]# 

 

二.手动均衡DataNode数据

问题描述:
  公司的集群新扩充了一批工作节点,但是新的工作节点上没有数据,造成整个集群数据分布不均衡。
  你知道HDFS的balancer功能可以解决这个问题。请将balancer操作占用的带宽限制为1G以内,并以阈值5启动balancer操作。 解决方案:
  如果面试官问你这个问题那基本上就是送分题,我们只需要执行balancer即可。

1>.点击"HDFS"

2>.点击配置,搜索关键字"dfs.datanode.balance.bandwidth"

3>.将每个 DataNode 可用于平衡的最大带宽为1GB

4>.搜索关键字"重新平衡阈值"(或搜索英文"Threshold")

5>.修改重新平衡阈值为5

 

三.调小HDFS的副本数(将副本数为3的改为副本数为2

问题描述:
  你发现公司集群的HDFS集群总容量使用已经超过了80%,使用了默认的三个副本,现在想要将某个目录较大的文件副本数从3个副本改为2个副本,从而节省一定的容量。

解决方案:
  如果遇到面试官问你这样的问题,那么恭喜你又是一道送分题。

1>.上传文件到HDFS集群中

[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ ll /yinzhengjie/softwares/jdk1.8.0_201/
total 25980
drwxr-xr-x 2 10 143     4096 Dec 16 03:45 bin
-r--r--r-- 1 10 143     3244 Dec 16 03:45 COPYRIGHT
drwxr-xr-x 3 10 143      132 Dec 16 03:45 include
-rw-r--r-- 1 10 143  5207434 Dec 12  2018 javafx-src.zip
drwxr-xr-x 5 10 143      185 Dec 16 03:45 jre
drwxr-xr-x 5 10 143      245 Dec 16 03:45 lib
-r--r--r-- 1 10 143       40 Dec 16 03:45 LICENSE
drwxr-xr-x 4 10 143       47 Dec 16 03:45 man
-r--r--r-- 1 10 143      159 Dec 16 03:45 README.html
-rw-r--r-- 1 10 143      424 Dec 16 03:45 release
-rw-r--r-- 1 10 143 21103945 Dec 16 03:45 src.zip
-rw-r--r-- 1 10 143   108109 Dec 12  2018 THIRDPARTYLICENSEREADME-JAVAFX.txt
-r--r--r-- 1 10 143   155002 Dec 16 03:45 THIRDPARTYLICENSEREADME.txt
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -put /yinzhengjie/softwares/jdk1.8.0_201/* /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -put /yinzhengjie/softwares/jdk1.8.0_201/* /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
Found 13 items
-rw-r--r--   3 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
-rw-r--r--   3 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
-rw-r--r--   3 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
-rw-r--r--   3 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
-rw-r--r--   3 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
-rw-r--r--   3 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
-rw-r--r--   3 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
-rw-r--r--   3 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:20:48 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................Status: HEALTHY
 Total size:    397764951 B
 Total dirs:    205
 Total files:    1635
 Total symlinks:        0
 Total blocks (validated):    1614 (avg. block size 246446 B)
 Minimally replicated blocks:    1614 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0        #很显然,当前目录的文件副本书为3
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        4
 Number of racks:        1
FSCK ended at Sat Jun 15 18:20:48 CST 2019 in 78 milliseconds


The filesystem under path '/yinzhengjie/data' is HEALTHY
[hdfs@node101.yinzhengjie.org.cn ~]$ 

2>.将HDFS一个目录的文件副本数改为2

[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 2 -R -w /yinzhengjie/data/
......
Replication 2 set: /yinzhengjie/data/man/man1/javadoc.1
Replication 2 set: /yinzhengjie/data/man/man1/javafxpackager.1
Replication 2 set: /yinzhengjie/data/man/man1/javah.1
Replication 2 set: /yinzhengjie/data/man/man1/javap.1
Replication 2 set: /yinzhengjie/data/man/man1/javapackager.1
Replication 2 set: /yinzhengjie/data/man/man1/javaws.1
Replication 2 set: /yinzhengjie/data/man/man1/jcmd.1
Replication 2 set: /yinzhengjie/data/man/man1/jconsole.1
Replication 2 set: /yinzhengjie/data/man/man1/jdb.1
Replication 2 set: /yinzhengjie/data/man/man1/jdeps.1
Replication 2 set: /yinzhengjie/data/man/man1/jhat.1
Replication 2 set: /yinzhengjie/data/man/man1/jinfo.1
Replication 2 set: /yinzhengjie/data/man/man1/jjs.1
Replication 2 set: /yinzhengjie/data/man/man1/jmap.1
Replication 2 set: /yinzhengjie/data/man/man1/jmc.1
Replication 2 set: /yinzhengjie/data/man/man1/jps.1
Replication 2 set: /yinzhengjie/data/man/man1/jrunscript.1
Replication 2 set: /yinzhengjie/data/man/man1/jsadebugd.1
Replication 2 set: /yinzhengjie/data/man/man1/jstack.1
Replication 2 set: /yinzhengjie/data/man/man1/jstat.1
Replication 2 set: /yinzhengjie/data/man/man1/jstatd.1
Replication 2 set: /yinzhengjie/data/man/man1/jvisualvm.1
Replication 2 set: /yinzhengjie/data/man/man1/keytool.1
Replication 2 set: /yinzhengjie/data/man/man1/native2ascii.1
Replication 2 set: /yinzhengjie/data/man/man1/orbd.1
Replication 2 set: /yinzhengjie/data/man/man1/pack200.1
Replication 2 set: /yinzhengjie/data/man/man1/policytool.1
Replication 2 set: /yinzhengjie/data/man/man1/rmic.1
Replication 2 set: /yinzhengjie/data/man/man1/rmid.1
Replication 2 set: /yinzhengjie/data/man/man1/rmiregistry.1
Replication 2 set: /yinzhengjie/data/man/man1/schemagen.1
Replication 2 set: /yinzhengjie/data/man/man1/serialver.1
Replication 2 set: /yinzhengjie/data/man/man1/servertool.1
Replication 2 set: /yinzhengjie/data/man/man1/tnameserv.1
Replication 2 set: /yinzhengjie/data/man/man1/unpack200.1
Replication 2 set: /yinzhengjie/data/man/man1/wsgen.1
Replication 2 set: /yinzhengjie/data/man/man1/wsimport.1
Replication 2 set: /yinzhengjie/data/man/man1/xjc.1
Replication 2 set: /yinzhengjie/data/release
Replication 2 set: /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 2 -R -w /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
Found 13 items
-rw-r--r--   2 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
-rw-r--r--   2 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
-rw-r--r--   2 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
-rw-r--r--   2 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
-rw-r--r--   2 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
-rw-r--r--   2 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
-rw-r--r--   2 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
-rw-r--r--   2 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................Status: HEALTHY
 Total size:    397764951 B
 Total dirs:    205
 Total files:    1635
 Total symlinks:        0
 Total blocks (validated):    1614 (avg. block size 246446 B)
 Minimally replicated blocks:    1614 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.0      #当前集群的副本数为2
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        4
 Number of racks:        1
FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds


The filesystem under path '/yinzhengjie/data' is HEALTHY
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ 

  

四.调大HDFS的副本数(将副本数为2的改为副本数为3

问题描述:
  对集群进行例行检查的时候,你发现有个别重要文件的副本数只有两个,而集群默认的副本书参数为3个,并没有修改过。请解决"/yinzhengjie/data/"目录下文件的副本数不足的问题。

解决方案:
  HDFS命令的基本用法要熟练掌握,面试的时候如果考察HDFS的命令那几乎就是送分题。

1>.修改目录下所有文件的副本数

[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
Found 13 items
-rw-r--r--   2 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
-rw-r--r--   2 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
-rw-r--r--   2 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
-rw-r--r--   2 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
-rw-r--r--   2 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
-rw-r--r--   2 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
-rw-r--r--   2 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
-rw-r--r--   2 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................Status: HEALTHY
 Total size:    397764951 B
 Total dirs:    205
 Total files:    1635
 Total symlinks:        0
 Total blocks (validated):    1614 (avg. block size 246446 B)
 Minimally replicated blocks:    1614 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.0      #当前副本数为2
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        4
 Number of racks:        1
FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds


The filesystem under path '/yinzhengjie/data' is HEALTHY
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 3  /yinzhengjie/data/
......
Replication 3 set: /yinzhengjie/data/man/man1/javap.1
Replication 3 set: /yinzhengjie/data/man/man1/javapackager.1
Replication 3 set: /yinzhengjie/data/man/man1/javaws.1
Replication 3 set: /yinzhengjie/data/man/man1/jcmd.1
Replication 3 set: /yinzhengjie/data/man/man1/jconsole.1
Replication 3 set: /yinzhengjie/data/man/man1/jdb.1
Replication 3 set: /yinzhengjie/data/man/man1/jdeps.1
Replication 3 set: /yinzhengjie/data/man/man1/jhat.1
Replication 3 set: /yinzhengjie/data/man/man1/jinfo.1
Replication 3 set: /yinzhengjie/data/man/man1/jjs.1
Replication 3 set: /yinzhengjie/data/man/man1/jmap.1
Replication 3 set: /yinzhengjie/data/man/man1/jmc.1
Replication 3 set: /yinzhengjie/data/man/man1/jps.1
Replication 3 set: /yinzhengjie/data/man/man1/jrunscript.1
Replication 3 set: /yinzhengjie/data/man/man1/jsadebugd.1
Replication 3 set: /yinzhengjie/data/man/man1/jstack.1
Replication 3 set: /yinzhengjie/data/man/man1/jstat.1
Replication 3 set: /yinzhengjie/data/man/man1/jstatd.1
Replication 3 set: /yinzhengjie/data/man/man1/jvisualvm.1
Replication 3 set: /yinzhengjie/data/man/man1/keytool.1
Replication 3 set: /yinzhengjie/data/man/man1/native2ascii.1
Replication 3 set: /yinzhengjie/data/man/man1/orbd.1
Replication 3 set: /yinzhengjie/data/man/man1/pack200.1
Replication 3 set: /yinzhengjie/data/man/man1/policytool.1
Replication 3 set: /yinzhengjie/data/man/man1/rmic.1
Replication 3 set: /yinzhengjie/data/man/man1/rmid.1
Replication 3 set: /yinzhengjie/data/man/man1/rmiregistry.1
Replication 3 set: /yinzhengjie/data/man/man1/schemagen.1
Replication 3 set: /yinzhengjie/data/man/man1/serialver.1
Replication 3 set: /yinzhengjie/data/man/man1/servertool.1
Replication 3 set: /yinzhengjie/data/man/man1/tnameserv.1
Replication 3 set: /yinzhengjie/data/man/man1/unpack200.1
Replication 3 set: /yinzhengjie/data/man/man1/wsgen.1
Replication 3 set: /yinzhengjie/data/man/man1/wsimport.1
Replication 3 set: /yinzhengjie/data/man/man1/xjc.1
Replication 3 set: /yinzhengjie/data/release
Replication 3 set: /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 3 /yinzhengjie/data/

2>.验证是否副本数是否修改成功

[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
Found 13 items
-rw-r--r--   3 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
-rw-r--r--   3 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
-rw-r--r--   3 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
-rw-r--r--   3 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
-rw-r--r--   3 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
-rw-r--r--   3 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
-rw-r--r--   3 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
-rw-r--r--   3 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
[hdfs@node101.yinzhengjie.org.cn ~]$
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:37:24 CST 2019
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...................................Status: HEALTHY
 Total size:    397764951 B
 Total dirs:    205
 Total files:    1635
 Total symlinks:        0
 Total blocks (validated):    1614 (avg. block size 246446 B)
 Minimally replicated blocks:    1614 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0      #当前集群的副本数为3
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        4
 Number of racks:        1
FSCK ended at Sat Jun 15 18:37:24 CST 2019 in 17 milliseconds


The filesystem under path '/yinzhengjie/data' is HEALTHY
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/

 

五.将HDFS一个文件以指定的块大小复制到另一个目录

问题描述:
  你发现集群中一些大文件的块大小为64MB,导致MapReduce作业使用这些文件时,默认会产生较多的map数量,造成资源浪费。
  你决定将这些文件以128MB的块大小备份到另一个目录中。请将"/yinzhengjie/data/input"下的文件以128MB的块大小备份到"/yinzhengjie/data/output"下。 解决方案:
  这道题主要考察对HDFS的理解,HDFS文件的块大小处理集群默认配置外,还可以针对每个文件单独设置,但一旦设定后就不能修改,只能重新拷贝一份。

1>.将HDFS一个文件以64MB的块大小复制到另一个目录

[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir -p /yinzhengjie/data/input
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/debug/hdfs/log
Found 1 items
-rw-r--r--   3 root supergroup      64384 2019-06-15 16:37 /yinzhengjie/debug/hdfs/log/timestamp_1560583829
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=67108864 -cp /yinzhengjie/debug/hdfs/log/timestamp_1560583829 /yinzhengjie/data/input
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/data/input
Found 1 items
-rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=67108864 -cp /yinzhengjie/debug/hdfs/log/timestamp_1560583829 /yinzhengjie/data/input

2>.确认集群默认的块大小(如下图所示,默认的块大小已经时256MB啦,因此备份时需要指定块大小的参数,如果默认值时128MB咱们就不用指定块大小的参数啦)

3>.创建备份目录,并将数据拷贝至该目录

[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data/output
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=134217728 -cp /yinzhengjie/data/input/timestamp_1560583829  /yinzhengjie/data/output
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls  /yinzhengjie/data/input
Found 1 items
-rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls  /yinzhengjie/data/output
Found 1 items
-rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:59 /yinzhengjie/data/output/timestamp_1560583829
[hdfs@node101.yinzhengjie.org.cn ~]$ 
[hdfs@node101.yinzhengjie.org.cn ~]$ 

 

posted @ 2019-06-10 05:39  尹正杰  阅读(299)  评论(0编辑  收藏  举报