HDFS missing blocks处理
情况:在HDFS 的web页面中可以看到missing blocks的信息;
1、hdfs fsck命令
$ hdfs fsck Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] # 你要检测的目录,如果不写默认为根目录 / <path> start checking from this path # 把损坏的文件移动到/lost+found -move move corrupted files to /lost+found # 直接删除损坏的文件 -delete delete corrupted files # 打印被检测的文件 -files print out files being checked # 打印检测中的正在被写入的文件 -openforwrite print out files opened for write # 检测的文件包括系统snapShot快照目录下的 -includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it # 打印损坏的块及其所属的文件 -list-corruptfileblocks print out list of missing blocks and files they belong to # 打印 block 的信息 -blocks print out block report # 打印 block 的位置,即在哪个节点 -locations print out locations for every block # 打印 block 所在rack -racks print out network topology for data-node locations # 打印 block 存储的策略信息 -storagepolicies print out storage policy summary for the blocks # 打印指定blockId所属块的状况,位置等信息 -blockId print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)
2、使用命令查看HDFS情况
#检查文件系统情况
hdfs fsck / #最后一行显示如下内容 The filesystem under path '/' is CORRUPT
#列出有问题的文件
hdfs fsck -list-corruptfileblocks
3、查看具体的损坏数据块
#找一个损坏的数据块看一下
hdfs fsck -locations -blocks -files /user/hive/warehouse/warehouse/tmp_bi_test_cluster2.db/tab1/000000_0 hdfs dfs -cat /user/hive/warehouse/warehouse/tmp_bi_test_cluster2.db/tab1/000000_0
定位到数据块文件为:BP-835567911-172.17.26.204-1663860618746:blk_1073742040_1216
#可以去主机上看一下这个数据块文件是否存在
#进入存储目录 [hdfs@hadoop-204 subdir0]$ pwd /data/hadoop/hdfs/dn/current/BP-835567911-172.17.26.204-1663860618746/current/finalized/subdir0 #可见subdir0中最小的编号为blk_1073766400 ,我们损坏的为blk_1073742040,此数据块已经没有了 [hdfs@hadoop-204 subdir0]$ ll subdir0 |head total 8868 -rw------- 1 hdfs hdfs 117033 Mar 15 12:27 blk_1073766400 -rw------- 1 hdfs hdfs 923 Mar 15 12:27 blk_1073766400_25606.meta -rw------- 1 hdfs hdfs 117031 Mar 15 13:27 blk_1073766422 -rw------- 1 hdfs hdfs 923 Mar 15 13:27 blk_1073766422_25628.meta -rw------- 1 hdfs hdfs 117033 Mar 15 13:27 blk_1073766428 -rw------- 1 hdfs hdfs 923 Mar 15 13:27 blk_1073766428_25634.meta -rw------- 1 hdfs hdfs 115465 Mar 15 13:33 blk_1073766437 -rw------- 1 hdfs hdfs 911 Mar 15 13:33 blk_1073766437_25643.meta -rw------- 1 hdfs hdfs 115466 Mar 15 13:33 blk_1073766438
4、处理missing blocks
#先拿到损坏的块路径
#输出到文件中 [hdfs@hadoop-204 ~]$ hdfs fsck -list-corruptfileblocks >>./fsck.log 2>&1 #去掉头、尾几行不是数据块路径的信息 [hdfs@hadoop-204 ~]$ vim fsck.log #只取数据块路径 [hdfs@hadoop-204 ~]$ cat fsck.log |awk '{print $2}' >>currupt.txt #看一下行数是否和hdfs fsck -list-corruptfileblocks命令查到的一致,并检查一下文件内容 [hdfs@hadoop-204 ~]$ wc -l currupt.txt [hdfs@hadoop-204 ~]$ vim currupt.txt
#使用命令将损坏的块删除,数据块多的话可以写一个脚本,如果是生产环境操作需谨慎
#命令 hdfs fsck -delete /user/hive/warehouse/s8/000000_0
#一个简单的脚本,如果是生产环境操作需谨慎
[hdfs@hadoop-204 ~]$ cat del_currupt.sh #!/bin/bash for file in `cat currupt.txt` do hdfs fsck -delete ${file} >>del_currupt.log 2>&1 done