HDFS missing blocks处理

情况:在HDFS 的web页面中可以看到missing blocks的信息;

 

1、hdfs fsck命令

$ hdfs fsck
Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]]
    # 你要检测的目录,如果不写默认为根目录 /
    <path>  start checking from this path
    # 把损坏的文件移动到/lost+found
    -move   move corrupted files to /lost+found
    # 直接删除损坏的文件
    -delete delete corrupted files
    # 打印被检测的文件
    -files  print out files being checked
    # 打印检测中的正在被写入的文件
    -openforwrite   print out files opened for write
    # 检测的文件包括系统snapShot快照目录下的
    -includeSnapshots   include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
    # 打印损坏的块及其所属的文件
    -list-corruptfileblocks print out list of missing blocks and files they belong to
    # 打印 block 的信息
    -blocks print out block report
    # 打印 block 的位置,即在哪个节点
    -locations  print out locations for every block
    # 打印 block 所在rack
    -racks  print out network topology for data-node locations
    # 打印 block 存储的策略信息
    -storagepolicies    print out storage policy summary for the blocks
    # 打印指定blockId所属块的状况,位置等信息
    -blockId    print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

 

2、使用命令查看HDFS情况

#检查文件系统情况

hdfs fsck /

#最后一行显示如下内容
The filesystem under path '/' is CORRUPT

 

#列出有问题的文件

hdfs fsck -list-corruptfileblocks

 

3、查看具体的损坏数据块

#找一个损坏的数据块看一下

hdfs fsck -locations -blocks -files  /user/hive/warehouse/warehouse/tmp_bi_test_cluster2.db/tab1/000000_0

hdfs dfs -cat /user/hive/warehouse/warehouse/tmp_bi_test_cluster2.db/tab1/000000_0

定位到数据块文件为:BP-835567911-172.17.26.204-1663860618746:blk_1073742040_1216

 

#可以去主机上看一下这个数据块文件是否存在

#进入存储目录
[hdfs@hadoop-204 subdir0]$ pwd
/data/hadoop/hdfs/dn/current/BP-835567911-172.17.26.204-1663860618746/current/finalized/subdir0

#可见subdir0中最小的编号为blk_1073766400 ,我们损坏的为blk_1073742040,此数据块已经没有了
[hdfs@hadoop-204 subdir0]$ ll subdir0 |head 
total 8868
-rw------- 1 hdfs hdfs  117033 Mar 15 12:27 blk_1073766400
-rw------- 1 hdfs hdfs     923 Mar 15 12:27 blk_1073766400_25606.meta
-rw------- 1 hdfs hdfs  117031 Mar 15 13:27 blk_1073766422
-rw------- 1 hdfs hdfs     923 Mar 15 13:27 blk_1073766422_25628.meta
-rw------- 1 hdfs hdfs  117033 Mar 15 13:27 blk_1073766428
-rw------- 1 hdfs hdfs     923 Mar 15 13:27 blk_1073766428_25634.meta
-rw------- 1 hdfs hdfs  115465 Mar 15 13:33 blk_1073766437
-rw------- 1 hdfs hdfs     911 Mar 15 13:33 blk_1073766437_25643.meta
-rw------- 1 hdfs hdfs  115466 Mar 15 13:33 blk_1073766438

 

4、处理missing blocks

#先拿到损坏的块路径

#输出到文件中
[hdfs@hadoop-204 ~]$ hdfs fsck -list-corruptfileblocks >>./fsck.log 2>&1

#去掉头、尾几行不是数据块路径的信息
[hdfs@hadoop-204 ~]$ vim fsck.log 

#只取数据块路径
[hdfs@hadoop-204 ~]$ cat fsck.log |awk '{print $2}' >>currupt.txt

#看一下行数是否和hdfs fsck -list-corruptfileblocks命令查到的一致,并检查一下文件内容
[hdfs@hadoop-204 ~]$ wc -l currupt.txt
[hdfs@hadoop-204 ~]$ vim currupt.txt

 

#使用命令将损坏的块删除,数据块多的话可以写一个脚本,如果是生产环境操作需谨慎

#命令
hdfs fsck -delete /user/hive/warehouse/s8/000000_0

 

#一个简单的脚本,如果是生产环境操作需谨慎

[hdfs@hadoop-204 ~]$ cat del_currupt.sh 
#!/bin/bash
for file in `cat currupt.txt`
do
  hdfs fsck -delete ${file}  >>del_currupt.log 2>&1
done

 

posted @ 2023-04-19 10:06  米兰的小铁將  阅读(523)  评论(0编辑  收藏  举报