hadoop3.x-ec常用命令操作

一、EC原理

 二、常用命令与对应解释

1.查看当前支持的EC策略
hdfs ec -listPolicies
2023-05-30 10:10:43,251 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED

  

上面其中  RS-6-3-1024k 策略是开启的,后面的state=enabled

 2. 查看目录或者文件支持的EC策略(新建的目录或者文件不会指定策略)

hdfs ec -getPolicy -path /test.txt
2023-05-30 10:36:06,524 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The erasure coding policy of /test.txt is unspecified

这个文件还没有设置策略(未说明的)

 

3. 设置ec策略与更换策略到对应目录与文件中

 

关闭对应的策略
hdfs ec -disablePolicy -policy RS-6-3-1024k
2023-05-30 11:37:28,365 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Erasure coding policy RS-6-3-1024k is disabled
[root@worker1 ~]# hdfs ec -listPolicies 2023-05-30 11:37:37,981 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Erasure Coding Policies: ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=DISABLED ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=DISABLED ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED

  

开启对应策略
[root@worker1 ~]# hdfs ec -enablePolicy -policy RS-3-2-1024k
2023-05-30 11:40:18,905 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Erasure coding policy RS-3-2-1024k is enabled
[root@worker1 ~]# hdfs ec -listPolicies
2023-05-30 11:40:25,288 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=DISABLED
ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2], State=ENABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=DISABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED

  

给不是EC的文件设置策略会抛异常(RemoteException: Attempt to set an erasure coding policy for a file /test.txt)

给目录设置EC策略不会抛异常,但是已经存在目录下的文件不会被转为EC编码(Warning: setting erasure coding policy on a non-empty directory will not automatically convert existing files to RS-3-2-1024k erasure coding policy),设置好策略后的目录,新上传的文件就是使用对应的EC策略来编码的

[root@worker1 ~]# hdfs dfs -ls /
2023-05-30 11:42:13,816 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 root supergroup          6 2023-05-29 14:50 /test.txt
drwxr-xr-x   - root supergroup          0 2023-05-30 11:19 /usr
[root@worker1 ~]# hdfs ec -getPolicy -path /test.txt
2023-05-30 11:42:39,194 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The erasure coding policy of /test.txt is unspecified
[root@worker1 ~]# hdfs ec -getPolicy -path /usr
2023-05-30 11:42:47,192 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The erasure coding policy of /usr is unspecified
[root@worker1 ~]# hdfs ec -setPolicy -path /usr -policy RS-3-2-1024k
2023-05-30 11:43:43,737 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Set RS-3-2-1024k erasure coding policy on /usr
Warning: setting erasure coding policy on a non-empty directory will not automatically convert existing files to RS-3-2-1024k erasure coding policy
[root@worker1 ~]# hdfs ec -setPolicy -path /test.txt -policy RS-3-2-1024k
2023-05-30 11:44:13,565 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
RemoteException: Attempt to set an erasure coding policy for a file /test.txt
[root@worker1 ~]# hdfs ec -getPolicy -path /usr
2023-05-30 11:44:35,958 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
RS-3-2-1024k
[root@worker1 ~]# hdfs ec -getPolicy -path /test.txt
2023-05-30 11:44:46,720 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
The erasure coding policy of /test.txt is unspecified
[root@worker1 ~]# hdfs dfs -cp /test.txt /usr
2023-05-30 11:45:24,300 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2023-05-30 11:45:25,427 WARN erasurecode.ErasureCodeNative: ISA-L support is not available in your platform... using builtin-java codec where applicable
2023-05-30 11:45:25,521 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 11:45:25,856 WARN hdfs.DFSOutputStream: Cannot allocate parity block(index=3, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[]
2023-05-30 11:45:25,856 WARN hdfs.DFSOutputStream: Cannot allocate parity block(index=4, policy=RS-3-2-1024k). Not enough datanodes? Exclude nodes=[]
2023-05-30 11:45:25,858 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 11:45:26,055 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2023-05-30 11:45:26,512 WARN hdfs.DFSOutputStream: Block group <1> failed to write 2 blocks. It's at high risk of losing data.
[root@worker1 ~]# hdfs ec -getPolicy -path /usr/test.txt
2023-05-30 11:45:38,334 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
RS-3-2-1024k
[root@worker1 ~]#

  

4. 查看EC编码的文件的信息

hdfs fsck /usr/test.txt -files -blocks -locations
2023-05-30 11:53:31,626 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Connecting to namenode via http://master:50070/fsck?ugi=root&files=1&blocks=1&locations=1&path=%2Fusr%2Ftest.txt
FSCK started by root (auth:SIMPLE) from /172.16.20.239 for path /usr/test.txt at Tue May 30 11:53:32 CST 2023
/usr/test.txt 6 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s):  OK
0. BP-132737199-172.16.20.156-1685330616691:blk_-9223372036854775792_1003 len=6 Live_repl=3  
[blk_-9223372036854775792:DatanodeInfoWithStorage[172.16.20.239:9866,DS-909e37f9-2ba2-4c24-b777-367cd8c16c72,DISK],
blk_-9223372036854775789:DatanodeInfoWithStorage[172.16.20.193:9866,DS-e8d953b7-f21e-41f1-8fd3-a273cd3d49a1,DISK],
blk_-9223372036854775788:DatanodeInfoWithStorage[172.16.20.156:9866,DS-105ef4d2-e454-4acc-bbc7-1cc49ed7bfa5,DISK]
] Status: HEALTHY Number of data-nodes: 3 Number of racks: 1 Total dirs: 0 Total symlinks: 0 Replicated Blocks: Total size: 0 B Total files: 0 Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks: 0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor: 1 Average block replication: 0.0 Missing blocks: 0 Corrupt blocks: 0 Missing replicas: 0 Erasure Coded Block Groups: Total size: 6 B Total files: 1 Total block groups (validated): 1 (avg. block group size 6 B) Minimally erasure-coded block groups: 1 (100.0 %) Over-erasure-coded block groups: 0 (0.0 %) Under-erasure-coded block groups: 0 (0.0 %) Unsatisfactory placement block groups: 0 (0.0 %) Average block group size: 3.0 Missing block groups: 0 Corrupt block groups: 0 Missing internal blocks: 0 (0.0 %) FSCK ended at Tue May 30 11:53:32 CST 2023 in 2 milliseconds The filesystem under path '/usr/test.txt' is HEALTHY

  上面   0. BP-132737199-172.16.20.156-1685330616691:blk_-9223372036854775792_1003 len=6 Live_repl=3 表示就是实际的数据块,其余2个为校验块

 

总结:使用RS-3-2-1024k策略时,一次分割最小单位1024k,为1M
  1. 如果不够1M,不分割,存一块,校验块与数据块大小一样.
  2. 如果够分割,则按1M大小均匀分割成指定的数据块。如大于3M,每块会均匀分割(总的就3块),不足1M直接存放一块数据块.

 

posted @ 2023-05-30 14:43  新际航  阅读(285)  评论(0编辑  收藏  举报