管理Hadoop的配额

           管理Hadoop的配额

                                        作者:尹正杰

版权声明:原创作品,谢绝转载!否则将追究法律责任。

 

 

 

一.Hadoop的配额概述

  可以在HDFS目录上配置配额,由此可以限制用户或应用程序消耗的HDFS空间。

  HDFS的空间分配与底层Linux文件系统上的空间分配没有直接关系。

  Hadoop允许设置两种类型的配额,即空间配额和名称配额。
    名称配额:
      指定根目录树中的文件和目录的最大数量。
    空间配额:
      为单个目录使用的空间设置上限。

  温馨提示:
    如果创建了用户的主(家)目录但未向用户授予名称配额或空间配额,则用户在HDFS中具有无限存储空间,这是很不好的操作。
    名称配额和空间配额不是特定于用户的,而是特定于目录的。

 

二.管理名称配额

  可以通过指定名称配额来限制任何目录中的文件数和目录数。如果用户尝试创建超出指定配额的文件或目录,则文件或目录将创建失败。

  我们可以通过下面的命令检查配额信息(此时我们还为给HDFS配置任何配额):
    [root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v  -h /user/root    #使用"-q"选项可以查看到空间配额和名称配额相关信息哟~
           QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
            none             inf            none             inf           15           18            374.0 M /user/root
    [root@hadoop101.yinzhengjie.com ~]# 

    相关术语解释如下:
      QUOTA:
        名称配额相关信息,即文件和目录的限制。
      REM_QUOTA:
        此用户可以创建的配额中剩余文件和目录数。
      SPACE_QUOTA:
        授予此用户的空间配额。
      REM_SPACE_QUOTA:
        此用户剩余空间配额。
      DIR_COUNT:
        目录数。
      FILE_COUNT:
        文件数。
       CONTENT_SIZE:
        文件大小
      PATHNAME:
        路径名称。

1>.为HDFS设置名称配额

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setQuota        #使用"dfsadmin -setQuota"命令设置目录的HDFS名称配额
-setQuota <quota> <dirname>...<dirname>: Set the quota <quota> for each directory <dirName>.
        The directory quota is a long integer that puts a hard limit
        on the number of names in the directory tree
        For each directory, attempt to set the quota. An error will be reported if
        1. quota is not a positive integer, or
        2. User is not an administrator, or
        3. The directory does not exist or is a file.
        Note: A quota of 1 would force the directory to remain empty.

[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setQuota        #使用"dfsadmin -setQuota"命令设置目录的HDFS名称配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -R /user/root 
drwx------   - root admingroup          0 2020-08-15 08:00 /user/root/.Trash
drwx------   - root admingroup          0 2020-08-14 19:32 /user/root/.Trash/200814193733
-rw-r--r--   3 root admingroup        490 2020-08-14 19:31 /user/root/.Trash/200814193733/fstab
-rw-r--r--   3 root admingroup      10779 2020-08-14 19:32 /user/root/.Trash/200814193733/sysctl.conf
drwxr-xr-x   - root admingroup          0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2
drwxr-xr-x   - root admingroup          0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2/sub1
drwxr-xr-x   - root admingroup          0 2020-08-14 19:04 /user/root/.Trash/200814193733/test2/sub1/sub2
drwx------   - root admingroup          0 2020-08-14 19:21 /user/root/.Trash/200814193733/yinzhengjie
drwx------   - root admingroup          0 2020-08-15 00:04 /user/root/.Trash/200815080000
-rw-r--r--   3 root  admingroup          0 2020-08-14 22:47 /user/root/.Trash/200815080000/a.txt
-rw-r--r--   3 root  admingroup  392115733 2020-08-14 23:25 /user/root/.Trash/200815080000/hadoop-2.10.0.tar.gz
-rw-r--r--   3 root  admingroup          0 2020-08-14 22:58 /user/root/.Trash/200815080000/hdfs2020.log
-rw-r--r--   3 root  admingroup         26 2020-08-14 23:42 /user/root/.Trash/200815080000/hostname
-rw-r--r--   3 root  admingroup        371 2020-08-14 23:49 /user/root/.Trash/200815080000/hosts2020
-rw-r--r--   3 root  admingroup         69 2020-08-14 23:14 /user/root/.Trash/200815080000/wc.txt.gz
drwx-w-r-x   - jason admingroup          0 2020-08-14 21:46 /user/root/.Trash/200815080000/yinzhengjie
drwx-w-r-x   - jason admingroup          0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data
drwx-w-r-x   - jason admingroup          0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data/hadoop
drwx-w-r-x   - jason admingroup          0 2020-08-14 07:07 /user/root/.Trash/200815080000/yinzhengjie/data/hadoop/hdfs
drwx-w-r-x   - jason admingroup          0 2020-08-14 21:46 /user/root/.Trash/200815080000/yinzhengjie/softwares
drwxr-xr-x   - root  admingroup          0 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020
-rw-r--r--   3 root  admingroup         69 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/wc.txt.gz
drwxr-xr-x   - root  admingroup          0 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d
-rw-r--r--   3 root  admingroup       1664 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Base.repo
-rw-r--r--   3 root  admingroup       1309 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-CR.repo
-rw-r--r--   3 root  admingroup        649 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Debuginfo.repo
-rw-r--r--   3 root  admingroup        630 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Media.repo
-rw-r--r--   3 root  admingroup       1331 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Sources.repo
-rw-r--r--   3 root  admingroup       5701 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-Vault.repo
-rw-r--r--   3 root  admingroup        314 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/CentOS-fasttrack.repo
-rw-r--r--   3 root  admingroup       1050 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/epel-testing.repo
-rw-r--r--   3 root  admingroup        951 2020-08-14 23:48 /user/root/.Trash/200815080000/yinzhengjie2020/yum.repos.d/epel.repo
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -R /user/root | wc -l          #"/user/root"下有32个文件,包含"/user/root"目录共计33个目录和文件
32
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -R /user/root | wc -l          #"/user/root"下有32个文件,包含"/user/root"目录共计33个目录和文件
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v  -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf            none             inf           15           18            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -setQuota 35 /user/root          #我们为"/user/root"目录设置名称配额大小为35,注意观察"count"命令的统计信息哟~
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
          35               2            none             inf           15           18            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 

2>.验证名称配额是否生效

[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
          35               2            none             inf           15           18            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /user/root          #接下来我们在已配置的名称配额的目录下创建文件和目录进行验证是否立即生效。
Found 1 items
drwx------   - root admingroup          0 2020-08-15 08:00 /user/root/.Trash
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -touchz /user/root/a.txt
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -mkdir /user/root/test
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
          35               0            none             inf           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -touchz /user/root/b.txt
touchz: The NameSpace quota (directories and files) of directory /user/root is exceeded: quota=35 file count=36
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -mkdir /user/root/test02
mkdir: The NameSpace quota (directories and files) of directory /user/root is exceeded: quota=35 file count=36
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /user/root
Found 3 items
drwx------   - root admingroup          0 2020-08-15 08:00 /user/root/.Trash
-rw-r--r--   3 root admingroup          0 2020-08-19 18:32 /user/root/a.txt
drwxr-xr-x   - root admingroup          0 2020-08-19 18:32 /user/root/test
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /user/root          #接下来我们在已配置的名称配额的目录下创建文件和目录进行验证是否立即生效。

3>.清除当前名称配额 

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrQuota         #使用"dfsadmin clrQuota"命令可以清除当前的名称配额
-clrQuota <dirname>...<dirname>: Clear the quota for each directory <dirName>.
        For each directory, attempt to clear the quota. An error will be reported if
        1. the directory does not exist or is a file, or
        2. user is not an administrator.
        It does not fault if the directory has no quota.
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrQuota         #使用"dfsadmin clrQuota"命令可以清除当前的名称配额
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
          35               0            none             inf           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -clrQuota /user/root      #使用该命令成功清除名称配额
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -q -v -h /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf            none             inf           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 

 

三.管理空间配额

  可以对分配给HDFS下特定目录的存储设置限制,此配额是目录中所有文件可以使用的字节数。一旦目录用完其分配的空间配额,用户和应用程序将无法在目录中创建文件。

  空间配额对HDFS目录树中的所有文件可以使用的磁盘空间设置硬性限制。可以通过设置用户的主目录或用户与其它用户共享的其它目录来限制用户的空间消耗。如果不在目录上设置空间配额,则意味着该目录的磁盘空间配额不受限制,它可以使用整个HDFS。

  在配置空间配额时,重要的是要理解,在HDFS中,必须有足够的空间配额来容纳整个块。如果用户在分配的配额中有200MB的空闲空间,先不论副本因子等因素,不管你要存储的文件大小如何,如果HDFS块大小大于200MB(如256MB),则无法创建新文件。

  温馨提示:
     空间配额包括所有复制的数据。如果用户设置了30GB的配额,则该用户可以通过在其HDFS目录中存储10GB的实际数据(使用默认复制因子3,HDFS存储10GB x 3 = 30GB的数据)来消耗配额。

1>.为HDFS设置空间配额

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setSpaceQuota
-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>: Set the space quota <quota> for each directory <dirName>.
        The space quota is a long integer that puts a hard limit
        on the total size of all the files under the directory tree.
        The extra space required for replication is also counted. E.g.
        a 1GB file with replication of 3 consumes 3GB of the quota.

        Quota can also be specified with a binary prefix for terabytes,
        petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).
        For each directory, attempt to set the quota. An error will be reported if
        1. quota is not a positive integer or zero, or
        2. user is not an administrator, or
        3. the directory does not exist or is a file.
        The storage type specific quota is set when -storageType option is specified.
        Available storageTypes are 
        - RAM_DISK
        - DISK
        - SSD
        - ARCHIVE
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help setSpaceQuota
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root        #注意观察空间配额信息
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf            none             inf           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -setSpaceQuota 2g /user/root    #此处我仅为"/user/root"目录设置2G的空间配额
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             2 G         926.1 M           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 


温馨提示:
  (1)为目录设置空间配额其包括副本因子的容量;
  (2)我们可以同时为多个目录设置空间配额;    

2>.验证空间配额是否生效

[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root            #观察剩余空间配额的容量为926.1MB
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             2 G         926.1 M           16           19            374.0 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll -h
total 375M
-rw-r--r-- 1 root root 374M Aug 10 15:42 hadoop-2.10.0.tar.gz
-rw------- 1 root root 265K Aug 20 16:49 messages
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -put messages /user/root                #请思考为什么上传一个265K的文件会抛出异常说配额不足?
put: The DiskSpace quota of /user/root is exceeded: quota = 2147483648 B = 2 GB but diskspace consumed = 2787036144 B = 2.60 GB
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=32m -put messages /user/root   #为什么现在有可以成功上传啦? 
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             2 G         925.3 M           16           20            374.2 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=128m -put hadoop-2.10.0.tar.gz /user/root      #为什么这里指定了块大小依旧上传失败呢?
put: The DiskSpace quota of /user/root is exceeded: quota = 2147483648 B = 2 GB but diskspace consumed = 2385196977 B = 2.22 GB
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -D dfs.blocksize=128m -D dfs.replication=1 -put hadoop-2.10.0.tar.gz /user/root      #为什么这样配置又上传成功啦?
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root            #请思考为什么剩余空间配额还有551.3MB呢?
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             2 G         551.3 M           16           21            748.2 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls -h /user/root
Found 5 items
drwx------   - root admingroup          0 2020-08-15 08:00 /user/root/.Trash
-rw-r--r--   3 root admingroup          0 2020-08-19 18:32 /user/root/a.txt
-rw-r--r--   1 root admingroup    374.0 M 2020-08-20 17:13 /user/root/hadoop-2.10.0.tar.gz
-rw-r--r--   3 root admingroup    265.0 K 2020-08-20 17:12 /user/root/messages
drwxr-xr-x   - root admingroup          0 2020-08-19 18:32 /user/root/test
[root@hadoop101.yinzhengjie.com ~]# 

3>.清除当前空间配额

[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrSpaceQuota
-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>: Clear the space quota for each directory <dirName>.
        For each directory, attempt to clear the quota. An error will be reported if
        1. the directory does not exist or is a file, or
        2. user is not an administrator.
        It does not fault if the directory has no quota.
        The storage type specific quota is cleared when -storageType option is specified.
        Available storageTypes are 
        - RAM_DISK
        - DISK
        - SSD
        - ARCHIVE
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -help clrSpaceQuota
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root      #注意观察空间配额信息
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf             2 G         551.3 M           16           21            748.2 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfsadmin -clrSpaceQuota /user/root    #清除空间配额
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -count -h -v -q /user/root
       QUOTA       REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT   FILE_COUNT       CONTENT_SIZE PATHNAME
        none             inf            none             inf           16           21            748.2 M /user/root
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# 

 

posted @ 2020-07-18 00:04  JasonYin2020  阅读(991)  评论(0编辑  收藏  举报