HDFS简单配置笔记

第五章:HDFS
一、操作HDFS
1、Web Console:端口50070
2、命令行:有两种类型
3、Java API

二、HDFS输出数据的原理(画图):比较重要
1、数据上传的原理(过程)
2、数据下载的原理(过程)

缓存元信息的内存:1000M
/root/training/hadoop-2.7.3/etc/hadoop
文件:hadoop-env.sh
# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP_NAMENODE_INIT_HEAPSIZE=""

三、HDFS的高级特性
1、回收站: recyclebin
日志
-rmr: 删除目录,包括子目录
hdfs dfs -rmr /bbb
日志:
17/12/08 20:32:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /bbb

)默认,HDFS的回收站是关闭
)启用回收站:参数---> core-site.xml
本质:删除数据的时候,实际是一个ctrl+x操作


fs.trash.interval
1440

日志:
hdfs dfs -rmr /folder1
rmr: DEPRECATED: Please use 'rm -r' instead.
17/12/11 21:05:57 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://bigdata11:9000/folder1' to trash at: hdfs://bigdata11:9000/user/root/.Trash/Current
(*)恢复:实际就是cp,拷贝
hdfs dfs -cp /user/root/.Trash/Current/input/data.txt /input

清空:hdfs dfs -expunge

(*)补充:Oracle数据库也有回收站
SQL> select * from tab;

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE
BONUS TABLE
DEPT TABLE
EMP TABLE
RESULT TABLE
SALGRADE TABLE

6 rows selected.

SQL> -- drop table mydemo1;
SQL> show recyclebin;
ORIGINAL NAME RECYCLEBIN NAME OBJECT TYPE DROP TIME
---------------- ------------------------------ ------------ -------------------
MYDEMO1 BIN$WBSNMvxJpWvgUAB/AQBygg==$0 TABLE 2017-09-01:06:56:15
SQL> select count() from mydemo1;
select count(
) from mydemo1
*
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> select count() from BIN$WBSNMvxJpWvgUAB/AQBygg==$0;
select count(
) from BIN$WBSNMvxJpWvgUAB/AQBygg==$0
*
ERROR at line 1:
ORA-00933: SQL command not properly ended

SQL> select count(*) from "BIN$WBSNMvxJpWvgUAB/AQBygg==$0";

COUNT(*)
----------
30

SQL> flashback table mydemo1 to before drop;

Flashback complete.

SQL> show recyclebin;
SQL> select count(*) from mydemo1;

COUNT(*)
----------
30

2、快照snapshot:备份 ---> 一般来说:不建议使用快照

)默认:HDFS的快照是禁用的
)第一步:管理员开启某个目录的快照功能
[-allowSnapshot ]
[-disallowSnapshot ]

hdfs dfsadmin -allowSnapshot /mydir1

(*)第二步:使用HDFS的操作命令,创建快照
[-createSnapshot []]
[-deleteSnapshot ]
[-renameSnapshot ]

hdfs dfs -createSnapshot /mydir1 mydir1_backup_01
日志:Created snapshot /mydir1/.snapshot/mydir1_backup_01
本质:将数据拷贝一份到当前目录的一个隐藏目录下

(*)继续试验
hdfs dfs -put student02.txt /mydir1
hdfs dfs -createSnapshot /mydir1 mydir1_backup_02

对比快照: hdfs snapshotDiff /mydir1 mydir1_backup_01 mydir1_backup_02
Difference between snapshot mydir1_backup_01 and snapshot mydir1_backup_02 under directory /mydir1:
M .
+ ./student02.txt

3、配额quota:(1)名称配额: 规定某个目录下,存放文件(目录)的个数
实际的个数:N-1个
[-setQuota ...]
[-clrQuota ...]

hdfs dfs -mkdir /quota1
设置该目录的名称配额:3
hdfs dfsadmin -setQuota 3 /quota1

当我们放第三个文件的时候
hdfs dfs -put data.txt /quota1
put: The NameSpace quota (directories and files) of directory /quota1 is exceeded: quota=3 file count=4

(2)空间配额: 规定某个目录下,文件的大小
[-setSpaceQuota [-storageType ] ...]
[-clrSpaceQuota [-storageType ] ...]

hdfs dfs -mkdir /quota2
设置该目录的空间配额是:10M
hdfs dfsadmin -setSpaceQuota 10M /quota2

正确的做法:hdfs dfsadmin -setSpaceQuota 130M /quota2

放一个小于10M的文件,会出错
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.DSQuotaExceededException): The DiskSpace quota of /quota2 is exceeded: quota = 10485760 B = 10 MB but diskspace consumed = 134217728 B = 128 MB

注意:尽管数据不到128M,但是占用的数据块依然是128M
切记:当设置空间配额的时候,这个值不能小于128M

4、HDFS安全模式: safemode ---> HDFS只读
命令: hdfs dfsadmin -safemode get|wait|leave|enter
作用:检查数据块的副本率,如果副本率不满足要求,就会进行水平复制

6、HDFS的集群:开个头
集群的两大功能:负载均衡,高可用(失败迁移)

(1)NameNode联盟(Federation) ----> HDFS

(2)HA: HDFS、Yarn、HBase、Storm、Spark ---> 都需要ZooKeeper

posted @ 2018-02-09 15:30  好奇的小码农  阅读(169)  评论(0)    收藏  举报