12.25日报

完成大型数据库实验二熟悉常用的HDFS操作,以下为实验内容

实验2

熟悉常用的HDFS操作

 

 

1.实验目的

(1)理解HDFS在Hadoop体系结构中的角色;

(2)熟练使用HDFS操作常用的Shell命令;

(3)熟悉HDFS操作常用的Java API。

2. 实验平台

(1)操作系统:Linux(建议Ubuntu16.04或Ubuntu18.04);

(2)Hadoop版本:3.1.3;

(3)JDK版本:1.8;

(4)Java IDE:Eclipse。

3. 实验步骤

(一)编程实现以下功能,并利用Hadoop提供的Shell命令完成相同任务:

(1)     向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,则由用户来指定是追加到原有文件末尾还是覆盖原有的文件;

(2)     从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名;

(3)     将HDFS中指定文件的内容输出到终端中;

(4)     显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息;

(5)     给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息;

(6)     提供一个HDFS内的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录;

 

(7)     提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在,则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录;

(8)     向HDFS中指定的文件追加内容,由用户指定内容追加到原有文件的开头或结尾;

(9)     删除HDFS中指定的文件;

(10)   在HDFS中,将文件从源路径移动到目的路径。

 

(二)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInputStream”,要求如下:实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。

(三)查看Java帮助手册或其它资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中。

 

4.实验报告

题目:

熟悉常用的HDFS操作

姓名

李健龙

 

日期

2024/12/5

实验环境:Ubuntu 18.04.6 LTS Hadoop 3.1.3

实验内容与完成情况:

3073 SecondaryNameNode

2726 NameNode

2873 DataNode

hadoop@hadoop:~/hadoop$ hadoop fs -ls /

 

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2024-07-19 22:06 /tmp

drwxr-xr-x   - hadoop supergroup          0 2024-11-04 17:08 /user

hadoop@hadoop:~/hadoop$

hadoop@hadoop:~/hadoop$ ^C

hadoop@hadoop:~/hadoop$ hadoop fs -put /path/to/localfile /path/in/hdfs

put: `/path/in/hdfs': No such file or directory: `hdfs://localhost:9000/path/in/hdfs'

hadoop@hadoop:~/hadoop$ hadoop fs -put /path/to/localfile /user

put: `/path/to/localfile': No such file or directory

hadoop@hadoop:~/hadoop$ ^C

hadoop@hadoop:~/hadoop$ cat /home/hadoop/sample.txt

cat: /home/hadoop/sample.txt: 没有那个文件或目录

hadoop@hadoop:~/hadoop$ cat /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "Hadoop HDFS 文件系统" > /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "====================" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "1. HDFS是一个分布式文件系统,用于存储大规模数据。" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "2. 它具有高容错性,并能提供高吞吐量的数据访问。" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "3. HDFS的核心组件包括NameNode、DataNode和SecondaryNameNode。" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "4. 文件在HDFS中以块的形式存储,块大小默认是128MB。" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ echo "5. HDFS支持数据冗余,确保数据的可靠性。" >> /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ # 先检查文件是否存在于HDFS中

hadoop@hadoop:~/hadoop$ hadoop fs -test -e /user/sample.txt

hadoop@hadoop:~/hadoop$ if [ $? -eq 0 ]; then

>   echo "文件已存在,是否覆盖 (y/n)?"

>   read choice

>   if [ "$choice" == "y" ]; then

>     hadoop fs -put -f /home/hadoop/sample.txt /user

>     echo "文件已覆盖"

>   else

>     echo "选择追加内容"

>     cat /home/hadoop/sample.txt | hadoop fs -appendToFile - /user/sample.txt

>     echo "内容已追加到文件末尾"

>   fi

> else

>   hadoop fs -put /home/hadoop/sample.txt /user

>   echo "文件已上传"

> fi

2024-11-11 16:15:56,750 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

文件已上传

hadoop@hadoop:~/hadoop$ ^C

hadoop@hadoop:~/hadoop$ hadoop fs -ls /user

Found 2 items

drwxr-xr-x   - hadoop supergroup          0 2024-11-04 17:10 /user/hadoop

-rw-r--r--   1 hadoop supergroup        385 2024-11-11 16:15 /user/sample.txt

hadoop@hadoop:~/hadoop$ hadoop fs -get /user/sample.txt /home/hadoop/sample_download.txt

2024-11-11 16:17:04,040 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

hadoop@hadoop:~/hadoop$ hadoop fs -cat /user/sample.txt

2024-11-11 16:17:17,969 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

Hadoop HDFS 文件系统

====================

 

1. HDFS是一个分布式文件系统,用于存储大规模数据。

2. 它具有高容错性,并能提供高吞吐量的数据访问。

3. HDFS的核心组件包括NameNode、DataNode和SecondaryNameNode。

4. 文件在HDFS中以块的形式存储,块大小默认是128MB。

5. HDFS支持数据冗余,确保数据的可靠性。

hadoop@hadoop:~/hadoop$ hadoop fs -ls -l /user/sample.txt

-ls: Illegal option -l

Usage: hadoop fs [generic options]

    [-appendToFile <localsrc> ... <dst>]

    [-cat [-ignoreCrc] <src> ...]

    [-checksum <src> ...]

    [-chgrp [-R] GROUP PATH...]

    [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]

    [-chown [-R] [OWNER][:[GROUP]] PATH...]

    [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]

    [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

    [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]

    [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]

    [-createSnapshot <snapshotDir> [<snapshotName>]]

    [-deleteSnapshot <snapshotDir> <snapshotName>]

    [-df [-h] [<path> ...]]

    [-du [-s] [-h] [-v] [-x] <path> ...]

    [-expunge]

    [-find <path> ... <expression> ...]

    [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]

    [-getfacl [-R] <path>]

    [-getfattr [-R] {-n name | -d} [-e en] <path>]

    [-getmerge [-nl] [-skip-empty-file] <src> <localdst>]

    [-head <file>]

    [-help [cmd ...]]

    [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]

    [-mkdir [-p] <path> ...]

    [-moveFromLocal <localsrc> ... <dst>]

    [-moveToLocal <src> <localdst>]

    [-mv <src> ... <dst>]

    [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]

    [-renameSnapshot <snapshotDir> <oldName> <newName>]

    [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]

    [-rmdir [--ignore-fail-on-non-empty] <dir> ...]

    [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]

    [-setfattr {-n name [-v value] | -x name} <path>]

    [-setrep [-R] [-w] <rep> <path> ...]

    [-stat [format] <path> ...]

    [-tail [-f] [-s <sleep interval>] <file>]

    [-test -[defsz] <path>]

    [-text [-ignoreCrc] <src> ...]

    [-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...]

    [-touchz <path> ...]

    [-truncate [-w] <length> <path> ...]

    [-usage [cmd ...]]

 

Generic options supported are:

-conf <configuration file>        specify an application configuration file

-D <property=value>               define a value for a given property

-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.

-jt <local|resourcemanager:port>  specify a ResourceManager

-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster

-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath

-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

 

The general command line syntax is:

command [genericOptions] [commandOptions]

 

Usage: hadoop fs [generic options] -ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]

hadoop@hadoop:~/hadoop$ hadoop fs -ls -R /user

drwxr-xr-x   - hadoop supergroup          0 2024-11-04 17:10 /user/hadoop

drwxr-xr-x   - hadoop supergroup          0 2024-11-04 17:10 /user/hadoop/test

-rw-r--r--   1 hadoop supergroup       4009 2024-11-04 17:10 /user/hadoop/test/.bashrc

-rw-r--r--   1 hadoop supergroup        385 2024-11-11 16:15 /user/sample.txt

hadoop@hadoop:~/hadoop$ hadoop fs -mkdir -p /user/newdir

hadoop@hadoop:~/hadoop$ hadoop fs -put /home/hadoop/sample.txt /user/newdir/sample.txt

2024-11-11 16:17:58,136 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

hadoop@hadoop:~/hadoop$ hadoop fs -rm /user/newdir/sample.txt

Deleted /user/newdir/sample.txt

hadoop@hadoop:~/hadoop$ hadoop fs -mkdir -p /user/newdir

hadoop@hadoop:~/hadoop$ hadoop fs -rm -r /user/newdir

Deleted /user/newdir

hadoop@hadoop:~/hadoop$ # 追加到文件末尾

hadoop@hadoop:~/hadoop$ hadoop fs -appendToFile /home/hadoop/sample.txt /user/sample.txt

2024-11-11 16:18:33,211 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

hadoop@hadoop:~/hadoop$

hadoop@hadoop:~/hadoop$ # 如果用户指定追加到开头,HDFS本身不直接支持此操作,你需要先下载文件,修改内容后重新上传

hadoop@hadoop:~/hadoop$ hadoop fs -get /user/sample.txt /home/hadoop/sample.txt

get: `/home/hadoop/sample.txt': File exists

hadoop@hadoop:~/hadoop$ echo "新内容" | cat - /home/hadoop/sample.txt > /home/hadoop/temp.txt

hadoop@hadoop:~/hadoop$ mv /home/hadoop/temp.txt /home/hadoop/sample.txt

hadoop@hadoop:~/hadoop$ hadoop fs -put -f /home/hadoop/sample.txt /user/sample.txt

2024-11-11 16:18:41,903 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

hadoop@hadoop:~/hadoop$ hadoop fs -rm /user/sample.txt

Deleted /user/sample.txt

hadoop@hadoop:~/hadoop$ hadoop fs -mv /user/sample.txt /user/backup/sample.txt

mv: `/user/backup/sample.txt': No such file or directory: `hdfs://localhost:9000/user/backup/sample.txt'

hadoop@hadoop:~/hadoop$ hadoop fs -cat /etc/hadoop/core-site.xml

 

cat: `/etc/hadoop/core-site.xml': No such file or directory

hadoop@hadoop:~/hadoop$

hadoop@hadoop:~/hadoop$ hdfs getconf -confKey fs.defaultFS

hdfs://localhost:9000/

 

 

 

 

 

 

 

 

 

出现的问题:

解决方案(列出遇到的问题和解决办法,列出没有解决的问题):

posted @ 2025-01-14 20:13  Code13  阅读(15)  评论(0)    收藏  举报