24.10.28
实验2
熟悉常用的HDFS操作
1.实验目的
(1)理解HDFS在Hadoop体系结构中的角色;
(2)熟练使用HDFS操作常用的Shell命令;
(3)熟悉HDFS操作常用的Java API。
2. 实验平台
(1)操作系统:Linux(建议Ubuntu16.04或Ubuntu18.04);
(2)Hadoop版本:2.7.3;
(3)JDK版本:1.8;
(4)Java IDE:IDEA。
3. 实验步骤
(一)编程实现以下功能,并利用Hadoop提供的Shell命令完成相同任务:
(1) 向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,则由用户来指定是追加到原有文件末尾还是覆盖原有的文件;
Bash
upload_file_to_hdfs() {
local local_file=$1
local hdfs_path=$2
local mode=$3 # "append" 或 "overwrite"
if hdfs dfs -test -e "$hdfs_path"; then
if [[ $mode == "append" ]]; then
hdfs dfs -appendToFile "$local_file" "$hdfs_path"
elif [[ $mode == "overwrite" ]]; then
hdfs dfs -put -f "$local_file" "$hdfs_path"
else
echo "无效的模式:$mode"
fi
else
hdfs dfs -put "$local_file" "$hdfs_path"
fi
}
(2) 从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名;
download_file_from_hdfs() {
local hdfs_file=$1
local local_dir=$2
local filename=$(basename "$hdfs_file")
if [[ -f "$local_dir/$filename" ]]; then
local timestamp=$(date +%s)
local new_filename="${filename%.*}_$timestamp.${filename##*.}"
hdfs dfs -get "$hdfs_file" "$local_dir/$new_filename"
else
hdfs dfs -get "$hdfs_file" "$local_dir"
fi
}
(3) 将HDFS中指定文件的内容输出到终端中;
output_file_content() {
local hdfs_file=$1
hdfs dfs -cat "$hdfs_file"
}
(4) 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息;
show_file_info() {
local hdfs_file=$1
hdfs dfs -stat "%n %r %b %y %h" "$hdfs_file"
}
(5) 给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息;
show_directory_info() {
local hdfs_dir=$1
hdfs dfs -ls -R "$hdfs_dir"
}
(6) 提供一个HDFS内的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录;
manage_file_in_hdfs() {
local hdfs_file=$1
local action=$2 # "create" 或 "delete"
if [[ $action == "create" ]]; then
local dir_path=$(dirname "$hdfs_file")
hdfs dfs -mkdir -p "$dir_path"
hdfs dfs -touchz "$hdfs_file"
elif [[ $action == "delete" ]]; then
hdfs dfs -rm "$hdfs_file"
else
echo "无效的操作:$action"
fi
}
(7) 提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在,则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录;
manage_directory_in_hdfs() {
local hdfs_dir=$1
local action=$2 # "create" 或 "delete"
local force=$3 # "force" 仅用于删除
if [[ $action == "create" ]]; then
hdfs dfs -mkdir -p "$hdfs_dir"
elif [[ $action == "delete" ]]; then
if [[ $force == "force" ]]; then
hdfs dfs -rm -r "$hdfs_dir"
else
hdfs dfs -rmdir "$hdfs_dir"
fi
else
echo "无效的操作:$action"
fi
}
(8) 向HDFS中指定的文件追加内容,由用户指定内容追加到原有文件的开头或结尾;
append_content_to_file() {
local content=$1
local hdfs_file=$2
local position=$3 # "head" 或 "tail"
local tmp_file=$(mktemp)
echo "$content" > "$tmp_file"
if [[ $position == "head" ]]; then
hdfs dfs -cat "$hdfs_file" >> "$tmp_file"
hdfs dfs -put -f "$tmp_file" "$hdfs_file"
elif [[ $position == "tail" ]]; then
hdfs dfs -appendToFile "$tmp_file" "$hdfs_file"
else
echo "无效的位置:$position"
fi
rm "$tmp_file"
}
(9) 删除HDFS中指定的文件;
delete_file_in_hdfs() {
local hdfs_file=$1
hdfs dfs -rm "$hdfs_file"
}
(10) 在HDFS中,将文件从源路径移动到目的路径。
move_file_in_hdfs() {
local src_path=$1
local dest_path=$2
hdfs dfs -mv "$src_path" "$dest_path"
}
首先需要向idea资源目录下新建lib目录,然后使用find命令查找所有和hdfs相关的jar包,下载下来导入并添加为库,之后就可以正确运行了
package org.example.hdfs;
import org.apache.hadoop.fs.*;
import java.io.IOException;
public class HDFSOperations {
// 上传文件到 HDFS
public static void uploadFile(String localFilePath, String hdfsPath, boolean overwrite, boolean append) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path srcPath = new Path(localFilePath);
Path destPath = new Path(hdfsPath);
System.out.println("Uploading file from: " + localFilePath + " to HDFS path: " + hdfsPath);
if (fs.exists(destPath)) {
if (overwrite) {
System.out.println("File exists. Overwriting...");
fs.delete(destPath, true);
} else if (append) {
System.out.println("File exists. Appending...");
FSDataOutputStream out = fs.append(destPath);
FSDataInputStream in = fs.open(srcPath);
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
in.close();
out.close();
System.out.println("File appended successfully.");
return;
} else {
System.out.println("File exists. No changes made.");
return;
}
}
fs.copyFromLocalFile(srcPath, destPath);
System.out.println("File uploaded successfully.");
fs.close();
}
// 下载文件到本地
public static void downloadFile(String hdfsFilePath, String localFilePath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path hdfsPath = new Path(hdfsFilePath);
Path localPath = new Path(localFilePath);
System.out.println("Downloading file from HDFS path: " + hdfsFilePath + " to local path: " + localFilePath);
if (fs.exists(hdfsPath)) {
if (fs.exists(localPath)) {
localPath = new Path(localFilePath + "_new");
System.out.println("Local file already exists. Renaming downloaded file to: " + localPath);
}
fs.copyToLocalFile(hdfsPath, localPath);
System.out.println("File downloaded successfully.");
} else {
System.out.println("File not found on HDFS.");
}
fs.close();
}
// 输出文件内容到终端
public static void viewFile(String hdfsFilePath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path filePath = new Path(hdfsFilePath);
System.out.println("Viewing file content for: " + hdfsFilePath);
if (fs.exists(filePath)) {
FSDataInputStream in = fs.open(filePath);
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
System.out.write(buffer, 0, bytesRead);
}
in.close();
System.out.println("\nFile content displayed successfully.");
} else {
System.out.println("File not found on HDFS.");
}
fs.close();
}
// 显示文件信息
public static void fileInfo(String hdfsFilePath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path filePath = new Path(hdfsFilePath);
System.out.println("Fetching file information for: " + hdfsFilePath);
if (fs.exists(filePath)) {
FileStatus status = fs.getFileStatus(filePath);
System.out.println("Path: " + status.getPath());
System.out.println("Owner: " + status.getOwner());
System.out.println("Permissions: " + status.getPermission());
System.out.println("Size: " + status.getLen());
System.out.println("Modification Time: " + status.getModificationTime());
} else {
System.out.println("File not found on HDFS.");
}
fs.close();
}
// 递归显示目录下文件信息
public static void dirInfo(String hdfsDirPath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path dirPath = new Path(hdfsDirPath);
System.out.println("Fetching directory information for: " + hdfsDirPath);
if (fs.exists(dirPath)) {
RemoteIterator<LocatedFileStatus> files = fs.listFiles(dirPath, true);
while (files.hasNext()) {
LocatedFileStatus fileStatus = files.next();
System.out.println("Path: " + fileStatus.getPath());
System.out.println("Owner: " + fileStatus.getOwner());
System.out.println("Permissions: " + fileStatus.getPermission());
System.out.println("Size: " + fileStatus.getLen());
System.out.println("Modification Time: " + fileStatus.getModificationTime());
System.out.println("-------------------------------------------");
}
} else {
System.out.println("Directory not found on HDFS.");
}
fs.close();
}
// 创建文件
public static void createFile(String hdfsFilePath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path filePath = new Path(hdfsFilePath);
System.out.println("Creating file: " + hdfsFilePath);
if (!fs.exists(filePath)) {
FSDataOutputStream out = fs.create(filePath);
out.close();
System.out.println("File created successfully.");
} else {
System.out.println("File already exists.");
}
fs.close();
}
// 删除文件或目录
public static void deletePath(String hdfsPath, boolean recursive) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path path = new Path(hdfsPath);
System.out.println("Deleting path: " + hdfsPath);
if (fs.exists(path)) {
fs.delete(path, recursive);
System.out.println("Path deleted successfully.");
} else {
System.out.println("Path not found on HDFS.");
}
fs.close();
}
// 追加内容到文件
public static void appendToFile(String hdfsFilePath, String content) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path filePath = new Path(hdfsFilePath);
System.out.println("Appending content to file: " + hdfsFilePath);
if (fs.exists(filePath)) {
FSDataOutputStream out = fs.append(filePath);
out.write(content.getBytes());
out.close();
System.out.println("Content appended successfully.");
} else {
System.out.println("File not found on HDFS.");
}
fs.close();
}
// 移动文件
public static void moveFile(String srcHdfsPath, String destHdfsPath) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path srcPath = new Path(srcHdfsPath);
Path destPath = new Path(destHdfsPath);
System.out.println("Moving file from: " + srcHdfsPath + " to: " + destHdfsPath);
if (fs.exists(srcPath)) {
fs.rename(srcPath, destPath);
System.out.println("File moved successfully.");
} else {
System.out.println("Source file not found on HDFS.");
}
fs.close();
}
public static void main(String[] args) {
try {
// 1. 上传文件
HDFSOperations.uploadFile("src/main/java/org/example/hdfs/test.txt", "/user/hadoop/testFile.txt", true, false);
// 2. 下载文件
HDFSOperations.downloadFile("/user/hadoop/testFile.txt", "localFile.txt");
// 3. 查看文件内容
HDFSOperations.viewFile("/user/hadoop/testFile.txt");
// 4. 查看文件信息
HDFSOperations.fileInfo("/user/hadoop/testFile.txt");
// 5. 查看目录信息
HDFSOperations.dirInfo("/user/hadoop");
// 6. 创建文件
HDFSOperations.createFile("/user/hadoop/newFile");
// 7. 删除文件
HDFSOperations.deletePath("/user/hadoop/newFile", false);
// 8. 追加内容
HDFSOperations.appendToFile("/user/hadoop/testFile.txt", "This is appended content.");
// 9. 移动文件
HDFSOperations.moveFile("/user/hadoop/testFile.txt", "/user/hadoop/movedTestFile.txt");
} catch (IOException e) {
e.printStackTrace();
}
}
}
(二)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInputStream”,要求如下:实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。
package org.example.hdfs;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class MyFSDataInputStream extends FSDataInputStream {
private final BufferedReader bufferedReader;
public MyFSDataInputStream(FSDataInputStream fsDataInputStream) {
super(fsDataInputStream.getWrappedStream());
this.bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));
}
/**
* 按行读取HDFS文件内容的方法。
*
* @return 文件中的一行内容,如果到达文件末尾则返回 null。
* @throws IOException 读取文件过程中可能出现的异常。
*/
public String hdfsReadLine() throws IOException {
return bufferedReader.readLine();
}
@Override
public void close() throws IOException {
bufferedReader.close();
super.close();
}
public static void main(String[] args) {
try {
// 初始化 HDFS 文件系统
FileSystem fs = HDFSConfig.getFileSystem();
Path filePath = new Path("/user/hadoop/movedTestFile.txt");
// 检查文件是否存在
if (!fs.exists(filePath)) {
System.out.println("File not found: " + filePath);
return;
}
// 打开文件
FSDataInputStream fsDataInputStream = fs.open(filePath);
MyFSDataInputStream myFSDataInputStream = new MyFSDataInputStream(fsDataInputStream);
// 按行读取文件
String line;
System.out.println("Reading file line by line:");
while ((line = myFSDataInputStream.hdfsReadLine()) != null) {
System.out.println(line);
}
// 关闭流
myFSDataInputStream.close();
fs.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
(三)查看Java帮助手册或其它资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中。
package org.example.hdfs;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
public class HDFSFileViewer {
public static void showFileContentFromHDFS(){
try {
String remotePath="/user/hadoop/movedTestFile.txt";
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
InputStream inputStream = new URL("hdfs","192.168.70.143",8020,remotePath.toString()).openStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String line = null;
while ((line = bufferedReader.readLine()) != null){
System.out.println(line);
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
showFileContentFromHDFS();
}
}

浙公网安备 33010602011771号