24.10.28

实验2
熟悉常用的HDFS操作

1.实验目的
(1)理解HDFS在Hadoop体系结构中的角色;
(2)熟练使用HDFS操作常用的Shell命令;
(3)熟悉HDFS操作常用的Java API。
2. 实验平台
(1)操作系统:Linux(建议Ubuntu16.04或Ubuntu18.04);
(2)Hadoop版本:2.7.3;
(3)JDK版本:1.8;
(4)Java IDE:IDEA。
3. 实验步骤
(一)编程实现以下功能,并利用Hadoop提供的Shell命令完成相同任务:
(1) 向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,则由用户来指定是追加到原有文件末尾还是覆盖原有的文件;
Bash
upload_file_to_hdfs() {
local local_file=$1
local hdfs_path=$2
local mode=$3 # "append" 或 "overwrite"

if hdfs dfs -test -e "$hdfs_path"; then
    if [[ $mode == "append" ]]; then
        hdfs dfs -appendToFile "$local_file" "$hdfs_path"
    elif [[ $mode == "overwrite" ]]; then
        hdfs dfs -put -f "$local_file" "$hdfs_path"
    else
        echo "无效的模式:$mode"
    fi
else
    hdfs dfs -put "$local_file" "$hdfs_path"
fi

}
(2) 从HDFS中下载指定文件,如果本地文件与要下载的文件名称相同,则自动对下载的文件重命名;
download_file_from_hdfs() {
local hdfs_file=$1
local local_dir=$2
local filename=$(basename "$hdfs_file")

if [[ -f "$local_dir/$filename" ]]; then
    local timestamp=$(date +%s)
    local new_filename="${filename%.*}_$timestamp.${filename##*.}"
    hdfs dfs -get "$hdfs_file" "$local_dir/$new_filename"
else
    hdfs dfs -get "$hdfs_file" "$local_dir"
fi

}
(3) 将HDFS中指定文件的内容输出到终端中;
output_file_content() {
local hdfs_file=$1
hdfs dfs -cat "$hdfs_file"
}

(4) 显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息;
show_file_info() {
local hdfs_file=$1
hdfs dfs -stat "%n %r %b %y %h" "$hdfs_file"
}
(5) 给定HDFS中某一个目录,输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息,如果该文件是目录,则递归输出该目录下所有文件相关信息;
show_directory_info() {
local hdfs_dir=$1
hdfs dfs -ls -R "$hdfs_dir"
}

(6) 提供一个HDFS内的文件的路径,对该文件进行创建和删除操作。如果文件所在目录不存在,则自动创建目录;
manage_file_in_hdfs() {
local hdfs_file=$1
local action=$2 # "create" 或 "delete"

if [[ $action == "create" ]]; then
    local dir_path=$(dirname "$hdfs_file")
    hdfs dfs -mkdir -p "$dir_path"
    hdfs dfs -touchz "$hdfs_file"
elif [[ $action == "delete" ]]; then
    hdfs dfs -rm "$hdfs_file"
else
    echo "无效的操作:$action"
fi

}
(7) 提供一个HDFS的目录的路径,对该目录进行创建和删除操作。创建目录时,如果目录文件所在目录不存在,则自动创建相应目录;删除目录时,由用户指定当该目录不为空时是否还删除该目录;
manage_directory_in_hdfs() {
local hdfs_dir=$1
local action=$2 # "create" 或 "delete"
local force=$3 # "force" 仅用于删除

if [[ $action == "create" ]]; then
    hdfs dfs -mkdir -p "$hdfs_dir"
elif [[ $action == "delete" ]]; then
    if [[ $force == "force" ]]; then
        hdfs dfs -rm -r "$hdfs_dir"
    else
        hdfs dfs -rmdir "$hdfs_dir"
    fi
else
    echo "无效的操作:$action"
fi

}

(8) 向HDFS中指定的文件追加内容,由用户指定内容追加到原有文件的开头或结尾;
append_content_to_file() {
local content=$1
local hdfs_file=$2
local position=$3 # "head" 或 "tail"

local tmp_file=$(mktemp)
echo "$content" > "$tmp_file"

if [[ $position == "head" ]]; then
    hdfs dfs -cat "$hdfs_file" >> "$tmp_file"
    hdfs dfs -put -f "$tmp_file" "$hdfs_file"
elif [[ $position == "tail" ]]; then
    hdfs dfs -appendToFile "$tmp_file" "$hdfs_file"
else
    echo "无效的位置:$position"
fi

rm "$tmp_file"

}
(9) 删除HDFS中指定的文件;
delete_file_in_hdfs() {
local hdfs_file=$1
hdfs dfs -rm "$hdfs_file"
}
(10) 在HDFS中,将文件从源路径移动到目的路径。
move_file_in_hdfs() {
local src_path=$1
local dest_path=$2
hdfs dfs -mv "$src_path" "$dest_path"
}

首先需要向idea资源目录下新建lib目录,然后使用find命令查找所有和hdfs相关的jar包,下载下来导入并添加为库,之后就可以正确运行了

package org.example.hdfs;

import org.apache.hadoop.fs.*;

import java.io.IOException;

public class HDFSOperations {
// 上传文件到 HDFS
public static void uploadFile(String localFilePath, String hdfsPath, boolean overwrite, boolean append) throws IOException {
FileSystem fs = HDFSConfig.getFileSystem();
Path srcPath = new Path(localFilePath);
Path destPath = new Path(hdfsPath);

    System.out.println("Uploading file from: " + localFilePath + " to HDFS path: " + hdfsPath);
    if (fs.exists(destPath)) {
        if (overwrite) {
            System.out.println("File exists. Overwriting...");
            fs.delete(destPath, true);
        } else if (append) {
            System.out.println("File exists. Appending...");
            FSDataOutputStream out = fs.append(destPath);
            FSDataInputStream in = fs.open(srcPath);
            byte[] buffer = new byte[1024];
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1) {
                out.write(buffer, 0, bytesRead);
            }
            in.close();
            out.close();
            System.out.println("File appended successfully.");
            return;
        } else {
            System.out.println("File exists. No changes made.");
            return;
        }
    }

    fs.copyFromLocalFile(srcPath, destPath);
    System.out.println("File uploaded successfully.");
    fs.close();
}

// 下载文件到本地
public static void downloadFile(String hdfsFilePath, String localFilePath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path hdfsPath = new Path(hdfsFilePath);
    Path localPath = new Path(localFilePath);

    System.out.println("Downloading file from HDFS path: " + hdfsFilePath + " to local path: " + localFilePath);
    if (fs.exists(hdfsPath)) {
        if (fs.exists(localPath)) {
            localPath = new Path(localFilePath + "_new");
            System.out.println("Local file already exists. Renaming downloaded file to: " + localPath);
        }
        fs.copyToLocalFile(hdfsPath, localPath);
        System.out.println("File downloaded successfully.");
    } else {
        System.out.println("File not found on HDFS.");
    }

    fs.close();
}

// 输出文件内容到终端
public static void viewFile(String hdfsFilePath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path filePath = new Path(hdfsFilePath);

    System.out.println("Viewing file content for: " + hdfsFilePath);
    if (fs.exists(filePath)) {
        FSDataInputStream in = fs.open(filePath);
        byte[] buffer = new byte[1024];
        int bytesRead;
        while ((bytesRead = in.read(buffer)) != -1) {
            System.out.write(buffer, 0, bytesRead);
        }
        in.close();
        System.out.println("\nFile content displayed successfully.");
    } else {
        System.out.println("File not found on HDFS.");
    }

    fs.close();
}

// 显示文件信息
public static void fileInfo(String hdfsFilePath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path filePath = new Path(hdfsFilePath);

    System.out.println("Fetching file information for: " + hdfsFilePath);
    if (fs.exists(filePath)) {
        FileStatus status = fs.getFileStatus(filePath);
        System.out.println("Path: " + status.getPath());
        System.out.println("Owner: " + status.getOwner());
        System.out.println("Permissions: " + status.getPermission());
        System.out.println("Size: " + status.getLen());
        System.out.println("Modification Time: " + status.getModificationTime());
    } else {
        System.out.println("File not found on HDFS.");
    }

    fs.close();
}

// 递归显示目录下文件信息
public static void dirInfo(String hdfsDirPath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path dirPath = new Path(hdfsDirPath);

    System.out.println("Fetching directory information for: " + hdfsDirPath);
    if (fs.exists(dirPath)) {
        RemoteIterator<LocatedFileStatus> files = fs.listFiles(dirPath, true);
        while (files.hasNext()) {
            LocatedFileStatus fileStatus = files.next();
            System.out.println("Path: " + fileStatus.getPath());
            System.out.println("Owner: " + fileStatus.getOwner());
            System.out.println("Permissions: " + fileStatus.getPermission());
            System.out.println("Size: " + fileStatus.getLen());
            System.out.println("Modification Time: " + fileStatus.getModificationTime());
            System.out.println("-------------------------------------------");
        }
    } else {
        System.out.println("Directory not found on HDFS.");
    }

    fs.close();
}

// 创建文件
public static void createFile(String hdfsFilePath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path filePath = new Path(hdfsFilePath);

    System.out.println("Creating file: " + hdfsFilePath);
    if (!fs.exists(filePath)) {
        FSDataOutputStream out = fs.create(filePath);
        out.close();
        System.out.println("File created successfully.");
    } else {
        System.out.println("File already exists.");
    }

    fs.close();
}

// 删除文件或目录
public static void deletePath(String hdfsPath, boolean recursive) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path path = new Path(hdfsPath);

    System.out.println("Deleting path: " + hdfsPath);
    if (fs.exists(path)) {
        fs.delete(path, recursive);
        System.out.println("Path deleted successfully.");
    } else {
        System.out.println("Path not found on HDFS.");
    }

    fs.close();
}

// 追加内容到文件
public static void appendToFile(String hdfsFilePath, String content) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path filePath = new Path(hdfsFilePath);

    System.out.println("Appending content to file: " + hdfsFilePath);
    if (fs.exists(filePath)) {
        FSDataOutputStream out = fs.append(filePath);
        out.write(content.getBytes());
        out.close();
        System.out.println("Content appended successfully.");
    } else {
        System.out.println("File not found on HDFS.");
    }

    fs.close();
}

// 移动文件
public static void moveFile(String srcHdfsPath, String destHdfsPath) throws IOException {
    FileSystem fs = HDFSConfig.getFileSystem();
    Path srcPath = new Path(srcHdfsPath);
    Path destPath = new Path(destHdfsPath);

    System.out.println("Moving file from: " + srcHdfsPath + " to: " + destHdfsPath);
    if (fs.exists(srcPath)) {
        fs.rename(srcPath, destPath);
        System.out.println("File moved successfully.");
    } else {
        System.out.println("Source file not found on HDFS.");
    }

    fs.close();
}
public static void main(String[] args) {
    try {
        // 1. 上传文件
        HDFSOperations.uploadFile("src/main/java/org/example/hdfs/test.txt", "/user/hadoop/testFile.txt", true, false);

        // 2. 下载文件
        HDFSOperations.downloadFile("/user/hadoop/testFile.txt", "localFile.txt");

        // 3. 查看文件内容
        HDFSOperations.viewFile("/user/hadoop/testFile.txt");

        // 4. 查看文件信息
        HDFSOperations.fileInfo("/user/hadoop/testFile.txt");

        // 5. 查看目录信息
        HDFSOperations.dirInfo("/user/hadoop");

        // 6. 创建文件
        HDFSOperations.createFile("/user/hadoop/newFile");

        // 7. 删除文件
        HDFSOperations.deletePath("/user/hadoop/newFile", false);

        // 8. 追加内容
        HDFSOperations.appendToFile("/user/hadoop/testFile.txt", "This is appended content.");

        // 9. 移动文件
        HDFSOperations.moveFile("/user/hadoop/testFile.txt", "/user/hadoop/movedTestFile.txt");

    } catch (IOException e) {
        e.printStackTrace();
    }
}

}

(二)编程实现一个类“MyFSDataInputStream”,该类继承“org.apache.hadoop.fs.FSDataInputStream”,要求如下:实现按行读取HDFS中指定文件的方法“readLine()”,如果读到文件末尾,则返回空,否则返回文件一行的文本。
package org.example.hdfs;

import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;

public class MyFSDataInputStream extends FSDataInputStream {
private final BufferedReader bufferedReader;

public MyFSDataInputStream(FSDataInputStream fsDataInputStream) {
    super(fsDataInputStream.getWrappedStream());
    this.bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));
}

/**
 * 按行读取HDFS文件内容的方法。
 *
 * @return 文件中的一行内容,如果到达文件末尾则返回 null。
 * @throws IOException 读取文件过程中可能出现的异常。
 */
public String hdfsReadLine() throws IOException {
    return bufferedReader.readLine();
}

@Override
public void close() throws IOException {
    bufferedReader.close();
    super.close();
}
public static void main(String[] args) {
    try {
        // 初始化 HDFS 文件系统
        FileSystem fs = HDFSConfig.getFileSystem();
        Path filePath = new Path("/user/hadoop/movedTestFile.txt");

        // 检查文件是否存在
        if (!fs.exists(filePath)) {
            System.out.println("File not found: " + filePath);
            return;
        }

        // 打开文件
        FSDataInputStream fsDataInputStream = fs.open(filePath);
        MyFSDataInputStream myFSDataInputStream = new MyFSDataInputStream(fsDataInputStream);

        // 按行读取文件
        String line;
        System.out.println("Reading file line by line:");
        while ((line = myFSDataInputStream.hdfsReadLine()) != null) {
            System.out.println(line);
        }

        // 关闭流
        myFSDataInputStream.close();
        fs.close();

    } catch (IOException e) {
        e.printStackTrace();
    }
}

}

(三)查看Java帮助手册或其它资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中。
package org.example.hdfs;

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
public class HDFSFileViewer {

public static void showFileContentFromHDFS(){
    try {
        String remotePath="/user/hadoop/movedTestFile.txt";
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
        InputStream inputStream = new URL("hdfs","192.168.70.143",8020,remotePath.toString()).openStream();
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
        String line = null;
        while ((line = bufferedReader.readLine()) != null){
            System.out.println(line);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
}

public static void main(String[] args) {
    showFileContentFromHDFS();
}

}

posted @ 2024-10-28 19:54  起名字真难_qmz  阅读(20)  评论(0)    收藏  举报