In modern Hadoop environments like your 3.4.2 setup, there is functionally no difference between the two commands when interacting with HDFS. However, they represent different layers of the Hadoop architecture.
1. The Definitions
-
hadoop fs: This is the Generic File System command. It is designed to interact with any file system that Hadoop supports. This includes HDFS, local file systems (file:///), Amazon S3 (s3a://), Azure WASB, and others.
-
hdfs dfs: This is the HDFS-Specific command. It is strictly meant for interacting with the Hadoop Distributed File System.
2. Key Differences
| Feature | hadoop fs | hdfs dfs |
| Scope | Broad (HDFS, S3, Local, Azure) | Narrow (HDFS only) |
| Origin | Part of the core Hadoop common tools | Part of the HDFS sub-project |
| Execution | Runs through the generic hadoop script |
Runs through the hdfs script |
| Functionality | Supports all file system operations | Supports HDFS-specific commands (like dfsadmin) |
3. Which one is "better"?
Use hdfs dfs if:
You are specifically working within an HDFS environment (like managing your Hive warehouse). Most Hadoop administrators prefer this because it explicitly signals that you are intending to talk to the distributed cluster. It is considered the "modern" standard for HDFS operations.
Use hadoop fs if:
You are working in a multi-cloud or hybrid environment. If you are running a script that might copy data from a local folder to an S3 bucket and then to HDFS, hadoop fs is the "one tool for everything" approach.
[!NOTE]
There is a third command, hadoop dfs, which you might see in very old tutorials. This was deprecated in Hadoop 2.x and you should avoid using it.
The Verdict
For your work on ZZHPC configuring Hive: Stick with hdfs dfs. It is more specific to the service you are using, and it's the convention used in modern documentation for Hadoop 3.x.
Example of the most "standard" way to check your Hive directory:
hdfs dfs -ls /user/hive/warehouse

浙公网安备 33010602011771号