CREATE TABLE events ( event_id BIGINT, event_date DATE, event_type STRING ) WRITE LOCALLY ORDERED BY event_date DESC STORED BY ICEBERG STORED AS ORC;
For the above HiveQL code, how come 'STORED BY' and 'STORED AS' be used at the same time?
In HiveQL (specifically when working with Apache Iceberg), seeing both STORED BY and STORED AS in a single CREATE statement can be confusing because, in traditional Hive tables, they are usually mutually exclusive.
However, in the context of Iceberg tables, they serve two distinct purposes:
1. STORED BY ICEBERG
This clause tells Hive which storage handler to use.
-
It signifies that this is not a standard Hive table, but an Iceberg table.
-
The storage handler manages the metadata, snapshots, and manifest files that define the Iceberg format. Without this, Hive would treat the table as a basic flat-file table.
2. STORED AS ORC
This clause tells Iceberg which file format to use for the actual data files on disk.
-
Iceberg supports multiple underlying file formats (ORC, Parquet, and Avro).
-
Even though the table structure is managed by Iceberg, the data needs to be written in a specific binary format. By adding
STORED AS ORC, you are overriding the default (which is often Parquet) to use Optimized Row Columnar (ORC) files.
How they work together
Think of it as a hierarchy of management:
| Clause | Responsibility | Analogy |
| STORED BY ICEBERG | The Table Manager: Handles versioning, ACID transactions, and partition evolution. | The Librarian |
| STORED AS ORC | The Paper Format: Defines how the words are actually written on the page. | The Language/Font |
Why use both?
If you omit STORED BY ICEBERG, Hive won't know how to read the Iceberg metadata. If you omit STORED AS ORC, Iceberg will simply use its internal default format. By using both, you are explicitly defining the table engine and the data compression format simultaneously.
A note on your code:
The clause WRITE LOCALLY ORDERED BY is a specific optimization that tells Hive to sort the data within each writer (task) before committing to Iceberg, which significantly improves the compression ratio of the resulting ORC files.

浙公网安备 33010602011771号