ZhangZhihui's Blog  

 

CREATE TABLE events (
  event_id BIGINT,
  event_date DATE,
  event_type STRING
)
WRITE LOCALLY ORDERED BY event_date DESC
STORED BY ICEBERG
STORED AS ORC;

For the above HiveQL code, how come 'STORED BY' and 'STORED AS' be used at the same time?

 

In HiveQL (specifically when working with Apache Iceberg), seeing both STORED BY and STORED AS in a single CREATE statement can be confusing because, in traditional Hive tables, they are usually mutually exclusive.

However, in the context of Iceberg tables, they serve two distinct purposes:

1. STORED BY ICEBERG

This clause tells Hive which storage handler to use.

  • It signifies that this is not a standard Hive table, but an Iceberg table.

  • The storage handler manages the metadata, snapshots, and manifest files that define the Iceberg format. Without this, Hive would treat the table as a basic flat-file table.

2. STORED AS ORC

This clause tells Iceberg which file format to use for the actual data files on disk.

  • Iceberg supports multiple underlying file formats (ORC, Parquet, and Avro).

  • Even though the table structure is managed by Iceberg, the data needs to be written in a specific binary format. By adding STORED AS ORC, you are overriding the default (which is often Parquet) to use Optimized Row Columnar (ORC) files.


How they work together

Think of it as a hierarchy of management:

Clause Responsibility Analogy
STORED BY ICEBERG The Table Manager: Handles versioning, ACID transactions, and partition evolution. The Librarian
STORED AS ORC The Paper Format: Defines how the words are actually written on the page. The Language/Font

Why use both?

If you omit STORED BY ICEBERG, Hive won't know how to read the Iceberg metadata. If you omit STORED AS ORC, Iceberg will simply use its internal default format. By using both, you are explicitly defining the table engine and the data compression format simultaneously.

A note on your code:

The clause WRITE LOCALLY ORDERED BY is a specific optimization that tells Hive to sort the data within each writer (task) before committing to Iceberg, which significantly improves the compression ratio of the resulting ORC files.

 

posted on 2026-01-14 16:56  ZhangZhihuiAAA  阅读(3)  评论(0)    收藏  举报