公告

日历

If a temporary table is created with a database/table name of a permanent table which already exists in the database, then within that session any references to that table will resolve to the temporary table, rather than to the permanent table. The user will not be able to access the original table within that session without either dropping the temporary table, or renaming it to a non-conflicting name.

Temporary tables have the following limitations:

Partition columns are not supported.
No support for creation of indexes.

CREATE TEMPORARY TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING);

While CTEs (Common Table Expressions) are generally cleaner and easier to read, there are several technical scenarios where a Temporary Table is objectively better.

The choice usually comes down to reusability and optimization.

1. Multiple References (Performance)

A CTE is just a "named subquery." If you reference a CTE three times in your main query, Hive might execute the logic of that CTE three separate times (depending on the optimizer's version and complexity).

A Temporary Table is materialized. Hive calculates the result once, saves it to the scratch directory, and then simply reads the file for every subsequent reference.

Use a Temporary Table if you are going to join or reference that specific result set multiple times in a long script.

2. Multi-Step Logic

CTEs are limited to a single query block. If your data pipeline requires 10 steps of logic, a CTE becomes a massive, unreadable "spaghetti query."

Temporary tables allow you to break the logic into discrete steps:2

Filter the data (Store in tmp1).
Aggregate the data (Store in tmp2).
Perform the final Join.

3. Optimizer Statistics (Join Performance)

This is the most critical technical reason.

CTEs: Since they aren't materialized, the Hive optimizer has to "guess" how many rows the CTE will produce. If it guesses wrong, it might choose a slow Shuffle Join instead of a fast Map Join.
Temporary Tables: After creating a temporary table, you can run ANALYZE TABLE ... COMPUTE STATISTICS. This tells Hive exactly how big the table is, allowing the engine to choose the most efficient execution plan (like a Map-side join).

4. Summary Comparison

Feature	CTE (WITH clause)	Temporary Table
Scope	One single query	Entire Beeline session
Storage	Virtual (Logic only)	Physical (Materialized in `/tmp`)
Optimization	Optimizer makes estimates	Can run `ANALYZE` for perfect stats
Debugging	Hard (Must run whole query)	Easy (Can `SELECT *` from it)
Syntax	WITH name AS (...)	CREATE TEMPORARY TABLE ...

When to use which?

Use a CTE for simple, one-off logic where you just want to avoid nesting subqueries. It’s better for readability and doesn't leave "footprints" in the file system.
Use a Temporary Table if the logic is heavy (takes minutes to run), if you need to reuse the results across multiple SQL statements, or if you need to force a specific join optimization by providing the optimizer with accurate statistics.

posted on 2026-01-13 17:41 ZhangZhihuiAAA 阅读(0) 评论(0) 收藏举报

刷新页面返回顶部


博客园 © 2004-2026 浙公网安备 33010602011771号浙ICP备2021040463号-3

导航

Temporary Tables

1. Multiple References (Performance)

2. Multi-Step Logic

3. Optimizer Statistics (Join Performance)

4. Summary Comparison

When to use which?