Fact table surrogate keys are often misunderstood. Let’s break it down clearly.
1. Fact Tables and Keys
-
A fact table contains measures (quantitative data like transaction amount, balance, etc.) and foreign keys pointing to dimensions (branch, customer, product, time, etc.).
-
Normally, a fact table is identified by the combination of its dimension foreign keys (sometimes with a transaction ID).
So where do surrogate keys in fact tables come in?
2. Uses of Fact Table Surrogate Keys
(A) Uniquely Identifying Rows
-
A surrogate key (e.g.,
fact_sales_key
) gives each fact row a unique identifier. -
Without it, uniqueness depends on a composite of many dimension keys, which:
-
Makes joins cumbersome.
-
Slows down lookups and updates.
-
-
With a surrogate key:
is much simpler than:
(B) Supporting Slowly Changing Facts
-
Facts can change over time (e.g., correcting a transaction amount).
-
If you only had natural composite keys, updating would overwrite history.
-
A surrogate fact key lets you add a new version of the fact row while keeping the old one (similar to SCD Type 2 in dimensions).
(C) Handling Degenerate Dimensions (DD)
-
Sometimes a fact has a transaction identifier that is not stored in a dimension (e.g., invoice number, loan contract ID).
-
You can use the fact surrogate key to represent the transaction itself, and keep degenerate attributes (invoice number, reference code) in the same table.
(D) Performance & Referential Integrity
-
Surrogate keys:
-
Make joins to other systems easier if you need to reference a single fact row elsewhere.
-
Provide a compact clustered index instead of wide multi-column keys.
-
Can serve as a parent key if you build “fact-to-fact” relationships (e.g., fact snapshot derived from a transaction fact).
-
3. When Surrogate Keys Are Optional
-
If the fact table is append-only, immutable, and you never need to update a single row, surrogate keys might be unnecessary.
-
In many high-volume transaction warehouses, the natural composite keys are enough, and surrogate keys would just add storage overhead.
4. Best Practice
-
Large transaction fact tables (billions of rows): usually no surrogate key → rely on composite dimension keys + DD (to save space).
-
Moderate fact tables, snapshots, or facts with updates/corrections: surrogate key is useful for uniqueness, history, and ease of joins.
-
Fact tables used as sources for other tables: surrogate key is recommended.
✅ Summary:
A fact table surrogate key is not always required, but when used, it serves mainly to:
-
Provide a simple unique row identifier.
-
Simplify joins and lookups.
-
Enable handling of corrections/history.
-
Support degenerate transaction identifiers.