ZhangZhihui's Blog  

 Fact table surrogate keys are often misunderstood. Let’s break it down clearly.


1. Fact Tables and Keys

  • A fact table contains measures (quantitative data like transaction amount, balance, etc.) and foreign keys pointing to dimensions (branch, customer, product, time, etc.).

  • Normally, a fact table is identified by the combination of its dimension foreign keys (sometimes with a transaction ID).

So where do surrogate keys in fact tables come in?


2. Uses of Fact Table Surrogate Keys

(A) Uniquely Identifying Rows

  • A surrogate key (e.g., fact_sales_key) gives each fact row a unique identifier.

  • Without it, uniqueness depends on a composite of many dimension keys, which:

    • Makes joins cumbersome.

    • Slows down lookups and updates.

  • With a surrogate key:

    SELECT *
    FROM fact_sales
    WHERE fact_sales_key = 123456;

     

    is much simpler than:

    WHERE date_key = 20250101
      AND customer_key = 20012
      AND product_key = 3105
      AND branch_key = 17;

     


(B) Supporting Slowly Changing Facts

  • Facts can change over time (e.g., correcting a transaction amount).

  • If you only had natural composite keys, updating would overwrite history.

  • A surrogate fact key lets you add a new version of the fact row while keeping the old one (similar to SCD Type 2 in dimensions).


(C) Handling Degenerate Dimensions (DD)

  • Sometimes a fact has a transaction identifier that is not stored in a dimension (e.g., invoice number, loan contract ID).

  • You can use the fact surrogate key to represent the transaction itself, and keep degenerate attributes (invoice number, reference code) in the same table.


(D) Performance & Referential Integrity

  • Surrogate keys:

    • Make joins to other systems easier if you need to reference a single fact row elsewhere.

    • Provide a compact clustered index instead of wide multi-column keys.

    • Can serve as a parent key if you build “fact-to-fact” relationships (e.g., fact snapshot derived from a transaction fact).


3. When Surrogate Keys Are Optional

  • If the fact table is append-only, immutable, and you never need to update a single row, surrogate keys might be unnecessary.

  • In many high-volume transaction warehouses, the natural composite keys are enough, and surrogate keys would just add storage overhead.


4. Best Practice

  • Large transaction fact tables (billions of rows): usually no surrogate key → rely on composite dimension keys + DD (to save space).

  • Moderate fact tables, snapshots, or facts with updates/corrections: surrogate key is useful for uniqueness, history, and ease of joins.

  • Fact tables used as sources for other tables: surrogate key is recommended.


Summary:
A fact table surrogate key is not always required, but when used, it serves mainly to:

    • Provide a simple unique row identifier.

    • Simplify joins and lookups.

    • Enable handling of corrections/history.

    • Support degenerate transaction identifiers.

 

posted on 2025-08-17 14:03  ZhangZhihuiAAA  阅读(4)  评论(0)    收藏  举报