🔹 1. Late Arriving Facts (a.k.a. Early Arriving Facts)
-
The fact record itself (transaction, order line, etc.) comes in earlier than some of its required dimension context.
-
Example:
-
You receive an Order Fact for Customer
C789
. -
But Customer
C789
doesn’t exist yet in the Customer Dimension.
-
-
Problem: Fact can’t find its foreign key in the dimension.
👉 Handling: Insert a dummy/placeholder dimension row so the fact can load, then update when the dimension arrives.
🔹 2. Late Arriving Dimensions
-
The dimension record itself arrives into the warehouse later than it should have — often after facts that reference it have already been loaded.
-
Example:
-
You already loaded Sales Facts for Customer
C789
(using a placeholder row). -
Days later, the real Customer
C789
dimension record arrives.
-
👉 Handling: Update the placeholder row with the correct attributes when the late dimension arrives.
🔹 The Difference in Perspective
-
Late Arriving Fact → The fact row is “too early,” i.e., arrives before its dimensions are ready.
-
Late Arriving Dimension → The dimension row is “too late,” i.e., arrives after facts that reference it.
They describe the same timing mismatch, but from different angles:
-
If you’re looking at the fact ETL pipeline, you call it a late arriving fact problem.
-
If you’re looking at the dimension ETL pipeline, you call it a late arriving dimension problem.
✅ So:
-
They are related and often occur together.
-
But the naming emphasizes which side of the ETL flow you are focusing on (fact vs. dimension).
Here’s a step-by-step ETL workflow diagram for handling late arriving facts and late arriving dimensions with surrogate keys and placeholders.
🛠 Workflow for Late Arriving Facts & Dimensions
🔑 Key Flow
-
Staging Layer
-
Capture raw data from source systems (Sales, CRM).
-
-
Dimension Load
-
Load or update dimension rows.
-
If dimension is missing but fact exists → insert placeholder row with surrogate key.
-
-
Fact Load
-
Facts join to dimensions using business keys, then load surrogate keys.
-
Placeholders guarantee referential integrity.
-
-
Late Arriving Dimension Handling
-
When the dimension record finally arrives, update the placeholder.
-
Surrogate key stays the same → no need to touch fact table.
-
✅ This is the standard Kimball-style handling for early arriving facts (late arriving facts) and late arriving dimensions.
✅ It ensures fact tables are always loadable, and later corrections flow only into dimension rows, without rewriting fact history.
Let’s build on the late arriving fact/dimension scenario we discussed and show real SQL insert examples for fact table loads from staging.
📂 Setup
Dimension Table
Fact Table
Staging Tables
🟢 1. Load Dimension First (with placeholder logic)
🟢 2. Handle Late Arriving Facts (fact comes before dimension)
Before inserting facts, ensure placeholder customers exist:
🟢 3. Insert Facts (using surrogate keys)
Now we can safely insert facts, resolving business key → surrogate key:
✅ Every fact row will find a matching customer_sk
because placeholders guarantee referential integrity.
🟢 4. Handle Late Arriving Dimensions (update placeholders later)
🔎 Example in Action
-
Sales fact arrives first (no customer dimension yet):
-
ETL ensures placeholder:
-
Fact is loaded with surrogate key for
C789
: -
Later, real customer data arrives:
-
Update placeholder:
✅ Fact row remains valid — it still points to customer_sk=5
, which now has real attributes.