When you have a variable depth hierarchy (the number of levels is not fixed), using a pathstring column in the dimension table is a practical and efficient design—especially for reporting and drill-downs.
Let’s walk through the design for your bank branches example.
1. The Problem
Fixed depth works well when the hierarchy is predictable (e.g., always Country → Region → City → Branch).
But in reality:
-
Some branches may have an extra subdivision.
-
Some may skip a level.
-
The number of levels might change over time.
This makes fixed columns (level_1_name
, level_2_name
, etc.) messy.
2. Pathstring Approach
In the pathstring model:
-
Each record represents one node in the hierarchy (could be a branch, city, region, etc.).
-
A column stores the full path from the root to that node, using a delimiter.
-
You can store both keys and names in the path for flexibility.
Example Table: dim_branch
branch_key | branch_name | branch_type | parent_branch_key | pathstring | pathstring_names | depth |
---|---|---|---|---|---|---|
1 | USA | Country | NULL | /1/ | /USA/ | 1 |
2 | Northeast | Region | 1 | /1/2/ | /USA/Northeast/ | 2 |
3 | Boston | City | 2 | /1/2/3/ | /USA/Northeast/Boston/ | 3 |
4 | Boston-01 | Branch | 3 | /1/2/3/4/ | /USA/Northeast/Boston/B01/ | 4 |
5 | Midwest | Region | 1 | /1/5/ | /USA/Midwest/ | 2 |
6 | Chicago | City | 5 | /1/5/6/ | /USA/Midwest/Chicago/ | 3 |
7 | Chicago-01 | Branch | 6 | /1/5/6/7/ | /USA/Midwest/Chicago/C01/ | 4 |
Columns
-
branch_key: Surrogate key for the node.
-
branch_name: Name of the node (e.g., "Chicago").
-
branch_type: What kind of node it is (Country, Region, City, Branch, etc.).
-
parent_branch_key: Links to the parent node’s surrogate key.
-
pathstring: Path of IDs from root to this node, delimited (e.g.,
/1/5/6/7/
). -
pathstring_names: Human-readable path for reporting.
-
depth: Numeric depth level (root=1).
3. Advantages of Pathstring
-
Variable depth friendly — you don’t need to redesign when levels change.
-
Easy to filter for descendants:
-
Easy to find ancestors:
4. Best Practices
-
Delimiter choice: Use
/
or another character unlikely to appear in IDs. -
Indexing:
-
For descendant queries, index the
pathstring
column. -
For ancestor queries, consider materialized paths with reverse strings or store both directions.
-
-
Depth column: Store
depth
for quick filtering (e.g., find only level 3). -
SCD handling: If node names or parents can change, consider SCD Type 2 so you preserve historical paths.
-
Pathstring vs. path IDs:
-
Keep
pathstring
with surrogate keys for joins. -
Keep
pathstring_names
for display purposes.
-
5. When to Use Pathstring
✅ When the hierarchy is irregular and changes over time.
✅ When you need fast descendant/ancestor queries without recursive CTEs.
✅ When you can afford a slightly larger dimension table (path redundancy).