ZhangZhihui's Blog  

When you have a variable depth hierarchy (the number of levels is not fixed), using a pathstring column in the dimension table is a practical and efficient design—especially for reporting and drill-downs.

Let’s walk through the design for your bank branches example.


1. The Problem

Fixed depth works well when the hierarchy is predictable (e.g., always Country → Region → City → Branch).
But in reality:

  • Some branches may have an extra subdivision.

  • Some may skip a level.

  • The number of levels might change over time.

This makes fixed columns (level_1_name, level_2_name, etc.) messy.


2. Pathstring Approach

In the pathstring model:

  • Each record represents one node in the hierarchy (could be a branch, city, region, etc.).

  • A column stores the full path from the root to that node, using a delimiter.

  • You can store both keys and names in the path for flexibility.


Example Table: dim_branch

branch_keybranch_namebranch_typeparent_branch_keypathstringpathstring_namesdepth
1 USA Country NULL /1/ /USA/ 1
2 Northeast Region 1 /1/2/ /USA/Northeast/ 2
3 Boston City 2 /1/2/3/ /USA/Northeast/Boston/ 3
4 Boston-01 Branch 3 /1/2/3/4/ /USA/Northeast/Boston/B01/ 4
5 Midwest Region 1 /1/5/ /USA/Midwest/ 2
6 Chicago City 5 /1/5/6/ /USA/Midwest/Chicago/ 3
7 Chicago-01 Branch 6 /1/5/6/7/ /USA/Midwest/Chicago/C01/ 4

Columns

  • branch_key: Surrogate key for the node.

  • branch_name: Name of the node (e.g., "Chicago").

  • branch_type: What kind of node it is (Country, Region, City, Branch, etc.).

  • parent_branch_key: Links to the parent node’s surrogate key.

  • pathstring: Path of IDs from root to this node, delimited (e.g., /1/5/6/7/).

  • pathstring_names: Human-readable path for reporting.

  • depth: Numeric depth level (root=1).


3. Advantages of Pathstring

  • Variable depth friendly — you don’t need to redesign when levels change.

  • Easy to filter for descendants:

    -- Get all descendants of 'Midwest'
    SELECT *
    FROM dim_branch
    WHERE pathstring LIKE '/1/5/%';

     

  • Easy to find ancestors:

    -- Get all ancestors of branch_key=7
    SELECT *
    FROM dim_branch
    WHERE '/1/5/6/7/' LIKE CONCAT(pathstring, '%');

     


4. Best Practices

  • Delimiter choice: Use / or another character unlikely to appear in IDs.

  • Indexing:

    • For descendant queries, index the pathstring column.

    • For ancestor queries, consider materialized paths with reverse strings or store both directions.

  • Depth column: Store depth for quick filtering (e.g., find only level 3).

  • SCD handling: If node names or parents can change, consider SCD Type 2 so you preserve historical paths.

  • Pathstring vs. path IDs:

    • Keep pathstring with surrogate keys for joins.

    • Keep pathstring_names for display purposes.


5. When to Use Pathstring

✅ When the hierarchy is irregular and changes over time.
✅ When you need fast descendant/ancestor queries without recursive CTEs.
✅ When you can afford a slightly larger dimension table (path redundancy).

 

posted on 2025-08-16 09:57  ZhangZhihuiAAA  阅读(14)  评论(0)    收藏  举报