ZhangZhihui's Blog  
Can you give me an example to query parent-child hierarchy using Spark SQL recursive CTE?
 
ChatGPT said:

Sure! Here are clear, complete Spark SQL examples that show how to query a parent-child hierarchy using a recursive CTE.
(Supported in Spark 3.4+.)


⭐ Example Scenario

Suppose you have a table:

CREATE TABLE org (
  id INT,
  parent_id INT,
  name STRING
);

Example data:

idparent_idname
1 NULL CEO
2 1 VP Eng
3 1 VP Sales
4 2 Manager A
5 2 Manager B
6 4 Engineer

This forms a tree:

CEO
 ├─ VP Eng
 │    ├─ Manager A
 │    │     └─ Engineer
 │    └─ Manager B
 └─ VP Sales

 


Spark SQL Recursive CTE Example

WITH RECURSIVE org_tree AS (
  
  -- 1. Anchor: all root nodes
  SELECT
    id,
    parent_id,
    name,
    CAST(name AS STRING) AS path,
    0 AS level
  FROM org
  WHERE parent_id IS NULL
  
  UNION ALL
  
  -- 2. Recursive: join children to parents
  SELECT
    c.id,
    c.parent_id,
    c.name,
    CONCAT(p.path, ' > ', c.name) AS path,
    p.level + 1 AS level
  FROM org c
  JOIN org_tree p
    ON c.parent_id = p.id
)

SELECT *
FROM org_tree
ORDER BY path;

 


✔ Output

idparent_idnamepathlevel
1 NULL CEO CEO 0
2 1 VP Eng CEO > VP Eng 1
3 1 VP Sales CEO > VP Sales 1
4 2 Manager A CEO > VP Eng > Manager A 2
5 2 Manager B CEO > VP Eng > Manager B 2
6 4 Engineer CEO > VP Eng > Manager A > Engineer 3

🔍 How it works

1. Anchor query

Gets root rows (parent_id IS NULL).

2. Recursive query

Keeps joining child rows to the current “tree”.

3. Spark handles recursion

Spark automatically expands:

level 0 → level 1 → level 2 → …

Until no more child rows exist.


⭐ Bonus: Query all descendants of a specific id

WITH RECURSIVE sub AS (
  SELECT *
  FROM org
  WHERE id = 2        -- start from VP Eng
  UNION ALL
  SELECT c.*
  FROM org c
  JOIN sub p ON c.parent_id = p.id
)
SELECT * FROM sub;

 


⭐ Bonus: Query all ancestors of a specific id

WITH RECURSIVE ancestors AS (
  SELECT *
  FROM org
  WHERE id = 6         -- Engineer
  UNION ALL
  SELECT p.*
  FROM org p
  JOIN ancestors c ON p.id = c.parent_id
)
SELECT * FROM ancestors;

 

 

posted on 2025-11-17 09:31  ZhangZhihuiAAA  阅读(0)  评论(0)    收藏  举报