Can you give me an example to query parent-child hierarchy using Spark SQL recursive CTE?
ChatGPT said:
Sure! Here are clear, complete Spark SQL examples that show how to query a parent-child hierarchy using a recursive CTE.
(Supported in Spark 3.4+.)
⭐ Example Scenario
Suppose you have a table:
CREATE TABLE org ( id INT, parent_id INT, name STRING );
Example data:
| id | parent_id | name |
|---|---|---|
| 1 | NULL | CEO |
| 2 | 1 | VP Eng |
| 3 | 1 | VP Sales |
| 4 | 2 | Manager A |
| 5 | 2 | Manager B |
| 6 | 4 | Engineer |
This forms a tree:
CEO ├─ VP Eng │ ├─ Manager A │ │ └─ Engineer │ └─ Manager B └─ VP Sales
✅ Spark SQL Recursive CTE Example
WITH RECURSIVE org_tree AS ( -- 1. Anchor: all root nodes SELECT id, parent_id, name, CAST(name AS STRING) AS path, 0 AS level FROM org WHERE parent_id IS NULL UNION ALL -- 2. Recursive: join children to parents SELECT c.id, c.parent_id, c.name, CONCAT(p.path, ' > ', c.name) AS path, p.level + 1 AS level FROM org c JOIN org_tree p ON c.parent_id = p.id ) SELECT * FROM org_tree ORDER BY path;
✔ Output
| id | parent_id | name | path | level |
|---|---|---|---|---|
| 1 | NULL | CEO | CEO | 0 |
| 2 | 1 | VP Eng | CEO > VP Eng | 1 |
| 3 | 1 | VP Sales | CEO > VP Sales | 1 |
| 4 | 2 | Manager A | CEO > VP Eng > Manager A | 2 |
| 5 | 2 | Manager B | CEO > VP Eng > Manager B | 2 |
| 6 | 4 | Engineer | CEO > VP Eng > Manager A > Engineer | 3 |
🔍 How it works
1. Anchor query
Gets root rows (parent_id IS NULL).
2. Recursive query
Keeps joining child rows to the current “tree”.
3. Spark handles recursion
Spark automatically expands:
level 0 → level 1 → level 2 → …
Until no more child rows exist.
⭐ Bonus: Query all descendants of a specific id
WITH RECURSIVE sub AS ( SELECT * FROM org WHERE id = 2 -- start from VP Eng UNION ALL SELECT c.* FROM org c JOIN sub p ON c.parent_id = p.id ) SELECT * FROM sub;
⭐ Bonus: Query all ancestors of a specific id
WITH RECURSIVE ancestors AS ( SELECT * FROM org WHERE id = 6 -- Engineer UNION ALL SELECT p.* FROM org p JOIN ancestors c ON p.id = c.parent_id ) SELECT * FROM ancestors;

浙公网安备 33010602011771号