Tree data as a nested list redux
-
Comments:
- here.
Some time ago, I wrote about using python to aggregate data that is stored with a Materialized Path into a Nested List structure.
But we should be able to do that same aggregation using Postgres, and from an Adjacency List structure.
Let’s start with a table definition:
CREATE TABLE location (
node_id SERIAL PRIMARY KEY,
name TEXT,
parent_id INTEGER REFERENCES location(node_id)
);
And some data:
INSERT INTO location (node_id, name, parent_id) VALUES
(1, 'Australia', NULL),
(2, 'South Australia', 1),
(3, 'Victoria', 1),
(4, 'South-East', 2),
(5, 'Western Districts', 3),
(6, 'New Zealand', NULL),
(7, 'Barossa Valley', 2),
(8, 'Riverland', 2),
(9, 'South Island', 6),
(10, 'North Island', 6),
(11, 'Eastern Bay of Plenty', 10);
To begin with, we need to get all of the items, and their depth:
WITH RECURSIVE location_with_level AS (
SELECT *,
0 AS lvl
FROM location
WHERE parent_id IS NULL
UNION ALL
SELECT child.*,
parent.lvl + 1
FROM location child
JOIN location_with_level parent ON parent.node_id = child.parent_id
)
SELECT * FROM location_with_level;
node_id │ name │ parent_id │ lvl
─────────┼───────────────────────┼───────────┼─────
1 │ Australia │ <NULL> │ 0
6 │ New Zealand │ <NULL> │ 0
2 │ South Australia │ 1 │ 1
3 │ Victoria │ 1 │ 1
9 │ South Island │ 6 │ 1
10 │ North Island │ 6 │ 1
4 │ South-East │ 2 │ 2
5 │ Western Districts │ 3 │ 2
7 │ Barossa Valley │ 2 │ 2
8 │ Riverland │ 2 │ 2
11 │ Eastern Bay of Plenty │ 10 │ 2
(11 rows)
Because of the way recursive queries work, we need to find the deepest node(s), and start there:
WITH RECURSIVE location_with_level AS (
SELECT *,
0 AS lvl
FROM location
WHERE parent_id IS NULL
UNION ALL
SELECT child.*,
parent.lvl + 1
FROM location child
JOIN location_with_level parent ON parent.node_id = child.parent_id
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM location_with_level
)
SELECT * FROM maxlvl;
We then need to build up the tree (this clause is the next one in our CTE chain, I’ve omitted the first two for clarity):
c_tree AS (
SELECT location_with_level.*,
NULL::JSONB children
FROM location_with_level, maxlvl
WHERE lvl = maxlvl
UNION
(
SELECT (branch_parent).*,
jsonb_agg(branch_child)
FROM (
SELECT branch_parent,
to_jsonb(branch_child) - 'lvl' - 'parent_id' - 'node_id' AS branch_child
FROM location_with_level branch_parent
JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.node_id
) branch
GROUP BY branch.branch_parent
UNION
SELECT c.*,
NULL::JSONB
FROM location_with_level c
WHERE NOT EXISTS (SELECT 1
FROM location_with_level hypothetical_child
WHERE hypothetical_child.parent_id = c.node_id)
)
)
The first part of this query gets all of the deepest leaf nodes.
This is then combined with another recursive subquery, that creates branches. This relies on the fact it’s possible use the “type”, and have records as columns in a query. The second part of this subquery finds all remaining leaf nodes, and combines them in. This second subquery will keep executing until it doesn’t find any new rows, which will happen when all root nodes have been processed.
We can see from the results of this last clause that we just need to limit this to root nodes:
node_id │ name │ parent_id │ lvl │ children
─────────┼───────────────────────┼───────────┼─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
4 │ South-East │ 2 │ 2 │ <NULL>
5 │ Western Districts │ 3 │ 2 │ <NULL>
7 │ Barossa Valley │ 2 │ 2 │ <NULL>
8 │ Riverland │ 2 │ 2 │ <NULL>
11 │ Eastern Bay of Plenty │ 10 │ 2 │ <NULL>
3 │ Victoria │ 1 │ 1 │ [{"name": "Western Districts", "children": null}]
10 │ North Island │ 6 │ 1 │ [{"name": "Eastern Bay of Plenty", "children": null}]
9 │ South Island │ 6 │ 1 │ <NULL>
2 │ South Australia │ 1 │ 1 │ [{"name": "Riverland", "children": null}, {"name": "Barossa Valley", "children": null}, {"name": "South-East…
│ │ │ │…", "children": null}]
6 │ New Zealand │ <NULL> │ 0 │ [{"name": "South Island", "children": null}, {"name": "North Island", "children": [{"name": "Eastern Bay of …
│ │ │ │…Plenty", "children": null}]}]
1 │ Australia │ <NULL> │ 0 │ [{"name": "South Australia", "children": [{"name": "Riverland", "children": null}, {"name": "Barossa Valley"…
│ │ │ │…, "children": null}, {"name": "South-East", "children": null}]}, {"name": "Victoria", "children": [{"name": …
│ │ │ │…"Western Districts", "children": null}]}]
(11 rows)
So our final query, using the new jsonb_pretty
function:
WITH RECURSIVE location_with_level AS (
SELECT *,
0 AS lvl
FROM location
WHERE parent_id IS NULL
UNION ALL
SELECT child.*,
parent.lvl + 1
FROM location child
JOIN location_with_level parent ON parent.node_id = child.parent_id
),
maxlvl AS (
SELECT max(lvl) maxlvl FROM location_with_level
),
c_tree AS (
SELECT location_with_level.*,
NULL::JSONB children
FROM location_with_level, maxlvl
WHERE lvl = maxlvl
UNION
(
SELECT (branch_parent).*,
jsonb_agg(branch_child)
FROM (
SELECT branch_parent,
to_jsonb(branch_child) - 'lvl' - 'parent_id' - 'node_id' AS branch_child
FROM location_with_level branch_parent
JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.node_id
) branch
GROUP BY branch.branch_parent
UNION
SELECT c.*,
NULL::JSONB
FROM location_with_level c
WHERE NOT EXISTS (SELECT 1
FROM location_with_level hypothetical_child
WHERE hypothetical_child.parent_id = c.node_id)
)
)
SELECT jsonb_pretty(
array_to_json(
array_agg(
row_to_json(c_tree)::JSONB - 'lvl' - 'parent_id' - 'node_id'
)
)::JSONB
) AS tree
FROM c_tree
WHERE lvl=0;
And our results:
tree
──────────────────────────────────────────────────────────
[
{
"name": "New Zealand",
"children": [
{
"name": "South Island",
"children": null
},
{
"name": "North Island",
"children": [
{
"name": "Eastern Bay of Plenty",
"children": null
}
]
}
]
},
{
"name": "Australia",
"children": [
{
"name": "South Australia",
"children": [
{
"name": "Riverland",
"children": null
},
{
"name": "Barossa Valley",
"children": null
},
{
"name": "South-East",
"children": null
}
]
},
{
"name": "Victoria",
"children": [
{
"name": "Western Districts",
"children": null
}
]
}
]
}
]
(1 row)
Oh, that is rather neat.
This query is mostly cribbed from a fantastic Stack Overflow answer by David Guillot.