Tree data as a nested list redux

Some time ago, I wrote about using python to aggregate data that is stored with a Materialized Path into a Nested List structure.

But we should be able to do that same aggregation using Postgres, and from an Adjacency List structure.

Let’s start with a table definition:

CREATE TABLE location (
  node_id SERIAL PRIMARY KEY,
  name TEXT,
  parent_id INTEGER REFERENCES location(node_id)
);

And some data:

INSERT INTO location (node_id, name, parent_id) VALUES
  (1, 'Australia', NULL),
  (2, 'South Australia', 1),
  (3, 'Victoria', 1),
  (4, 'South-East', 2),
  (5, 'Western Districts', 3),
  (6, 'New Zealand', NULL),
  (7, 'Barossa Valley', 2),
  (8, 'Riverland', 2),
  (9, 'South Island', 6),
  (10, 'North Island', 6),
  (11, 'Eastern Bay of Plenty', 10);

To begin with, we need to get all of the items, and their depth:

WITH RECURSIVE location_with_level AS (
  SELECT *,
         0 AS lvl
    FROM location
   WHERE parent_id IS NULL

  UNION ALL

  SELECT child.*,
         parent.lvl + 1
    FROM location child
    JOIN location_with_level parent ON parent.node_id = child.parent_id
)
SELECT * FROM location_with_level;
 node_id │         name          │ parent_id │ lvl
─────────┼───────────────────────┼───────────┼─────
       1 │ Australia             │    <NULL> │   0
       6 │ New Zealand           │    <NULL> │   0
       2 │ South Australia       │         1 │   1
       3 │ Victoria              │         1 │   1
       9 │ South Island          │         6 │   1
      10 │ North Island          │         6 │   1
       4 │ South-East            │         2 │   2
       5 │ Western Districts     │         3 │   2
       7 │ Barossa Valley        │         2 │   2
       8 │ Riverland             │         2 │   2
      11 │ Eastern Bay of Plenty │        10 │   2
(11 rows)

Because of the way recursive queries work, we need to find the deepest node(s), and start there:

WITH RECURSIVE location_with_level AS (
  SELECT *,
         0 AS lvl
    FROM location
   WHERE parent_id IS NULL

  UNION ALL

  SELECT child.*,
         parent.lvl + 1
    FROM location child
    JOIN location_with_level parent ON parent.node_id = child.parent_id
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM location_with_level
)

SELECT * FROM maxlvl;

We then need to build up the tree (this clause is the next one in our CTE chain, I’ve omitted the first two for clarity):

c_tree AS (
  SELECT location_with_level.*,
         NULL::JSONB children
    FROM location_with_level, maxlvl
   WHERE lvl = maxlvl

   UNION

   (
     SELECT (branch_parent).*,
            jsonb_agg(branch_child)
       FROM (
         SELECT branch_parent,
                to_jsonb(branch_child) - 'lvl' - 'parent_id' - 'node_id' AS branch_child
           FROM location_with_level branch_parent
           JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.node_id
       ) branch
       GROUP BY branch.branch_parent

       UNION

       SELECT c.*,
              NULL::JSONB
       FROM location_with_level c
       WHERE NOT EXISTS (SELECT 1
                           FROM location_with_level hypothetical_child
                          WHERE hypothetical_child.parent_id = c.node_id)
   )
)

The first part of this query gets all of the deepest leaf nodes.

This is then combined with another recursive subquery, that creates branches. This relies on the fact it’s possible use the “type”, and have records as columns in a query. The second part of this subquery finds all remaining leaf nodes, and combines them in. This second subquery will keep executing until it doesn’t find any new rows, which will happen when all root nodes have been processed.

We can see from the results of this last clause that we just need to limit this to root nodes:

 node_id │         name          │ parent_id │ lvl │                                                   children
─────────┼───────────────────────┼───────────┼─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────
       4 │ South-East            │         2 │   2 │ <NULL>
       5 │ Western Districts     │         3 │   2 │ <NULL>
       7 │ Barossa Valley        │         2 │   2 │ <NULL>
       8 │ Riverland             │         2 │   2 │ <NULL>
      11 │ Eastern Bay of Plenty │        10 │   2 │ <NULL>
       3 │ Victoria              │         1 │   1 │ [{"name": "Western Districts", "children": null}]
      10 │ North Island          │         6 │   1 │ [{"name": "Eastern Bay of Plenty", "children": null}]
       9 │ South Island          │         6 │   1 │ <NULL>
       2 │ South Australia       │         1 │   1 │ [{"name": "Riverland", "children": null}, {"name": "Barossa Valley", "children": null}, {"name": "South-East…
         │                       │           │     │…", "children": null}]
       6 │ New Zealand           │    <NULL> │   0 │ [{"name": "South Island", "children": null}, {"name": "North Island", "children": [{"name": "Eastern Bay of …
         │                       │           │     │…Plenty", "children": null}]}]
       1 │ Australia             │    <NULL> │   0 │ [{"name": "South Australia", "children": [{"name": "Riverland", "children": null}, {"name": "Barossa Valley"…
         │                       │           │     │…, "children": null}, {"name": "South-East", "children": null}]}, {"name": "Victoria", "children": [{"name": …
         │                       │           │     │…"Western Districts", "children": null}]}]
(11 rows)

So our final query, using the new jsonb_pretty function:

WITH RECURSIVE location_with_level AS (
  SELECT *,
         0 AS lvl
    FROM location
   WHERE parent_id IS NULL

  UNION ALL

  SELECT child.*,
         parent.lvl + 1
    FROM location child
    JOIN location_with_level parent ON parent.node_id = child.parent_id
),
maxlvl AS (
  SELECT max(lvl) maxlvl FROM location_with_level
),
c_tree AS (
  SELECT location_with_level.*,
         NULL::JSONB children
    FROM location_with_level, maxlvl
   WHERE lvl = maxlvl

   UNION

   (
     SELECT (branch_parent).*,
            jsonb_agg(branch_child)
       FROM (
         SELECT branch_parent,
                to_jsonb(branch_child) - 'lvl' - 'parent_id' - 'node_id' AS branch_child
           FROM location_with_level branch_parent
           JOIN c_tree branch_child ON branch_child.parent_id = branch_parent.node_id
       ) branch
       GROUP BY branch.branch_parent

       UNION

       SELECT c.*,
              NULL::JSONB
       FROM location_with_level c
       WHERE NOT EXISTS (SELECT 1
                           FROM location_with_level hypothetical_child
                          WHERE hypothetical_child.parent_id = c.node_id)
   )
)

SELECT jsonb_pretty(
         array_to_json(
           array_agg(
             row_to_json(c_tree)::JSONB - 'lvl' - 'parent_id' - 'node_id'
           )
         )::JSONB
       ) AS tree
  FROM c_tree
  WHERE lvl=0;

And our results:

                           tree
 ──────────────────────────────────────────────────────────
  [
      {
          "name": "New Zealand",
          "children": [
              {
                  "name": "South Island",
                  "children": null
              },
              {
                  "name": "North Island",
                  "children": [
                      {
                          "name": "Eastern Bay of Plenty",
                          "children": null
                      }
                  ]
              }
          ]
      },
      {
          "name": "Australia",
          "children": [
              {
                  "name": "South Australia",
                  "children": [
                      {
                          "name": "Riverland",
                          "children": null
                      },
                      {
                          "name": "Barossa Valley",
                          "children": null
                      },
                      {
                          "name": "South-East",
                          "children": null
                      }
                  ]
              },
              {
                  "name": "Victoria",
                  "children": [
                      {
                          "name": "Western Districts",
                          "children": null
                      }
                  ]
              }
          ]
      }
  ]
 (1 row)

Oh, that is rather neat.

This query is mostly cribbed from a fantastic Stack Overflow answer by David Guillot.