Graphs in Django and Postgres

I have written a bunch of posts about dealing with trees in Postgres and Django, and Funkybob used some of this to start the package django-closure-view.

Today, someone was looking for similar functionality, but for a graph. Specifically, a Directed Acyclic Graph. Now, not every graph, or even every DAG is a tree, but every tree is a DAG.

So, the difference between a tree and a graph in this context is that a given node may have an arbitrary number of parents. But, and this is worth noting now, none of it’s parents may also be dependencies.

The first part of this tells us that we can no longer just use a simple self-relation in our model to store the relationship: because there could be multiple parents. Instead, we will need to have a many-to-many relation to store that.

from django.db import models


class Node(models.Model):
    node_id = models.AutoField(primary_key=True)
    name = models.TextField(unique=True)
    parents = models.ManyToManyField(
        'self',
        related_name='children',
        symmetrical=False,
    )

We can put some meaningful data into this graph to make it a little more obvious if our queries are sane:

django, pytz, sqlparse, asgiref = Node.objects.bulk_create([
    Node(name='django'),
    Node(name='pytz'),
    Node(name='sqlparse'),
    Node(name='asgiref'),
])

django.parents.add(pytz, sqlparse, asgiref)

graph_demo, psycopg2 = Node.objects.bulk_create([
    Node(name='graph_demo'),
    Node(name='psycopg2')
])

graph_demo.parents.add(psycopg2, django)

Let’s have a bit of a look at some of the queries we might need to think about.

-- All root nodes
SELECT node_id, name
  FROM graph_node
  LEFT OUTER JOIN graph_node_parents ON (node_id = from_node_id)
 WHERE to_node_id IS NULL;

As expected, this gives us back all packages that have no dependencies (parents):

 node_id │   name
─────────┼──────────
       6 │ psycopg2
       2 │ pytz
       4 │ asgiref
       3 │ sqlparse
(4 rows)

And now, all packages which are not depended upon by any other packages (no parents):

SELECT node_id, name
  FROM graph_node
  LEFT OUTER JOIN graph_node_parents ON (node_id = to_node_id)
 WHERE from_node_id IS NULL;

We should only have one package here: graph_demo.

From each of these, we can build up a recursive query to get all descendants, or all ancestors of each root/leaf node.

WITH RECURSIVE ancestors AS (
  SELECT node_id, '{}'::INTEGER[] AS ancestors
    FROM graph_node
    LEFT OUTER JOIN graph_node_parents ON (node_id = from_node_id)
   WHERE to_node_id IS NULL

   UNION

  SELECT node.from_node_id,
         ancestors.ancestors || ancestors.node_id
    FROM ancestors
   INNER JOIN graph_node_parents node
           ON (ancestors.node_id = to_node_id)
) SELECT * FROM ancestors;

From here, we can annotate on the names to double check:

WITH RECURSIVE ancestors AS (
  SELECT node_id, '{}'::INTEGER[] AS ancestors
    FROM graph_node
    LEFT OUTER JOIN graph_node_parents ON (node_id = from_node_id)
   WHERE to_node_id IS NULL

   UNION

  SELECT node.from_node_id,
         ancestors.ancestors || ancestors.node_id
    FROM ancestors
   INNER JOIN graph_node_parents node
           ON (ancestors.node_id = to_node_id)
)
SELECT node_id,
       node.name,
       ancestors,
       ARRAY(SELECT name
               FROM unnest(ancestors) node_id
              INNER JOIN graph_node USING (node_id)
       ) AS ancestor_names
  FROM ancestors
  INNER JOIN graph_node node USING (node_id);

So that has given us all ancestor chains: but what about if we just want the closure table: all ancestor/descendant pairs?

WITH RECURSIVE closure_table AS (
  SELECT from_node_id AS descendant,
         to_node_id AS ancestor
    FROM graph_node_parents

   UNION

  SELECT descendant,
         to_node_id AS ancestor
    FROM closure_table
   INNER JOIN graph_node_parents ON (from_node_id = ancestor)
)
SELECT * FROM closure_table

Okay, that was even easier than the previous query.

Once we have our closure table query, then we can look at preventing cycles.

CREATE OR REPLACE RECURSIVE VIEW
graph_closure_table (descendant, ancestor) AS (

  SELECT from_node_id AS descendant,
         to_node_id AS ancestor
    FROM graph_node_parents

   UNION

  SELECT descendant,
         to_node_id AS ancestor
    FROM graph_closure_table
   INNER JOIN graph_node_parents ON (from_node_id = ancestor)
);

And we can now use this in a function to prevent cycles

CREATE OR REPLACE FUNCTION prevent_cycles()
RETURNS TRIGGER AS $$

BEGIN
  IF EXISTS(SELECT 1
              FROM graph_closure_table
             WHERE ancestor = NEW.from_node_id
               AND descendant = NEW.to_node_id
           ) THEN
    RAISE EXCEPTION 'cycle detected';
  END IF;
  RETURN NEW;
END;

$$ LANGUAGE plpgsql STRICT;

CREATE TRIGGER prevent_cycles
BEFORE UPDATE OR INSERT ON graph_node_parents
FOR EACH ROW EXECUTE PROCEDURE prevent_cycles();

And this will prevent us from being able to set an invalid dependency relationship: ie, one that would trigger a cycle:

>>> django.parents.add(graph_demo)
Traceback (most recent call last):
  File "...django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.RaiseException: cycle detected
CONTEXT:  PL/pgSQL function prevent_cycles() line 9 at RAISE

It’s not totally ideal, but it does show how it protects against saving invalid relationships.


Interestingly, if we drop that constraint, we can still run the closure table query: it doesn’t give us an infinite loop, because the view uses a UNION instead of a UNION ALL: it’s going to drop any rows that are already in the output when it deals with each row - and since there are not an infinite number of combinations for a given set of dependencies, it will eventually return data.

So, where from here? I’m not sure. This was just something that I thought about while answering a question in IRC, and I felt like I needed to explore the idea.