Adjacency Lists in Django with Postgres
-
Comments:
- here.
Today, I’m going to walk through modelling a tree in Django, using an Adjacency List, and a Postgres View that dynamically creates the materialised path of ancestors for each node.
With this, we will be able to query the tree for a range of operations using the Django ORM.
We will start with our model:
class Node(models.Model):
node_id = models.AutoField(primary_key=True)
parent = models.ForeignKey('tree.node', related_name='children', null=True, blank=True)
class Meta:
app_label = 'tree'
We will also build an unmanaged model that will be backed by our view.
from django.contrib.postgres.fields import ArrayField
class Tree(models.Model):
root = models.ForeignKey(Node, related_name='+')
node = models.OneToOneField(Node, related_name='tree_node', primary_key=True)
ancestors = ArrayField(base_field=models.IntegerField())
class Meta:
app_label = 'tree'
managed = False
You’ll notice I’ve included a root relation. This could be obtained by using ancestors[0] if ancestors else node_id
, but that’s a bit cumbersome.
So, on to the View:
CREATE RECURSIVE VIEW tree_tree(root_id, node_id, ancestors) AS
SELECT node_id, node_id, ARRAY[]::INTEGER[]
FROM tree_node WHERE parent_id IS NULL
UNION ALL
SELECT tree.root_id, node.node_id, tree.ancestors || node.parent_id
FROM tree_node node INNER JOIN tree_tree tree ON (node.parent_id = tree.node_id)
I’ve written this view before, so I won’t go into any detail.
We can create a tree. Normally I wouldn’t specify the primary key, but since we want to talk about those values shortly, I will. It also means you can delete them, and recreate with this code, and not worry about the sequence values.
from tree.models import Node
Node.objects.bulk_create([
Node(pk=1),
Node(pk=2, parent_id=1),
Node(pk=3, parent_id=1),
Node(pk=4, parent_id=2),
Node(pk=5, parent_id=2),
Node(pk=6, parent_id=3),
Node(pk=7, parent_id=3),
Node(pk=8, parent_id=4),
Node(pk=9, parent_id=8),
Node(pk=10),
Node(pk=11, parent_id=10),
Node(pk=12, parent_id=11),
Node(pk=13, parent_id=11),
Node(pk=14, parent_id=12),
Node(pk=15, parent_id=12),
Node(pk=16, parent_id=12),
])
Okay, let’s start looking at how we might perform some operations on it.
We’ve already seen how to create a node, either root or leaf nodes. No worries there.
What about inserting an intermediate node, say between 11 and 12?
node = Node.objects.create(parent_id=11)
node.parent.children.exclude(pk=node.pk).update(parent=node)
I’m not sure if it is possible to do it in a single statement.
Okay, let’s jump to some tree-based statements. We’ll start by finding a sub-tree.
Node.objects.filter(tree_node__ancestors__contains=[2])
Oh, that’s pretty nice. It’s not necessarily sorted, but it will do for now.
We can also query directly for a root:
Node.objects.filter(tree_node__root=10)
We could spell that one as tree_node__ancestors__0=10
, but I think this is more explicit. Also, that one will not include the root node itself.
Deletions are also simple: if we can build a queryset, we can delete it. Thus, deleting a full tree could be done by following any queryset by a .delete()
Fetching a node’s ancestors is a little trickier: because we only have an array of node ids; thus it does two queries.
Node.objects.filter(pk__in=Node.objects.get(pk=15).tree_node.ancestors)
The count of ancestors doesn’t require the second query:
len(Node.objects.get(pk=15).tree_node.ancestors)
Getting ancestors to a given depth is also simple, although it still requires two queries:
Node.objects.filter(pk__in=Node.objects.get(pk=15).tree_node.ancestors[-2:])
This is a fairly simple way to enable relatively performance-aware queries of tree data. There are still places where it’s not perfect, and in reality, you’d probably look at building up queryset or model methods for wrapping common operations.