Today, I’m going to walk through modelling a tree in Django, using an Adjacency List, and a Postgres View that dynamically creates the materialised path of ancestors for each node.
With this, we will be able to query the tree for a range of operations using the Django ORM.
We will start with our model:
class Node(models.Model): node_id = models.AutoField(primary_key=True) parent = models.ForeignKey('tree.node', related_name='children', null=True, blank=True) class Meta: app_label = 'tree'
We will also build an unmanaged model that will be backed by our view.
from django.contrib.postgres.fields import ArrayField class Tree(models.Model): root = models.ForeignKey(Node, related_name='+') node = models.OneToOneField(Node, related_name='tree_node', primary_key=True) ancestors = ArrayField(base_field=models.IntegerField()) class Meta: app_label = 'tree' managed = False
You’ll notice I’ve included a root relation. This could be obtained by using
ancestors if ancestors else node_id, but that’s a bit cumbersome.
So, on to the View:
CREATE RECURSIVE VIEW tree_tree(root_id, node_id, ancestors) AS SELECT node_id, node_id, ARRAY::INTEGER FROM tree_node WHERE parent_id IS NULL UNION ALL SELECT tree.root_id, node.node_id, tree.ancestors || node.parent_id FROM tree_node node INNER JOIN tree_tree tree ON (node.parent_id = tree.node_id)
I’ve written this view before, so I won’t go into any detail.
We can create a tree. Normally I wouldn’t specify the primary key, but since we want to talk about those values shortly, I will. It also means you can delete them, and recreate with this code, and not worry about the sequence values.
from tree.models import Node Node.objects.bulk_create([ Node(pk=1), Node(pk=2, parent_id=1), Node(pk=3, parent_id=1), Node(pk=4, parent_id=2), Node(pk=5, parent_id=2), Node(pk=6, parent_id=3), Node(pk=7, parent_id=3), Node(pk=8, parent_id=4), Node(pk=9, parent_id=8), Node(pk=10), Node(pk=11, parent_id=10), Node(pk=12, parent_id=11), Node(pk=13, parent_id=11), Node(pk=14, parent_id=12), Node(pk=15, parent_id=12), Node(pk=16, parent_id=12), ])
Okay, let’s start looking at how we might perform some operations on it.
We’ve already seen how to create a node, either root or leaf nodes. No worries there.
What about inserting an intermediate node, say between 11 and 12?
node = Node.objects.create(parent_id=11) node.parent.children.exclude(pk=node.pk).update(parent=node)
I’m not sure if it is possible to do it in a single statement.
Okay, let’s jump to some tree-based statements. We’ll start by finding a sub-tree.
Oh, that’s pretty nice. It’s not necessarily sorted, but it will do for now.
We can also query directly for a root:
We could spell that one as
tree_node__ancestors__0=10, but I think this is more explicit. Also, that one will not include the root node itself.
Deletions are also simple: if we can build a queryset, we can delete it. Thus, deleting a full tree could be done by following any queryset by a
Fetching a node’s ancestors is a little trickier: because we only have an array of node ids; thus it does two queries.
The count of ancestors doesn’t require the second query:
Getting ancestors to a given depth is also simple, although it still requires two queries:
This is a fairly simple way to enable relatively performance-aware queries of tree data. There are still places where it’s not perfect, and in reality, you’d probably look at building up queryset or model methods for wrapping common operations.