Tree data as a nested list
-
Comments:
- here.
One of the nice things about Adjacency Lists as a method of storing tree structures is that there is not much redundancy: you only store a reference to the parent, and that’s it.
It does mean that getting that data in a nested object is a bit complicated. I’ve written before about getting data out of a database: I’ll revisit that again I’m sure, but for now, I’m going to deal with data that has the following shape: that is, has been built up into a Materialized Path:
[
{
"node": 1,
"ancestors": [],
"label": "Australia"
},
{
"node": 2,
"ancestors": [1],
"label": "South Australia"
},
{
"node": 3,
"ancestors": [1],
"label": "Victoria"
},
{
"node": 4,
"ancestors": [1, 2],
"label": "South-East"
},
{
"node": 5,
"ancestors": [1, 3],
"label": "Western Districts"
},
{
"node": 6,
"ancestors": [],
"label": "New Zealand"
},
{
"node": 7,
"ancestors": [1, 2],
"label": "Barossa Valley"
},
{
"node": 8,
"ancestors": [1, 2],
"label": "Riverland"
}
]
From here, we want to build up something that looks like:
- Australia
- South Australia
- Barossa Valley
- Riverland
- South East
- Victoria
- Western Districts
- South Australia
- New Zealand
Or, a nested python data structure:
[
('Australia', [
('South Australia', [
('Barossa Valley', []),
('Riverland', []),
('South-East', [])
]),
('Victoria', [
('Western Districts', [])
])
]),
('New Zealand', [])
]
You’ll see that each node is a 2-tuple, and each set of siblings is a list. Even a node with no children still gets an empty list.
We can build up this data structure in two steps: based on the fact that a dict, as key-value pairs, matches a 2-tuple. That is, we will start by creating:
{
1: {
2: {
4: {},
7: {},
8: {},
},
3: {
5: {},
}
},
6: {},
}
You might be reaching for python’s defaultdict
class at this point, but there is a slightly nicer way:
class Tree(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
(Note: This class, and the seed of the idea, came from this answer on StackOverflow).
We can also create a recursive method on this class that creates a node and all of it’s ancestors:
def insert(self, key, ancestors):
if ancestors:
self[ancestors[0]].insert(key, ancestors[1:])
else:
self[key]
>>> tree = Tree()
>>> for node in data:
... tree.insert(node['node'], node['ancestors'])
>>> print tree
{1: {2: {8: {}, 4: {}, 7: {}}, 3: {5: {}}}, 6: {}}
Looking good.
Let’s make another method that allows us to actually insert the labels (and apply a sort, if we want):
def label(self, label_dict, sort_key=lambda x: x[0]):
return sorted([
(label_dict.get(key), value.label(label_dict, sort_key))
for key, value in self.items()
], key=sort_key)
We also need to build up the simple key-value store to pass as label_dict
, but that’s pretty easy.
Let’s look at the full code: first the complete class:
class Tree(dict):
"""Simple Tree data structure
Stores data in the form:
{
"a": {
"b": {},
"c": {},
},
"d": {
"e": {},
},
}
And can be nested to any depth.
"""
def __missing__(self, key):
value = self[key] = type(self)()
return value
def insert(self, node, ancestors):
"""Insert the supplied node, creating all ancestors as required.
This expects a list (possibly empty) containing the ancestors,
and a value for the node.
"""
if not ancestors:
self[node]
else:
self[ancestors[0]].insert(node, ancestors[1:])
def label(self, labels, sort_key=lambda x: x[0]):
"""Return a nested 2-tuple with just the supplied labels.
Realistically, the labels could be any type of object.
"""
return sorted([
(
labels.get(key),
value.label(labels, sort_key)
) for key, value in self.items()
], key=sort_key)
Now, using it:
>>> tree = Tree()
>>> labels = {}
>>>
>>> for node in data:
>>> tree.insert(node['node'], node['ancestors'])
>>> labels[node['node']] = node['label']
>>>
>>> from pprint import pprint
>>> pprint(tree.label(labels))
[('Australia',
[('South Australia',
[('Barossa Valley', []), ('Riverland', []), ('South-East', [])]),
('Victoria', [('Western Districts', [])])]),
('New Zealand', [])]
Awesome. Now use your template rendering of choice to turn this into a nicely formatted list.