Aggregating ranges in Python
-
Comments:
- here.
This is a bit of a follow-up to Aggregating Ranges in Postgres.
Since we don’t have a nice range type in Python, we will just use a tuple that contains a lower and upper bound. We will assume that this is a canonical form: the lower bound is inclusive, the upper bound is non-inclusive. We will also assume (for simplicity) that the 0-th item is always less than the 1-th item in the tuple.
Given a list of these range-tuples, we want to do the two things we saw in the previous post.
- Aggregate them into the simplest possible array of range-tuples, using a union operation.
- Determine the parts that are missing.
We will use in in a similar way:
>>> data =[(2,3),(3,4),(11,12),(5,12),(4,5)]
>>> aggregate_union(data)
(2, 12)
>>> missing_ranges(data)
[(None, 2), (12, None)]
Luckily, None
sorts before any integer in python, so we will just be able to use the normal sort.
def aggregate_union(data):
if not data:
return None
sorted_data = sorted(data)
result = sorted_data[0]
for lower, upper in sorted_data[1:]:
# If we ever find result[1] is None, we know it covers any
# other possible values, so we can stop at that point.
if result[1] is None:
break
if lower > result[1]:
return None # Or raise an exception, if you want.
result = (result[0], max(upper, result[1]))
return result
The alternative, missing_ranges(data)
takes cues from the SQL version too.
def missing_ranges(data):
if not data:
return (None, None)
result = []
# We do a little fancy stuff here: append an extra item that
# mimics what we were able to use lead for, but in a different
# way so we can use [i + 1] later.
sorted_data = sorted(data) + [(None, None)]
if sorted_data[0][0] is not None:
# Note: the upper bound here is not quite correct!
result.append((None, sorted_data[0][1]))
for i, (lower, upper) in enumerate(sorted_data):
# Grab the next item in our sorted list. Normally, we would check
# to see if we are at the end, but we will rely on the break later
# on: we can always stop processing when the upper bound of our
# next item is `None`
next = sorted_data[i + 1]
if upper < next[0] or (next[0] is None and upper is not None):
result.append((upper, next[0]))
# Now, exit before we ever get to a bounds error on getting the next item.
if next[1] is None:
break
return result
However, there is a problem here that is nicely solved by the Range types within postgres:
>>> missing_ranges(data)
[(None, 3), (12, None)]
We need to subtract from the first object’s upper bound. Which is easy with integer tuple-ranges (basically, any discrete range type), but not so much with continuous ranges.