Aggregating ranges in Python

Comments:
 here.
This is a bit of a followup to Aggregating Ranges in Postgres.
Since we don’t have a nice range type in Python, we will just use a tuple that contains a lower and upper bound. We will assume that this is a canonical form: the lower bound is inclusive, the upper bound is noninclusive. We will also assume (for simplicity) that the 0th item is always less than the 1th item in the tuple.
Given a list of these rangetuples, we want to do the two things we saw in the previous post.
 Aggregate them into the simplest possible array of rangetuples, using a union operation.
 Determine the parts that are missing.
We will use in in a similar way:
>>> data =[(2,3),(3,4),(11,12),(5,12),(4,5)]
>>> aggregate_union(data)
(2, 12)
>>> missing_ranges(data)
[(None, 2), (12, None)]
Luckily, None
sorts before any integer in python, so we will just be able to use the normal sort.
def aggregate_union(data):
if not data:
return None
sorted_data = sorted(data)
result = sorted_data[0]
for lower, upper in sorted_data[1:]:
# If we ever find result[1] is None, we know it covers any
# other possible values, so we can stop at that point.
if result[1] is None:
break
if lower > result[1]:
return None # Or raise an exception, if you want.
result = (result[0], max(upper, result[1]))
return result
The alternative, missing_ranges(data)
takes cues from the SQL version too.
def missing_ranges(data):
if not data:
return (None, None)
result = []
# We do a little fancy stuff here: append an extra item that
# mimics what we were able to use lead for, but in a different
# way so we can use [i + 1] later.
sorted_data = sorted(data) + [(None, None)]
if sorted_data[0][0] is not None:
# Note: the upper bound here is not quite correct!
result.append((None, sorted_data[0][1]))
for i, (lower, upper) in enumerate(sorted_data):
# Grab the next item in our sorted list. Normally, we would check
# to see if we are at the end, but we will rely on the break later
# on: we can always stop processing when the upper bound of our
# next item is `None`
next = sorted_data[i + 1]
if upper < next[0] or (next[0] is None and upper is not None):
result.append((upper, next[0]))
# Now, exit before we ever get to a bounds error on getting the next item.
if next[1] is None:
break
return result
However, there is a problem here that is nicely solved by the Range types within postgres:
>>> missing_ranges(data)
[(None, 3), (12, None)]
We need to subtract from the first object’s upper bound. Which is easy with integer tupleranges (basically, any discrete range type), but not so much with continuous ranges.