Merging Adjacent Ranges in Postgres
-
Comments:
- here.
Previously, I detailed a solution to split/trim/replace overlapping items in a table. Subsequently, I decided I needed to merge all adjacent items that could be merged. In this case, that was with two other fields (only one of which was subject to the exclusion constraint) being identical in adjacent periods.
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE TABLE team_membership (
membership_id SERIAL,
player_id INTEGER,
team_id INTEGER,
period DATERANGE,
CONSTRAINT prevent_overlapping_memberships EXCLUDE USING gist(player_id WITH =, period WITH &&)
);
Before we can implement the plpgsql trigger function, we need to tell Postgres how to aggregate ranges:
CREATE AGGREGATE sum(anyrange) (
stype = anyrange,
sfunc = range_union
);
We should note at this point that range_union
, or +
(hence the reason I’ve called it SUM
) will fail with an error if the two ranges that are being combined do not overlap or touch. We must make sure that in any queries where we are going to use it, that all of the ranges will overlap (and I believe they also must be in “order”, so that as we perform the union on each range in a “reduce” manner we never end up with non-contiguous ranges).
So, let’s look at the trigger function. Initially, I wrote this as two queries:
CREATE OR REPLACE FUNCTION merge_adjacent()
RETURNS TRIGGER AS $$
BEGIN
NEW.period = (SELECT SUM(period) FROM (SELECT NEW.period UNION ALL ...));
DELETE FROM team_membership ...;
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
This required me to duplicate the WHERE clauses, and was messy.
Then I remembered you can use the RETURNING
clause, and use a CTE, with a SELECT INTO
:
CREATE OR REPLACE FUNCTION merge_adjacent()
RETURNS TRIGGER AS $$
BEGIN
WITH matching AS (
DELETE FROM team_membership mem
WHERE mem.player_id = NEW.player_id
AND mem.team_id = NEW.team_id
AND (mem.period -|- NEW.period OR mem.period && NEW.period)
AND mem.membership_id <> NEW.membership_id
RETURNING period
)
SELECT INTO NEW.period (
SELECT SUM(period) FROM (
SELECT NEW.period
UNION ALL
SELECT period FROM matching
ORDER BY period
) _all
);
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
CREATE TRIGGER merge_adjacent
BEFORE INSERT OR UPDATE ON team_membership
FOR EACH ROW EXECUTE PROCEDURE merge_adjacent();
The other thing to note about this construct is that it will only work on “already merged” data: if you had ranges:
[2019-01-01, 2019-01-04)
[2019-01-04, 2019-02-02)
# Note there is a gap here...
[2019-05-01, 2019-05-11)
[2019-05-11, 2020-01-01)
and you added in a value to the missing range:
INSERT INTO range (period) VALUES ('[2019-02-02, 2019-05-01)')
You would not merge all of the ranges, only those immediately adjacent. That is, you would wind up with rows:
[2019-01-01, 2019-01-04)
[2019-01-04, 2019-05-11)
[2019-05-11, 2020-01-01)
However, if this trigger is active on the table you would never get to the stage where your data was adjacent but not merged.
This post was updated on 2021-01-04 to ensure that updates to a single row will not result in an error.