A few years ago, I wrote up some stuff about Postgres Composite Types in Django. Holy cow, that appears to be 5 years ago.
Anyway, it’s come up a bit recently on #postgresql on IRC, and I thought I might expand a little on how I’m currently using that concept, and some ideas that could be used to do more.
The composite type itself is quite straightforward: we store two values representing the opening time, and then the length of time that the business is open. This allows us to model things that go over midnight without having to worry about a bunch of checks about (start > finish), and whatever that means.
CREATE TYPE open_period AS (
start TIME,
length INTERVAL
);
We could have use a DOMAIN TYPE
to limit the length to less than or equal to 24 hours, however I’ll omit that for now.
From there, we can use the new type wherever we would use any other type: including in an array.
CREATE TABLE stores (
store_id SERIAL PRIMARY KEY,
name TEXT,
default_opening_hours open_period[7]
);
Nothing new here since the last post.
However, let’s look at coming up with a mechanism that prevents subsequent days from overlapping with one another. Since we have all of these in an array, we can write a single function that ensures the values are acceptable together. There are a couple of different approaches we could use. One would be to “materialise” the open periods, and then compare them to one another.
CREATE OR REPLACE FUNCTION materialise(open_period, DATE)
RETURNS TSRANGE AS $$
SELECT TSRANGE(
($2 || 'T' || $1.start || 'Z')::TIMESTAMP,
($2 || 'T' || $1.start || 'Z')::TIMESTAMP + $1.length
);
$$ LANGUAGE SQL STRICT IMMUTABLE;
CREATE OR REPLACE FUNCTION materialise(open_period)
RETURNS TSRANGE AS $$
SELECT materialise($1, '1979-01-01'::DATE);
$$ LANGUAGE SQL STRICT IMMUTABLE;
We have a version there that takes a specific day, but also one that just uses the epoch date. That may be useful later…
…but right now we want to be able to apply subsequent days to each item in the array, and then look for overlaps.
WITH default_opening_hours AS (
SELECT UNNEST(ARRAY[
('09:00', '08:00')::open_period, -- Monday, but we won't really use that today.
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '12:00')::open_period,
('09:00', '08:00')::open_period,
('10:00', '07:00')::open_period,
('11:00', '06:00')::open_period
]) AS hours
), materialised_opening_hours AS (
SELECT materialise(hours, (now() + INTERVAL '1 day' * row_number() OVER ())::DATE) AS hours
FROM default_opening_hours
), overlapping_periods AS (
SELECT hours && LEAD(hours, 1) OVER () AS overlap
FROM materialised_opening_hours
)
SELECT * FROM overlapping_periods WHERE overlap;
We don’t (at this point in time) really mind if the weekdays that the open periods refer to is the correct weekday: instead we just need to ensure that we have 7 consecutive days, with the sequence of open_periods materialised to the correct value based on the offset from the first day.
This is pretty close: it will find any overlaps between days, except for if the finish of the last day overlaps with the start of the next day. We can cheat a little to make that work:
WITH default_opening_hours AS (
SELECT UNNEST(ARRAY[
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '12:00')::open_period,
('09:00', '08:00')::open_period,
('10:00', '07:00')::open_period,
('11:00', '06:00')::open_period
]) AS hours
), materialised_opening_hours AS (
SELECT materialise(hours, (now() + INTERVAL '1 day' * row_number() OVER ())::DATE) AS hours
FROM default_opening_hours
UNION ALL
SELECT materialise((SELECT hours FROM default_opening_hours LIMIT 1),
(now() + INTERVAL '8 days')::DATE
)
), overlapping_periods AS (
SELECT hours && LEAD(hours, 1) OVER () AS overlap
FROM materialised_opening_hours
)
SELECT * FROM overlapping_periods WHERE overlap;
Let’s put a couple of values in there to see that the overlaps are detected:
WITH default_opening_hours AS (
SELECT UNNEST(ARRAY[
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '28:00')::open_period,
('09:00', '12:00')::open_period,
('09:00', '08:00')::open_period,
('10:00', '07:00')::open_period,
('11:00', '24:00')::open_period
]) AS hours
), materialised_opening_hours AS (
SELECT materialise(hours, (now() + INTERVAL '1 day' * row_number() OVER ())::DATE) AS hours
FROM default_opening_hours
UNION ALL
SELECT materialise((SELECT hours FROM default_opening_hours LIMIT 1),
(now() + INTERVAL '8 days')::DATE)
), overlapping_periods AS (
SELECT hours && LEAD(hours, 1) OVER () AS overlap
FROM materialised_opening_hours
)
SELECT * FROM overlapping_periods WHERE overlap;
overlap
─────────
t
t
(2 rows)
Now, we can bundle this up into a function that we can then use in a CHECK CONSTRAINT
(as we cannot use a subquery directly in a check constraint):
CREATE OR REPLACE FUNCTION find_subsequent_day_overlaps(open_period[])
RETURNS BOOLEAN AS $$
SELECT NOT EXISTS (
WITH materialised_opening_hours AS (
SELECT materialise(hours, (now() + INTERVAL '1 day' * row_number() OVER ())::DATE) AS hours
FROM unnest($1) hours
UNION ALL
SELECT materialise($1[1], (now() + INTERVAL '8 days')::DATE)
), overlapping_periods AS (
SELECT hours && LEAD(hours, 1) OVER () AS overlap FROM materialised_opening_hours
)
SELECT * FROM overlapping_periods WHERE overlap
)
$$ LANGUAGE SQL STRICT IMMUTABLE;
ALTER TABLE store
ADD CONSTRAINT prevent_default_opening_hours_overlap
CHECK (find_subsequent_day_overlaps(default_opening_hours));
And, now to check:
INSERT INTO stores (name, default_opening_hours) VALUES
(
'John Martins',
ARRAY[
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '12:00')::open_period,
('09:00', '08:00')::open_period,
('10:00', '07:00')::open_period,
('11:00', '06:00')::open_period
]
);
And with invalid data:
INSERT INTO stores (name, default_opening_hours) VALUES (
'Foo',
ARRAY[('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '08:00')::open_period,
('09:00', '12:00')::open_period,
('09:00', '08:00')::open_period,
('10:00', '07:00')::open_period,
('11:00', '24:00')::open_period]);
…which throws an exception:
ERROR: new row for relation "store" violates check constraint "prevent_default_opening_hours_overlap"
DETAIL: Failing row contains (2, Foo, {"(09:00:00,08:00:00)","(09:00:00,08:00:00)","(09:00:00,08:00:00...).
Righto, what other things might we want to do with these composite types?
Some businesses have a concept of “Day Parts”, for instance, within a single day we may want to look at a sub-set of that day. For instance, sales during Breakfast may have a different set of Key Performance Indicators than those during Lunch or Tea. So, we may want to store something like:
+------------+------------+-------------+
| Day Period | Start time | Finish time |
+============+============+=============+
| Breakfast | 06:00 | 10:00 |
| Lunch | 11:00 | 14:00 |
| Tea | 16:00 | 21:00 |
+------------+------------+-------------+
Again, it might make sense to store these as an open_period
instead, because they could go over midnight. We’ll also want the name to be unique per store, but that’s something we can do with a plain old unique index:
CREATE TABLE day_parts (
day_part_id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores(store_id),
name TEXT,
period OPEN_PERIOD
);
CREATE UNIQUE INDEX distinct_name_per_day_period ON day_parts (store_id, name)
We can use an exclusion constraint to prevent overlaps, however you may need to enable support first:
CREATE EXTENSION btree_gist;
Now, let’s see the exclusion constraint:
ALTER TABLE day_parts
ADD CONSTRAINT prevent_overlapping_day_parts
EXCLUDE USING gist(
materialise(period) WITH &&,
store_id WITH =
);
Turns out that is actually easier to implement than the values in the array!
The other thing we may want to do is annotate on the Day Period to an object of some sort. To do this we will need to materialise all of the day periods for the given day(s), and see which one of them our timestamp is within. We will expand on a couple of things here: specifically, we need to have a timezone within which our store is located. To make things easier to follow, we will have all of the DDL code anew. This is partly because this example will not use the concept of default opening hours.
CREATE TABLE stores (
store_id SERIAL PRIMARY KEY,
name TEXT UNIQUE NOT NULL,
timezone TEXT NOT NULL CHECK (now() AT TIME ZONE timezone IS NOT NULL)
-- Note we validate that this column contains a valid timezone by
-- attempting to coerce now() to that timezone: this will report
-- back an error if the timezone name is not recognised.
);
CREATE TABLE day_parts (
day_part_id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores (store_id),
name TEXT,
period OPEN_PERIOD,
CONSTRAINT prevent_overlapping_day_parts EXCLUDE USING gist(
materialise(period) WITH &&,
store_id WITH =
)
);
CREATE UNIQUE INDEX distinct_name_per_day_period ON day_parts(store_id, name);
CREATE TABLE transactions (
transaction_id SERIAL PRIMARY KEY,
store_id INTEGER REFERENCES stores (store_id),
timestamp TIMESTAMPTZ,
amount NUMERIC
);
And now add some data:
INSERT INTO stores (name, timezone)
VALUES ('John Martins', 'Australia/Adelaide');
INSERT INTO day_parts (store_id, name, period)
VALUES (1, 'Morning', ('09:00', '02:00')),
(1, 'Lunch', ('11:00', '03:00')),
(1, 'Afternoon', ('14:00', '03:00')),
(1, 'Evening', ('17:00', '04:00'));
INSERT INTO transactions (store_id, timestamp, amount)
VALUES (1, '2019-05-27T01:25:22', '33.77'),
(1, '2019-05-27T04:33:47', '724.75'),
(1, '2019-05-27T06:00:42', '47.48'),
(1, '2019-05-27T08:33:12', '3.44');
The first thing we want to do is show the transactions at the time it was in the store when they were completed:
SELECT transactions.*,
transactions.timestamp AT TIME ZONE stores.timezone AS local_time
FROM transactions
INNER JOIN stores USING (store_id)
transaction_id │ store_id │ timestamp │ amount │ local_time
1 │ 1 │ 2019-05-27 01:25:22+00 │ 33.77 │ 2019-05-27 10:55:22
2 │ 1 │ 2019-05-27 04:33:47+00 │ 724.75 │ 2019-05-27 14:03:47
3 │ 1 │ 2019-05-27 06:00:42+00 │ 47.48 │ 2019-05-27 15:30:42
4 │ 1 │ 2019-05-27 08:33:12+00 │ 3.44 │ 2019-05-27 18:03:12
Next, we want to annotate on which day part corresponds to that local time:
SELECT trans.*,
day_part.name AS day_part
FROM (
SELECT transactions.*,
transactions.timestamp AT TIME ZONE stores.timezone AS local_time
FROM transactions
INNER JOIN stores USING (store_id)
) trans
LEFT OUTER JOIN LATERAL (
SELECT materialise(day_parts.period, trans.local_time::DATE) AS day_part,
day_parts.name
FROM day_parts
WHERE day_parts.store_id = trans.store_id
) day_part ON (day_part @> local_time)
transaction_id │ store_id │ timestamp │ amount │ local_time │ day_part
────────────────┼──────────┼────────────────────────┼────────┼─────────────────────┼───────────
1 │ 1 │ 2019-05-27 01:25:22+00 │ 33.77 │ 2019-05-27 10:55:22 │ Morning
2 │ 1 │ 2019-05-27 04:33:47+00 │ 724.75 │ 2019-05-27 14:03:47 │ Afternoon
3 │ 1 │ 2019-05-27 06:00:42+00 │ 47.48 │ 2019-05-27 15:30:42 │ Afternoon
4 │ 1 │ 2019-05-27 08:33:12+00 │ 3.44 │ 2019-05-27 18:03:12 │ Evening
From there, we could look at aggregation within day parts, or comparisons between different days, but only the same day part.
Those of you paying attention may notice that I used TSRANGE
instead of TSTZRANGE
in the materialise functions. Can we look at a version of these functions that accepts a timezone as well as a date (and open_period), and gives back a TSTZRANGE?
CREATE OR REPLACE FUNCTION materialise(open_period, DATE, timezone TEXT)
RETURNS TSTZRANGE AS $$
SELECT TSTZRANGE(
($2 || 'T' || $1.start)::TIMESTAMP AT TIME ZONE timezone,
(($2 || 'T' || $1.start)::TIMESTAMP + $1.length) AT TIME ZONE timezone
);
$$ LANGUAGE SQL STRICT IMMUTABLE;
Now we can rewrite our last query:
SELECT transactions.*,
day_part.name AS day_part
FROM transactions
LEFT OUTER JOIN LATERAL (
SELECT materialise(day_parts.period, transactions.timestamp::DATE, stores.timezone) AS day_part,
day_parts.name
FROM day_parts
INNER JOIN stores USING (store_id)
WHERE day_parts.store_id = transactions.store_id
) day_part ON (day_part.day_part @> transactions.timestamp)
transaction_id │ store_id │ timestamp │ amount │ day_part
1 │ 1 │ 2019-05-27 01:25:22+00 │ 33.77 │ Morning
2 │ 1 │ 2019-05-27 04:33:47+00 │ 724.75 │ Afternoon
3 │ 1 │ 2019-05-27 06:00:42+00 │ 47.48 │ Afternoon
4 │ 1 │ 2019-05-27 08:33:12+00 │ 3.44 │ Evening
Although, I think this might be a bit harder to do aggregation per-day, because you’d still need to get the “local” timestamp to put them on the same day, although, that’s actually part of the materialisation of the store’s full open period anyway.