Postgres Generated Columns
-
Comments:
- here.
A little while ago, I wrote about creating a nice way to have a Django ComputedField. It is pretty neat, except it needs to do some black magic to sniff up the stack to work around a limitation in the way a Ref/Col works in Django.
The way it works is that you define the expression in Python, and it evaluates it in the database, allowing you to query based on this, and have it automatically annotated on.
What it doesn’t do, however, is actually store that value in the database. Indeed, if you are actually querying on this column, you’d probably want to have a functional index that uses the same expression, so that the database can do a reasonable job of improving query times on that column.
New in Postgres 12 is a feature that really piqued my interest: Generated Columns.
These are basically what the ComputedField does, but at the database level. And, instead of it being an expression that is evaluated at query time, it is instead an expression that is evaluated at write time, and stored in an actual column (that could then have an index applied to it).
Let’s have a look at an example:
CREATE TABLE person (
person_id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
first_name TEXT,
last_name TEXT,
full_name TEXT GENERATED ALWAYS AS (
COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
) STORED
);
Again, I’m aware I’m failing to note at least one of the falsehoods programmers believe about names.
Notes about this:
- I’ve used the similar (preferred) syntax for generating the primary key.
- You must have the keyword STORED at the end of the column definition: or more specifically, the syntax must be
<column> <type> GENERATED ALWAYS AS (<expression>) STORED
. - You may only refer to other columns within the same row: similar to how a functional index would work.
- You may not refer to other generated columns: that would likely require parsing the expressions to determine which one to calculate first. I’d love to see postgres implement that at some point though!
So, let’s have a look at that with some data:
INSERT INTO person (first_name, last_name)
VALUES
('alice', 'aardvark'),
('bob', 'burger'),
('chuck', NULL),
(NULL, 'darris');
And when we query it:
SELECT * FROM person;
person_id │ first_name │ last_name │ full_name
------------------------------------------------------
1 │ alice │ aardvark │ alice aardvark
2 │ bob │ burger │ bob burger
3 │ chuck │ <NULL> │ chuck
4 │ <NULL> │ darris │ darris
(4 rows)
Oh, bother. We didn’t want the space before ‘darris’ (or the one you can’t see, after ‘chuck’). We’ll have to fix that in a sec.
So, what happens when we try to write to the full_name
column?
UPDATE person SET first_name = 'dave', full_name='foo' WHERE first_name IS NULL;
ERROR: column "full_name" can only be updated to DEFAULT
DETAIL: Column "full_name" is a generated column.
Okay, that’s nice to know. If the error was ignored, we could have just used a custom django field and ignored the value, but we’ll need something similar to how ComputedField prevents writing values. I’ll have to investigate that further.
But, back onto the fact I forgot to trim any leading or trailing spaces. It turns out that there is no way to alter the expression that is being used in a generated column. Which, when you think a little more about it, sort-of makes sense. At the very least, it would need to write new values to each column where the new value was different to the old value.
Instead, you need to drop the column, and re-add it with the correct expression. You’ll almost certainly want to do this in a transaction:
BEGIN;
ALTER TABLE person DROP COLUMN full_name;
ALTER TABLE person ADD COLUMN full_name TEXT
GENERATED ALWAYS AS (TRIM(
COALESCE(first_name, '') || ' ' ||
COALESCE(last_name, '')
)) STORED;
COMMIT;
And now we can query our table again:
SELECT * FROM person;
person_id │ first_name │ last_name │ full_name
------------------------------------------------------
1 │ alice │ aardvark │ alice aardvark
2 │ bob │ burger │ bob burger
3 │ chuck │ <NULL> │ chuck
4 │ <NULL> │ darris │ darris
(4 rows)
Sweet.