Automatically expire rows in Postgres

2021-09-09 @ 14:16:36

here.

Here’s a fun idea: how to make a database table where the data is kept only for a certain period of time?

One solution could be to have a column updated_at, which is set to the current timestamp each time a row is updated. Then you need a scheduled task that periodically clears out rows older than the threshold.

We can do this in a single trigger function:

CREATE OR REPLACE FUNCTION keep_for() RETURNS TRIGGER AS $$
  DECLARE
    primary_key_name TEXT = (
        SELECT attname
        FROM pg_index
        JOIN pg_attribute ON
            attrelid = indrelid
            AND attnum = ANY(indkey)
        WHERE indrelid = TG_RELID AND indisprimary
    );
    primary_key_value TEXT;
  BEGIN
    IF TG_WHEN <> 'BEFORE' THEN
      RAISE EXCEPTION 'keep_for() may only run as a BEFORE trigger';
    END IF ;

    IF TG_ARGV[0]::INTERVAL IS NULL THEN
      RAISE EXCEPTION 'keep_for() must be installed with an INTERVAL to keep data for';
    END IF;

    NEW.updated_at = now();

    IF TG_OP = 'INSERT' THEN
      EXECUTE 'DELETE FROM ' || quote_ident(TG_TABLE_NAME) || ' WHERE updated_at < now() - INTERVAL ' || quote_literal(TG_ARGV[0]::TEXT) || ';';
    ELSIF TG_OP = 'UPDATE' THEN

      EXECUTE 'DELETE FROM ' || quote_ident(TG_TABLE_NAME)
                             || ' WHERE updated_at < now() - INTERVAL ' || quote_literal(TG_ARGV[0]::TEXT)
                             || ' AND ' || quote_ident(primary_key_name) || ' <> ' || quote_literal(row_to_json(NEW) ->> primary_key_name) || ';';
    END IF;

    RETURN NEW;
  END;
$$ LANGUAGE plpgsql;

Now, we can install this on a table (that has a single primary key column):

CREATE TRIGGER keep_for_one_week
BEFORE UPDATE OR INSERT ON <target_table>
FOR EACH ROW
EXECUTE PROCEDURE keep_for('1 week');

This only works on tables that are updated somewhat regularly, and that are not too big. You’ll also want an index on that updated_by column.

This was a thought experiment (which I did implement), until my co-worker pointed out that there was no reason to actually store this data in the database to begin with, since we already use Redis we can just set the TTL on the key when we set it, and it will automatically expire without us having to do extra work.