Horizontal Partitioning in Postgres

2014-10-28 @ 22:00:07

here.

It never surprises me when I find another neat feature of Postgres that makes doing a potentially difficult task simple. Today, I discovered that since 9.0, Postgres has supported a really powerful way to horizontally partition data into separate tables.

For those who haven’t heard of the concept before, horizontal partitioning is where different rows are stored in different tables, depending upon something about the data within the row.

For instance, we could partition audit data into tables based upon the timestamp of the action. Thus, all entries created during October, 2014 would be stored in a table called audit_2014_10, and entries created during March 2011 would be stored in a table called audit_2011_03. And so on. Alternatively, we could have a single table per year, or however we want to partition.

This is called “Range” partitioning.

There are a couple of ways you could think about doing this in a DBMS. You could have a writable view that redirects the writes to the correct table. The problem then is that when you add a new child table, you need to rewrite your view.

Instead, we can use Postgres’ neat table inheritance to handle all of this for you. Indeed, it is discussed in the Postgres documentation.

If we inherit one table from another, and do a query on the parent table, we will also get the rows that match the query from all child tables. That obviates the need for a view that uses UNION ALL or similar to fetch the data.

I’m going to use a toy example here, that just contains a single column.

CREATE TABLE "data" ("value" TIMESTAMPTZ);

CREATE TABLE "data_2014" (
  CHECK (
    "value" >= '2014-01-01' AND "value" < '2015-01-01')
) INHERITS ("data");

CREATE TABLE "data_2015" (
  CHECK (
    "value" >= '2015-01-01' AND "value" < '2016-01-01')
) INHERITS ("data");

This, however, is only part of the picture. Any data that is added to either of the child tables (or indeed the parent table, but we won’t be doing that), will be returned when we query the parent table.

But we want to ensure that data is partitioned out nicely. For that, we can use a trigger.

A naïve trigger may look something like:

CREATE OR REPLACE FUNCTION data_insert_trigger()
RETURNS TRIGGER AS $$

BEGIN

  IF (NEW.value >= '2014-01-01' AND NEW.value < '2015-01-01') THEN
    INSERT INTO data_2014 VALUES (NEW.*);
  ELSIF (NEW.value >= '2015-01-01' AND NEW.value < '2016-01-01') THEN
    INSERT INTO data_2015 VALUES (NEW.*);
  ELSE
    RAISE EXCEPTION 'Date out of range. Please fix the data_insert_trigger() function.';
  END IF;

  RETURN NULL;
END;

$$ LANGUAGE plpgsql;

As you can see by the ELSE clause, we will actually need to do maintainence on this function as we start to get data that falls outside of our existing ranges. We will also need to create a new table for those rows.

It would be nice if we could automatically create tables that are missing, and handle any arbitrary values.

CREATE OR REPLACE FUNCTION data_insert_trigger()
RETURNS TRIGGER AS $$

DECLARE
  table_name text;
  year integer;
  start text;
  finish text;

BEGIN
  year := date_part('year', NEW.value);
  table_name := 'data_' || year;
  start := year || '-01-01';
  finish := (year + 1) || '-01-01';

  PERFORM 1 FROM pg_tables WHERE tablename = table_name;

  IF NOT FOUND THEN
    EXECUTE
      'CREATE TABLE '
      || quote_ident(table_name)
      || ' (CHECK ("value" >= '
      || quote_literal(start)
      || ' AND "value" < '
      || quote_literal(finish)
      || ')) INHERITS (data)';
  END IF;

  EXECUTE
    'INSERT INTO '
    || quote_ident(table_name)
    || ' VALUES ($1.*)'
  USING NEW;

  RETURN NULL;
END;

$$ LANGUAGE plpgsql;

CREATE TRIGGER data_insert_trigger
BEFORE INSERT ON data
FOR EACH ROW EXECUTE PROCEDURE data_insert_trigger();

You would also want to create any indexes on the child tables, as this is where they need to be, rather than the parent table.

This function is pretty neat: it first stores what the table name should be in a variable, as well as the two bounds for this table (start and finish). Then, we see if that table exists, and if not, create it. Finally, we then insert the values into the correct child table. I’m not sure I’d recommend using it as-is: it’s quite possibly subject to a race condition if two new records came in at the same time.

The one thing that was concerning me was that DDL changes to the parent table would not propagate to the child tables: however this turned out to not be an issue at all. Since I mostly use Django, I want as little hard stuff that would require custom migration operations.

ALTER TABLE data ADD COLUMN confirmed BOOLEAN DEFAULT false;

The other thing worth noting is that Postgres will do a really good job of limiting the tables that are accessed to those that contain the relevant data:

INSERT INTO data VALUES
  ('2014-01-06 09:00:00'),
  ('2015-01-09 12:00:00'),
  ('2016-02-22 15:39:00');

EXPLAIN
SELECT * FROM data
WHERE value > '2014-01-01'
AND value < '2014-07-01';

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Append  (cost=0.00..42.10 rows=12 width=8)
   ->  Seq Scan on data  (cost=0.00..0.00 rows=1 width=8)
         Filter: ((value > '2014-01-01 00:00:00'::timestamp with time zone) AND
                  (value < '2014-07-01 00:00:00'::timestamp with time zone))
   ->  Seq Scan on data_2014  (cost=0.00..42.10 rows=11 width=8)
         Filter: ((value > '2014-01-01 00:00:00'::timestamp with time zone) AND
                  (value < '2014-07-01 00:00:00'::timestamp with time zone))
 Planning time: 0.292 ms
(6 rows)

We see that only the data_2014 table is hit by the query. If your constraints do something like cast to DATE, then this may not happen. This was causing me some concern earlier, but letting Postgres coerce the data types fixed it.

However, you can’t use a tstzrange to query if you want these constraints to help the query planner:

-- Actually hits every table.
SELECT * FROM data WHERE value <@ '[2014-01-01, 2014-07-01)'::tstzrange;

It’s worth noting that if you change a value that would cause that row to belong in a different partition, this will fail.

There are moves afoot to have this feature more tightly integrated into Postgres, perhaps using a syntax like:

CREATE TABLE data (value TIMESTAMPTZ)
PARTITION BY RANGE (value)
(PARTITION data_2014 VALUES LESS THAN '2015-01-01');

CREATE PARTITION data_2015 ON data VALUES LESS THAN '2016-01-01';

It’s not clear to me how you actually define the range. It also seems counter-productive to have to manually create the partition tables.

There are also tools that might be useful to handle the heavy lifting, like pg_partman. I haven’t used this, but it looks interesting.