Postgres multi-column unique index

Consider something that requires a “best match” facility, based on the geographic location. There may be a set of behaviours that apply when an object is in Australia, and a different set within the USA. If this is all you have, then it’s simple to just have those behaviours based on a single country column in your database.

But now consider that in some circumstances, it’s possible to have behaviours within a single state of Australia that should override the nationwide behaviours. For instance, Western Australia may just use the Australian rules, but South Australia has a custom set of rules. Likewise, California may use a custom set of rules, but Hawaii just uses the inherited USA rules.

To ensure that a consistent set of rules is applied, there could only be a single “Australia” rule set, but there could be an “Australia+South Australia” set stored. This could also be modelled, in a slightly more complicated fashion, by having two columns, country, and state, and requiring country to be unique if state is NULL, and both country and state to be unique together.

Now, consider that there could be a whole stack more levels, and some of these levels have different names (and different orders in the hierarchy) in different countries. For instance, New Zealand has regions, but Great Britian has Nations (England, Northern Ireland, Scotland and Wales), but also counties (or council areas in Scotland).

Django Localflavor has a whole bunch of these divisions, but there are only nine “names” for these:

  • nation
  • state
  • province
  • provincial district
  • prefecture
  • region
  • county
  • department
  • municipality
CREATE TABLE geo_table (
  id SERIAL PRIMARY KEY,
  country VARCHAR(2) NOT NULL,
  county TEXT,
  department TEXT,
  municipality TEXT,
  nation TEXT,
  prefecture TEXT,
  province TEXT,
  provincial_district TEXT,
  region TEXT,
  state TEXT
);
INSERT INTO geo_table(country, state)
VALUES ('AU', NULL), ('AU', 'SA');

So, to prevent having data in our database that could result in an inability to determine which rule set we should use for a given object, we want to have the following:

A unique constraint that applies to all columns that are not NULL

It could be possible to model this with a number of different unique constraints, but that doesn’t seem like heaps of fun to deal with.

However, postgres allows us to define “functional” indices, that is, they apply a function to some columns from the row, and use that as the stored value in the index.

We can write an expression that takes the whole row, turns it into JSON, removes the primary key column, and then any columns that are NULL, and use that as the index.

SELECT jsonb_strip_nulls(
         to_jsonb(geo_table) - 'id'::TEXT
       )
  FROM geo_table;
  {"country": "AU"}
  {"country": "AU", "state": "SA"}

So, that looks pretty good, but what’s going on here?

In this case, we take the row, and turn it into a JSONB object, remove the primary key (in this case, called id), and then remove any keys that have a NULL value.

Now, we can’t just use this expression in an index, because to_jsonb (and row_to_json) are not IMMUTABLE, because the value they turn a TIMESTAMPTZ into depends upon something that may not be fixed. However, because our data will only contain strings, which can be turned reliably into the same JSON value regardless of anything else. So, we’ll need to create a function:

CREATE OR REPLACE FUNCTION geo_unique_indexer(anyelement, pk TEXT)
RETURNS jsonb AS $$

    SELECT jsonb_strip_nulls(to_jsonb($1) - pk);

$$
LANGUAGE SQL IMMUTABLE;

You’ll notice that I had to use anyelement, as it’s not possible to have an SQL function refer to a record type. I’ve also made it configurable what the primary key column is, so it doesn’t depend on using the column “id”.

Finally, we can create the index:

CREATE UNIQUE INDEX geo_unique_match ON geo_table(
  geo_unique_indexer(geo_table, 'id')
);

Okay, that looks good. Now we can try to insert some values that should not be permitted. We already have a row with just Australia:

INSERT INTO geo_table(country, state) VALUES ('AU', NULL);
ERROR:  duplicate key value violates unique constraint "geo_unique_match"
DETAIL:  Key (geo_unique_indexer(geo_table.*, 'id'::text))=({"country": "AU"}) already exists.

…and another row with both Australia and SA.

INSERT INTO geo_table(country, state) VALUES ('AU', 'SA');
ERROR:  duplicate key value violates unique constraint "geo_unique_match"
DETAIL:  Key (geo_unique_indexer(geo_table.*, 'id'::text))=({"country": "AU", "state": "SA"}) already exists.

Finally, this one should succeed:

INSERT INTO geo_table(country, state) VALUES ('AU', 'NSW');

Excellent.