Best match in hierarchy using Postgres

2018-10-18 @ 10:33:51

here.

The other day, I had an issue where I needed to prevent duplicates in a subset of nullable columns for the purpose of modelling different regional breakdowns; I came up with quite a neat solution that uses JSONB functions.

What I didn’t realise until later was how this same function can be used to detect the “best match” of an object within these rows.

In this case, the best match is the row that has the most specificity (ie, the highest number of non-null values), but does not have any values that differ in the target object.

For instance, given the data (and the function defined in the aforementioned post):

{"country": "AU"}
{"country": "AU", "state": "SA"}
{"country": "NZ"}

We would want an object that is within Australia (AU) to match the first row, unless it also was within South Australia (SA), when it should match the second row.

We can use the json containment operators to query this match:

SELECT geo_unique_indexer(geo_table, 'id')
  FROM geo_table
 WHERE geo_unique_indexer(geo_table, 'id') <@ '{"country": "AU", "state": "SA"}';

Okay, that’s really close: it shows all “parent” data:

{"country": "AU"}
{"country": "AU", "state": "SA"}

So, we need to ensure that only the most specific one of these is actually returned:

SELECT geo_unique_indexer(geo_table, 'id')
  FROM geo_table
 WHERE geo_unique_indexer(geo_table, 'id') <@ '{"country": "AU", "state": "SA"}';
 ORDER BY (SELECT count(*) FROM JSONB_OBJECT_KEYS(geo_unique_indexer(geo_table, 'id'))) DESC
 LIMIT 1;

And, presto, we have our best match:

{"country": "AU", "state": "SA"}