Adding JSON(B) operators to PostgreSQL

This is a followup/replacement post for this older post. In it, I discussed a way to use PL/python to perform function (and in turn operations) on JSON data. Don’t do that, it’s way too slow.

It’s actually possible, using raw SQL (and some handy functions) to perform some of the operations that are “missing” from the JSON(B) datatypes in Postgres.

In all cases, I’ll work with just JSONB as the input/output formats. In practice, when I first wrote these, I wrote up to four versions of each (JSON+JSON, JSON+JSONB, JSONB+JSON, JSONB+JSONB). I believe it’s possible to write polymorphic versions of these functions, but I’m not that familiar with them just yet.

We’ll start with concatenation: basically joining two JSONB objects into one. This is a good one to start with, as operands must be either JSON or JSONB, so all forms of this function are the same, just with different functions or casting of operands.

Before I begin, I’ll mention a post I saw on Michael Paquier’s excellent blog: Manipulating jsonb data by abusing of key uniqueness. In it, Michael uses the json_object_agg function to build up a JSON object from a query:

CREATE FUNCTION "json_append" (jsonb, jsonb) RETURNS jsonb AS $$

WITH json_union AS
(
  SELECT * FROM jsonb_each($1)
  UNION ALL
  SELECT * FROM jsonb_each($2)
)
SELECT json_object_agg(key, value) FROM json_union;

$$ LANGUAGE SQL;

This seems like a good idea, however it actually performs around two orders of magnitude slower than just iterating over the objects and building them up using string_agg and ||:

CREATE FUNCTION "json_concatenate" (jsonb, jsonb) RETURNS jsonb AS $$

SELECT ('{' || string_add(to_json("key")::text || ':' ||"value", ',') || '}')::jsonb
FROM (
  SELECT * FROM jsonb_each($1) UNION ALL SELECT * jsonb_each($2)
);

$$ LANGUAGE SQL;

CREATE OPERATOR || (
  LEFTARG = jsonb,
  RIGHTARG = jsonb,
  PROCEDURE = jsonb_concatenate
);

It seems that this is not just because of the Common Table Expression (although, using a CTE does make the second function perform just as poorly).

=# SELECT * FROM BENCHMARK(10000,
  'jsonb_concatenate(''{"a": 1, "b":2}''::jsonb, ''{"a":2}''::jsonb)',
  'jsonb_append(''{"a": 1, "b":2}''::jsonb, ''{"a":2}''::jsonb)'
  );
           code              |  runtime   | corrected
-----------------------------+------------+------------
 [Control]                   | 0.00550699 |          0
 jsonb_concatenate(...)      | 0.00652981 | 0.00102282
 jsonb_append(...)           |   0.484099 |   0.478592

My attempt at reimplementing the json_object_agg aggregate function in SQL proved even slower. Not at all surprising.

The next one we will tackle is the - operator from Hstore.

hstore - text     : delete key from left operand
hstore - text[]   : delete keys from left operand
hstore - hstore   : delete matching pairs from left operand

We can reimplement these for JSON: first as functions, and then create operators using those if we want. What is interesting is that we use the same construction pattern for our output JSONB object, however, I don’t seem to be able to figure out how to extract this out into another function.

CREATE OR REPLACE FUNCTION "jsonb_subtract"(
  "json" jsonb,
  "remove" TEXT
)
  RETURNS jsonb
  LANGUAGE sql
  IMMUTABLE
  STRICT
AS $function$
SELECT CASE WHEN "json" ? "remove" THEN COALESCE(
  (SELECT ('{' || string_agg(to_json("key")::text || ':' || "value", ',') || '}')
     FROM jsonb_each("json") -- Until this function is added!
    WHERE "key" <> "remove"),
  '{}'
)::jsonb
ELSE "json"
END
$function$;

CREATE OPERATOR - (
  LEFTARG = jsonb,
  RIGHTARG = text,
  PROCEDURE = jsonb_subtract
);

You’ll notice that there’s a test for if the key to remove is in the object first: this should be much faster in the situation where it doesn’t appear, as we then don’t need to recreate the object.

The other forms are quite similar, but the WHERE clause varies, and the initial test varies or is removed:

CREATE OR REPLACE FUNCTION "jsonb_subtract"(
  "json" jsonb,
  "keys" TEXT[]
)
  RETURNS jsonb
  LANGUAGE sql
  IMMUTABLE
  STRICT
AS $function$
SELECT CASE WHEN "json" ?| "keys" THEN COALESCE(
  (SELECT ('{' || string_agg(to_json("key")::text || ':' || "value", ',') || '}')
     FROM jsonb_each("json")
    WHERE "key" <> ALL ("keys")),
  '{}'
)::jsonb
ELSE "json"
END
$function$;

CREATE OPERATOR - (
  LEFTARG = jsonb,
  RIGHTARG = text[],
  PROCEDURE = jsonb_subtract
);

CREATE OR REPLACE FUNCTION "jsonb_subtract"(
  "json" jsonb,
  "remove" jsonb
)
  RETURNS jsonb
  LANGUAGE sql
  IMMUTABLE
  STRICT
AS $function$
SELECT COALESCE(
  (
    SELECT ('{' || string_agg(to_json("key")::text || ':' || "value", ',') || '}')
    FROM jsonb_each("json")
    WHERE NOT
      ('{' || to_json("key")::text || ':' || "value" || '}')::jsonb <@ "remove"
      -- Note: updated using code from http://8kb.co.uk/blog/2015/01/16/wanting-for-a-hstore-style-delete-operator-in-jsonb/
  ),
  '{}'
)::jsonb
$function$;

CREATE OPERATOR - (
  LEFTARG = jsonb,
  RIGHTARG = jsonb,
  PROCEDURE = jsonb_subtract
);

There is also the #= operator for hstore: but this seems to be record #= hstore, rather than hstore #= hstore. I’m not sure how to implement this, but I can implement a jsonb #= jsonb function:

CREATE OR REPLACE FUNCTION "jsonb_update_only_if_present"(
  "json" jsonb,
  "other" jsonb
)
  RETURNS jsonb
  LANGUAGE sql
  IMMUTABLE
  STRICT
AS $function$

SELECT COALESCE(
  (SELECT ('{' || string_agg(to_json("key")::text || ':' || "value", ',') || '}')
     FROM (SELECT * FROM jsonb_each("json") UNION ALL SELECT * FROM jsonb_each("other")) AS a
     WHERE "json" ? "key"::text
  ),
  '{}'
)::jsonb
$function$;

CREATE OPERATOR #= (
  LEFTARG = jsonb,
  RIGHTARG = jsonb,
  PROCEDURE = jsonb_update_only_if_present
);

Finally, there are operators %% (convert to array of alternating key/value pairs), and %# (convert to 2-dimensional array of keys, values).

I haven’t figured out a way to create these yet either.


So, that’s what I’ve got so far. Obviously, these will be slower than pure C implementations however, we can run some benchmarks against the hstore operators for comparisons.

=# SELECT * FROM BENCHMARK(100000,
   ' ''{"a": 1, "b":2}''::jsonb || ''{"a":2}''::jsonb',
   ' ''a=>1, b=>2''::hstore || ''a=>2''::hstore'
   );
 
                     code                      |  runtime  |  corrected
-----------------------------------------------+-----------+-------------
 [Control]                                     | 0.0837061 |           0
  '{"a": 1, "b":2}'::jsonb || '{"a":2}'::jsonb | 0.0842681 | 0.000561953
  'a=>1, b=>2'::hstore || 'a=>2'::hstore       | 0.0843148 | 0.000608683
(3 rows)

Whoa. That’s actually pretty good!

And similar for subtract:

=# SELECT * FROM BENCHMARK(100000,
  ' ''{"a": 1, "b":2}''::jsonb - ''a''::text ',
  ' ''a=>1, b=>2''::hstore - ''a''::text '
);
                  code                  |  runtime  | corrected
----------------------------------------+-----------+------------
 [Control]                              | 0.0818689 |          0
  '{"a": 1, "b":2}'::jsonb - 'a'::text  |  0.083431 | 0.00156212
  'a=>1, b=>2'::hstore - 'a'::text      |  0.083159 | 0.00129008
(3 rows)

I might have to run these with some more complicated queries, and compare the results.

Python, postgres and jsonb

I maintain a json field for django, and was working today on getting the new (1.7+) lookup code to play nicely: in order for this to happen, you basically need to be running Postgres 9.4, and using a jsonb column. Otherwise, querying kind-of sucks.

After a significant amount of work, where I drift backwards and forwards between having old and new code working, I had an idea.

Some time ago I discovered that psycopg2 has really nice support for some custom types. Indeed, it’s super-easy to get it to handle UUID and json data. But it seems that it hasn’t yet been made to work with jsonb.

However, the registration process for handling the data makes it possible to do so, and trivial, since the serialised form will be essentially identical for both:

psycopg2.extras.register_json(
    conn_or_curs=None,
    globally=False,
    loads=None,
    oid=None,
    array_oid=None
)

Note the last two arguments. We can trick psycopg2 into using jsonb instead of json.

Is your database, execute:

SELECT oid, typarray FROM pg_type WHERE typname = 'jsonb';
-- oid      --> 3802
-- typarray --> 3807

(Syntax highlighting fail means I can’t include the actual results).

Your values may vary (I’m really not sure), but you’ll simply need to call register_json with the first two:

register_json(oid=3802, array_oid=3807)

Now, assuming you have a jsonb column, when you fetch data from it, it will already be turned into python objects.

Python 2.7.5 (default, Mar  9 2014, 22:15:05) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import psycopg2
>>> conn = psycopg2.connect("")
>>> cur = conn.cursor()
>>> cur.execute("SELECT * FROM jsonb_test;")
>>> data = cur.fetchone()
>>> data
(1, '{}')
>>> from psycopg2.extras import register_json
>>> register_json(oid=3802, array_oid=3807)
(<psycopg2._psycopg.type 'JSON' at 0x101713418>, <psycopg2._psycopg.type 'JSONARRAY' at 0x101721208>)
>>> cur.execute("SELECT * FROM jsonb_test;")
>>> data = cur.fetchone()
>>> data
(1, {})