Multiple Homebrew Pythons

The Homebrew project brings a really nice package installer to OS X, allowing you to install command line programs in a really simple way. For instance, to install the latest version of Python 3, you would do:

$ brew install python3

Because not all projects are as aware of old versions as python, when brew upgrades a package, it removes the old versions linked binaries and support files. This is actually not a good thing for python: it means you can no longer access the older interpreter.

Python keeps version-named interpreters, and then just symlinks the most recently installed to the python executable. Thus, it’s not uncommon to see, for python2:

$ ls -1 /usr/bin/python*

This means, if you want to run an older version (for instance, say you use tox and want to do some testing against a range of versions), you can just use:

$ python2.5
Python 2.5.6 (r256:Unversioned directory, Mar  9 2014, 22:15:03) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

But Homebrew breaks this.

However, if you have the old versions already installed, you can easily recreate the symlinks. Indeed, here is a script that will visit all homebrew installed python3 versions, creating

cd /usr/local/Cellar/python3/

for VERSION in `ls`
  find bin -name \*3\* -exec ln -s -f `pwd`/{} /usr/local/{} \;
  find lib -name \*python\* -maxdepth 1 -exec ln -s -F `pwd`/{} /usr/local/{} \;
  find share -name python\* -exec ln -s -f `pwd`/{} /usr/local/{} \;
  find Frameworks -name 3.\* -exec ln -s -f `pwd`/{} /usr/local/{} \;
  cd ..

It worked for me for python3 with the following versions installed:

  • 3.2.3
  • 3.3.3
  • 3.4.1

Now I just need to figure out how to get Homebrew to download and build specific versions of packages.

Adding JSON operators to PostgreSQL

Don’t use the code in this post. Instead, read: Adding JSON(B) operators to PostgreSQL

Notably, the new JSONB data type in postgres is missing some of the features that the hstore data type has had for some time.

hstore - text     : delete key from left operand
hstore - text[]   : delete keys from left operand
hstore - hstore   : delete matching pairs from left operand

However, using two awesome features of Postgres, it’s possible to add these operators in.

Firstly, python as a language in postgres:

matt=# CREATE LANGUAGE plpythonu;

Then, you can write functions in python, that import standard system libraries, like json.

CREATE OR REPLACE FUNCTION json_subtract(json_object json, key text) RETURNS json AS $body$
import json
data = json.loads(json_object)
data.pop(key, None)
return json.dumps(data)
$body$ LANGUAGE plpythonu;

Finally, you can now overload the operator to get the syntactic sugar you want:

  LEFTARG = json,
  RIGHTARG = text,
  PROCEDURE = json_subtract

Now, you can use the same syntax as for hstore:

matt=# SELECT '{"a":1, "b":2}'::json - 'a'::text;
 {"b": 2}
(1 row)

It’s possible to repeat these for the other subtraction operators:

CREATE OR REPLACE FUNCTION json_subtract(json_object json, keys text[]) RETURNS json AS $body$
import json
data = json.loads(json_object)
for key in keys:
    data.pop(key, None)
return json.dumps(data)
$body$ LANGUAGE plpythonu;

  LEFTARG = json,
  RIGHTARG = text[],
  PROCEDURE = json_subtract

CREATE OR REPLACE FUNCTION json_subtract(json_object json, pairs json) RETURNS json AS $body$
import json
data = json.loads(json_object)
pairs_data = json.loads(pairs)
for key,value in pairs_data.items():
  if key in data and data[key] == value:
return json.dumps(data)
$body$ LANGUAGE plpythonu;

  LEFTARG = json,
  RIGHTARG = json,
  PROCEDURE = json_subtract

I’ll leave it as an exercise to write functions for the other operators.

In-Database Audit Trail

I’ve been thinking about audit trails, object versioning and the like a bit lately. We have situations where it’s actually important to be able to know what changes were made, when, and by whom.

The tool we are currently using in Django for this is django-reversion, but I’m not that happy with it. Part of the problem is that it adds a fair amount of overhead into each request, which we have noticed, and also that querying the data it produces is not simple, unless you are just looking for changes made to a specific object.

For our case, we want to know all the changes made to a set of objects (that may not all be of the same type), made within a given range, where there is (eventually) a foreign key relationship back to our Company model.

Part of the problem is just that: a custom solution could store the Company relationship explicitly, but also, we really care about what the changes were, rather than what the object looks like after the changes. Well, technically, we want both.

However, it has gotten me thinking about other ways to solve this problem.

In most DBMSs, it’s possible to get a trigger function to run when an update happens to each row, which makes it possible to get access to this information. Indeed, googling “Postgres audit trigger” pops up some hits that are highly relevant. I looked at Audit trigger 91plus, since it’s “improved”. It has some really nice features like storing the audit data in an hstore column, which means you can query it.

However, one place where this type of trigger breaks down is that it’s not generally possible to get the “application user” associated with a request, only the “database user”, which for our system (and most other web applications) is fixed for all access.

One way to get around this might be to, at the start of every database transaction, inject a call that creates a temporary table, with the extra bits of data that you want to log, and then use that in the trigger function.

  "_app_user" (user_id integer, ip_address inet);

Then we need to add (or update) the one row that will contain our data. We must ensure that we only ever have one row in this table.

UPDATE _app_user SET user_id=%s, ip_address=%s;
INSERT INTO _app_user (user_id, ip_address)

This code will ensure that the first statement (UPDATE) will affect all rows in the table (of which there will be at most one), and the second statement (INSERT ... SELECT ... WHERE NOT EXISTS ...) will only create a new row if there are no rows currently in the table.

It’s up to you to then pass the correct data to this. I’m currently looking at doing this using Django middleware, although I suspect this may fall down using the newer transaction handling, as otherwise we could have just ensured our middleware ran after the TransactionMiddleware. It may be possible to do it with a custom database backend, but it needs to somehow get access to the request object (which contains the user, and the ip address). Obviously, you could log other data about the request, too.

The final part of the puzzle is to inject this data into the row that will be used for the audit table entry. I modified the table definition so it included columns for the data I wanted: app_user_id and app_ip_address.

Then, inside the actual trigger function, after the audit_row object has been created, but before it is written to the table, we inject the data we want.

We need to be a little careful, as it’s possible the table does not exist:

    n.nspname, c.relname 
    pg_catalog.pg_class c 
    pg_catalog.pg_namespace n
  ON n.oid = c.relnamespace
    n.nspname like 'pg_temp_%' 
    c.relname = '_app_user';

    FOR r IN SELECT * FROM _app_user LIMIT 1 LOOP
      audit_row.app_user_id = r.user_id;
      audit_row.app_ip_address = r.ip_address;
    END IF;

This checks to see if the _app_user table exists in any of the valid temporary table namespaces, and if so, grabs the first (and only, from above) entry, using the values to update the row.

This function then works: if there is a temporary table with this name, it uses these fields when creating the audit, if not, it creates the audit row with empty values. This would mean that some audit statements may not contain proper user data, but in the case of Django, it’s possible to make changes outside of the request-response cycle. You could require that a user starting a shell session authenticates with a valid django username+password, but that still leaves management commands. I guess you could have a system account, but leaving these entries blank is like an explicit system user.

I haven’t got any production code using anything like this: I’d still want to test that it works as expected as part of the request, and would want to build up some method of querying it. There’s probably no reason you couldn’t do the table definition as a Django model (and indeed, have the function definition as a migration).