Versioning complex database migrations

Recently, I’ve been writing lots of raw SQL code that is either a complex VIEW, or a FUNCTION. Much of the time these will be used as the “source” for a Django model, but not always. Sometimes, there are complex functions that need to be run as a trigger in Postgres, or even a rule to execute when attempting a write operation on a view.

Anyway, these definitions are all code, and should be stored within the project they belong to. Using Django’s migrations you can apply them at the appropriate time, using a RunSQL statement.

Hovewer, you don’t really want to have the raw SQL in the migration file. Depending upon the text editor, it may not syntax highlight correctly, and finding the correct definition can be difficult.

Similarly, you don’t want to just have a single file, because to recreate the database migration sequence, it needs to apply the correct version at the correct time (otherwise, other migrations may fail to apply).

Some time ago, I adopted a policy of manually versioning these files. I have a pattern of naming, that seemed to be working well:

special_app/
  migrations/
    __init__.py
    0001_initial.py
    0002_update_functions.py
  sql/
    foo.function.0001.sql
    foo.function.0002.sql
    foo.trigger.0001.sql
    bar.view.0001.sql

The contents of the SQL files are irrelevant, and the migrations mostly so. There is a custom migration operation I wrote that loads the content from a file:

    LoadSQLScript('special_app', 'foo.function', 1)

The mechanics of how it works are not important.

So, this had been working well for several months, but I had a nagging feeling that the workflow was not ideal. This came to a head in my mind when I recognised that doing code review on database function/view changes was next to impossible.

See, the problem is that there is a completely new file each time you create a new version of a definition.

Instead, I’ve come up with a similar, but different solution. You still need to have versioned scripts for the purpose of historical migrations, but the key difference is that you don’t actually write these. Instead, you have something that notices that the “current” version of a definition is different to the latest version snapshot. You then also have a tool that copies the current version to a new snapshot, and creates a migration.

You can still modify a snapshot (for instance, if you’ve already created one, but it’s only in your current development tree), but mostly you don’t need to.

$ ./manage.py check
System check identified some issues:

WARNINGS:
?: (sql_helpers.W002) The versioned migration file for core: iso8601.function.sql is out of date,
and needs to be updated or a new version created.

Oh, thanks for that. Checking the file, I see that it does indeed need a new version:

$ ./manage.py make_sql_migrations core
...
Copied <project>/core/sql/iso8601.function.sql to version 0002

You still need to ensure that any dependencies between SQL files are dealt with appropriately (for instance, a function that relies on a newly added column to a view needs to have that view’s definition updated before the function can be updated). But this is a much smaller problem, and something that your database should complain about when you try to apply the migrations.

I haven’t packaged this up yet, it’s currently only an internal app, as I want to use it a bit for the next little while and see what else shakes out. I was pretty sure the way I was doing it before was “the best way” just after I thought that up, so we’ll see what happens.

JavaScript Array Widget

I’ve been making more and more use of the django.contrib.postgres classes, and will often store data in an ArrayField where appropriate.

There are two form fields that are supplied with Django for handling these types: one of which has the array values in a single text input (comma separated), and the other has a different text input element for each value.

However, the latter does not really work that well with a dynamic length array (it could work with up to N items, but in my case, I really don’t often have an N).

It could be possible to build similar functionality that you see with the Django Admin’s formset handling like here, however this turns out to be lots of mucking around.

It might be simpler to just have the simple array field rendered, and have JS bolted on that builds the dynamic list of text inputs based on this.

In this instance, I am actually storing the state in the widgets themselves: this means it’s relatively easy to add in the ability to re-order. I’ve done this with the Sortable library.

Django Dynamic Formsets

Django forms are one of the most important parts of the stack: they enable us to write declarative code that will validate user input, and ensure we protect ourselves from malicious input.

Formsets are an extension of this: they deal with a set of homogeous forms, and will ensure that all of the forms are valid independently (and possibly do some inter-form validation, but that’s a topic for a later day).

The Django Admin contains an implementation of a dynamic formset: that is, it handles adding and removing forms from a formset, and maintains the management for accordingly. This post details an alternative implementation.


A Formset contains a Form (and has zero or more instances of that Form). It also contains a “Management Form”, which has metadata about the formset: the number of instances of the form that were provided initially, the number that were submitted by the user, and the maximum number of forms that should be accepted.

A Formset has a “prefix”, which is prepended to each element within the management form:

<input type="hidden" name="prefix-INITIAL_FORM_COUNT" value="...">
<input type="hidden" name="prefix-TOTAL_FORM_COUNT" value="...">
<input type="hidden" name="prefix-MIN_NUM_FORM_COUNT" value="...">
<input type="hidden" name="prefix-MAX_NUM_FORM_COUNT" value="...">

Each Form within the Formset uses the prefix, plus it’s index within the list of forms. For instance, if we have a Formset that contains three forms, each containing a single “name” field, we would have something similar to:

<input type="text" name="prefix-0-name" value="Alice">
<input type="text" name="prefix-1-name" value="Bob">
<input type="text" name="prefix-2-name" value="Carol">

Note that the form’s prefix is <formset_prefix>-<form_index>.

To make a Formset dynamic, we just need to be able to add (and possibly remove, but there’s a little more complexity there) extra forms. The managment form needs to be updated to reflect this, and we need to ensure that the new form’s fields are named appropriately.

A Formset also contains an empty_form. This is an unbound form, where the form’s “index” is set to __prefix__. Thus, the empty form for the above formset might look somewhat like:

<input type="text" name="prefix-__prefix__-name" value="">

We can leverage this to allow us to have simpler code: instead of having to duplicate elements and remove the value, we can just duplicate the empty form, and replace the string __prefix__ with whatever the index of the newly created form should be.

Here’s an implementation that has no dependencies, but does make some assumptions: