Using Postgres Composite Types in Django
-
Comments:
- here.
Note: this post turned out to be far more complicated than I had hoped. I may write another one that deals with a less complicated type!
Postgres comes with a pretty large range of column types, and the ability to use these types in an ARRAY. There’s also JSON(B) and Hstore, which are useful for storing structured (but possibly varying) data. Additionally, there are also a range of, well, range
types.
However, sometimes you actually want to store data in a strict column, but that isn’t a simple scalar type, or one of the standard range types. Postgres allows you to define your own composite types.
There is a command CREATE TYPE
that can be used to create an arbitrary type. There are four forms: for now we will just look at Composite Types.
We will create a Composite type that represents the opening hours for a store, or more specifically, the default opening hours. For instance, a store may have the following default opening hours:
+------------+--------+---------+
| Day | Open | Close |
+------------+--------+---------+
| Monday | 9 am | 5 pm |
| Tuesday | 9 am | 5 pm |
| Wednesday | 9 am | 5 pm |
| Thursday | 9 am | 9 pm |
| Friday | 9 am | 5 pm |
| Saturday | 10 am | 5 pm |
| Sunday | 11 am | 5 pm |
+------------+--------+---------+
During the Christmas season this store may be open longer (perhaps even 24 hours). There may also be differences at Easter time, or other public holidays, where the store is closed, or closes early.
It would be nice to be able to store the default opening hours for a store, and then, when creating a week, use these to create concrete (TIMESTAMP
) values for each day, which could be overridden on any given day.
There are a few ways we could model this. Postgres has no timerange
type, so that’s out. We could create a RANGE
type, or we could use (start-time
, finish-time
). But what about when a store is open after midnight, or for 24 hours? Storing this data implicitly is a real pain, because you need to always check to see if the finish time is less than (or equal to) the start time whenever doing anything. Trust me, this is not the best approach.
An alternative I’ve been toying with is (start-time
, interval
). You could limit it so that the interval’s maximum is '1 day'
, but not (from what I can tell) when you define the type. Anyway, the syntax for creating this type is:
CREATE TYPE opening_hours AS (
start time,
length interval
);
As an aside, every table in the database also has an associated type (of the same name as the table).
Now, we have our type: we can use it in a table:
CREATE TABLE store (
store_id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE default_opening_hours (
store_id INTEGER REFERENCES store (store_id),
monday opening_hours,
tuesday opening_hours,
wednesday opening_hours,
thursday opening_hours,
friday opening_hours,
saturday opening_hours,
sunday opening_hours
);
An alternative way of storing this information might be to use an array of opening_hours
, directly on the store model. We’ll use this one instead, as it’s a little neater (and means we will look at how to use opening_hours[]
later too).
CREATE TABLE store (
store_id SERIAL PRIMARY KEY,
name TEXT,
default_opening_hours opening_hours[7]
);
Now, we can put data in there:
INSERT INTO store (name, default_opening_hours) VALUES
(
'John Martins',
ARRAY[
('09:00', '08:00')::opening_hours,
('09:00', '08:00')::opening_hours,
('09:00', '08:00')::opening_hours,
('09:00', '12:00')::opening_hours,
('09:00', '08:00')::opening_hours,
('10:00', '07:00')::opening_hours,
('11:00', '06:00')::opening_hours
]
);
Note how we need to cast all of the values from record
to opening_hours
.
In practice, we would probably also want to have some type of restriction where the opening time from one day, plus the default open hours is less than or equal to the starting time on the next day. I’m still not sure of the best way to do this in Postgres, but it is possible to do it in Django.
Speaking of Django, we want to be able to access this data type there. We can leverage a really nice feature of Psycopg2 to have these values automatically turned into a Python namedtuple. We do this by registering the type within Psycopg2, using the Django cursor.
from django.db import connection
from psycopg2.extras import register_composite
register_composite('opening_hours', connection.cursor().cursor)
But, this is only half of the pattern. We also need to register an adapter so that values going back the other way are also automatically cast into opening_hours
.
from django.db import connection
from psycopg2.extras import register_composite
from psycopg2.extensions import register_adapter, adapt, AsIs
# Get a reference to the namedtuple class
OpeningHours = register_composite(
'opening_hours',
connection.cursor().cursor,
globally=True
).type
def adapt_opening_hours(value):
return AsIs("(%s, %s)::opening_hours" % (
adapt(value.start).getquoted(),
adapt(value.length).getquoted()
))
register_adapter(OpeningHours, adapt_opening_hours)
Now, we can fetch data from the database, and know that we will get OpeningHours
instances, and, when passing an OpeningHours
instance back to the database, know it will be converted into the correct type.
Obviously, in order to do this, the type must exist in the database. We did that manually in this case. In a real situation you would want to do that as a database migration. And that is where things get tricky. You can’t run the register_adapter
function until the type exists in the database. I did come up with a relatively neat workaround for this when writing a framework for generic Composite fields, where the registration of the composite type attempts to execute, and if it fails, it stores the data for later registration, and then the actual migration operation fires off a signal, which is handled by a listener that actually performs the registration.
The final piece of the puzzle is the Django Field subclass, which is actually not that complicated. In essence, we are relying on Psycopg to handle the adaptation in both directions, so it can be a bare field (perhaps with a formfield
method to get a custom form field). In practice, I wrote the generic CompositeField subclass, which uses some metaclass magic to handle the late registration:
from django.db.models import fields
from django.db import connection
from django.dispatch import receiver, Signal
from psycopg2.extras import register_composite
from psycopg2.extensions import register_adapter, adapt, AsIs
from psycopg2 import ProgrammingError
_missing_types = {}
class CompositeMeta(type):
def __init__(cls, name, bases, clsdict):
super(CompositeMeta, cls).__init__(name, bases, clsdict)
cls.register_composite()
def register_composite(cls):
db_type = cls().db_type(connection)
if db_type:
try:
cls.python_type = register_composite(
db_type,
connection.cursor().cursor,
globally=True
).type
except ProgrammingError:
_missing_types[db_type] = cls
else:
def adapt_composite(composite):
return AsIs("(%s)::%s" % (
", ".join([
adapt(getattr(composite, field)).getquoted() for field in composite._fields
]), db_type
))
register_adapter(cls.python_type, adapt_composite)
class CompositeField(fields.Field):
__metaclass__ = CompositeMeta
"""
A handy base class for defining your own composite fields.
It registers the composite type.
"""
composite_type_created = Signal(providing_args=['name'])
@receiver(composite_type_created)
def register_composite_late(sender, db_type, **kwargs):
_missing_types.pop(db_type).register_composite()
We also want to have a custom migration operation:
from django.db.migrations.operations.base import Operation
# Or wherever the code above is located.
from .fields.composite import composite_type_created
class CreateCompositeType(Operation):
def __init__(self, name=None, fields=None):
self.name = name
self.fields = fields
@property
def reversible(self):
return True
def state_forwards(self, app_label, state):
pass
def database_forwards(self, app_label, schema_editor, from_state, to_state):
schema_editor.execute('CREATE TYPE %s AS (%s)' % (
self.name, ", ".join(["%s %s" % field for field in self.fields])
))
composite_type_created.send(sender=self.__class__, db_type=self.name)
def state_backwards(self, app_label, state):
pass
def database_backwards(self, app_label, schema_editor, from_state, to_state):
schema_editor.execute('DROP TYPE %s' % self.name)
This is a bit manual, however. You need to create your own migration that creates the composite type, and then begin to use the field.
# migrations/XXXX_create_opening_hours.py
class Migration(migrations.Migration):
dependencies = []
operations = [
CreateCompositeType(
name='opening_hours',
fields=[
('start', 'time'),
('length', 'interval')
],
),
]
The place this pattern falls down is that this migration must be manually created: we don’t have any way to automatically create the migration from the Field subclass, which just looks like:
class OpeningHoursField(CompositeField):
def db_type(self, connection):
return 'opening_hours'
def formfield(self, **kwargs):
defaults = {
'form_class': OpeningHoursFormField
}
defaults.update(**kwargs)
return super(OpeningHoursField, self).formfield(**defaults)
I think in the future I’ll attempt to use further metaclass magic to allow defining the fields of the Composite type. This could then be used to automatically create a form field (which is a subclass of forms.MultiValueField
).
class OpeningHoursField(CompositeField):
start = models.DateField()
length = IntervalField()
def db_type(self, connection):
return 'opening_hours'
However, in the meantime, we can still get by. I’m not sure it’s going to be possible to inject extra operations into the migration based upon the field types anyway.
Finally, we can use this in a model:
class Store(models.Model):
store_id = models.AutoField(primary_key=True)
name = models.CharField(max_length=128)
default_opening_hours = ArrayField(
base_field=OpeningHoursField(null=True, blank=True),
size=7
)
I’ve used the ArrayField
from django.contrib.postgres, purely for illustration purposes.
The CompositeField
and associated operation are part of my django-postgres project: once I have worked out some more kinks, I may submit a pull request to django.contrib.postgres
, unless someone else beats me to it.
Oh, and a juicy little extra. Above I mentioned something about preventing overlaps. The logic I use in my form is:
from django import forms
from django.utils.translation import string_concat, ugettext_lazy as _
import postgres.forms
from .fields import OpeningHoursFormField
from .models import Store
def finish(obj):
"Given an OpeningHours value, get the finish time"
date = datetime.date(1, 1, 1)
return (datetime.datetime.combine(date, obj.start) + obj.duration).time()
class StoreForm(forms.ModelForm):
OVERLAPS_PREVIOUS = _('Open hours overlap previous day.')
default_opening_hours = postgres.forms.SplitArrayField(
base_field=OpeningHoursFormField(required=False),
size=7,
)
class Meta:
model = Store
def clean_default_opening_hours(self):
opening_hours = self.cleaned_data['default_opening_hours']
field = self.fields['default_opening_hours']
# Ensure consecutive days do not overlap.
errors = []
for i in range(7):
today = opening_hours[i]
if today.start is None or today.duration is None:
continue
yesterday = opening_hours[(i + 6) % 7]
if yesterday.start is None or yesterday.duration is None:
continue
if finish(yesterday) <= yesterday.start:
if today.start < finish(yesterday):
errors.append(forms.ValidationError(
string_concat(
field.error_messages['item_invalid'],
self.OVERLAPS_PREVIOUS
),
code='item_invalid',
params={'nth': i}
))
if errors:
raise forms.ValidationError(errors)
return opening_hours
I’m currently not displaying the duration/length: I dynamically calculate it based on the entered start/finish pair, but that’s getting quite complicated.