Django Single Table Inheritance on the cheap.

There was a recent question on Stack Overflow about Django Single Table Inheritance (STI). It got me thinking about how to use my FSM-proxy stuff to just be about STI.

Note: this only works when all sub-classes have the same fields: the example we are going to use here is different to a state machine, in that an object may not change state after it has been created.

class Sheep(models.Model):
  type = models.CharField(max_length=4)
  tag_number = models.CharField(max_length=64)

class Ram(Sheep):
  class Meta:
    proxy = True
  
class Ewe(Sheep):
  class Meta:
    proxy = True

In this case, we can fetch all sheep as Sheep.objects.all(). However, this gives us the objects as Sheep instances, when we want those with type='ram' to return Ram instances, and those with type='ewe' to return Ewe instances.

We can do this, by the magic of type().__subclasses__().

class Sheep(models.Model):
  # fields as above
  
  def __init__(self, *args, **kwargs):
    super(Sheep, self).__init__(*args, **kwargs)
    # If we don't have a subclass at all, then we need the type attribute to match
    # our current class. 
    if not self.__class__.__subclasses__():
      self.type = self.__class__.__name__.lower()
    else:
      subclass = [x for x in self.__class__.__subclasses__() if x.__name__.lower() == self.type]
      if subclass:
        self.__class__ = subclass[0]
      else:
        self.type = self.__class__.__name__.lower()
    

This will automatically downcast Sheep objects to the correct subclass, based upon the type field.

It also sets the type field on objects that are instantiated without one (based on the current instance class). This enables us to do things like:

# Fetch all Sheep, downcast to correct subclass.
>>> Sheep.objects.all()
[<Ram: Ram object>, <Ram: Ram object>, <Ewe: Ewe object>]

# Automatically set the type on a class.
>>> Ram()
<Ram: Ram object>
>>> Ram().type
'ram'
>>> Sheep()
<Sheep: Sheep object>
>>> Sheep().type
'sheep'

# Automatically set the class on a valid subclass/type
>>> Sheep(type='ram')
<Ram: Ram object>
# Force the type field on an invalid type argument. [see below]
>>> Ram(type='ewe')
<Ram: Ram object>
>>> Sheep(type='foo')
<Sheep: Sheep object>
>>> Sheep(type='foo').type
'sheep'

The assumption I have made here is that when instantiating a class, and the type value is not a valid value (our class, or one of our subclasses), then it changes the type field to the current class.

The other assumption is that the parent class is also valid. In this case, it wouldn’t be, as sheep must be either ewes or rams (or wethers, but that’s another story).

We also need to be able to fetch Ewe and Ram objects using their manager. This is just as simple as filtering on the type.

class ProxyManager(models.Manager):
  def get_query_set(self): # Note: get_queryset in Django1.6+
    return super(ProxyManager, self).get_query_set().filter(type=self.model.__name__.lower())

class Ram(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

class Ewe(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

Now, we can do:

>>> Ram.objects.all()
[<Ram: Ram object>, <Ram: Ram object>]

Clearly, the models have been simplified: I have not shown any model methods that would be the different behaviours that the subclasses have.

Who is having a birthday?

The latest thing I have been working on is notifications for our project. One of the required notification types is upcoming (and today’s) birthdays (and expiring work visas, but that’s a much easier problem).

This actually turns out to be quite a hard problem. There are some simple solutions, but they all do not meet our requirements:

  1. Store only the month and day of a person’s birthday. This is unsatisfactory as we use their age to calculate their wage, if applicable.
  2. Create a pseudo-column that contains their upcoming birthday. This gets hard when you take leap-day birthdays into account.

We need to fetch all people who have a birthday coming up in the next X days. This is a requirement because if we just matched people who had a birthday in X days, (a) leap days are easy to miss, and (b) changes to either the query period or a person’s birthday could mean some events were missed.

Instead, we will query for birthdays in a range, and see if we have already sent a notification for this instance of their birthday. If not, we will send a notification.

One solution is to look at all of the dates in the given range, in the format -%m-%d, and query using contains against this list:

dates = [start+datetime.timedelta(i) for i in range((finish-start).days)]
filters = [Q(dob__contains=x.strftime('-%m-%d')) for x in dates]
Person.objects.filter(reduce(operator.or_, filters))

But, this too fails when a birthday on a leap day exists, and this year is not a leap year.

(We use the -%m-%d format instead of %m-%d so we don’t get false matches from the year part of the date).

Then I came across a post by Zoltán Böszörményi, that contains the following useful function:

CREATE OR REPLACE FUNCTION indexable_month_day(date) RETURNS TEXT as $BODY$
  SELECT to_char($1, 'MM-DD');
$BODY$ language 'sql' IMMUTABLE STRICT;

There are a couple of things to notice: it does MM-DD, not the other way around. This allows us to sort lexically. Also, declaring it as IMMUTABLE means we will be able to create an index using it. And since we are querying against it, having an index may be useful:

CREATE INDEX person_birthday_idx ON people (indexable_month_day(dob));

Now, we can also query against this. I like to use django queryset methods (see building a higher-level query API), so my stuff looks like:

class BirthdayQuerySetMixin(object):
    def birthday_between(self, start, finish):
        assert start <= finish, "Start must be less than or equal to finish"
        start = start - datetime.timedelta(1)
        finish = finish + datetime.timedelta(1)
        return self.extra(where=[
            """
            indexable_month_day(dob) < '%(finish)s' 
            %(andor)s 
            indexable_month_day(dob) > '%(start)s'
            """ % {
                'start': start.strftime('%m-%d'),
                'finish': finish.strftime('%m-%d'),
                'andor': 'and' if start.year == finish.year else 'or'
            }
        ])
      
    def birthday_on(self, date):
        return self.birthday_between(date, date)

This has a caveat: it returns two matches for leap-day birthdays during non-leap-years. This is intentional, as other logic will prevent duplicate notifications, and we don’t know which offsetting method people will prefer.

The logic behind it is that it offsets the start and the finish by one day each, and then filters using less_than and greater_than. This is what allows us to find leap-day birthdays. The other tricky part is using AND when the years of the start and finish are the same, and OR if the finish is in the next year. This allows it to match over year boundaries.

Oh, there should also be a check to ensure that we have less than a full year between start and finish: if it’s a year or more, we can just return everyone!

Otherwise, it’s all good, and we can use it to filter a queryset. I’ve put those methods in my PersonManager and PersonQuerySet (via PassThroughManager), so I can do things like:

>>> today = datetime.date.today()
>>> Person.objects.birthday_between(today, today + datetime.timedelta(7))

… which provides me with a list of people who have a birthday within the next seven days.

Django sessions and security

We have an interesting set of requirements regarding session timeouts.

Our application is currently split into two parts: the newer stuff runs in a browser, but the older parts consume a JSON api, and are part of a native app. We recently stopped using HTTP Basic authentication, and instead use session-based authentication in both places. This was handy, as it allows us to:

  1. Not store the user’s password, even in memory on the local machine.
  2. Automatically have the user logged in when the native client links to an HTML page (by passing the session id through).

This is all well and good, but we have discovered a slight possible issue.

  1. User logs in to native client.
  2. User clicks on a button that loads a page in the browser (logging them in automatically).
  3. User closes browser window, but does not quit browser.
  4. Native client does not cleanly exit, and logout code is not called.

This means that the browser session is still logged in, even though the user would have no idea of this. This is a very bad thing(tm), as the next person to use the computer could have access to all of the previous user’s data.

So, we need the following to happen:

  • Logging out of the client logs out all of the linked (same session id) browser instances.
  • Closing a given browser window does not log out the session (the client may still be open, or there may be other linked browser windows).
  • When no requests are receieved within a given time period, the session expires.

So, we need a short session expiry time, but this should refresh every time a request occurs. The browser pages fetch notifications every 30 seconds, but the native client will also need to ping the server with some frequency for this to work.

This is somewhat different to the way django-session-security works. However, this does add a feature that may also be useful: if no user input is receieved on a given page within a timeout period, the session should expire. However, this may be hard to manage, as no activity may occur on one page, but another page may be getting lots of activity. For now, we might leave this out as a requirement.

It turns out Django can do everything that is required, out of the box. All you need to do is configure it correctly:

# settings.py

SESSION_SAVE_EVERY_REQUEST = True
SESSION_COOKIE_AGE = 60

The key is to understanding that the session expire time is only refreshed if the session is saved. Most requests will not save this (my fetch of unread notifications doesn’t for instance), so after the expiry time, the session would expire, even if requests had been made in the meantime.