Custom Element Form Submission

Custom elements are starting to get some traction in Web Development. There have been some really nice recent posts, the one which got me back on track was Web Components: Why You’re Already an Expert, but also Custom Elements: defining new elements in HTML and Performance and Custom Elements. Also, Polymer makes heavy use of Custom Elements, and I’ve had a bit of a look there too.

I really like KnockoutJS, so started playing around with that. I have a nicely defined DatePicker element, that would work well being turned into a Custom Element, and allow me to use the Shadow DOM to hide the internals of the code.

However, I soon hit a problem. Browsers, and jQuery, will only add data to a form when the element type is <input>, <select>, <textarea> and <keygen>. In jQuery, for instance, this is hard-coded in as rsubmittable, see source.

This means that you cannot just use a <x-date-picker> element, and have it submit. You still need to use some type of a hidden <input> element, and link that to the value. “Subclassing” <input> is not sufficient.

This seems to be a fairly large oversight, and I have not been able to find anything else on the internet that discusses this.

I haven’t tried “subclassing” <button> yet, to see if it is possible to create custom elements that can be used to submit forms, rather than provide data for them.

Per-command Virtualenv

Recently, I finally got around to re-installing OS X from scratch on my work machine. It was past time it needed to happen, to the extent where I would frequently be unable to wake machine from display sleep, and saving a file in a monitored directory would take the wsgi-monitor package tens of seconds to restart django.

One thing I wanted to do this time was only install stuff as necssary, but also put every pip installed command line tool in it’s own virtualenv. However, this has one drawback, in that it is a little repetitive.

For instance, to install Fabric, my deployment tool of choice:

$ virtualenv ~/.venv/fabric
$ . ~/.venv/fabric/bin/activate
(fabric)$ pip install fabric
(fabric)$ ln -s ~/.venv/bin/fabric /usr/local/bin/

This is fine if you only have one ‘tool’ to install, but something like docutils actually installs a whole stack of command line tools.

What we want, is something like:

  • create the virtualenv
  • get a list of items already in the <virtualenv>/bin
  • install the required tool (and any extra modules)
  • link all of the newly added commands in <virtualenv>/bin to /usr/local/bin

We could just add each <virtualenv>/bin to our path, but that would mean that the first virtualenv created would be used for pip, which I don’t want installed at all.

Additionally, it would be nice to be able to specify a required version of the package to install, and other (non-dependency) packages that should be installed. For instance, I want mercurial_keyring to be installed in the mercurial virtualenv.

This last one is probably less important, as you can just use that virtualenv’s pip to install them after. But the version number stuff might be nice.

virtualenv has the nice ability to be able to create bootstrap scripts, which will do other stuff (like install specific packages). We can co-opt this to build a tool for doing the automatic installation and linking:

import virtualenv, subprocess

data = """
import os, subprocess

def extend_parser(optparse_parser):
    optparse_parser.add_option(
        "--upgrade",
        action="store_true",
        dest="upgrade",
        default=False,
        help="Upgrade package",
    )
    optparse_parser.add_option(
        "--path",
        dest="path",
        default='~/.venv/',
        help="Parent path of virtualenvs"
    )
    optparse_parser.add_option(
        '--package',
        dest="packages",
        action="append",
        help="Other packages to install"
    )
    
def adjust_options(options, args):
    global package
    if not args: 
        return
    package = args[0]
    if '==' in args[0]:
        args[0], version = args[0].split('==', 1)
    args[0] = os.path.join(os.path.expanduser(options.path), args[0])

def after_install(options, home_dir):
    global package
    venv = os.path.join(os.path.expanduser(options.path), home_dir)
    before = os.listdir(os.path.join(venv, 'bin'))
    command = [os.path.join(venv, 'bin', 'pip'), 'install', package]
    if options.upgrade:
        command += ['--upgrade']
    if options.packages:
        command += options.packages
    subprocess.call(command)
    after = os.listdir(os.path.join(venv, 'bin'))
    
    for command in set(after).difference(before):
        subprocess.call([
            'ln', '-s', 
            os.path.join(venv, 'bin', command),
            '/usr/local/bin'
        ])
"""

output = virtualenv.create_bootstrap_script(data)
open('/usr/local/bin/pip-install', 'w').write(output)
subprocess.call(['chmod', '+x', '/usr/local/bin/pip-install'])

There is one caveat: if an existing file is found in /usr/local/bin that matches one that should be linked, it will be ignored. That is, it does not overwrite existing commands. I think this is preferable, as it is marginally safer.

Linking commands like this is better than copying them, as it means you can just do a pip install --upgrade <package> in the relevant virtualenv, and it will upgrade commands. You can also use pip-install <package>==<new-version>, and that should work too. However, if you unlink a command (or remove one that would have linked but failed), and do a pip-install, it will not link the commands that were already installed in that virtualenv.

Anyway, your mileage may vary. I’m using it now, and it seems good.

Duck-punch misbehaving software

Recently, I found myself having to interact with an API that uses SOAP. I’ve been using the SOAPpy package. Which has made it possible, but not exactly easy. But that’s not what I am going to write about right now.

In order to make the linking of data between my software and that system easier, I needed to get a dump of the other system’s data, in a CSV that I could send to the client.

So, since the SOAPpy module gives you something that looks dict-y, I thought I’d just be able to pass it to csv.DictWriter’s writerow() method.

Not quite.

See, whilst it supports the python dict-like [key] syntax, the SOAPpy.Types.structType doesn’t support the .get() method, that DictWriter uses to extract the data (indeed, it needs to, to be able to trap missing keys).

So, here is a simple duck-punch (does it quack like a duck? No? Punch it harder so, it quacks!) that enables you to pass a structType object to a DictWriter.writerow() method call:

from SOAPpy.Types import structType
structType.get = lambda x,y,z=None : x[y]

In this case, I was able to use this simple case, as I knew the keys it would be asked for would all exist, but you could make a slightly more complex one that checked to see the key exists, and if not, return z. You need all three arguments in the lambda though, since DictWriter passes them in.

I Hate Generic Foreign Keys, but this works anyway

I’m really not a fan of the concept of Generic Foreign Keys. They do have their place, and the app I’ve just started is a reasonable example.

It’s django-activity-streams, and I’m using it essentially as an audit stream. It stores the user who performed the change, the object that was changed, when it was changed, and a serialised version of the fields that have changed, in the format of:

[
  {
    "field": "date_of_birth",
    "old": "1955-01-10",
    "new": "1955-10-01"
  }
]

Now, the complication comes when trying to generate reports based on this stuff, and that is all down to the use of GFKs.

Essentially, what I want to be able to do is:

Action.objects.between(start, finish).verb('updated').filter(
  target__in=queryset
)

But this will not work, as there is no real target field: it’s a GFK field. But we can query on the two fields that make it up: target_content_type and target_object_id.

So, you might think we can do something like:

ctype = ContentType.objects.get_for_model(queryset.model)
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=queryset
)

Alas, this will not work either, as target_object_id is a “character varying”, and a queryset kind-of looks like a set of integers (or whatever the primary key for that table is).

So, we need a list of characters, instead of integers.

pks = map(str, queryset.values_list('id', flat=True))
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=pks
)

Indeed, that works, but (a) it requires two queries (one to get the PKs, and the other to get the actions), and (b) the second query will get very long if there are lots of objects in the queryset.

So, we want a query that we can use as a subquery. Enter postgres:

pks = queryset.extra(select={'_id': 'SELECT CAST("id" AS text)'values('_id')
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=pks
)

Bingo:

SELECT
    "actstream_action"."id",
    "actstream_action"."actor_content_type_id",
    "actstream_action"."actor_object_id",
    "actstream_action"."verb",
    "actstream_action"."description",
    "actstream_action"."target_content_type_id",
    "actstream_action"."target_object_id",
    "actstream_action"."action_object_content_type_id",
    "actstream_action"."action_object_object_id",
    "actstream_action"."timestamp",
    "actstream_action"."public",
    "actstream_action"."data"
FROM
    "actstream_action"
WHERE (
    "actstream_action"."verb" = created
    AND "actstream_action"."timestamp" <= 2013-09-11 00:00:00
    AND "actstream_action"."timestamp" >= 2001-01-01 00:00:00
    AND "actstream_action"."target_object_id" IN (
        SELECT
          (SELECT CAST("id" AS text)) AS "_id"
        FROM 
            "people" U0 
        WHERE 
            U0."comp_id" = 1 
    ) 
    AND "actstream_action"."target_content_type_id" = 17
)
ORDER BY
    "actstream_action"."timestamp" DESC;

You can see the subquery SELECT (SELECT CAST(...)) after the IN, which in the previous version was a list of string versions of the ids.

arp -a | vendor

I have lots of things on my local network. Most of them behave nicely with the DHCP server, and provide their machine name as part of their DHCP request (Client ID), which means I can see them in the list in Airport Utility.

However, some of them don’t which means I have some blank rows.

It would be nice to be able to figure out which devices these are, especially for those that don’t provide any services (particulary a web interface).

Enter MAC Vendor Lookup.

You can register, and get an API key that will return values in the format you desire.

Then, it’s possible to do:

$ curl --silent http://www.macvendorlookup.com/api/:API_KEY/:MAC_ADDRESS | cut -f 1 -d \|

(I use the pipe delimited version).

This is all well and good, but who wants to have to type them in? Not this guy.

Let’s look at how we can get them from arp -a.

$ arp -a | cut -f 4 -d ' '

Okay, that’s promising, it gives me a list of MAC addreses. Almost. It skips out leading zeros, which the API rejects. And it includes ones that are missing.

Cue about an hour mucking about with the (limited) sed regex docs:

$ arp -a | 
    cut -f 4 -d ' ' | 
    sed -E 's/:([[:xdigit:]]):/:0\1:/g' | 
    sed -E 's/^.:/0&/' | 
    sed -E 's/:(.)$/:0\1/'

Ah, that’s better. Now we have the proper MAC addresses.

Now, we can pipe this information through the API call.

This is where we need to start to get a bit tricky. We need to create a function that will allow us to call the API with a new value each time. You’ll want to stick this in your .bashrc.

function mac_vendor() {
  $API_URL="http://www.macvendorlookup.com/api"
  $API_KEY="<your api key>"
  if [[ $1 ]]; then
    curl --silent "$API_KEY/:API_KEY/$1" | cut -f 1 -d \|
  else
    while read DATA; do
      curl --silent "$API_KEY/:API_KEY/$DATA" | cut -f 1 -d \|
    done
  fi
}

The if statement means we can use it by passing an argument on the command line:

$ mac_vendor 00:00:00:00:00:00
Xerox Corporation

Or by passing through data from stdin:

$ arp -a | 
    cut -f 4 -d ' ' | 
    sed -E 's/:([[:xdigit:]]):/:0\1:/g' | 
    sed -E 's/^.:/0&/' | 
    sed -E 's/:(.)$/:0\1/' |
    mac_vendor

Okay, that’s nice, but we now can’t see which IP address is associated with which vendor.

Let’s move that ugly chained sed call into it’s own function, called normalise_mac_address, which we will also wrap in a while read DATA; do ... done clause, so we can pipe data through it:

function normalise_mac_address() {
  while read DATA; do
    echo $DATA |
      sed -E 's/:([[:xdigit:]]):/:0\1:/g' |
      sed -E 's/^.:/0&/' |
      sed -E 's/:(.)$/:0\1/'
  done
}

Nearly there!

We now need to be able to grab out the IP address and the MAC address from arp, and pass only the MAC address through our conversion functions. By default the bash for … in … construct will iterate through words, so we need to tell it to deal with a line at a time:

function get_all_local_vendors() {
  IFS=$'\n'
  for LINE in `arp -a | cut -f 2,4 -d ' '`; do
    # We have LINE="(<ip.address.here>) <mac:address:here>"
    MAC=`echo $LINE | cut -f 2 -d ' ' | normalise_mac_address`
    IP=`echo $LINE | cut -f 1 -d ' '`
    # We only want ones that were still active
    if [ $MAC != '(incomplete)' ]; then
      VENDOR=`echo $MAC | mac_vendor`
      echo $VENDOR $IP
    fi
  done
}

I’m hardly a bash expert, so there may be a better way of doing things rather than the repeated VARIABLE=`foo thing` construct I keep using.

So, the outcome I get when I run this looks something like:

$ get_all_local_vendors 
Apple, Inc. (10.0.1.1)
Sparklan Communications, Inc. (10.0.1.3)
Devicescape Software, Inc. (10.0.1.4)
Mitrastar Technology (10.0.1.5)
Apple, Inc. (10.0.1.15)
Silicondust Engineering Ltd (10.0.1.16)
Apple Computer (10.0.1.21)
none (10.0.1.255)

Getting rid of that last one is left as an exercise to the reader: the MAC address is FF:FF:FF:FF:FF:FF.

Django Single Table Inheritance on the cheap.

There was a recent question on Stack Overflow about Django Single Table Inheritance (STI). It got me thinking about how to use my FSM-proxy stuff to just be about STI.

Note: this only works when all sub-classes have the same fields: the example we are going to use here is different to a state machine, in that an object may not change state after it has been created.

class Sheep(models.Model):
  type = models.CharField(max_length=4)
  tag_number = models.CharField(max_length=64)

class Ram(Sheep):
  class Meta:
    proxy = True
  
class Ewe(Sheep):
  class Meta:
    proxy = True

In this case, we can fetch all sheep as Sheep.objects.all(). However, this gives us the objects as Sheep instances, when we want those with type='ram' to return Ram instances, and those with type='ewe' to return Ewe instances.

We can do this, by the magic of type().__subclasses__().

class Sheep(models.Model):
  # fields as above
  
  def __init__(self, *args, **kwargs):
    super(Sheep, self).__init__(*args, **kwargs)
    # If we don't have a subclass at all, then we need the type attribute to match
    # our current class. 
    if not self.__class__.__subclasses__():
      self.type = self.__class__.__name__.lower()
    else:
      subclass = [x for x in self.__class__.__subclasses__() if x.__name__.lower() == self.type]
      if subclass:
        self.__class__ = subclass[0]
      else:
        self.type = self.__class__.__name__.lower()
    

This will automatically downcast Sheep objects to the correct subclass, based upon the type field.

It also sets the type field on objects that are instantiated without one (based on the current instance class). This enables us to do things like:

# Fetch all Sheep, downcast to correct subclass.
>>> Sheep.objects.all()
[<Ram: Ram object>, <Ram: Ram object>, <Ewe: Ewe object>]

# Automatically set the type on a class.
>>> Ram()
<Ram: Ram object>
>>> Ram().type
'ram'
>>> Sheep()
<Sheep: Sheep object>
>>> Sheep().type
'sheep'

# Automatically set the class on a valid subclass/type
>>> Sheep(type='ram')
<Ram: Ram object>
# Force the type field on an invalid type argument. [see below]
>>> Ram(type='ewe')
<Ram: Ram object>
>>> Sheep(type='foo')
<Sheep: Sheep object>
>>> Sheep(type='foo').type
'sheep'

The assumption I have made here is that when instantiating a class, and the type value is not a valid value (our class, or one of our subclasses), then it changes the type field to the current class.

The other assumption is that the parent class is also valid. In this case, it wouldn’t be, as sheep must be either ewes or rams (or wethers, but that’s another story).

We also need to be able to fetch Ewe and Ram objects using their manager. This is just as simple as filtering on the type.

class ProxyManager(models.Manager):
  def get_query_set(self): # Note: get_queryset in Django1.6+
    return super(ProxyManager, self).get_query_set().filter(type=self.model.__name__.lower())

class Ram(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

class Ewe(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

Now, we can do:

>>> Ram.objects.all()
[<Ram: Ram object>, <Ram: Ram object>]

Clearly, the models have been simplified: I have not shown any model methods that would be the different behaviours that the subclasses have.

Who is having a birthday?

The latest thing I have been working on is notifications for our project. One of the required notification types is upcoming (and today’s) birthdays (and expiring work visas, but that’s a much easier problem).

This actually turns out to be quite a hard problem. There are some simple solutions, but they all do not meet our requirements:

  1. Store only the month and day of a person’s birthday. This is unsatisfactory as we use their age to calculate their wage, if applicable.
  2. Create a pseudo-column that contains their upcoming birthday. This gets hard when you take leap-day birthdays into account.

We need to fetch all people who have a birthday coming up in the next X days. This is a requirement because if we just matched people who had a birthday in X days, (a) leap days are easy to miss, and (b) changes to either the query period or a person’s birthday could mean some events were missed.

Instead, we will query for birthdays in a range, and see if we have already sent a notification for this instance of their birthday. If not, we will send a notification.

One solution is to look at all of the dates in the given range, in the format -%m-%d, and query using contains against this list:

dates = [start+datetime.timedelta(i) for i in range((finish-start).days)]
filters = [Q(dob__contains=x.strftime('-%m-%d')) for x in dates]
Person.objects.filter(reduce(operator.or_, filters))

But, this too fails when a birthday on a leap day exists, and this year is not a leap year.

(We use the -%m-%d format instead of %m-%d so we don’t get false matches from the year part of the date).

Then I came across a post by Zoltán Böszörményi, that contains the following useful function:

CREATE OR REPLACE FUNCTION indexable_month_day(date) RETURNS TEXT as $BODY$
  SELECT to_char($1, 'MM-DD');
$BODY$ language 'sql' IMMUTABLE STRICT;

There are a couple of things to notice: it does MM-DD, not the other way around. This allows us to sort lexically. Also, declaring it as IMMUTABLE means we will be able to create an index using it. And since we are querying against it, having an index may be useful:

CREATE INDEX person_birthday_idx ON people (indexable_month_day(dob));

Now, we can also query against this. I like to use django queryset methods (see building a higher-level query API), so my stuff looks like:

class BirthdayQuerySetMixin(object):
    def birthday_between(self, start, finish):
        assert start <= finish, "Start must be less than or equal to finish"
        start = start - datetime.timedelta(1)
        finish = finish + datetime.timedelta(1)
        return self.extra(where=[
            """
            indexable_month_day(dob) < '%(finish)s' 
            %(andor)s 
            indexable_month_day(dob) > '%(start)s'
            """ % {
                'start': start.strftime('%m-%d'),
                'finish': finish.strftime('%m-%d'),
                'andor': 'and' if start.year == finish.year else 'or'
            }
        ])
      
    def birthday_on(self, date):
        return self.birthday_between(date, date)

This has a caveat: it returns two matches for leap-day birthdays during non-leap-years. This is intentional, as other logic will prevent duplicate notifications, and we don’t know which offsetting method people will prefer.

The logic behind it is that it offsets the start and the finish by one day each, and then filters using less_than and greater_than. This is what allows us to find leap-day birthdays. The other tricky part is using AND when the years of the start and finish are the same, and OR if the finish is in the next year. This allows it to match over year boundaries.

Oh, there should also be a check to ensure that we have less than a full year between start and finish: if it’s a year or more, we can just return everyone!

Otherwise, it’s all good, and we can use it to filter a queryset. I’ve put those methods in my PersonManager and PersonQuerySet (via PassThroughManager), so I can do things like:

>>> today = datetime.date.today()
>>> Person.objects.birthday_between(today, today + datetime.timedelta(7))

… which provides me with a list of people who have a birthday within the next seven days.

Django sessions and security

We have an interesting set of requirements regarding session timeouts.

Our application is currently split into two parts: the newer stuff runs in a browser, but the older parts consume a JSON api, and are part of a native app. We recently stopped using HTTP Basic authentication, and instead use session-based authentication in both places. This was handy, as it allows us to:

  1. Not store the user’s password, even in memory on the local machine.
  2. Automatically have the user logged in when the native client links to an HTML page (by passing the session id through).

This is all well and good, but we have discovered a slight possible issue.

  1. User logs in to native client.
  2. User clicks on a button that loads a page in the browser (logging them in automatically).
  3. User closes browser window, but does not quit browser.
  4. Native client does not cleanly exit, and logout code is not called.

This means that the browser session is still logged in, even though the user would have no idea of this. This is a very bad thing(tm), as the next person to use the computer could have access to all of the previous user’s data.

So, we need the following to happen:

  • Logging out of the client logs out all of the linked (same session id) browser instances.
  • Closing a given browser window does not log out the session (the client may still be open, or there may be other linked browser windows).
  • When no requests are receieved within a given time period, the session expires.

So, we need a short session expiry time, but this should refresh every time a request occurs. The browser pages fetch notifications every 30 seconds, but the native client will also need to ping the server with some frequency for this to work.

This is somewhat different to the way django-session-security works. However, this does add a feature that may also be useful: if no user input is receieved on a given page within a timeout period, the session should expire. However, this may be hard to manage, as no activity may occur on one page, but another page may be getting lots of activity. For now, we might leave this out as a requirement.

It turns out Django can do everything that is required, out of the box. All you need to do is configure it correctly:

# settings.py

SESSION_SAVE_EVERY_REQUEST = True
SESSION_COOKIE_AGE = 60

The key is to understanding that the session expire time is only refreshed if the session is saved. Most requests will not save this (my fetch of unread notifications doesn’t for instance), so after the expiry time, the session would expire, even if requests had been made in the meantime.

Django Fieldsets

HTML forms contain a construct called a fieldset. These are generally used to segment a form: splitting a form into groups of fields that are logically grouped. Each fieldset may also have a legend.

Django’s forms have no concept of a fieldset natively, but with a bit of patching, we can make every django form capable of rendering itself using fieldsets, yet still be backwards compatible with non-fieldset-aware templates.

Ideally, we would like to be able to render a form in a way similar to:

<form>
  {% for fieldset in form.fieldsets %}
  <fieldset>
    <legend>{{ fieldset.title }}</legend>
    
    <ul>
      {% for field in fieldset %}
        <li>
          {{ field.label_tag }}
          {{ field }}
          {{ field.help_text }}
          {{ field.errors }}
        </li>
      {% endfor %}
    </ul>
  </fieldset>
  {% endfor %}
  
  <!-- submit button -->
</form>

And, it would make sense to be able to declare a form’s fieldsets in a manner such as:

class MyForm(forms.Form):
  field1 = forms.BooleanField(required=False)
  field2 = forms.CharField()
  
  class Meta:
    fieldsets = (
      ('Fieldset title', {
        'fields': ('field1', 'field2')
      }),
    )

This is similar to how fieldsets are declared in the django admin.

We can’t just simply create a subclass of forms.Form, and do everything there, as the metaclass stuff doesn’t work correctly. Instead, we need to duck-punch.

First, we want to redefine the metaclass __init__ method, so it will accept the fieldsets attribute.

from django import forms
from django.forms.models import ModelFormOptions

_old_init = ModelFormOptions.__init__

def _new_init(self, options=None):
  _old_init(self, options)
  self.fieldsets = getattr(options, 'fieldsets', None)

ModelFormOptions.__init__ = _new_init

Next, we will need a Fieldset class:

class Fieldset(object):
  def __init__(self, form, title, fields, classes):
    self.form = form
    self.title = title
    self.fields = fields
    self.classes = classes
  
  def __iter__(self):
    # Similar to how a form can iterate through it's fields...
    for field in self.fields:
      yield field

And finally, we need to give every form a fieldsets method, which will yield each fieldset, as a Fieldset defined above:

def fieldsets(self):
  meta = getattr(self, '_meta', None)
  if not meta:
    meta = getattr(self, 'Meta', None)
  
  if not meta or not meta.fieldsets:
    return
  
  for name, data in meta.fieldsets:
    yield Fieldset(
      form=self,
      title=name,
      fields=(self[f] for f in data.get('fields',(,))),
      classes=data.get('classes', '')
    )

forms.BaseForm.fieldsets = fieldsets

I am using this code (or something very similar to it), in projects. It works for me, but your mileage may vary…

Django Proxy Model State Machine

Finite State Machines (fsm) are a great way to model something that has, well, a finite number of known states. You can easily specify the different states, and the transitions between them.

Some time ago, I came across a great way of doing this in python: Dynamic State Machines. This maps well onto an idea I have been toying with lately, replacing a series of linked models representing different phases in a process with one model type. Initially, I had thought to just use a type flag, but actually changing the class seems like a better idea.

One aspect of django’s models that makes it easy to do this is the concept of a Proxy Model. These are models that share the database table, but have different class definitions. However, usually a model instance will be of the type that was used to fetch it:

class ModelOne(models.Model):
  field = models.CharField()
  
class ModelOneProxy(ModelOne):
  class Meta:
    proxy = True

ModelOneProxy.objects.get(pk=1) # Returns a ModelOneProxy object.
ModelOne.objects.all() # Returns all ModelOne objects.

However, by using a type field, we can, at the time it is fetched from the database, turn it into the correct type.

class StateMachineModel(models.Model):
  status = models.CharField(max_length=64)
  
  def __init__(self, *args, **kwargs):
    super(StateMachineModel, self).__init__(*args, **kwargs)
    self.__class__ = class_mapping[self.status]

However, having to store a registry of status : <ProxyModelClass> objects is not much fun.

Enter __subclasses__.

  @property
  def _get_states(self):
    """
    Get a mapping of {status: SubClass, ...}
    
    The status key will be the name of the SubClass, with the
    name of the superclass stripped out.
    
    It is intended that you prefix your subclasses with a meaningful
    name, that will be used as the status value.
    """
    return dict([
      (
        sub.__name__.lower().replace(self.__class__.__name__, ''),
        sub
      ) for sub in self.__class__.__subclasses__()
    ])
  
  # in __init__, above, replace the last line with:
    self.__class__ = self._get_states[self.status]

Now, we need to change the underlying class when the type gets changed

  def __setattr__(self, attr, value):
    if attr == 'status':
      states = self._get_states
      if value in states:
        self.__class__ = states[value]
    return super(StateMachineModel, self).__setattr__(attr, value)

As the docstring on _get_states indicates, it looks at the subclass name, and compares it to the superclass name to work out the values that will be stored as the status (and used to dynamically change the class).

This has a fairly large implication: you cannot fetch database objects of any of the subclass types directly: you would need to:

SuperClass.objects.filter(status="prefix")

Of course, you could use queryset methods to do this: that’s what I have been doing.

This is still a bit of a work in progress: it’s not well tested, but is an interesting idea.

The full version of this model class, which is slightly different to above:

from django.db import models

class StateMachineModel(models.Model):
    status = models.CharField(max_length=64)
    
    class Meta:
        abstract = True
    
    def __init__(self, *args, **kwargs):
        self._states = dict([
            (sub.__name__.replace(self.__class__.__name__, '').lower(), sub)
            for sub in self.__class__.__subclasses__()
        ])
        super(StateMachineModel, self).__init__(*args, **kwargs)
        self._meta.get_field_by_name('status')[0]._choices = [(x, x) for x in self._states.keys()]
        self._set_state()
            
    def _set_state(self):
        if getattr(self, 'status', None) in self._states:
            self.__class__ = self._states[self.status]
    
    def __setattr__(self, attr, value):
        if attr == 'status':
            self._set_state()
        return super(StateMachineModel, self).__setattr__(attr, value)