Developing RESTful Web APIs with Python, ...

This week’s Python Weekly has a link to a presentation by Nicola Iarocci, called Developing RESTful Web APIs with Python, Flask and MongoDB.

I have a few minor concerns with some aspects of the content.

No form validation

This concerns me. I’ve recently started using django’s forms for my validation layer for API calls, and also for the generation of the serialised output. It’s not completely flawless, but it seems to be working quite well. It certainly is more robust than my hand-rolled scheme of validating data, and using code that is better tested than my own is always a bonus.

Instead, as we see later, there is a data validation layer. It has basically the same goal as django’s forms, but is somewhat more nested, rather than using classes. Also, using classes makes it easier to have inheritance, a great way to have shared rules. You could do this using the same function in your custom validation, but this feels disconnected.

MongoDB
scalable, high-performance, …

The integrity of my data is important to me. It’s very rare that the db is the limiting factor in my system’s performance, and having stuff written to disk as soon as it is ‘real’ is kind-of critical.

http://api.example.com/v1/contacts/

Okay, this is where I jump on my high horse: “versioning should happen in the media-type”. Or even better, resources should be forwards and backwards compatible, and clients should be written to handle (or ignore) changes to schemata.

@mimerender( ... )

A decorator that has 5 arguments? That will be applied to every view function? Surely there’s a way to do this without having to decorate every function. Django CBV FTW here.

“Thu, 1 Mar 2012 10:00:49 UTC”

Egad. I can’t think of a reason to have machine readable dates in any format other than ISO 8601. Purely for the reason of being able to sort dates whilst they are still strings.

PATCH
Why not PUT?

Why not POST?

This is something that has been debated for ages. I think I kind-of agree with the author: PATCH is more explicitly a partial update. It does make me think about using some type of diff, but I guess using concurrency control covers the same ground.

"<link rel='parent' ... />"

Okay, HTML/XML inside a JSON object?

Why not have:

{
  "rel": "parent",
  "title": "...",
  "href": "..."
}

At least that way you’ll be able to parse the data out natively.

“updated”: “…”,
“etag”: “…”

I’m not sure if it is necessary/warranted/desired to have the etag as part of the representation. Especially if the etag is generated from the content: that would kind-of preclude it.

Personally, I generate etags from a higher resolution timestamp (possibly hashed with the object id, class or whatever). Whilst etags are opaque, having them as human readable helps with troubleshooting.

To me, this seems to be metadata, and should not be part of the object. I think you could argue a case that within Collection+JSON you could add this in, for convenience. It certainly would make it easier not to have to store the etag in a seperate variable on the client, for one.

The discussion about Concurrency Control is quite good. Which reminds me: I enjoyed most of this presentation. I have some minor nitpicks, but some of those I understand the author’s choices. Some I don’t (date format). It’s certainly better than the REST API Design Rulebook, which is a load of junk.

KnockoutJS persistence using Simperium

I really like KnockoutJS. I’ve said that lots of times, but I mean it. It does one thing, two-way bindings between a data model and the GUI elements, really well.

Perhaps my biggest hesitation in using it in a big project is that there is no built-in persistence layer. This would appear to be a situation where something like Backbone has an advantage.

And then, last week, I came across Simperium.

“So,” I thought, “what if you were able to transparently persist KnockoutJS models using Simperium?”

// Assume we have a SIMPERIUM_APP_ID, and a logged in user's access_token.
var simperium = new Simperium(SIMPERIUM_APP_ID, {token: access_token});
// mappingOptions is a ko.mapping mappingOptions object: really only useful
// if your bucket contains homogenous objects.
var store = new BucketMapping(simperium.bucket(BUCKET_NAME), mappingOptions);

var tony = store.all()[0];

var alan = store.create({
  name: "Alan Tenari",
  date_of_birth: "1965-02-06",
  email: "alan.tenari@example.com"
});

Now, tony is an existing object we loaded up from the server, and alan is one we just created.

Both of these objects are mapped using ko.mapping, but, this is the exciting bit, every time we make a change to any of their attributes, they are automatically persisted back to simperium.

There is a little more to it than that: we may want to only persist valid objects, for instance.

This totally gets me excited. And, I’ve already written a big chunk of the code that actually does this!

But for that, you’ll just have to wait…

Metaclass magic registry pattern

The Registry Pattern is something I use relatively frequently. In django, for instance, we see it used for the admin interface, and I used very derivative code for my first API generation tool: django-rest-api. For our integration with external POS and other systems, we need to register importers, so that the automated stats fetching is able to look for units that need to fetch data from an external system’s website, or parse incoming email headers for matching delivered data.

I had been using something similar to:

from base import BaseStatsImporter, register

class FooStatsImporter(BaseStatsImporter):
    # ...

register(FooStatsImporter)

This is all well and good, but it is annoying. I need to remember to register each class after I declare it.

Then I discovered the magic of __metaclass_, used with __new__:

class RegistryMetaClass(type):
    def __new__(cls, clsname, bases, attrs):
        new_class = super(cls, RegistryMetaClass).__new__(cls, clsname, bases, attrs)
        register(new_class)
        return new_class
        
class BaseStatsImporter(object):
    __metaclass__ = RegistryMetaClass
    
    # ...

As long as your subclasses don’t override __metaclass__, then every new subclass will be added to the registry.

Obviously, this is magic, and in some cases the explicit way would be better.

The Organism Application

I had an email from a self-confessed django beginner, asking for some assistance. Here is my solution, as I worked through it.

The Application

The application is designed to allow tracking information related to identifying various organisms. An organism may have many identifying features, such as on a tree, the height, and the leaf morphology, or on a bird, the colour of the feathers, size of the egg and so on. To make it simpler for the users, it would be useful to classify organisms as belonging to a type, which can then be used to limit the available choices of identifying features: if an organism is a bird, then we only show those features that make sense for a bird.

To do all of this, we can have a class structure that looks somewhat like:

# models.py
from django.db import models

class OrganismType(models.Model):
    description = models.CharField(max_length=200)

class IdentificationField(models.Model):
    type = models.ForeignKey(OrganismType, related_name='id_fields')
    name = models.CharField(max_length=200)
    
    class Meta:
        unique_together = ('type', 'name')

class Organism(models.Model):
    common_name = models.CharField(max_length=200)
    latin_name = models.CharField(max_length=200, unique=True)
    type = models.ForeignKey(OrganismType, related_name='organisms')

class IdentificationDetail(models.Model):
    organism = models.ForeignKey(Organism, related_name="id_details")
    field = models.ForeignKey(IdentificationField)
    description = models.CharField(max_length=250)
    
    class Meta:
        unique_together = ('organism', 'field')

You’ll see I’ve also included a couple of unique_together constraints: I’ve assumed that each field for a given organism should only appear once.

Bending the admin to our will

Next, we can put all of this into the admin. This is really quite simple, but, as we will see, has it’s limits.

# admin.py
from django.contrib import admin

from models import OrganismType, Organism, IdentificationField, IdentificationDetail

class IdentificationFieldInline(admin.TabularInline):
    model = IdentificationField
    extra = 0

class OrganismTypeAdmin(admin.ModelAdmin):
    inlines = [IdentificationFieldInline]

class IdentificationDetailInline(admin.TabularInline):
    model = IdentificationDetail
    extra = 0

class OrganismAdmin(admin.ModelAdmin):
    inlines = [IdentificationDetailInline]    
    list_display = ('common_name', 'latin_name', 'type')
    list_filter = ('type',)

admin.site.register(OrganismType, OrganismTypeAdmin)
admin.site.register(Organism, OrganismAdmin)

I’ve removed the extra empty forms on the formsets, it looks much cleaner. I’ve also used a couple of the nice features of the admin to make display of stuff better.

At this point, thanks to the magic of django, you now have an administrative interface. But, it doesn’t quite do what we want: that is, we haven’t limited which identification fields will be available in the organism’s inlines.

To do that, we need to fiddle with the formset.

# forms.py
from django import forms

from models import IdentificationDetail, Organism

class IdentificationDetailFormSet(forms.models.BaseInlineFormSet):
    def __init__(self, *args, **kwargs):
        super(IdentificationDetailFormSet, self).__init__(*args, **kwargs)
        for form in self.forms:
            self.update_choices(form)
    
    # We need to override the constructor (and the associated property) for the
    # empty form, so dynamic forms work.
    def _get_empty_form(self, **kwargs):
        form = super(IdentificationDetailFormSet, self)._get_empty_form(**kwargs)
        self.update_choices(form)
        return form
    empty_form = property(_get_empty_form)
    
    # This updates one form's 'field' field queryset, if there is an organism with type
    # associated with the formset. Otherwise, make the choice list empty.
    def update_choices(self, form):
        if 'type' in self.data:
            id_fields = OrganismType.objects.get(pk=self.data['type']).id_fields.all()
        elif self.instance.pk and self.instance.type:
            id_fields = self.instance.type.id_fields.all()
        else:
            id_fields = IdentificationDetail.objects.none()
        
        form.fields['field'].queryset = id_fields

This process is something I’ve talked about before (and finding that post was what pointed the questioner in my direction), but I’ll discuss it again anyway. This is perhaps a more concrete example anyway.

We want to change the queryset available to a given field (in this case, confusingly called field), based on the value of a related object. In this case, we want to set the queryset of an identification detail’s field to all of the available identification fields on the related organism’s type. Whew!

As it turns out, it’s easier to see this in the code. Note also that if there is no selected organism type (as would be the case when an empty form is presented), no fields can be selected.

This alone would work: except that changing the organism’s type should change the available list of field types. There are two approaches that can be used: have all of the data available in the page somewhere, and use JavaScript to filter the available list of field types, or fetch the data dynamically from the server (again, using JavaScript) at the time the type is changed. If I were using something like KnockoutJS, then the former would be easier, and improve the responsiveness: the change would be immediate. Since I’m not using anything that doesn’t come with django, I’ll fetch the data on each change.

So, we are going to need some JavaScript. When we do the end-user page, it’s easy to see how to put that in, but we need to understand how to override django’s admin templates in order to inject it in this case.

The django documentation has some nice detail about how to do this: Overriding admin templates. In this case, we need to create a file within our app at templates/admin/organisms/organism/change_form.html. We want to just add data to the regular template, so we just inherit from it.

{% extends 'admin/change_form.html' %}

{% block after_related_objects %}
{{ block.super }}
<script>
django.jQuery(function($){
  $('#id_type').change(function(evt){
    $.ajax({
      url: "/admin/organisms/organismtype/" + this.value + '/fields/',
      type: 'get',
      success: function(data) {
        $('tr.form-row td.field-field select').html(data);
      }
    });
  });
});
</script>
{% endblock %}

The script here adds a change event handler to the organism type <select> element, that hits the server, and gets the list of fields for that type. It then sets the content of the inline identification detail field fields to the data the server returned. This clears whatever had been stored there previously, but that is probably the behaviour we want in this case. Note that I am hard-coding the URL for now: we’ll see a way to handle that in a better way later.

Only one thing remains: to actually write the view that returns the desired content of the <select> element. For now, we will put this into the admin class of the organism type. Again, later we’ll move this to a proper seperate view, but doing it this way shows how easy it is to extend the admin interface.

Back in our admin.py file, we want to change the OrganismTypeAdmin class:

# admin.py

from django.contrib import admin
from django.conf.urls import patterns, url
from django.http import HttpResponse

# [snip]

class OrganismTypeAdmin(admin.ModelAdmin):
    inlines = [IdentificationFieldInline]
    
    def get_urls(self, **kwargs):
        urls = super(OrganismTypeAdmin, self).get_urls(**kwargs)
        urls = patterns('', 
            url(r'^(.*)/fields/$', self.get_fields, name='organisms_organismtype_fields'),
        ) + urls
        return urls
    urls = property(get_urls)
    
    def get_fields(self, request, *args, **kwargs):
        data = "<option value>---------</option>"
        if args[0]:
            data += "".join([
                "<option value='%(id)s'>%(name)s</option>" % x 
                for x in OrganismType.objects.get(pk=args[0]).id_fields.values()
            ])
        return HttpResponse(data)

We can use the fact that the admin model object provides its own urls, and we can override the method that generates them. We need to put our fields view before the existing ones (and allow empty strings where we want the primary key), else it will be matched by another route.

Finally, we write the view itself. If there was no primary key provided, we return a “null” option, otherwise we include that and the actual list of choices.

Doing it for real

Of course, in a real environment, we probably don’t want to give access to the admin interface to anyone but trusted users. And even then, limit that to as few as possible. In this case, I would suggest that the admin users would be creating the OrganismType objects, but creating Organism objects would be done by regular users. Which means we really only have a couple of pages that need to be written for the outside world:

  • View a list of organisms.
    • Filter the list of organisms by OrganismType
    • Search for an organism by common name or latin name
    • Search for an organism by some other means (feather colour, etc)
  • Create a new organism
  • Edit an existing organism
  • Fetch a list of field types for a given organism type (the get_fields view above.)

This may come in a future post: I had forgotten about this and need some time to get back into it.

Hypermedia APIs in Django: Leveraging Class Based Views

It seems that I keep rewriting code that generates APIs from django. I think I’m getting closer to actually getting it right, though :)

I’m rather keen on Collection+JSON at the moment, and spent some time over easter writing an almost complete Collection+JSON client, using KnockoutJS. It loads up a root API url, and then allows navigation around the API using links. While working on this, it occurred to me that Collection+JSON really encodes the same information as a web page:

  • every <link> or <a href=...></a> element is either in links or queries.
  • form-based queries map nicely to queries elements that have a data attribute.
  • items encapsulates the actual data that should be presented.
  • template contains data that can be used to render a form for creating/updating an object.

Ideally, what feels best from my perspective is to have a pure HTML representation of the API, which can be rendered by browsers with JS disabled, and then all of the same urls could also be fetched as Collection+JSON. Then, you are sharing the code, right up to the point where the output is generated.

To handle this, I’ve come up with a protocol for developing django Class Based Views that can be represented as Collection+JSON or plain old HTML. Basically, your view needs to be able to provide links, queries, items. template comes from a form object (called form), and by default items is the queryset attribute.

leveraging views

I subscribe the idea that the less code that is written the better, and I believe that the API wrapper should (a) have minimal code itself, and (b) allow the end developer to write as little code as possible. Django is a great framework, we should leverage as much as is possible of that well written (and well tested) package.

The part of a hypermedia API that is sometimes ignored by web developers is handling the media type selection. I believe this is the domain of the “Accept” and “Content-Type” headers, not anything to do with the URL. Thus, I have a mixin that allows for selecting the output format based on the Accept header. It uses the inbuilt render_to_response method that a django View class has, and handles choosing how to render the response. As it should.

The other trick is how to get the links, queries, items and template into the context. For this, we can use get_context_data. We can call self.get_FOO(**kwargs) for FOO in each of those items. It is then up to the View class to handle those methods.

By default, a Model-based Resource is likely to have a form class, and a model class or a queryset. These can be used to get the items, and in the case of the form, the template. Even in the instance of the queryset (or model), we use the form class to turn the objects into something that can be rendered.

Finally, so it’s super-easy to use the same pattern as with django’s Views (generic.CreateView, for instance), I have a couple of classes: ListResource and DetailResource, which map directly onto CreateView and UpdateView. In the simplest case, you can just use:

urlpatterns = patterns('',
    url(r'^foo/$', ListResource.as_view(model=Foo)),
    url(r'^foo/(<?P<pk>\d+)/$', DetailResource.as_view(model=Foo))
)

There is also a Resource, which just combines the resource-level bits with generic.TemplateView. You can use ResourceMixin with any other class-based-view, but make sure it appears earlier than the django view class, to make sure we get the correct method resolution order.

There is still the matter of the links attribute. Knowing what to put into this can be a bit tricky. I’ve come to realise that this should contain a list of the valid states that can be accessed when you are in a given state. You will want to use django’s reverse function to populate the href attribute:

class Root(Resource):
    template_name = 'base.html'
    
    def get_links(self):
        return [
            {"rel": "root", "href": reverse('root'), "prompt": "Home"},
            {"rel": "user", "href": reverse('user'), "prompt": "You"},
            {"rel": "links", "href": reverse('tasks:list'), "prompt": "Task List"},
        ]

Note that you actually need to provide the view names (and namespaces, if appropriate) to reverse. Similarly, for any queries, you would want to use reverse, to make it easier to change the URL later. Also, django will complain if you have not installed something you reference, meaning your links and queries should never 404.

I’m still toying with the feature of having an automatic list of links that should be used for every view. Obviously, this should only contain a list of states that can be moved to from any state within the system.

For rendering HTML, you may need to change your templates: actually, you should change your templates. Instead of using:

<a href="{% url 'foo'  %}">Foo Link</a>

You would reference that items in your links array:

<a href="{{ links.foo.href }}">{{ links.foo.prompt }}</a>

I have used a little bit of magic here too: in order to be able to access links items according to their rel attribute, when rendering HTML, we use a sub-class of list that allows for __getattr__ to look through the items and find the first one that matches by rel type.

enter django-hypermedia

As you may surmise from the above text: I’ve already written a big chunk of this. It’s not complete (see below), but you can see where it is at now: django-hypermedia.

There is a demo/test project included, that has some functionality. It shows how you still need to do things “the django way”, and then you get the nice hypermedia stuff automatically.

what is still to come?

I’ve never really been happy with Collection+JSON’s error object, so I haven’t started handling that yet. I want to be able to reference where the error lies, similar to how django’s forms can display their own errors.

I want to flesh out the demo/test project. It has some nice bits already, but I want to have it so that it also uses my nice KnockoutJS client. Pretty helps. :)

Belair Hill Climb

Since I bought my Garmin Forerunner 405cx in November 2010, I’ve gradually gotten more and more into running. Having a computer that tracks when, where, and how fast I run appeals to me, and has probably been the biggest motivator to me running as much as I do now.

I was tracking my running using Garmin Connect, but recently, thanks to my good friend Travis, got hooked on Strava. The key feature for me is the segments, and how competitive they enable me to be. Mostly against myself (my hard running occurs on the same track each week, which no-one else using Strava seems to have discovered).

Technically, I’m in training for this year’s City to Bay, although that is a long way away, so I’m spending the time working on getting faster over that type of distance. Last year I finished in 1490th place, with a time of 00:54:43. My target time had been 00:54:00, so I was a little disappointed to miss that by such a short margin. I did speed up a little too early (2km out), which nearly killed me, and I needed to back off. Also, I was absolutely exhausted by the end.

In fact, if I look at my performance, at the 47 minute mark I sped up, increasing my pace from 4:30 to 4:00 min/km, and promptly had to slow to a walk. There may have been some dry retching going on there too.

So, I set a target time for this year’s race of 00:48:00, with the possibility of reducing that to 00:44:00 if I could get to my target time 12 weeks before the race. Now, it does occur to me that my target pace is the one that forced me to stop early last year, but I think I’m already in much better shape now than I was then.

To improve my speed, I determined that first I needed to reduce my heart-rate. In last year’s City to Bay, my heartrate was basically in the Threshold zone (Z4) for the entire race. It did get a little higher when I sped up, but otherwise was fairly constant, which means I probably did run that race as fast as I could have then.

So, my training plan of late has been to run a lot more with my HR in zone 3, and see if I can get my speed up. So far, it seems to be working. I’ve been doing lots of 30 and 60 minute easy runs, where my Forerunner will beep at me if I get my HR above Z3. I’m finding that lately I’ve been running at around the same pace, but find my HR sometimes dips below Z3, so perhaps I can speed up a little.

Tonight, I ran faster up the Belair Hill Climb than I had ever done so before. Not only that, but each Strava segment was faster, too. More importantly, my Strava “Suffer Score” was only 44, as compared to a 62 on my previous PB up the hill. When done, I was all primed to then run some more hill climbs (6x600m uphill), but it started to rain, and it was time for dinner.

Perhaps the only thing that’s missing from Strava for me was the ability to track my weight: Garmin Connect has it’s “Health” tab, which enables you to enter a weight manually, or accepts data from a supported scale. This information is useful to me, as I can see that my weight increased significantly over the leadup to NTL, where I was training much harder, but obviously bulking up a bit too. Lots more speed and strength work there: I do recall having pants that no longer fit my thighs. I’m now down to 78.1kg, after a high of 84.6, and I’d love to be able to import this data into Strava too.

Something that even Garmin Connect doesn’t do, and which I need to keep Garmin Training Center around for is the advanced workouts. Oh, you can enter them into Garmin Connect, but the interface is slow and clunky, and I was never able to get more than one to upload to my watch at a time. Not that useful when I was in a more free-form mode, and would pick workouts based on how I felt. Now, I have a pre-programmed schedule for the next 20-odd weeks, all stored in there. I think I’ll look at a web-app that improves on the process though, as GTC is a bit rubbish.

Oh, and I have my eye on the Garmin Forerunner 610. Not sure when I will get around to upgrading. The 405cx still works really well for my needs, but there are a few nice new features in the 610.

Collection+JSON Primer (and comments)

Collection+JSON, created by Mike Amundsen, is a standard way of creating hypermedia APIs. There were a few things I didn’t pick up correctly reading through his great book, or the spec.

First, let us look at a partial example document.

{
  collection: {
    version: "1.0",
    href: "http://api.example.com/",
    links: [],
    items: [],
    queries: [],
    template: {},
    error: {}
  }
}

I’m not so keen on having the version number in the document itself, as this refers to the version of Collection+JSON, rather than the version of the document. In my mind, the version of Collection+JSON should be contained within the media-type (Content-Type: application/vnd.collection+json;version=1.0), just as the version of the document is contained within the Etag header (Etag: 026e10f644ba4b06). Anyway, I’ll let that slide for now.

Secondly, having the href of the collection seems a little superfluous. I’m assuming there will always be an entry in links that has a rel=self, which should give you the same value. Again, not a big issue.

What I was a little unclear on was the difference between links and queries. We can have a look at a couple of examples:

links: [
  {href: "http://api.example.com", rel: "self", prompt: "Home", name: "home", render: "link"},
  {href: "http://api.example.com", rel: "users", prompt: "Users", name: "users", render: "link"}
],
queries: [
  {href: "http://api.example.com", rel: "search", prompt: "Enter search string", data: [
    {name: "search", value: ""}
  ]}
]

The difference between links and queries to me seems somewhat artificial. Sure, in this case, my query has data fields, but it seems that this is not always necessary. The example Mike uses in his book:

queries: [
  {href: "...", rel: "all", prompt: "All tasks"},
  {href: "...", rel: "open", prompt: "Open tasks"},
  {href: "...", rel: "closed", prompt: "Closed tasks"},
  {href: "...", rel: "date-due", prompt: "Date Due", data: [
    {name: "dateStart", value: "", prompt: "Start Date"},
    {name: "dateStop", value: "", prompt: "Stop Date"}
  ]},
]

I’m not quite sure when I should be using a link, and when I should be using a query? In this example, it looks like a query is a filter on the collection: maybe that is the difference?

The other sticking point I have is that both queries and lists are GET requests: the data attribute is simply the query string applied to the URL. Before I continue, we need to look at the items and template attributes of the collection object. In this case, we have a single-object collection, including a write template for it.

items: [
  {
    href: "...",
    data: [
      {name: "first_name", value: "Matthew", prompt: "First name"},
      {name: "last_name", value: "Schinckel", prompt: "Last name"},
      {name: "email": value: "matt@schinckel.net", prompt: "Email address"},
      {name: "gender", value: "male", prompt: "Gender"}
    ]
  }
],
template: {
  data: [
    {name: "first_name", value: "Matthew", prompt: "First name"},
    {name: "last_name", value: "Schinckel", prompt: "Last name"},
    {name: "email", value: "matt@schinckel.net", prompt: "Email address", regexp: "^[^@]+@[^@]+\.[^@]+"},
    {name: "gender", value: "male", prompt: "Gender", options: [
      {value: "male", text: "Male"}, 
      {value: "female", text: "Female"}
    ]}
  ]
}

Again, we see duplicate information. In this case, the template is populated: if it were a ‘proper’ collection rather than a single object, the template would be used for creating new objects in the collection, so I’m prepared to let this one go. You’ll also notice that I’m using a couple of undocumented features: regexp and options. These enable us to either present a list of choices to the user, or have client-side validation based on a regular expression.

To update an object, we can use a PUT (or POST) to the object’s href, and we send the name/value parts of the updated template data:

{
  template: {
    data: [
      {name: "first_name", value: "Matthew"},
      {name: "last_name", value: "Schinckel"},
      {name: "gender", value: "male"},
      {name: "email", value: "matt@schinckel.net"}
    ]
  }
}

To create a new object, we send the same type of data in a POST request to a collection’s href. To delete an object, we can send a DELETE request to the object’s href.

Finally, we come to the error property. I wrote last night how I think this is a little limiting: Collection+JSON error objects. Anyway, an error looks like:

error: {
  title: "Error saving your details",
  code: "409",
  message: "Your date of birth is invalid (19977-11-30 is not a valid date)"
}

After writing most of this, I did come across Collection+JSON - Examples, but I may have described it in a slightly different manner. It still doesn’t elaborate on the difference between links and queries, however.

Collection+JSON error objects

I’m still keen on the idea of implementing a rich hypermedia API based on django’s forms.

One of the nicest things about the django forms is that they handle the validation of incoming data. Each form field has a .clean() method, which will clean the data. On a form, it is then possible to have extra methods, .clean_FIELDNAME(), which will process the incoming data again, meaning you don’t need to subclass a field to add simple cleaning functionality. Finally, the form itself has a .clean() method, that can be used to clean composite data, say, ensuring that start is before finish.

The form validation code will create an errors property on the form, that will contain the fields that have errors, and any non-field errors (such as the last example above). When rendering an HTML page, and displaying a form that has errors, these are marked up with CSS classes that enable you to show which fields have invalid or missing data, and also display relatively friendly messages (which you can customise).

But Collection+JSON has a fairly simple error property on the collection object:

{
  "collection": {
    "error": {
      "title": "Error saving your details",
      "code": "409",
      "message": "Your date of birth is invalid (19777-11-30)"
    }
  }
}

Compare this to the format I have been using for JSON responses:

{
  "message": "Error saving your details",
  "detail": {
    "date_of_birth": "The value '19777-11-30' is not a valid date."
  }
}

Programmatically, this allows me to attach the error messages to where they belong: the message value is shown in the main messages area of the client, the detail values for each field are attached to the fields for which they apply.

Django and Collection+JSON

Recently, I have been reading (and re-reading) Building Hypermedia APIs with HTML5 and Node. There’s lots to like about this book, especially after reading (and mostly discarding) REST API Design Rulebook.

There is one thing that bugs me, and that is the way that templates are used to generate the JSON. As I said to Mike Amundsen:

His response was that he sometimes used JSON.stringify, at other times templates. But it got me thinking. I have written lots of code that serialises Django models, or more recently forms into JSON and other formats. Getting a nice Collection+JSON representation actually maps quite nicely onto these django constructs, as we often have the metadata that is required for the fields.

Consider the following (simple) django model:

class Article(models.Model):
    title = models.CharField('Title of Article', max_length=128)
    content = models.TextField('Content of Article')
    author = models.ForeignKey('auth.User', verbose_name='Author of Article')
    
    @permalink
    def get_absolute_url(self):
        return reverse('article_detail', kwargs={'pk', self.pk})

I don’t normally supply verbose_names, but I have in this case. We’ll see why in a minute.

Now, what I would declare is the obvious JSON representation of this is something like:

{
  "title": "Title goes here",
  "content": "Content goes here",
  "author": 1,
  "href": "…"
}

But, I’m quite interested in Collection+JSON. We might see something like:

{
  "collection": {
    "version": "1.0",
    "href": "…",
    "links": [
      {"href":"…", "rel":"…", "prompt":"…", "name":"…", "render":"string"}
    ],
    "items": [
      {
        "href": "…",
        "data": [
          {"name":"title", "value":"Title goes here", "prompt":"Title of Article"},
          {"name":"content", "value":"Content goes here", "prompt":"Content of Article"},
          {"name":"author", "value":"1", "prompt":"Author of Article"},
        ],
        "links": []
      }
    ]
  }
}

From a django ModelForm, we should be able to easily generate each member of items:

links = getattr(form, 'links', [])
return {
    "data": [
        {"name":f.name, "prompt":f.label, "value":f.value()} for f in form
    ],
    "href": ,
    "links": links
}

The only bit that we are missing out of the form/field data is data type, or more specifically in this case, the available choices that are permitted for the author field. Now, this is missing from the Collection+JSON spec, so I’m not sure how to handle that.

I think this is actually an important problem: if we have a discoverable/hypermedia API, how do we indicate to the client what are valid values that can be entered for a given field?

For those not familiar with django: the verbose_name on a model field is used for the default label on a form field. If you were not using a model, you could just supply a label in the form class.

The other part that is a little hard to think about now are the other attributes: href, and links. Now, these may actually coalesce into one, as links.self.href should give us href. Perhaps we have to look on the form object for a links property. But, in django, it’s not really the domain of the form to contain information about that. For now, I’m going to have a links property on my forms, but that feels dirty too.

Steve Jobs, Enid Blyton and my mother

We watched the movie Enid the other week. I read lots of Enid Blyton books as a child, and really enjoyed them. This movie really pushed home how, whilst she had an amazing impact on, and connection with, millions of children, she really didn’t have a very good connection with her own. My tip is, if you don’t know much about her life, but enjoyed her work, don’t watch the movie. Whilst it was excellent, it really sours the memory of her books.

Similarly, I read Walter Isaacson’s biography Steve Jobs recently. Unlike what seems like everyone else, I actually quite enjoyed it. Sure, there may have been some factual errors, and maybe it could have been a much better book, but I felt it did give me a lot of insight into the man that I never had up until that point. I would like to have known more about the NeXT years, but it still contained a lot of what was to me new information.

Interestingly, my mind drew a lot of parallels between these two people, lots of them coming after the fact as I finally got around to listening to all of the 5by5 podcasts discussing the book. The main similarity for me was that these two people had huge impacts on lots of people, but failed to connect effectively with their own children.

Which brings me to my mother. It was her 60th birthday on the weekend, and I gave a short, crappy speech. What I really wanted to say really only crystalised in my mind after a couple of other people had spoken, and I had some time to think about it.

My mum worked for many years running the child day care centre in Naracoorte, and whilst she didn’t touch quite as many childrens’ lives as Jobs and Blyton, the number of children she had a significant impact on was by no means small.

The difference was, she still managed to have a great connection with her children.