Developing RESTful Web APIs with Python, ...

This week’s Python Weekly has a link to a presentation by Nicola Iarocci, called Developing RESTful Web APIs with Python, Flask and MongoDB.

I have a few minor concerns with some aspects of the content.

No form validation

This concerns me. I’ve recently started using django’s forms for my validation layer for API calls, and also for the generation of the serialised output. It’s not completely flawless, but it seems to be working quite well. It certainly is more robust than my hand-rolled scheme of validating data, and using code that is better tested than my own is always a bonus.

Instead, as we see later, there is a data validation layer. It has basically the same goal as django’s forms, but is somewhat more nested, rather than using classes. Also, using classes makes it easier to have inheritance, a great way to have shared rules. You could do this using the same function in your custom validation, but this feels disconnected.

MongoDB
scalable, high-performance, …

The integrity of my data is important to me. It’s very rare that the db is the limiting factor in my system’s performance, and having stuff written to disk as soon as it is ‘real’ is kind-of critical.

http://api.example.com/v1/contacts/

Okay, this is where I jump on my high horse: “versioning should happen in the media-type”. Or even better, resources should be forwards and backwards compatible, and clients should be written to handle (or ignore) changes to schemata.

@mimerender( ... )

A decorator that has 5 arguments? That will be applied to every view function? Surely there’s a way to do this without having to decorate every function. Django CBV FTW here.

“Thu, 1 Mar 2012 10:00:49 UTC”

Egad. I can’t think of a reason to have machine readable dates in any format other than ISO 8601. Purely for the reason of being able to sort dates whilst they are still strings.

PATCH
Why not PUT?

Why not POST?

This is something that has been debated for ages. I think I kind-of agree with the author: PATCH is more explicitly a partial update. It does make me think about using some type of diff, but I guess using concurrency control covers the same ground.

"<link rel='parent' ... />"

Okay, HTML/XML inside a JSON object?

Why not have:

{
  "rel": "parent",
  "title": "...",
  "href": "..."
}

At least that way you’ll be able to parse the data out natively.

“updated”: “…”,
“etag”: “…”

I’m not sure if it is necessary/warranted/desired to have the etag as part of the representation. Especially if the etag is generated from the content: that would kind-of preclude it.

Personally, I generate etags from a higher resolution timestamp (possibly hashed with the object id, class or whatever). Whilst etags are opaque, having them as human readable helps with troubleshooting.

To me, this seems to be metadata, and should not be part of the object. I think you could argue a case that within Collection+JSON you could add this in, for convenience. It certainly would make it easier not to have to store the etag in a seperate variable on the client, for one.

The discussion about Concurrency Control is quite good. Which reminds me: I enjoyed most of this presentation. I have some minor nitpicks, but some of those I understand the author’s choices. Some I don’t (date format). It’s certainly better than the REST API Design Rulebook, which is a load of junk.

Hypermedia APIs in Django: Leveraging Class Based Views

It seems that I keep rewriting code that generates APIs from django. I think I’m getting closer to actually getting it right, though :)

I’m rather keen on Collection+JSON at the moment, and spent some time over easter writing an almost complete Collection+JSON client, using KnockoutJS. It loads up a root API url, and then allows navigation around the API using links. While working on this, it occurred to me that Collection+JSON really encodes the same information as a web page:

  • every <link> or <a href=...></a> element is either in links or queries.
  • form-based queries map nicely to queries elements that have a data attribute.
  • items encapsulates the actual data that should be presented.
  • template contains data that can be used to render a form for creating/updating an object.

Ideally, what feels best from my perspective is to have a pure HTML representation of the API, which can be rendered by browsers with JS disabled, and then all of the same urls could also be fetched as Collection+JSON. Then, you are sharing the code, right up to the point where the output is generated.

To handle this, I’ve come up with a protocol for developing django Class Based Views that can be represented as Collection+JSON or plain old HTML. Basically, your view needs to be able to provide links, queries, items. template comes from a form object (called form), and by default items is the queryset attribute.

leveraging views

I subscribe the idea that the less code that is written the better, and I believe that the API wrapper should (a) have minimal code itself, and (b) allow the end developer to write as little code as possible. Django is a great framework, we should leverage as much as is possible of that well written (and well tested) package.

The part of a hypermedia API that is sometimes ignored by web developers is handling the media type selection. I believe this is the domain of the “Accept” and “Content-Type” headers, not anything to do with the URL. Thus, I have a mixin that allows for selecting the output format based on the Accept header. It uses the inbuilt render_to_response method that a django View class has, and handles choosing how to render the response. As it should.

The other trick is how to get the links, queries, items and template into the context. For this, we can use get_context_data. We can call self.get_FOO(**kwargs) for FOO in each of those items. It is then up to the View class to handle those methods.

By default, a Model-based Resource is likely to have a form class, and a model class or a queryset. These can be used to get the items, and in the case of the form, the template. Even in the instance of the queryset (or model), we use the form class to turn the objects into something that can be rendered.

Finally, so it’s super-easy to use the same pattern as with django’s Views (generic.CreateView, for instance), I have a couple of classes: ListResource and DetailResource, which map directly onto CreateView and UpdateView. In the simplest case, you can just use:

urlpatterns = patterns('',
    url(r'^foo/$', ListResource.as_view(model=Foo)),
    url(r'^foo/(<?P<pk>\d+)/$', DetailResource.as_view(model=Foo))
)

There is also a Resource, which just combines the resource-level bits with generic.TemplateView. You can use ResourceMixin with any other class-based-view, but make sure it appears earlier than the django view class, to make sure we get the correct method resolution order.

There is still the matter of the links attribute. Knowing what to put into this can be a bit tricky. I’ve come to realise that this should contain a list of the valid states that can be accessed when you are in a given state. You will want to use django’s reverse function to populate the href attribute:

class Root(Resource):
    template_name = 'base.html'
    
    def get_links(self):
        return [
            {"rel": "root", "href": reverse('root'), "prompt": "Home"},
            {"rel": "user", "href": reverse('user'), "prompt": "You"},
            {"rel": "links", "href": reverse('tasks:list'), "prompt": "Task List"},
        ]

Note that you actually need to provide the view names (and namespaces, if appropriate) to reverse. Similarly, for any queries, you would want to use reverse, to make it easier to change the URL later. Also, django will complain if you have not installed something you reference, meaning your links and queries should never 404.

I’m still toying with the feature of having an automatic list of links that should be used for every view. Obviously, this should only contain a list of states that can be moved to from any state within the system.

For rendering HTML, you may need to change your templates: actually, you should change your templates. Instead of using:

<a href="{% url 'foo'  %}">Foo Link</a>

You would reference that items in your links array:

<a href="{{ links.foo.href }}">{{ links.foo.prompt }}</a>

I have used a little bit of magic here too: in order to be able to access links items according to their rel attribute, when rendering HTML, we use a sub-class of list that allows for __getattr__ to look through the items and find the first one that matches by rel type.

enter django-hypermedia

As you may surmise from the above text: I’ve already written a big chunk of this. It’s not complete (see below), but you can see where it is at now: django-hypermedia.

There is a demo/test project included, that has some functionality. It shows how you still need to do things “the django way”, and then you get the nice hypermedia stuff automatically.

what is still to come?

I’ve never really been happy with Collection+JSON’s error object, so I haven’t started handling that yet. I want to be able to reference where the error lies, similar to how django’s forms can display their own errors.

I want to flesh out the demo/test project. It has some nice bits already, but I want to have it so that it also uses my nice KnockoutJS client. Pretty helps. :)

Django and Collection+JSON

Recently, I have been reading (and re-reading) Building Hypermedia APIs with HTML5 and Node. There’s lots to like about this book, especially after reading (and mostly discarding) REST API Design Rulebook.

There is one thing that bugs me, and that is the way that templates are used to generate the JSON. As I said to Mike Amundsen:

His response was that he sometimes used JSON.stringify, at other times templates. But it got me thinking. I have written lots of code that serialises Django models, or more recently forms into JSON and other formats. Getting a nice Collection+JSON representation actually maps quite nicely onto these django constructs, as we often have the metadata that is required for the fields.

Consider the following (simple) django model:

class Article(models.Model):
    title = models.CharField('Title of Article', max_length=128)
    content = models.TextField('Content of Article')
    author = models.ForeignKey('auth.User', verbose_name='Author of Article')
    
    @permalink
    def get_absolute_url(self):
        return reverse('article_detail', kwargs={'pk', self.pk})

I don’t normally supply verbose_names, but I have in this case. We’ll see why in a minute.

Now, what I would declare is the obvious JSON representation of this is something like:

{
  "title": "Title goes here",
  "content": "Content goes here",
  "author": 1,
  "href": "…"
}

But, I’m quite interested in Collection+JSON. We might see something like:

{
  "collection": {
    "version": "1.0",
    "href": "…",
    "links": [
      {"href":"…", "rel":"…", "prompt":"…", "name":"…", "render":"string"}
    ],
    "items": [
      {
        "href": "…",
        "data": [
          {"name":"title", "value":"Title goes here", "prompt":"Title of Article"},
          {"name":"content", "value":"Content goes here", "prompt":"Content of Article"},
          {"name":"author", "value":"1", "prompt":"Author of Article"},
        ],
        "links": []
      }
    ]
  }
}

From a django ModelForm, we should be able to easily generate each member of items:

links = getattr(form, 'links', [])
return {
    "data": [
        {"name":f.name, "prompt":f.label, "value":f.value()} for f in form
    ],
    "href": ,
    "links": links
}

The only bit that we are missing out of the form/field data is data type, or more specifically in this case, the available choices that are permitted for the author field. Now, this is missing from the Collection+JSON spec, so I’m not sure how to handle that.

I think this is actually an important problem: if we have a discoverable/hypermedia API, how do we indicate to the client what are valid values that can be entered for a given field?

For those not familiar with django: the verbose_name on a model field is used for the default label on a form field. If you were not using a model, you could just supply a label in the form class.

The other part that is a little hard to think about now are the other attributes: href, and links. Now, these may actually coalesce into one, as links.self.href should give us href. Perhaps we have to look on the form object for a links property. But, in django, it’s not really the domain of the form to contain information about that. For now, I’m going to have a links property on my forms, but that feels dirty too.

Collection+JSON error objects

I’m still keen on the idea of implementing a rich hypermedia API based on django’s forms.

One of the nicest things about the django forms is that they handle the validation of incoming data. Each form field has a .clean() method, which will clean the data. On a form, it is then possible to have extra methods, .clean_FIELDNAME(), which will process the incoming data again, meaning you don’t need to subclass a field to add simple cleaning functionality. Finally, the form itself has a .clean() method, that can be used to clean composite data, say, ensuring that start is before finish.

The form validation code will create an errors property on the form, that will contain the fields that have errors, and any non-field errors (such as the last example above). When rendering an HTML page, and displaying a form that has errors, these are marked up with CSS classes that enable you to show which fields have invalid or missing data, and also display relatively friendly messages (which you can customise).

But Collection+JSON has a fairly simple error property on the collection object:

{
  "collection": {
    "error": {
      "title": "Error saving your details",
      "code": "409",
      "message": "Your date of birth is invalid (19777-11-30)"
    }
  }
}

Compare this to the format I have been using for JSON responses:

{
  "message": "Error saving your details",
  "detail": {
    "date_of_birth": "The value '19777-11-30' is not a valid date."
  }
}

Programmatically, this allows me to attach the error messages to where they belong: the message value is shown in the main messages area of the client, the detail values for each field are attached to the fields for which they apply.

REST, permissions, hypermedia and DRY.

I’ve been working through the whole concept of REST over the past couple of years. I really like the fact that in theory the API should be clean, well designed, and resource-based.

One of the first takeaways I got from this whole school of thought is that it becomes the verbs in HTTP that become important. That is, given a resource, say at http://api.example.com/foo/1/, we can use GET on that resource to view it, DELETE to remove it, and PUT (or POST) to update it.

Since I prefer JSON to XML, I’ll use that for the resource representation examples.

We might get the following back when we do a GET on http://api.example.com/user/1/:

{
  "username": "barry",
  "first_name": "Barry",
  "last_name": "Citizen",
  "links": [
    {
      "rel": "self",
      "uri": "http://api/example.com/user/1/"
    }
  ]
}

I like the concept of having a links attribute, that contains links representing changes of state. But one thing that bothered me, from a UI perspective, is that I need to be able to know beforehand if the user has write/delete access on a resource. How can this best be represented?

I thought one way might be to have:

{
  ...
  "links": [
    {
      "rel": "self",
      "uri": "<whatever>",
      "actions": ["get", "put", "delete"]
    }
  ]
}

But I haven’t seen anything that does something similar to this. And, I have been leaning more towards using POST rather than PUT in lots of cases, as the business logic of my project is that often a user will not have access to the full data associated with a resource. PUT implies that the data that is being sent should replace the resource entirely.

For instance, a manager of a shop who has staff that she shares with another shop may not have access to the other shop’s data, and therefore may not know anything about that shop. In fact, we have at least one case where some managers must not know that another particular shop exists. Thus, she would send back an incomplete list of shops that her employee works at, thus removing him from the ones she cannot see.

As an aside, which I discovered while writing this article, there is also some discussion that it is valid to use PUT to update the bits we know about: RESTful architecture: what should we PUT?.

So, is this a valid way of representing what actions are permissable on a URI? Or is there a standard?

I then remembered that, REST in Practice talks about link types. On page 117:

self

The uri value can be used to GET the latest resource representation of the order. ### update

Consumers can change the order using a POST to transfer a representation to the linked resource.

cancel

This is the uri to be used to DELETE the order resource should the consumer wish to cancel the order.

There is also a link elsewhere in the book to Link Relations. There, we see instead of an update link, an edit link, but no remove or delete links. And this seems to fly against the whole concept of using the verb, rather than the URL, as in my case, those three URLs would all be the same. Besides, this seems to be a bit anti-DRY.

However, maybe this is the best solution: making the resource:

{
  "username": "barry",
  "first_name": "Barry",
  "last_name": "Citizen",
  "links": [
    {"rel": "self", "uri": "http://api/example.com/user/1/"},
    {"rel": "edit", "uri": "http://api/example.com/user/1/"}
  ]
}

The implication here is that the edit URI can also be used to delete the object, but that still doesn’t solve my issue. If we know the user may not delete a resource, then we can prevent access to the control that would be used to delete it.

Indeed, RFC 5023 specifically talks about the Member URI, which is (a) returned in the Location header when an object is created, and then goes on to talk about sending PUT and DELETE requests to the Member URI. But there is no mention about access control.


So then as I think more about it, HATEOAS is all about navigating around using links to control system state. Thus, it makes sense to have seperate links for edit and delete. And, according to Jim Webber, one of the authors of Rest in Practice:

Side note: it’s best not to conflate links with URIs. A link is a hypermedia control, a URI is an identifier and usually an address, and we’re interested in links with well-known semantics and not very interested in the format of the URIs.

(From the REST in Practice discussion group).


The other concern I have is that it would be nice to know what query parameters we can send to a resource collection to filter it. I have put a bit of thought into using URLs for the rel attribute, and those in turn being a resource that can be read using a GET request. This would then provide information like:

  • What parameters can be passed to this collection resource to filter it.
  • What data needs to be sent to this collection resource in order to create a new object resource. This would be field names, and the data types.

Indeed, I started to plan something like this that could be automatically generated (at least the second part of it) from within django-repose. The idea is that the process of registering an ApiView class also automatically registered the reltype url route.

Having said all of that, the code I have been working on lately looks explicitly for rel=self, and uses that to store the URL that should be operated on.


I’m going to end this now: mainly because I have just run across JSON Linking with HAL, and I’m going to re-work how I generate and handle links.