Collection+JSON Primer (and comments)

Collection+JSON, created by Mike Amundsen, is a standard way of creating hypermedia APIs. There were a few things I didn’t pick up correctly reading through his great book, or the spec.

First, let us look at a partial example document.

{
  collection: {
    version: "1.0",
    href: "http://api.example.com/",
    links: [],
    items: [],
    queries: [],
    template: {},
    error: {}
  }
}

I’m not so keen on having the version number in the document itself, as this refers to the version of Collection+JSON, rather than the version of the document. In my mind, the version of Collection+JSON should be contained within the media-type (Content-Type: application/vnd.collection+json;version=1.0), just as the version of the document is contained within the Etag header (Etag: 026e10f644ba4b06). Anyway, I’ll let that slide for now.

Secondly, having the href of the collection seems a little superfluous. I’m assuming there will always be an entry in links that has a rel=self, which should give you the same value. Again, not a big issue.

What I was a little unclear on was the difference between links and queries. We can have a look at a couple of examples:

links: [
  {href: "http://api.example.com", rel: "self", prompt: "Home", name: "home", render: "link"},
  {href: "http://api.example.com", rel: "users", prompt: "Users", name: "users", render: "link"}
],
queries: [
  {href: "http://api.example.com", rel: "search", prompt: "Enter search string", data: [
    {name: "search", value: ""}
  ]}
]

The difference between links and queries to me seems somewhat artificial. Sure, in this case, my query has data fields, but it seems that this is not always necessary. The example Mike uses in his book:

queries: [
  {href: "...", rel: "all", prompt: "All tasks"},
  {href: "...", rel: "open", prompt: "Open tasks"},
  {href: "...", rel: "closed", prompt: "Closed tasks"},
  {href: "...", rel: "date-due", prompt: "Date Due", data: [
    {name: "dateStart", value: "", prompt: "Start Date"},
    {name: "dateStop", value: "", prompt: "Stop Date"}
  ]},
]

I’m not quite sure when I should be using a link, and when I should be using a query? In this example, it looks like a query is a filter on the collection: maybe that is the difference?

The other sticking point I have is that both queries and lists are GET requests: the data attribute is simply the query string applied to the URL. Before I continue, we need to look at the items and template attributes of the collection object. In this case, we have a single-object collection, including a write template for it.

items: [
  {
    href: "...",
    data: [
      {name: "first_name", value: "Matthew", prompt: "First name"},
      {name: "last_name", value: "Schinckel", prompt: "Last name"},
      {name: "email": value: "matt@schinckel.net", prompt: "Email address"},
      {name: "gender", value: "male", prompt: "Gender"}
    ]
  }
],
template: {
  data: [
    {name: "first_name", value: "Matthew", prompt: "First name"},
    {name: "last_name", value: "Schinckel", prompt: "Last name"},
    {name: "email", value: "matt@schinckel.net", prompt: "Email address", regexp: "^[^@]+@[^@]+\.[^@]+"},
    {name: "gender", value: "male", prompt: "Gender", options: [
      {value: "male", text: "Male"}, 
      {value: "female", text: "Female"}
    ]}
  ]
}

Again, we see duplicate information. In this case, the template is populated: if it were a ‘proper’ collection rather than a single object, the template would be used for creating new objects in the collection, so I’m prepared to let this one go. You’ll also notice that I’m using a couple of undocumented features: regexp and options. These enable us to either present a list of choices to the user, or have client-side validation based on a regular expression.

To update an object, we can use a PUT (or POST) to the object’s href, and we send the name/value parts of the updated template data:

{
  template: {
    data: [
      {name: "first_name", value: "Matthew"},
      {name: "last_name", value: "Schinckel"},
      {name: "gender", value: "male"},
      {name: "email", value: "matt@schinckel.net"}
    ]
  }
}

To create a new object, we send the same type of data in a POST request to a collection’s href. To delete an object, we can send a DELETE request to the object’s href.

Finally, we come to the error property. I wrote last night how I think this is a little limiting: Collection+JSON error objects. Anyway, an error looks like:

error: {
  title: "Error saving your details",
  code: "409",
  message: "Your date of birth is invalid (19977-11-30 is not a valid date)"
}

After writing most of this, I did come across Collection+JSON - Examples, but I may have described it in a slightly different manner. It still doesn’t elaborate on the difference between links and queries, however.

Collection+JSON error objects

I’m still keen on the idea of implementing a rich hypermedia API based on django’s forms.

One of the nicest things about the django forms is that they handle the validation of incoming data. Each form field has a .clean() method, which will clean the data. On a form, it is then possible to have extra methods, .clean_FIELDNAME(), which will process the incoming data again, meaning you don’t need to subclass a field to add simple cleaning functionality. Finally, the form itself has a .clean() method, that can be used to clean composite data, say, ensuring that start is before finish.

The form validation code will create an errors property on the form, that will contain the fields that have errors, and any non-field errors (such as the last example above). When rendering an HTML page, and displaying a form that has errors, these are marked up with CSS classes that enable you to show which fields have invalid or missing data, and also display relatively friendly messages (which you can customise).

But Collection+JSON has a fairly simple error property on the collection object:

{
  "collection": {
    "error": {
      "title": "Error saving your details",
      "code": "409",
      "message": "Your date of birth is invalid (19777-11-30)"
    }
  }
}

Compare this to the format I have been using for JSON responses:

{
  "message": "Error saving your details",
  "detail": {
    "date_of_birth": "The value '19777-11-30' is not a valid date."
  }
}

Programmatically, this allows me to attach the error messages to where they belong: the message value is shown in the main messages area of the client, the detail values for each field are attached to the fields for which they apply.

Django and Collection+JSON

Recently, I have been reading (and re-reading) Building Hypermedia APIs with HTML5 and Node. There’s lots to like about this book, especially after reading (and mostly discarding) REST API Design Rulebook.

There is one thing that bugs me, and that is the way that templates are used to generate the JSON. As I said to Mike Amundsen:

His response was that he sometimes used JSON.stringify, at other times templates. But it got me thinking. I have written lots of code that serialises Django models, or more recently forms into JSON and other formats. Getting a nice Collection+JSON representation actually maps quite nicely onto these django constructs, as we often have the metadata that is required for the fields.

Consider the following (simple) django model:

class Article(models.Model):
    title = models.CharField('Title of Article', max_length=128)
    content = models.TextField('Content of Article')
    author = models.ForeignKey('auth.User', verbose_name='Author of Article')
    
    @permalink
    def get_absolute_url(self):
        return reverse('article_detail', kwargs={'pk', self.pk})

I don’t normally supply verbose_names, but I have in this case. We’ll see why in a minute.

Now, what I would declare is the obvious JSON representation of this is something like:

{
  "title": "Title goes here",
  "content": "Content goes here",
  "author": 1,
  "href": "…"
}

But, I’m quite interested in Collection+JSON. We might see something like:

{
  "collection": {
    "version": "1.0",
    "href": "…",
    "links": [
      {"href":"…", "rel":"…", "prompt":"…", "name":"…", "render":"string"}
    ],
    "items": [
      {
        "href": "…",
        "data": [
          {"name":"title", "value":"Title goes here", "prompt":"Title of Article"},
          {"name":"content", "value":"Content goes here", "prompt":"Content of Article"},
          {"name":"author", "value":"1", "prompt":"Author of Article"},
        ],
        "links": []
      }
    ]
  }
}

From a django ModelForm, we should be able to easily generate each member of items:

links = getattr(form, 'links', [])
return {
    "data": [
        {"name":f.name, "prompt":f.label, "value":f.value()} for f in form
    ],
    "href": ,
    "links": links
}

The only bit that we are missing out of the form/field data is data type, or more specifically in this case, the available choices that are permitted for the author field. Now, this is missing from the Collection+JSON spec, so I’m not sure how to handle that.

I think this is actually an important problem: if we have a discoverable/hypermedia API, how do we indicate to the client what are valid values that can be entered for a given field?

For those not familiar with django: the verbose_name on a model field is used for the default label on a form field. If you were not using a model, you could just supply a label in the form class.

The other part that is a little hard to think about now are the other attributes: href, and links. Now, these may actually coalesce into one, as links.self.href should give us href. Perhaps we have to look on the form object for a links property. But, in django, it’s not really the domain of the form to contain information about that. For now, I’m going to have a links property on my forms, but that feels dirty too.