Multi-tenanted Django

TL;DR : I made a thing: django-multi-schema. As pointed out in the comments, it’s now known as django-boardinghouse.

Software, as a Service (SaaS)

This is a term that has been around for a while. Basically, providing software that runs on a server, and selling access to that system. Instead of charging people for the software, you charge them a recurring fee for access to the hosted service.

If you have more than one customer, then you need to ensure that each customer can only access data that is ‘theirs’, rather than seeing everything. You need some way to partition the data, and there are two main ways to do this.

  1. Every customer has their own server/database.
  2. All the data is stored in one server/database.

To each their own.

One way of partitioning data is to provision an application server, and a database for each customer.

These web servers, and indeed the databases may not even be on different virtual machines, let alone physical machines. I manage a legacy PHP/MySQL application that runs each ‘server’ as an Apache VirtualHost, shares the codebase, and uses some configuration to route the database connections.

The advantages of this type of system is that it is very easy to move a single instance of the application onto a seperate server, if limits of server performance are reached. Depending upon the configuration, you still only need to upgrade one codebase, although typically there would be a seperate code installation per customer. The advantages of that is that you can upgrade customers individually, perhaps moving specific customers to a beta version.

The disadvantages of this type of setup are similar to what you would have if you had a big enough customer base anyway: multiple servers need to be upgraded, although getting them all done at exactly the same time is not as important if all requests to a given domain always go to a given server (and then to a given database).

However, if you are sharing code between installations, and each has a seperate database, then you do need to migrate all of the databases at the same time, and at the same time as the codebase is updated.

The real disadvantages are about adding a new customer. You need to provision a new server, or at the least, setup a new VirtualHost, and create a new database. You also need to run any database migrations on several databases, but that should be part of the deployment process to each installation anyway.

Another issue that my arise is that each installation requires a seperate set of connections to its own database. If the databases are on the same database server, then this may be a limit that you reach sooner than you would like: but at that time you can just split off some databases to a seperate database server.

There may be only one.

The alternative method of segmenting data is to use Foreign Keys. There is one database table, corresponding to a customer (which may not be a single user). Then, relevant other tables contain a foreign key back to that table, or to a table that links back to that table, or so on.

This is the way my main system I work on works. We have one django installation (or, possibly, several installations that share one database, but are effectively identical clones, just used for load handling). We have a Company table, and everything that should be limited to a given company links back to that, either directly or through a parent. For instance, a RosteredShift does not have a link to a company, but it does link to a RosterUnit, which is linked to a company.

The advantages of this are that you have a single server that needs to be upgraded, or multiple identical servers (that you can just upgrade in parallel). You have a single database, that, again, only needs to be upgraded once. You only have to manage a single database backup, and, importantly, your database connections are equally shared across all of your customers. Indeed, your load is evenly shared, too.

Scaling up is still possible: we can easily stick an extra N app servers into our app server pool, and the load balancer will just farm requests off to that. Sharding databases becomes a bit harder, as you cannot just shift a single customer’s data off onto a seperate database (or indeed push a highly used customer’s app server onto a seperate machine).

The big danger is that a customer may get access to data that belongs to a different customer. In some ways this is the same issue as to within a customer’s users some users seeing data they shouldn’t, but is a bit scarier.

With a single server, and a single domain name, usernames must be unique across all of your customers’ users. This sounds easy: just use email addresses. They are unique. That works well, until an employee of one customer moves to a different employer, that also happens to be your customer, and BANG, email conflict. You don’t want to just move the user, as that would break history for the previous employer. Indeed, they may even be employed by both of your customers at the same time. Possibly without those employers knowing about one another. Privacy, man.

Another useful thing is being able to shift data to a different customer. This is a bit of a double-edged sword - in reality we stopped doing this, and now create a copy of the relevant data for the new customer (for instance, when a Subway store changes hands). That means the previous owner can retain access to their historical data.

Finally, support staff only need to look in one place to see all customer data. They don’t need to extract from a frazzled user who they work for in order to check why their login is not working correctly, for instance.

A middle ground

One of the advantages of a single server is that you share database connections. No more running out of connections because each customer requires X connections. But, Postgres has a feature called schemas, that sits between database and table:

<database>.<schema>.<table>.<column>

Any query can use a fully qualified, or partially qualified name to refer to a database, schema, table or column.

Postgres uses the term schema differently to the idea of an SQL schema. Instead of meaning the table definitions, a schema is a named sub-database. Every postgres database has at least one schema, usually called public.

Postgres determines which schema should be searched for a matching table using a search_path. This can be a list of schemata: the first one in the list with a matching table will be used for queries against that table name. The default search path is "$user","public", which looks in the schema matching the connected user, then the public schema. A schema that does not exist is just ignored in the search path.

So, we can split our data into one schema per customer. That has the nice side effect of preventing data leakage due to programmer error, as a connection to the database (which has a specific search path at that time), cannot possibly return data to which it should not have access.

Starting to narrow

I work in Django, so from here on in, it’s starts to get rather specific to that framework.

Which schema, when?

One solution to this problem is to take the approach mirroring the ‘one server per customer’ approach. That is, each customer gets a seperate domain. Requests coming in are matched against the domain name that was used in the request, and then set the search path. This is quite simple, and is how django-schemata works. Some middleware matches up the incoming domain to the relevant schema settings, and sets the search path. Indeed, this is the simplest possible approach, as was intended. The schemata are set in the django settings file, which means you cannot create a new schema on the fly. django-appschema does allow this, by using a model in the public schema that contains the list of schema.

This is not the approach I want, as I want everything to come in on the same domain.

So, I came up with a different concept. Base the schema selection upon the logged in user.

When a request comes in, use a lookup table of some sort to determine which schema should be used. Now, since this needs to happen after authentication, we will still need to store auth_user and friends in the public schema. But that’s alright. We can have a related object that determines which schema should be used for looking up the rest of the data, and ensure that those requests which should be schema-aware use that schema.

Indeed, it may be possible for a user to be able to choose between schemata, so we also need a way to pass this information. I settled on storing the desired schema in request.session, and the middleware checks that the authenticated user is allowed to access that schema.

That was the easy part

Working out what the search path should be is indeed the easy part. Set it at the start of the request (our middleware should be early in the chain), and away you go.

The hard problems are avoided by django-schemata because each ‘site’ has it’s own schema, and all of it’s tables are stored in there. Thus, you can simply run ./manage.py syncdb or ./manage.py migrate for each schema, and away you go. They provide a manage_schemata tool, which does just this. They also avoid shared apps, to simplify things.

I needed to be able to share models: indeed, by default, a model is shared. They live in the public schema, and queries will be on this schema. The approach I used was that, instead of using the SHARED_APPS, you need to explicitly mark a Model as “schema aware”.

The only time this really matters is at DDL time. When you query a table, Postgres looks in the search path. As long as you have schema_name,public, you will be fine for all reads and writes. However, to create the tables, you need to use some smarts.

syncdb

Whenever a syncdb happens, we need to do a few things:

  • Make sure we have our clone_schema SQL function loaded. This will be used to clone the schema structure from __template__ to new schemata.
  • Make sure we have a schema with the name __template__.
  • Set the search path: in this case it will be public,__template__. Usually, it will be the other way around, but for this case, we want tables to be created in public, unless we explicitly mark them to be created in __template__.
  • Run the syncdb command.
  • Create schemata for any Schema objects we have in the database.

The syncdb command just runs the normal old syncdb (or south’s, if that is installed). However, we do have a couple of bits of evil, evil magic.

Firstly, we don’t use the standard postgres database backend. Instead, we have one that subtly changes the DatabaseCreation.sql_create_model process. If a model is schema aware, we inject into the CREATE TABLE commands the schema. Thus:

CREATE TABLE "foo" ...;

Becomes:

CREATE TABLE "__template__"."foo" ...;

In fact, it’s a little more than that. We can pass in the name of the schema we are writing to, so, in our second trick, we add a post_syncdb listener, which iterates through all of the schemata, re-running just this command, with the schema passed in.

loaddata

An override of the loaddata command adds the ability to declare the schema: the search path is then set before running the command. Simple.

dumpdata

The override for this command also allows passing in the schema name. Data in schema aware models will only be fetched from here (or the template schema, if nothing is passed in, which should be empty).

Migrations

Unless you are crazy, you probably already use South for migrations. So, the second really hard problem is how to apply the migrations to each schema.

Basically, we want to run each operation, if the model is schema aware, on the template schema, and then on each real schema. But, we can’t just run the migrations multiple times with the search path altered, because (a) South stores it’s migration record in a table in public, so the subsequent runs of the migration would not do anything, and (b) even if we could run them, any operations on the public schema would fail, as they have already been performed.

This looks like a really hard problem, and initially seemed insurmountable. However, there is one thing which makes it really quite simple. South looks in your settings file for SOUTH_DATABASE_ADAPTERS. I’m not sure this normally gets used, but it is required if you are not using a django database backend that South recognises. Like ours.

So, the database adapter is just a thin wrapper over South’s builtin postgres backend. It expects to find a class called DatabaseOperations, and wraps all of the methods that create/rename/delete or whatever tables and columns.

And the wrapper is quite simple. It finds the django model that matches the database table name, and then, if that model is schema aware, repeats the operation on each schema, and then the template schema. If the model is not schema aware, then we set the search path just to the public schema, so create operations will affect that.

More hackery

Admin users are permitted to see data from all schemata, so it’s possible they’ll see a link to an object that is not in their schema. Primary keys are unique across all schemata (but this is simply because the index is shared between them in this implementation), so they’ll get a 404 if they try to follow it. If PKs were not unique across schemata, then they might see the wrong object. Anyway, we can leverage the schema-switching middleware by passing in the correct schema in the URL. But to do this, we need to know at LogEntry creation time which schema is active, if the object is schema aware. So, I monkey-patch LogEntry to store this information, and generate URLs that include it.

I’ve also done a bit of work on adding the schema in when you serialise data. This is mainly for dumpdata, as I don’t think you should be passing around objects for deserialisation to untrusted sources: it really should be going through form validation or similar. But for loaddata/dumpdata, it might be useful. At this stage the schema value is not used, but eventually the deserialiser should look at that, and somehow ensure the object is created in the correct schema. For now, just use loaddata --schema .... That’s better anyway.

ImproperlyConfigured

One really nice thing about django is that it has an ImproperlyConfigured exception, which I have leveraged. If the database engine(s), south database adapter(s) or middleware are missing or incorrect, it refuses to load. This is conservative: you may have a database of a different engine type, or have no schema aware models, but for now it’s not a bad idea.

Also, if South is not installed before us, we need to bail out.

Well, that’s most of it. There’s some more gravy (signals before and after activation of schema, as well as when a new schema is created), but while it has been tested, there is no automated test suite as yet. Nor is there a public example project. But they are coming.

Oh, and it’s up on BitBucket: django-multi-schema. Although, I’m not that happy with the name.

Django AJAX Edit Mode

I can’t even remember where I saw this, but the suggestion was that viewing and editing data are different operations, and should be different modes.

For instance, when viewing some data, you would need to explicitly decide to enter the edit mode. In a web page, this would be by following a link to a page that had a form that allowed for editing. Attempting to submit the form would result in either the same page being displayed, along with any validation errors, or being redirected back to the viewing page.

This is the pattern I’ve been working on implementing with a module for our work system. However, we can reduce the amount of data sent, and the amount of page redraw, by using dynamic element replacement. So, we can have some AJAX that loads in the edit form in-place, and replaces the view form. Saving it then either returns the edit form with validation errors, or the view form again.

We can do all of this with one view. Depending upon how it is accessed, it will return either a different template, or render the template differently.

Solution One: render a different template.

# views.py
from django.views import generic

class AjaxEditView(generic.UpdateView):
  model = MyModel
  
  def get_template_names(self):
    if self.request.is_ajax():
      if 'edit' in self.request.GET:
        return 'partial/edit.html'
      return 'partial/display.html'
    return 'detail.html'
  
  def get_success_url(self):
    return reverse('mymodel_detail', kwargs={'pk': 1})

The disadvantages of this are that we need to name the url route, and use it here in this view, and that we have different templates. The templates are almost identical, however:

<!-- partial/display.html -->
{% load url from future %}

<form method="GET" action="{% url 'mymodel_detail' object.pk %}">
  <button>EDIT</button>
  <input type="hidden" name="edit" value="1">
  
  <table>
    {% for field in form %}
      <tr>
        <th>{{ field.label }}</th>
        <td>{{ field.value }}</td>
      </tr>
    {% endfor %}
  </table>
</form>
<!-- partial/edit.html -->
{% load url from future %}

<form method="POST" action="{% url 'mymodel_detail' object.pk %}">
  <button type="submit">SAVE</button>
  <a class="cancel" href="{% url 'mymodel_detail' object.pk %}">CANCEL</a>
  
  {% csrf_token %}
  
  <table>
    {% for field in form %}
    <tr>
      <th>{{ field.label }}</th>
      <td>{{ field.errors }}{{ field }}</td>
      <td>{{ field.help_text }}</t>
    </tr>
    {% endfor %}
  </table>
  
</form>

The partial/display.html template has a few things of note: it has a method="GET", and a single <input> element, which tells the view to render the response on a GET request ready for editing.

The partial/edit.html template has a link/button for cancelling. In this case, we could look at the form for the location we should load, but this is a bit more explicit.

Solution Two: context_data variable

The other solution uses just one AJAX template, but adds to the context of the view if it should be rendered in editable form or not.

# views.py
from django.views import generic

class AjaxEditView(generic.UpdateView):
  model = MyModel
  
  def get_context_data(self, **context):
    context['edit'] = self.request.GET.get('edit', False)
    return super(AjaxEditView, self).get_context_data(**context)
    
  def get_template_names(self):
    if self.request.is_ajax():
      return 'partial/form.html'
    return 'detail.html'
  
  def get_success_url(self):
    return reverse('mymodel_detail', kwargs={'pk': 1})

And our template looks like:

<!-- partial/form.html -->
{% load url from future %}

<form method="{% if edit %}POST{% else %}GET{% endif %}" 
      action="{% url 'user_detail' object.pk %}">
  
  <button type="submit">
    {% if edit %}SAVE{% else %}EDIT{% endif %}
  </button>
  
  {% if edit %}
    <a class="cancel" href="{% url 'user_detail' object.pk %}">
      CANCEL
    </a>
    {% csrf_token %}
  {% else %}
    <input name="edit" type="hidden" value="1">
  {% endif %}
  
  <table>
    {% for field in form %}
    <tr>
      <th>{{ field.label }}</th>
      <td>
        {% if edit %}
          {{ field.errors }}{{ field }}
        {% else %}
          {{ field.value }}
        {% endif %}
      </td>
      {% if edit %}
      <td>{{ field.help_text }}</td>
      {% endif %}
    </tr>
    {% endfor %}
  </table>
  
</form>

The downside of this one is that the template is much more complicated. I’ve been using the latter in a work project, but I may switch later.

The last part that ties all of this together is the Javascript. It’s fairly simple, written using jQuery:

$(function() {
  // Submit handler. Simply submit the form, and replace the
  // form in it's entirety with the response from the server.
  $(document).on('submit', 'form', function(evt) {
    var form = evt.target;
    var $form = $(form);
    evt.preventDefault();
    $.ajax({
      url: form.action,
      type: form.method,
      data: $form.serialize(),
      success: function(data){
        $form.replaceWith($.parseHTML(data));
      }
    });
  });
  
  // Cancel editing button handler. Do an ajax fetch on the
  // url of the button, and replace the parent form with
  // the response from the server.
  $(document).on('click', '.cancel', function(evt) {
    evt.preventDefault();
    $.ajax({
      url: evt.href,
      success: function(data){ 
        $(evt.target).closest('form').replaceWith($.parseHTML(data));
      }
    });
  });
});

The other thing that it is possible to do is make it so that the edit button will only display if the logged in user is permitted to edit that object.

In practice, I’m combining multiple display/edit views within one view (for related concepts: for instance Bank account details, Tax File Number and Superannuation details in the one page, but they have seperate models). I have some ideas about a nice way to handle this, but that’s for another post.

There is a project available on bitbucket that demonstrates this: dynamic-form-demo. There is a seperate branch for each solution outlined above.

jQuery dynamic forms

Some javascript, currently using jQuery, that will convert a form into an ajax form.

First up, a shim to enable FormData for non-compliant browsers. Based on formdata.js, but with some changes. Man, callback based code is a bitch when all you need to do is get some value, and have to work around that.

(function(w, $) {
  // Don't override if it is native.
  if (w.FormData)
    return;
      
  function FormData(form) {
    var fd = this;
    this.fake = true;
    this.boundary = "--------FormData" + Math.random();
    this._fields = [];
    this.contentType = 'multipart/form-data; boundary=' + this.boundary;
          
    if (form) {
      var $form = $(form);
      $.each($form.serializeArray(), function(i, obj){
        fd.append(obj.name, obj.value);
      });
      $form.find('[type=file]').each(function(i, file) {
        fd.append(file.name, file.files[0]);
      });
    }
  }
    
  // A listener to automatically add the binary version of the data. This may suck.
  $('[type=file]').change(function updateData(change) {
    var reader = new FileReader();
    reader.onload = function(load) {
      $(change.target.files[0]).data('binary-file-data', load.target.result);
    }
    reader.readAsBinaryString(change.target.files[0]);
  });
    
  // The interface FormData provides...
  FormData.prototype.append = function(key, value) {
    this._fields.push([key, value]);
  }
    
  // But, we will actually look more like a string to XMLHttpRequest.
  FormData.prototype.toString = function() {
    var boundary = this.boundary;
    var body = '';
    $.each(this._fields, function(i, field) {
      body += '--' + boundary + '\r\n';
      if (field[1].name) {
        var file = field[1];
        body += "Content-Disposition: form-data; name=\""+ field[0] +"\"; filename=\""+ file.name +"\"\r\n";
        body += "Content-Type: "+ file.type +"\r\n\r\n";
        body += $(file).data('binary-file-data') + "\r\n";
      } else {
        body += "Content-Disposition: form-data; name=\""+ field[0] +"\";\r\n\r\n";
        body += field[1] + "\r\n";
      }
    });
          
    body += "--" + boundary +"--";
    return body;
  }
  w.FormData = FormData;
})(window, jQuery);

Now, for the jQuery code. This will override the submit event on all forms. If there are any <input type=file> elements, then it will use a FormData object, else it will just .serialize() the object. The response will override the .html() content of the form, but not the form itself.

$('form').submit(function(evt) {
  evt.preventDefault();
  var form = evt.target, $form = $(form), data;
  var options = {
    cache: false,
    type: form.method,
    url: form.action,
    done: function(data) {
      $form.html(data);
    },
    fail: function(xhr, status, error) {
      console.log(status, error, xhr);
    }
  };
    
  if ($form.find('[type=file]').length) {
    data = new FormData(form);
    // Native FormData objects set this automatically (how???), but we need to manually do it.
    options.contentType = data.contentType || false;
    options.processData = false;
  } else {
    data = $form.serialize();
  }
  options.data = data;
  $.ajax(options);
});

Ikea bodychair


Django urlpattern nested regex groups

Had one of those annoying things that I could not figure out why it was not working; learned something about how django’s url routing works along the way.

I created a new view for within the admin, that provides a summary of the permissions associated with a group, or set of groups. For our purposes, a company can have a number of groups associated with it, so I wanted to be able to optionally provide a company id: if it was provided, it would only show the groups+permissions for that company; if not provided it should show all of the groups and their permissions.

So, I had the urlpattern like:

# Included under '/company/...'
url(r'^((?P<company>\d+)?/)?groups/$', 'group_perms', name='company_group_permissions'),

This resolves fine. All of these variations work as expected:

http://example.com/company/10/groups/
http://example.com/company/groups/
http://example.com/company//groups/

However, I wanted to put a link in the admin change page for the company class, but was getting resolution errors, so I tried reverse directly:

reverse('group_perms', kwargs={'company': 10})
# -> NoReverseMatch: Reverse for 'group_perms' with 
#    arguments '()' and keyword arguments '{}' not found.

That’s odd. Maybe I was getting the name or something wrong:

resolve('/company/10/groups/')
# Result:
ResolverMatch(
  func=<function group_permissions at 0x104b96de8>, 
  args=(), kwargs={'company': '10'}, 
  url_name='company_group_permissions', 
  app_name='None', 
  namespace=''
)

Then, I removed the extra grouping in the regex:

url(r'^(?P<company>\d+)?/groups/$', 'group_perms', name='company_group_permissions'),

And it all works as expected. However, this slightly limits the available urls:

http://example.com/company/10/groups/
http://example.com/company//groups/

This one no longer works:

http://example.com/company/groups/

I can live with that.

I can’t find anything in the django docs that details this, although I kind-of remember reading that there are limits as to the ability of reverse() to generate urls.

Trust your tools, or how django's ORM bested me

Within my system, there is a complicated set of rules for determining if a person is “inactive”.

They may have been explicitly marked as inactive, or their company may have been marked as inactive. These are simple to discover and filter to only get active people:

Person.objects.filter(active=True, company__active=True)

The other clause for inactive users is if they only work at locations that have been marked as inactive. This means we can disable a location (within a company that remains active), and not have to manually deactivate the staff who only work at that location; it also means when we reactivate a location, staff will automatically be restored to an active state.

I’ve written the code several times that determines the activity status, but have never really been that happy with it. It generally degenerates into something that uses N+1 queries to discover the activity status of N people, or requires using django’s queryset.extra() method to run queries within the database.

Now, I have a cause to fetch all active staff, from the entire system. Which I had written a query to do, but it was mistakenly including staff who are only active at inactive units. I tried playing around with .extra(select={...}), but was not able to filter on the pseudo-fields that were generated.

Then, I had the idea to do the following:

active = Location.objects.active()
inactive = Location.objects.inactive()
Person.objects.filter(
  Q(locations__in=active) | ~Q(locations__in=inactive)
)

As long as the objects active and inactive are querysets, they will be lazily evaluated, and the SQL that is generated is relatively concise:

SELECT ... 
FROM "people" 
LEFT OUTER JOIN "people_locations" 
ON ("people"."id" = "people_locations"."person_id") 
WHERE (
  "people_locations"."location_id" IN (
    SELECT U0."id" FROM "location" U0 WHERE U0."status" = 0
  )
  OR NOT ((
    "people"."id" IN (
      SELECT U1."person_id" FROM "people_locations" U1 WHERE (
        U1."location_id" IN (
          SELECT U0."id" FROM "location" U0 WHERE U0."status" = 1
        )
        AND U1."person_id" IS NOT NULL
      )
    ) 
    AND "people"."id" IS NOT NULL)
  )
)
ORDER BY "..." ASC

This is much better than how I had previously done it, and has the bonus of being db-agnostic: wheras my previous solution used Postgres ARRAY types to aggregate the statuses of locations into a list.

The moral of the story: trust your high-level abstraction tools, and use them first. If you still have performance issues, then look at optimising.

Sorting dates in DataTables

If you have tabular data, then semantically, you’ll want to put it into an HTML table. It makes sense, and is certainly easier than trying to post-style nested divs as a table.

The other really nice thing is that it’s fairly easy to use DataTables to then make that table dynamic. Especially useful if your table is large: I use it on a report of all customers in my work, and have just started using it in some user-facing pages. In essence, it is as simple as doing:

$('#table-id').dataTable();

With this, you get sortable columns, pagination, and searching.

But sorting of dates sucks, unless they are in ISO8601 format. ISO8601 is fantastic, by the way. Not only do you get dates/datetimes that are inherently no longer ambiguous, but they sort alphabetically, as you would expect. Because every field is larger than all of the fields following it, and all fields are zero-padded, eveny date or datetime will be correctly sorted.

However, the general public does not understand these two reasons for a ‘one true date format’, so we are generally forced to display it in a more readable format. Which doesn’t sort alphabetically.

There is a trick you can use to get sorting, and nice dates using DataTables, though. For example, the following (rendered) html will sort correctly, both ascending and desencding, but also only display a nice format:

<td>
  <span style="display: none;">2012-06-07</span>
  Thursday, June 7th, 2012
</td>

In django, you can use the following snippet:

<td>
  <span style="display: none;">{{ value|date:"Y-m-d" }}</span>
  {{ value|date:"l, F jS, Y" }}
</td>

Recently, DataTables also had a blog post about how to use it with Twitter Bootstrap 2. I think it looks rather nice. And with this tip, it is so much more useful.

You can also use this way of thinking on other things that should be sorted differently to how they are printed.

Spurious CORS Errors from Sentry

I realised the other day that Sentry, the awesome system we have been using for a while to track our error logs from our Django project, can also be used to track exceptions from other systems. Like Javascript. In fact, there is a client available: raven.js.

So, we have a server set up for work, but I have a side-project I have been working on, Workout Builder. So, I thought I’d set up a server in Heroku to act as my sentry server. And I found a nice simple way to get up and running: Daniel Watkins has a nice post over at Odd_Blog, Deploying Sentry on Heroku.

It’s pretty straightforward, and extremely simple. I got it up and running in no time, and then attempted to set up an email service. Rather than use my actual account for sending, I thought I’d set up a sending-only account at my domain, hosted as a Gmail Apps domain. So, I set it up, and set about testing.

All of a sudden, I’m getting errors, that didn’t appear for 30 seconds, that my test domain is not permitted to send a request due to CORS. But, I had been sending them previously.

After lots of dicking around, I discovered it was because I did not have the gmail settings quite right. Instead of telling me what the problem was, something was masking the issue (that the server was timing out because the server/port combination was not correct), and jQuery thought it was a CORS issue.

So, fixing up the email sending settings, and it’s all gravy:

EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'

EMAIL_HOST = 'smtp.gmail.com'
EMAIL_HOST_PASSWORD = '<oh no you don\'t>'
EMAIL_HOST_USER = 'noreply@schinckel.net'
EMAIL_PORT = 587
EMAIL_USE_TLS = True

Initially, I had used mail.google.com, port 25, and EMAIL_USE_TLS = False. Eventually, I got it all right.

A new Garmin Communicator plugin

As part of my plan to create a workout editor, I had to look into the method of communicating between the Garmin plugin and the browser.

It feels like a Java application. It’s documented like it, too. But, it’s written in Prototype, and includes a whole stack of other tools, like XML handling, Ajax communication, and messaging. Things that should belong in seperate parts, IMHO.

So, after a fair bit of plugging around, I was able to make enough sense of it to figure out exactly how it works:

  1. You unlock the plugin with a key-pair
  2. You get a list of devices
  3. If this is a send, then you set the value of a certain property.
  4. You start an Async communication.
  5. You poll the ‘finish’ version of that communication.
  6. When the communication is finished, if this was a receive, you load the data from a property.

I understand why they have made the plugin handle it’s communication in an async manner, but seriously: why not allow for a callback function when the communication is finished? To me, that feels like it would make so much more sense.

Anyway, my other main criticism is that it is inherently unsafe for multiple operations. Instead of, as would be possible with a callback that gets executed when the communication is finished, returning the data, it puts it into a property within the plugin. Which does mean that any bit of code can read it, but also means it’s possible to accidentally overwrite it, as the same property is used for writes.

So, the API for replacing it looks more like:

var plugin = new Garmin.Communicator();
plugin.selectDevice(0);
plugin.readActivities(function(data) {
  // data contains the XML activity data.
});

It’s actually a little more complicated than this: we can pass in delegates, that will have callback methods called when certain events occur. These events are also pushed (using jQuery) onto the HTML element that is the plugin object. But, due to a jQuery bug, you need to listen further up the chain: so you can listen for these events on body.

This script will also add the plugin to the page if it cannot find it, and will run as a singleton: calling the constructor a second time will return the original object, but also add a new delegate to the list of delegates.

I’m tempted to remove the delegate handling, and simply have it as callback-based, but this is sort-of a transition from the way the Garmin team have done it. I’m concerned there may be issues with non-UI initiated read/write events (ie, those that happen on page load) ‘beating’ the plugin being ready, but that is a job for another day.

I’ve also written some Knockout bindings for this: but those are not quite ready for public consumption. I may actually write parsing code for the Training Center Database XML file, and the types it contains, and include that with this project. But, then I may be approaching the bloat seen in the actual Garmin plugin. At this stage, if you have a server that accepts TCX files, then this should be enough.

The project is on BitBucket, as usual: garmin-plugin.

TCX Files and Garmin Goals

I’m partway through writing a workout planning tool: it’s web-based, similar to Garmin Connect, but hopefully with a better interface. I want to be able to create workouts, but I’m really happy with Strava for my activity tracking.

Part of the appeal is being able to export the data to my Garmin Forerunner HRM: this really is one of those ‘scratch my own itch’ tools. So, I’ve had to learn a bit about the Garmin TCX format. There is documentation: it is just an XML file that matches the desired schema.

I’ve made a lot of progress with the workout creation, and even exporting this to TCX. Today, I decided to work on the Goal planning.

Some Garmin HRMs have a neat feature where you can set goals, which the watch will track as you work out. Thus, you could decide you want to run 50km in a given week, and it will show you how far along that goal you are, and how much time you have remaining. However, there is no way on the Forerunner 405cx to set goals on the device, nor with Garmin Training Center, and you have to use Garmin Connect.

The thing is, this part of the TCX file is undocumented. It is stored in the <Extensions> section, and here is my plan to document it a little better.

The basic structure of the file is:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<TrainingCenterDatabase 
  xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xsi:schemaLocation="http://www.garmin.com/xmlschemas/ActivityGoals/v1 
  http://www.garmin.com/xmlschemas/ActivityGoalExtensionv1.xsd 
  http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 
  http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">

  <Folders/>

  <Author xsi:type="Application_t">
    <!-- Application info goes here -->
  </Author>

<Extensions>
  <ActivityGoals xmlns="http://www.garmin.com/xmlschemas/ActivityGoals/v1">
    <!-- List of goals goes here -->
  </ActivityGoals>
</Extensions>

We are only interested in what happens in the list of goals.

Mostly, a goal is fairly simple:

<ActivityGoal Current="0.0000000" Measure="DistanceMeters" Sport="All" Target="1000.0000000">
  <Name>Run 1km</Name>
  <Period Recurrence="Once">
    <StartDateTime>2012-07-15T00:00:00Z</StartDateTime>
    <EndDateTime>2012-07-21T23:59:59Z</EndDateTime>
  </Period>
</ActivityGoal>

From this we can see the following fields:

  • Current The amount of Measure that has been completed.
  • Measure The type of goal. Allowable values are: DistanceMeters, TimeSeconds, Calories and NumberOfSessions.
  • Sport You may limit the goal to activities of a given sport. Allowable values are: All, Running, Biking and Other. Note that Garmin Connect will allow you to choose other sports, however, the value will effectively be cast to one of these. Note also that these are the exact same values that are valid for a Workout sport type (with the addition of All).
  • Target What the actual target is.
  • Name The name of this goal. This will not be displayed on a Forerunner 405cx: not sure about other devices.
  • Period Recurrence At this stage, I’m not sure what other values than Once are permitted, but I will be investigating this: this could turn out to be a really nice way to have a repeating weekly goal.
  • StartDateTime, EndDateTime happy to see these in ISO8601 format. Not surprised by that, as the Activity spec stuff (as well as Workout scheduling) is also all in ISO8601.

I do have a couple of comments so far: the HRM watches are essentially timezone aware, and they pull their time from the GPS satellites. I wonder if goals will then respect this: I’m at +0930: if I set a goal to end at 2012-07-21T23:59:59Z, will it finish at that time (which is a UTC timestamp), or will it finish at midnight local time? Can you set goals that finish at other times than midnight?

Initial experiments appear to show no. Setting a time other than 23:59:59 means that the goal is not shown on the device. I don’t see this as a big disadvantage. Testing the timezone-ness of the period is harder: I need to wait until midnight to do so!

Secondly, what values are valid for the recurrence period? This requires some experimentation.

It appears to accept a value of Weekly, but as to if this actually does anything, I’m yet to discover. Considering it has an explicit StartDateTime and EndDateTime, unless the watch extrapolates and updates it, I’m not expecting it to do anything. Certainly, setting an EndDateTime in the past, and choosing Weekly does not appear to have any effect. Again, I’m going to have to wait until midnight clicks over to test this properly. Hopefully, it will update the start and finish times, and reset the current amount.

Also of interest: Garmin Connect sends through a Dummy <X>goal for every goal Measure you do not provide a goal for. However, this is not necessary: removing all but the goals you want to use from the generated TCX file does not prevent sending it to the device, but having an invalid Author block does prevent it from sending.

The Forerunner 405cx will only display one goal of each type (Measure). I believe it shows only the one that is closest to expiring.

When the Garmin agent sends the data to the watch, it removes it from the filesystem. This prevents it being re-sent. When data is recieved from the watch, it appears re-create the activity file from the current goals set up in Garmin Connect. This kind-of makes sense, but is annoying, as any goals that have been set up in Garmin Connect will override the goals created elsewhere.

In practice, it means that in order to send goal data to the device, you must first download the relevant activities, and calculate just how much of each goal has been completed. I was hoping to be able to avoid this: if the watch sent us the goal Current figure, then we could just load this, and apply any changes to targets, without affecting the current value. However, with my device, at least, ActivityGoals are InputToUnit only. At least, if you have no goals in Garmin Connect, it doesn’t send back bogus (dummy) goal data!