Postgres and Django

Frank Wiles gave a great talk Secrets of PostgreSQL Performance

Don’t do dumb things

  • Dedicate a single server to your database
  • Only fetch what you need

Do smart things

  • cache everything
  • limit number of queries

Tuning

  • shared_buffers : 25% of available RAM
  • effective_cache_size : OS disk cache size
  • work_mem : in-memory sort size

Less important

  • wal_buffers : set to 16MB
  • checkpoint_segments : at least 10
  • maintenance_work_mem : 50MB for every GB of RAM

Can also transactionally turn on grouping of transactions.

Hardware

  • As much RAM as you can afford - fit whole db if you can.
  • Faster disks.
    • Disk speed is important
    • RAID5 is bad
    • RAID-1+0 is good
    • WAL on own disk → 4x write performance
  • CPU speed - unlikely to be the limiting factor.

Other

  • use pg_bouncer to pool connections
  • use tablespaces to move tables/indexes onto other disks
    • ie, indexes on fastest disk
    • stuff that might run in background and hit only specific tables that are not used by other bits

django and jQuery templates

KnockoutJS is a great way to create relationships between data objects, and interface elements. You can, for instance, bind a date value to an html input[type=date] element, and have it converted into a proper date object. You could then display data based on this, or do anything else you wanted.

KnockoutJS 1.2 (the currently stable version) defaults to using jQuery templates (jQuery-tmpl), which happen to use conflicting syntax to django templates.

For instance, if you were to have the following in your django template file:

{{if foo > bar }}
  <div>Stuff Here</div>
{{/if }}

Then django would attempt to process that, as it uses bits that look like django’s template engine’s value placeholder.

A workaround to this is to look at doing something like wrapping any jQuery templates in something that prevents django from interpreting it.

But I don’t like that solution. For starters, almost every text editor will try to syntax highlight data between <script> tags as javascript, even when explicitly marked as <script type="text/html"> or any other non-javascript mime type.

So, it would be nicest (and cleanest) to be able to have each jQuery template item in a separate file in my project.

Enter {% jquery_template %}. With a custom django templatetag, you can not only have it include the template in your django template, but it will automatically add the script tags, and even add an id.

For instance, you can do: <div class="highlight"><pre>{% jquery_template 'path/to/template.html' 'templateName' %} </pre> </div>

This will include the data from path/to/template.html, which it finds in any template location, but wrapped in <script type="text/html" id="templateName">.

I have a django app that contains this template tag, as well as some other useful stuff for jquery, and other javascript stuff (including knockoutjs). You can see this template tag at: jquery_template.py.

Hope it’s useful.

Why CustomUser subclasses are not such a good idea

Background

The system I work on has People who may or may not be Users, and very infrequently Users who may not be a Person. In fact, an extension to the system has meant that there will be more of these: a User who needs to be able to generate reports (say, a Franchisor who needs to only be able to access aggregate data from franchises, that might belong to multiple companies) who is never rostered on for shifts, which is what the Person class is all about.

Anyway, the long and the short of this was that I thought it might be a good idea to look at sub-classing User for ManagementUser.

I guess I should have listened to those smarter than me who shouted that sub-classing User is not cool. Although they never gave any concrete reasons, but now I have one.

You cannot easily convert a superclass object to a specialised sub-class. Once a user is a User, it’s hard to make them into a ManagementUser.

It can be done: the following code will take a User (or any parent class) object, a User (or whatever) subclass, and any other keyword arguments that should be passed into the constructor. It saves the newly upgraded object, and returns it.

1 def create_subclass(SubClass, old_instance, **kwargs):
2     new_instance = SubClass()
3     for field in old_instance._meta.local_fields:
4         setattr(new_instance, field.name, getattr(old_instance, field.name))
5     new_instance.save()
6     return new_instance()

However, it really should check that there isn’t an existing instance, and maybe some other checks.

What advantages does sub-classing have?

The biggest advantage, or so I thought, was to have it so you can automatically downcast your models on user login, and then get access to the extended user details. For instance, if your authentication backend automatically converts User to Person, then you can get access to the Person’s attributes (like the company they work for, their shifts, etc) without an extra level of attribute access:

1 # request.user is always an auth.User instance:
2 request.user.person.company
3 # request.user might be a person, etc.
4 request.user.company

But it turns out that even this is bad. Now, in guard decorators on view functions, you cannot just test the value of an attribute, as not all users will have that attribute. Instead, you need to test to see if the attribute exists, and then test the attribute itself.

So, what do you do instead?

The preferred method in django for extending User is to use a UserProfile class. This is just a model that has a OneToOneField linked back to User. I would look at doing a very small amount of duck-punching just to make getting a hold of the profile class:

 1 import logging
 2 from django.contrib.auth.models import User
 3 from django.db import models
 4 
 5 class Person(models.Model):
 6     user = models.OneToOneField(User, related_name="_person")
 7     date_of_birth = models.DateField(null=True, blank=True)
 8 
 9 def get_person(user):
10     try:
11         return user._person
12     except Person.DoesNotExist:
13         pass
14 
15 def set_person(user, person):
16     user._person = person
17 
18 if hasattr(User, 'person'):
19     logging.error('Model User already has an attribute "person".')
20 else:
21     User.person = property(get_person, set_person)

By having the person’s related name attribute as _person, we can wrap read access to it in an exception handler, and then use a view decorator like:

1 @user_passes_test(lambda u:u.person)
2 def person_only_view(request, **kwargs):
3     pass

We know this view will only be available to logged in users who have a related Person object.

I will point out that I am duck-punching/monkey-patching here. However, I feel that this particular method of doing it is relatively safe. I check before adding the property, and in reality I probably would raise an exception rather than just log an error.

BBEdit - Strip Outer HTML tags

So, I monitor the BBEdit Google group, now that I’m a paid-up BBEdit user. One question piqued my interest today, and here is my solution:

 1 tell application "BBEdit"
 2 	tell front window
 3 		set cursorPos to characterOffset of selection
 4 		balance tags
 5 		set startPos to characterOffset of selection
 6 		set endPos to startPos + (length of selection)
 7 		select (characters (startPos - 6) thru (endPos + 6))
 8 		set selectedText to selection as text
 9 		if characters 1 thru 6 of selectedText as text is equal to "<span>" then
10 			set replaceText to characters startPos thru (endPos - 1) as text
11 			set selection to replaceText
12 			select insertion point before character (cursorPos - 6)
13 		else
14 			select insertion point before character (cursorPos)
15 		end if
16 	end tell
17 end tell

In summary, it uses the BBEdit builtin command to select the contents of the current tag, and then extends that selection to grab the span tags that surround it. If indeed it was as span block, then it removes those tags.

This is just a simple one-off, but it might be useful as a basis for generating a script that has more features: like arbitrary tag types (rather than just span), or some other thing I haven’t thought of.

Note that it will only strip the outer tags. BBEdit has a Remove Markup feature, but that does not seem to be accessible using AppleScript.

Dreamweaver Password Decoding

For future reference:

1 def decode_dreamweaver_password(encoded):
2     output = ""
3     for i in range(0, len(encoded), 2):
4         val = int(data[i:i+1],16) - i/2
5         output += chr(val)
6     return output

Knockout Collection

I am loving KnockoutJS. It makes it super easy to bind data values to UI elements in a declarative manner. You no longer have to worry about callbacks updating your data model and/or your view widgets.

The addition to KnockoutJS that I have been working on is a ‘collection’, that can be used to contain a set of objects, which can be fetched from a server, and each of which has it’s own resource URI that will be used to update or delete it.

For instance, we may have a collection URI:

GET "http://example.com/people/"

When we access this using a GET request, we might see something like:

 1 [
 2   {
 3     "first_name": "Adam",
 4     "last_name": "Smith",
 5     "links": [
 6       {"rel":"self", "uri": "http://example.com/people/552/"}
 7     ]
 8   },
 9   {
10     "first_name": "John",
11     "last_name": "Citizen",
12     "links": [
13       {"rel":"self", "uri": "http://example.com/people/32/"}
14     ]
15   }  
16 ]

Each linked resource contains the full (or as much as the logged-in user is able to see) representation. Example:

GET "http://example.com/people/552/"
1 {
2   "first_name": "Adam",
3   "last_name": "Smith",
4   "date_of_birth": "1910-02-11",
5   "email": "adam.smith@example.com",
6   "links": [
7     {"rel":"self", "uri": "http://example.com/people/552/"}
8   ]
9 }

Now, this is just the beginning. Obviously, we want to turn all of these fields into observables. I also wanted to know when any data had changed (so the “Save” button can be disabled when the object is not dirty). Clearly, being able to write the data back to the server, as well as create new objects, and delete them. Further, I needed to be able to do conditional reads and writes (only allow the object to be saved if no-one else has touched it since we last fetched it).

The place where the ko.mapping plugin broke down for me was that updating the resource from the full representation didn’t add the new fields that came back from the server. It may be that indeed this is possible (I think it is), but at the time, I could not see how to do this. It may be that I will rewrite this to use the ko.mapping stuff, but I’m not so sure right now.

Anyway, after a couple of revisions, I have a working framework.

To use it, you can just do:

 1 // Add a dependentObservable called 'name'.
 2 var processPerson = function(item) {
 3   item.name = ko.dependentObservable(function(){
 4     return item.first_name() + ' ' + item.last_name();
 5   });
 6 };
 7 
 8 var people = ko.collection({
 9   url: "http://example.com/people/",
10   processItem: processPerson
11 });

There is one main caveat at this stage:

  • It is expected that each object will have a ‘name’ property. If your server does not return one, you’ll need to setup a dependentObservable as shown in processPerson above.

First, the ko.collection object:

  1 ko.collection = function(options) {
  2   // Let jQuery know we always want JSON
  3   $.ajaxSetup({
  4     contentType: 'application/json',
  5     dataType: 'json',
  6     cache: false // This is browser cache! Needs to be set for Firefox.
  7   });
  8   
  9   options = options || {};
 10   var url = options.url;                  // Allow passing in a url.
 11   var processItem = options.processItem;  // Allow passing in a function to process each item after it is fetched.
 12   var etag;
 13   
 14   
 15   // Initial setup. We need to set these early so we can access them, even
 16   // if we have no data for them.
 17   var self = {
 18     items: ko.observableArray([]),
 19     selectedItem: ko.observable(null),
 20     selectedIndexes: ko.observableArray([]),
 21     filters: ko.observable({})
 22   };
 23   
 24   /*
 25   Message handling.
 26   
 27   We have a messages observableArray, but we use this dependent observable
 28   to access it. This allows us to have messages that expire.
 29   
 30   self.messages() => provide access to the array of messages.
 31   self.messages({
 32     type: "error|notice|warning|whatever",    => This will usually be used to apply a class
 33     message: "Text of message",               => This text will be displayed
 34     timeout: 1500                             => If this is non-zero, message expires (and 
 35                                                  is automatically removed after this many 
 36                                                  milliseconds)
 37   });
 38   
 39   Every message object gets given a callback function (.remove()), that,
 40   when executed, well immediately remove that message, and get rid of the
 41   timer that normally removes that message after timeout.
 42   
 43   The messages object is also given a flush() function, that will remove
 44   all of the messages within it.
 45   
 46   Not sure if I should move this to a seperate plugin?
 47   */
 48   var messages = ko.observableArray([]);
 49   self.messages = ko.dependentObservable({
 50     read: function() {
 51       return messages();
 52     },
 53     write: function(message) {
 54       var timeout;
 55       message.remove = function() {
 56         messages.remove(message);
 57         clearTimeout(timeout);
 58       };
 59       messages.remove(function(item) {
 60         return item.message === message.message;
 61       });
 62       messages.push(message);
 63       if (message.timeout) {
 64         timeout = setTimeout(function(){
 65           messages.remove(message);
 66         }, message.timeout);
 67       }
 68     }
 69   });
 70   self.messages.flush = function() {
 71     $.each(messages, function(message){
 72       message.remove();
 73     });
 74   };
 75     
 76   /*
 77   filteredItems : a subset of self.items() that has been passed through
 78                   all of the self.filters(), and selects only those that
 79                   match. A filter must be an object of the form:
 80                   {
 81                     value: ko.observable(""),
 82                     attr: "name",
 83                     test: function(test_value, obj_value) {}
 84                   }
 85                   
 86                   The filtering code handles getting the correct values to
 87                   pass to the test function, the attr is the name of the 
 88                   attribute on each member of self.items() that will be
 89                   tested.
 90                   Having 'value' passed in means we can have a default
 91                   value when app starts.
 92   */
 93   self.filteredItems = ko.dependentObservable(function() {
 94     var filteredItems = self.items();
 95     $.each(self.filters(), function(name, filt){
 96       filteredItems = ko.utils.arrayFilter(filteredItems, function(item){
 97         if (!filt.attr || !item[filt.attr]) {
 98           return true;
 99         }
100         return filt.test(filt.value(), item[filt.attr]());
101       });
102     });
103     return filteredItems;
104   });
105   
106   /*
107     This is really only used by a select[multiple] object, and is used in
108     conjunction with selectedIndexes.
109     
110     TODO: make this a writeable dependentObservable.
111   */
112   self.selectedItems = ko.dependentObservable(function() {
113     return self.items().filter(function(el){
114       return $.inArray(self.items().indexOf(el), self.selectedIndexes()) >= 0;
115     });
116   });
117   
118   /*
119     Filter self.items() finding only those that have at least one attribute
120     that is marked as dirty.
121   */
122   self.dirtyItems = ko.dependentObservable(function() {
123     return self.items().filter(function(el){
124       return el.isDirty();
125     });
126   });
127   
128   /*
129     Filter self.items(), finding only those that have at least one attribute
130     marked as conflicted.
131   */
132   self.conflictedItems = ko.dependentObservable(function() {
133     return self.items().filter(function(el){
134       return el.hasConflicts();
135     });
136   });
137   
138   self.setSource = function(newUrl) {
139     url = newUrl;
140   };
141   
142   /*
143     Fetch all items from the url we have for the index.
144     
145     It is allowable that the index does not return the full body of each
146     item, but instead only contains perhaps a name, and links for that
147     item. Then, we can use self.selectedItem().fetch() to get the full
148     data for the item.
149   */
150   self.fetchItems = function() {
151     if (!url) {
152       return;
153     }
154     var headers = {};
155     if (etag) {
156       headers['If-None-Match'] = etag;
157     }
158     $.ajax({
159       url: url,
160       type: "get",
161       headers: headers,
162       statusCode: {
163         200: function(data, textStatus, jqXHR) {
164           // Successful. If we already had objects, then
165           // we need to update that list.
166           $.each(self.items(), function(i, item){
167             // Is there an item in the new data items list that matches
168             // the item we are now looking at?
169             var matchingItem = data.filter(function(el){
170               links = el.links.filter(function(link){
171                 return link.rel==="self";
172               });
173               return links[0] && links[0].uri === item._url();
174             })[0];
175             if (matchingItem) {
176               // Update the item that matched.
177               item.updateData(matchingItem);
178               if (processItem) {
179                 processItem(item);
180               }
181               // Remove from data.
182               data.splice(data.indexOf(matchingItem), 1);
183               // Not sure if this should be here.
184               // item.isDirty(false);
185             } else {
186               // Not found in incoming data: remove from our local store.
187               // Will this break $.each(self.items(), ...) ?
188               self.items.remove(item);
189             }
190           });
191           
192           // Any items that we have left in data (which will be all if we
193           // haven't loaded this up before) now need to be added to items().
194           // On a clean fetch, this will be the first code that is run.
195           $.each(data, function(i, el){
196             var item = ko.collectionItem(el, self);
197             if (processItem) {
198               processItem(item, el);
199             }
200             self.items.push(item);
201           });
202           
203           // Finally, update the etag.
204           etag = jqXHR.getResponseHeader('Etag');
205         }
206       }
207     });
208   };
209   
210   /*
211     A shortcut method that allows us to bind an action to fetch the
212     data from the server for the currently selected item.
213   */
214   self.fetchSelectedItemDetail = function(evt) {
215     if (self.selectedItem && self.selectedItem()) {
216       self.selectedItem().fetch();
217     }
218   };
219   
220   /*
221     Create an item. I haven't implemented this yet, because I haven't 
222     figured out a way to see what fields are needed to be created when
223     there are no currently loaded items. I'm thinking about using a
224     Wizard in my application, so this might be overridden by the app.
225   */
226   self.createItem = function(evt) {
227     console.log("ADDING ITEM (NOT FINISHED YET)");
228     // The trick here is knowing what fields need to be created.
229     // self.items.push(ko.collectionItem({}));
230   };
231   
232   /*
233     Permanently remove the selectedItem, and delete it on the server.
234   */
235   self.removeSelectedItem = function(evt) {
236     if (self.selectedItem && self.selectedItem()) {
237       var sure = confirm("This will permanently remove " + self.selectedItem().name());
238       if (sure){
239         self.selectedItem().destroy();        
240       }
241     }
242   };
243   
244   /*
245     Iterate through self.items(), finding those that match all of the data
246     we pass in.
247     
248     For instance, you can do things like: 
249     
250       viewModel.findMatchingItems({date_of_birth: "1995-01-01"})
251     
252     This is used internally to find matches for objects when updating. Not
253     sure why it is exposed as a public member function though.
254   */
255   self.findMatchingItems = function(options) {
256     return self.items().filter(function(el){
257       var match = true;
258       $.each(options, function(opt, val) {
259         if (el[opt]() !== val) {
260           // Returning false causes $.each to stop, too.
261           return match = false;
262         }
263       });
264       return match;
265     });
266   };
267     
268   if (url) {
269     self.fetchItems();
270   }
271   
272   return ko.observable(self);
273 };

Second, the ko.collectionItem object. This may be eventually hidden in the collection object, as it isn’t really intended to be used seperately.

  1 ko.collectionItem = function(initialData, parentCollection) {
  2   var self = {
  3     isFetched: ko.observable(false)
  4   };
  5   var links = [];
  6   var etag = null;
  7   var url = null;
  8   var attributes = ko.observableArray([]);
  9   var collection = parentCollection;
 10   var dirtyFlag = ko.observable(false);
 11   
 12   /* Private methods */
 13   
 14   /*
 15     Given the incoming 'data' for this object, look through the fields for
 16     things that differ between the server representation and the client
 17     representation. Store both values for any differences in an attribute
 18     of the observable called conflicts().
 19     
 20     For each conflict, create a member function on the observable that
 21     allows you to resolve the conflict. When the last conflict is resolved,
 22     our etag is updated to the value the server gave us.
 23     
 24     This method returns true if all conflicts could be resolved (ie, the
 25     data in all fields was the same, just the etag had changed).
 26   */
 27   var parseConflicts = function(data, newEtag) {
 28     $.each(data, function(attr, value){
 29       if (attr !== "links") {
 30         if ($.compare(value, self[attr]() === undefined ? "" : self[attr]())) {
 31           // Server and client values match.
 32           // We need to do some funky stuff with undefined values, and treat
 33           // them as "". I don't really like this, but it works for now.
 34           self[attr].conflicts([]);
 35           self[attr].resolveConflict = function(){};
 36         } else {
 37           self[attr].conflicts([value, self[attr]() === undefined ? "" : self[attr]()]);
 38           self[attr].resolveConflict = function(chosenValue) {
 39             // Mark the entire object as dirty, so we can allow it to be
 40             // saved, even if we set it to the original value we had (which
 41             // differed from the server's value).
 42             self.isDirty(true);
 43             self[attr](chosenValue);
 44             self[attr].conflicts([]);
 45             if (!self.hasConflicts()) {
 46               // If this was the last conflict, we can use the new etag from
 47               // the server.
 48               etag = newEtag;
 49             }
 50           };
 51         }        
 52       }
 53     });
 54     var conflicts = self.hasConflicts();
 55     if (!conflicts) {
 56       etag = newEtag;
 57     }
 58     return !conflicts;
 59   };
 60   
 61   /*
 62   Given an object containing errors, we want to apply each of these
 63   errors onto the relevant field. We want to remove any errors that are
 64   already on any field.
 65   
 66   If we have any errors leftover, we need to notify globally, using the
 67   parentCollection's messages object.
 68   */
 69   var markErrors = function(errors) {
 70     $.each(attributes(), function(i,attr){
 71       if (!self[attr].errors) {
 72         self[attr].errors = ko.observableArray([]);
 73       }
 74       if (errors[attr]) {
 75         self[attr].errors(errors[attr]);
 76         delete errors[attr];
 77       } else {
 78         self[attr].errors([]);
 79       }
 80     });
 81     
 82     $.each(errors, function(field){
 83       parentCollection.messages({type:"error", message: field + ": " + errors[field].join("<br>"), timeout: 3000});
 84     });
 85   };
 86   
 87   /*
 88     Get the attributes ready for sending to the server.
 89     
 90     We can't just iterate through properties, as some will not apply. We
 91     use the convention that we will only send back properties that the
 92     server sent to us.
 93   */
 94   var prepareAttributes = function() {
 95     var data = {};
 96     $.each(attributes(), function(i,attr){
 97       data[attr] = self[attr]();
 98     });
 99     return data;
100   };
101   /* Public methods */
102   
103   /*
104     Update the data fields associated with this object from the provided
105     data.
106     
107     This may create new attributes, which need to be noted so we can send
108     those values back to the server.
109     
110     We can mark all updated attributes as not dirty, not conflicted, and
111     not having errors.
112   */
113   self.updateData = function(data) {
114     if (data.links) {
115       // We want to store the links, but not attach them to the object.
116       links = data.links;
117       delete data.links;
118       $.each(links, function(i, obj){
119         if (obj.rel === "self") {
120           url = obj.uri;
121         }
122       });
123     }
124     
125     $.each(data, function(attr, value){
126       if (attributes().indexOf(attr) < 0) {
127         self[attr] = ko.observable(value);
128         self[attr].errors = ko.observableArray([]);
129         self[attr].conflicts = ko.observableArray([]);
130         ko.dirtyFlag(self[attr], false);
131         // Need to add this last to cause the dirtyFields dependentObservable
132         // to work correctly when editing the last field.
133         attributes.push(attr);
134       } else {
135         self[attr](value);
136         self[attr].errors([]);
137         self[attr].conflicts([]);
138         self[attr].isDirty.reset();
139       }
140     });
141     // Put the links back in case a post-processor needs them.
142     data.links = links;
143   };
144   
145   self.serialize = function(evt) {
146     return JSON.stringify(prepareAttributes());
147   };
148   
149   /*
150     Discard any local changes, and pull the data from the server.
151   */
152   self.revert = function(evt) {
153     etag = null;
154     self.fetch();
155     parentCollection.messages({type:'warning', message:'The object "' + self.name() + '" was reverted to the version stored on the server.', timeout: 5000});
156   };
157   
158   /*
159     Attempt to save the data to the server.
160     
161     Only permitted to do this if we have successfully fetched the data
162     at some point.
163     
164     Notes: We use POST instead of PUT, in case we do not have access to
165            all of the fields of the object. PUT implies the complete resource
166            is being updated.
167            Errors may come back in {'field-errors': []}, or {'detail':[]}.
168            Currently, this makes assumptions about server type, which are
169            bad. I need to refactor the error handling code. (400,409)
170            Precondition Failed (412) needs to be handled differently, as
171            we need to fetch the data from the server if none was provided
172            as to the current state of the resource.
173            
174   */
175   self.save = function(evt) {
176     if (self.isFetched()) {
177       $.ajax({
178         url: url,
179         type: 'post', // We can't PUT in case we don't know about all fields.
180         headers: {'If-Match': etag},
181         data: self.serialize(),
182         statusCode: {
183           200: function(data, textStatus, jqXHR) {
184             // Object saved.
185             // Incase some fields were reformatted by the server, redo our data.
186             self.updateData(data);
187             etag = jqXHR.getResponseHeader('Etag');
188             parentCollection.messages({type:'success', message:'The object "' + self.name() + '" was saved.', timeout: 2500});
189             self.isDirty(false);
190           },
191           201: function(data, textStatus, jqXHR) {
192             // Object saved for the first time (created)
193             // Incase some fields were reformatted by the server, redo our data.
194             self.updateData(data);
195             etag = jqXHR.getResponseHeader('Etag');
196             url = jqXHR.getResponseHeader('Location');
197             parentCollection.messages({type:'success', message:'The object "' + self.name() + '" was created.', timeout: 2500});
198             self.isDirty(false);
199           },
200           400: function(jqXHR, textStatus, errorThrown) {
201             var data = JSON.parse(jqXHR.responseText);
202             if (data['field-errors']) {
203               markErrors(data['field-errors']);
204             }
205             parentCollection.messages({type:'error', message:'The object "' + self.name() + '" could not be saved. Please check the highlighted field(s).', timeout: 10000});
206           },
207           409: function(jqXHR, textStatus, errorThrown) {
208             // Errors saving the data. Likely to be validation errors.
209             // We should have a detail object with info to display.
210             var data = JSON.parse(jqXHR.responseText);
211             if (data.detail) {
212               markErrors(data.detail);
213             }
214             parentCollection.messages({type:'error', message:'The object "' + self.name() + '" could not be saved. Please check the highlighted field(s).', timeout: 10000});
215           },
216           412: function(jqXHR, textStatus, errorThrown) {
217             // Data was changed on server since we last fetched it.
218             // There may be conflicts to deal with.
219             // See if the server gave us a current version back...
220             var data, serverEtag;
221             if (jqXHR.responseText) {
222               data = JSON.parse(jqXHR.responseText);
223             } else {
224               $.ajax({
225                 url: url,
226                 async: false,
227                 success: function(newData, textStatus, jqXHR) {
228                   data = newData;
229                   serverEtag = jqXHR.getResponseHeader('Etag');
230                 }
231               });
232             }
233             if (parseConflicts(data, serverEtag)) {
234               // We were able to resolve all of the conflicts, now we can
235               // try to re-save; but only if it was the first time we saved,
236               // to prevent inifinite recursion.
237               if (evt) {
238                 self.save();
239               }
240             } else {
241               parentCollection.messages({type:'error', message:'The object "' + self.name() + '" has been modified on the server. Please check the changed field(s) and select the appropriate value(s).', timeout: 10000});
242             }
243           }
244         }
245       });
246     }
247   };
248   
249   /*
250     Permanently delete the object from the server.
251   */
252   self.destroy = function(evt) {
253     if (self.isFetched() && etag) {
254       console.log("DELETING ITEM");
255       $.ajax({
256         url: url,
257         type: 'delete',
258         headers: {'If-Match': etag},
259         success: function(data, textStatus, jqXHR) {
260           if (collection) {
261             collection.items.remove(self);
262           }
263           parentCollection.messages({type:'success', message:'The object "' + self.name() + '" was deleted.', timeout: 2500});
264         },
265         error: function(jqXHR, textStatus, errorThrown) {
266           // Display error message about not being able to delete?
267           parentCollection.messages({type:'error', message:'The object "' + self.name() + '" could not be deleted.', timeout: 10000});
268         }
269       });
270     }
271   };
272   
273   /*
274     (Re)Fetch the resource from the server.
275     
276     Handle conflicts if the arise (when the object has already been fetchd)
277   */
278   self.fetch = function(evt) {
279     var headers = {};
280     if (etag) {
281       headers['If-None-Match'] = etag;
282     }
283     $.ajax({
284       type: 'get',
285       url: url,
286       headers: headers,
287       statusCode: {
288         200: function(data, textStatus, jqXHR) {
289           // If we have an etag already, this means the object has been
290           // updated on the server, and we need to look for conflicts.
291           if (etag) {
292             var serverEtag = jqXHR.getResponseHeader('Etag');
293             // If we were unable to handle all conflicts, we need to exit.
294             if (!parseConflicts(data, serverEtag)) {
295               parentCollection.messages({type:'error', message:'The object "' + self.name() + '" has been modified on the server. Please check the changed field(s) and select the appropriate value(s).', timeout: 10000});
296               return;
297             };
298           }
299           
300           // Otherwise, we can update the data and the etag.
301           self.updateData(data);
302           etag = jqXHR.getResponseHeader('Etag');
303           self.isFetched(true);
304         },
305         304: function() {
306         }
307       },
308       error: function(jqXHR, textStatus, errorThrown) {
309         parentCollection.messages({type:"error", message:"There was an error fetching the data from the server"});
310       }
311     });
312   };
313   
314   
315   
316   /* Dependent Observables */
317   self.dirtyFields = ko.dependentObservable(function(){
318     return ko.utils.arrayFilter(attributes(), function(attr){
319       return self[attr] && self[attr].isDirty && self[attr].isDirty();
320     });
321   });
322   
323   self.conflictedFields = ko.dependentObservable(function() {
324     return ko.utils.arrayFilter(attributes(), function(attr){
325       return self[attr] && self[attr].conflicts && self[attr].conflicts().length > 0;
326     });
327   });
328   
329   var filterAttributes = function(property) {
330     return function() {
331       return ko.utils.arrayFilter(attributes(), function(attr){
332         return self[attr] && self[attr][property] && self[attr][property]().length > 0;
333       }).length > 0;
334     };      
335   };
336   
337   self.hasErrors = ko.dependentObservable(filterAttributes('errors'));
338   self.hasConflicts = ko.dependentObservable(filterAttributes('conflicts'));
339   
340   /*
341     An object is dirty when:
342       - any of its fields/attributes are dirty. (we aks them), OR
343       - we have explicitly marked it as dirty.
344       
345     We need to do the latter for when we have merged a conflict, by choosing
346     our value, which differed from the server. The local model would
347     normally think it wasn't dirty, but it differs from the server, and
348     does need to be saved.
349   */
350   self.isDirty = ko.dependentObservable({
351     read: function() {
352       return self.dirtyFields().length > 0 || dirtyFlag();
353     },
354     write: function(value) {
355       dirtyFlag(value);
356       if (!value) {
357         $.each(attributes(), function(attr){
358           if (self[attr] && self[attr].isDirty) {
359             console.log(attr);
360             self[attr].isDirty.reset();          
361           }
362         });
363       }
364     }
365   });
366   
367   /*
368     Can this object be saved back to the server?
369     Only when it is dirty, and has been fetched.
370   */
371   self.canSave = ko.dependentObservable(function() {
372     return self.isDirty() && self.isFetched();
373   });
374   
375   self._etag = function(){return etag;};
376   self._attributes = function(){ return attributes();};
377   self._url = function() {return url;};
378   
379   if (initialData) {
380     self.updateData(initialData);
381   }
382   
383   return self;
384 };

Filtering querysets in django.contrib.admin forms

I make extensive use of the django admin interface. It is the primary tool for our support team to look at user data for our product, and I have stretched it in many ways to suit my needs.

One problem I often come back to is a need to filter querysets in forms and formsets. Specifically, the objects that should be presented to the admin user in a relationship to the currently viewed object should be filtered. In most cases, this is something as simple as making sure the Person and the Units they work at are within the same company.

There is a simple bit of boilerplate that can do this. You need to create a custom form, and attach this to the ModelAdmin for the parent object:

 1 from django.contrib import admin
 2 from django import forms
 3 from models import Person, Unit
 4 
 5 class PersonAdminForm(forms.ModelForm):
 6     class Meta:
 7         model = Person
 8     
 9     def __init__(self, *args, **kwargs):
10         super(PersonAdminForm, self).__init__(*args, **kwargs)
11         # This is the bit that matters:
12         self.fields['units'].queryset = self.instance.company.units
13 
14 class PersonAdmin(admin.ModelAdmin):
15     form = PersonAdminForm

In actuality, it is a little more complicated than this: you need to test if the selected object has a company, and really, if the user has changed the company (or selected it on a new person), you should use that instead. So the code looks a bit more like:

 1 company = None
 2 if self.data.get('company', None):
 3     try:
 4         company = Company.objects.get(pk=self.data['company'])
 5     except Company.DoesNotExist:
 6         pass
 7 else:
 8     try:
 9         company = self.instance.company
10     except Company.DoesNotExist:
11         pass
12 if company:
13     self.fields['units'].queryset = company.units.all()

Now, having to write all of that every time you have to filter the choices available wears rather thin. And wait until you need to do it to a formset instead: you need to also do stuff to the empty_form, so that when you dynamically add an inline form, it has the correct choices.

Enter FilteringForm, and her niece FilteringFormSet:

 1 from django import forms
 2 from django.core.exceptions import ObjectDoesNotExist
 3 
 4 class FilterMixin(object):
 5     filters = {}
 6     instance_filters = {}
 7     def apply_filters(self, forms=None):
 8         # If we didn't get a forms argument, we apply to ourself.
 9         if forms is None:
10             forms = [self]
11         # We need to apply instance filters first, as they allow us to
12         # select an attribute on our instance to be the queryset, and
13         # then apply a filter onto that with filters.
14         for field, attr in self.instance_filters.iteritems():
15             # It may be using a related attribute. person.company.units
16             tokens = attr.split('.')
17             
18             source = None
19             # See if there is any incoming data first.
20             if self.data.get(tokens[0], ''):
21                 try:
22                     source = self.instance._meta.get_field_by_name(tokens[0])[0].rel.to.objects.get(pk=self.data[tokens[0]])
23                 except ObjectDoesNotExist:
24                     pass
25             # Else, look for a match on the object we already have stored
26             if not source:
27                 try:
28                     source = getattr(self.instance, tokens[0])
29                 except ObjectDoesNotExist:
30                     pass
31             
32             # Now, look for child attributes.
33             if source:
34                 for segment in tokens[1:]:
35                     source = getattr(source, segment)
36                 if forms:
37                     for form in forms:
38                         form.fields[field].queryset = source
39         
40         # We can now apply any simple filters to the queryset.
41         for field, q_filter in self.filters.iteritems():
42             for form in forms:
43                 form.fields[field].queryset = form.fields[field].queryset.filter(q_filter)
44     
45 
46 class FilteringForm(forms.ModelForm, FilterMixin):
47     def __init__(self, *args, **kwargs):
48         super(FilteringForm, self).__init__(*args, **kwargs)
49         self.apply_filters()
50 
51 class FilteringFormSet(forms.models.BaseInlineFormSet, FilterMixin):
52     filters = {}
53     instance_filters = {}
54     
55     def __init__(self, *args, **kwargs):
56         super(FilteringFormSet, self).__init__(*args, **kwargs)
57         self.apply_filters(self.forms)
58     
59     def _get_empty_form(self, **kwargs):
60         form = super(FilteringFormSet, self)._get_empty_form(**kwargs)
61         self.apply_filters([form])
62         return form
63     empty_form = property(_get_empty_form)

Now, to use all of this, you still need to subclass, but you can declare the filters:

1 class PersonAdminForm(FilteringForm):
2     class Meta:
3         model = Person
4     
5     instance_filters = {
6         'units': 'company.units'
7     }

You can also have non-instance filters, and they will be applied after the instance_filters:

 1 from django.db import models
 2 
 3 class PersonAdminForm(FilteringForm):
 4     class Meta:
 5         model = Person
 6     
 7     instance_filters = {
 8         'units': 'company.units'
 9     }
10     filters = {
11         'units': models.Q(is_active=True)
12     }

I think it might be nice to be able to add an extra set of filtering for the empty form in a formset, so you could make it that only choices that hadn’t already been selected, for instance, were the only ones available. But that isn’t an issue for me right now.

Displaying only objects without subclasses

Sometimes, the django.contrib.auth User model just doesn’t cut it.

I have bounced around between ways of handling this sorry fact. My production system uses a nasty system of Person-User relationships (where, due to old legacy code, I need to keep the primary keys in sync), to monkey-patching User, using UserProfiles, and subclassing User.

First, a little on the nasty hack I have in place (and how that will affect my choices later on).

My project in work is a rostering system, where not everyone who is a Person in the system needs to be a User. For instance, most managers (who are Users) do not need their staff to be able to log in. However, they themselves must be a Person as well as a User, if they are to be able to log in, but also be rostered on.

Thus, there are many people in the system who are not Users. They don’t have a username, and may not even have an email address. Not that having an email address is that useful in the django User model, as there is no unique constraint upon that.

So, I am currently kind-of using Person as a UserProfile object, but there are Person instances that do not have an associated User, and some of these are required to have an email address, and have first and last names. So, there is lots of duplication across these two tables. Which need to be kept in sync.

The solution I am looking at now moves in the other direction.

A Person is a subclass of User. It has the extra data that we need for our business logic (mobile phone number, company they work for), but I have also monkey-patched User to not require a username. We are moving towards using email addresses for login names anyway, so that isn’t a problem. That has its own concerns (not everyone has a unique email address, but there are workarounds for that).

But not every User will have a Person attached. The admin team’s logins will not (and this will be used to allow them to masquerade as another user for testing and bug-hunting purposes). So, we can’t just ignore the User class altogether and do everything with the Person class.

This is all well and good. I have an authentication backend that will return a Person object instead of a User object (if one matches the credentials). Things are looking good.

Except then I look in the admin interface. And there we have all of the Person objects’ related User objects, in the User table. It would be nice if we only had the ‘pure’ Users in here, and all Person objects were just in their category.

So, I needed a way to filter this list.

Luckily, django’s admin has this capability. In my person/admin.py file, I had the following code:

1 from django.contrib import admin
2 from django.contrib import auth
3 
4 class UserAdmin(auth.admin.UserAdmin):
5     def queryset(self, request):
6         return super(UserAdmin, self).queryset(request).filter(person=None)
7 
8 admin.site.unregister(auth.models.User)
9 admin.site.register(auth.models.User, UserAdmin)

And, indeed, this works.

But then I found another User subclass. Now we needed a type of user that is distinct from Person (they are never rostered, are not associated with a given company, but do log into the system).

I wanted the changes to the admin to be isolated within the different apps, so I needed to be able to get the currently installed UserAdmin class, and subclass that to filter the queryset. So the code becomes (in both admin.py files):

 1 from django.contrib import admin
 2 from django.contrib import auth
 3 
 4 BaseUserAdmin = type(admin.site._registry[auth.models.User])
 5 
 6 class UserAdmin(BaseUserAdmin):
 7     def queryset(self, request):
 8         return super(UserAdmin, self).queryset(request).filter(foo=None)
 9 
10 admin.site.unregister(auth.models.User)
11 admin.site.register(auth.models.User, UserAdmin)

The only difference in the two files is the foo. This becomes whatever this sub-class’s name is. Thus, it is person in the person/admin.py file, and orguser in the orguser/admin.py file.

The next step is to change the backend so that it will automatically downcast the logged in user to their child class. Other people have detailed this in the past: mostly the performance issue vanishes here because we are only looking at a single database query for a single object.

Adobe PDF Workflow under Snow Leopard

My partner is a mad keen Macromedia Freehand user. This is one of the reasons she has been able (and prepared) to stick with her trusty old G4 iMac until now. It is also the reason our brand new iMac won’t be running Lion anytime soon.

So, when we got the new iMac, I had to setup Freehand so it worked. The next thing was to bring across all of her thousands of fonts. Tip: if fonts look jaggy, force the font cache to rebuild and restart.

Finally, we got to the stage where she was trying to create some PDFs. And since Adobe is not always the best OS citizen, we found the old way she used to create them no longer worked under Snow Leopard. Using the system PDF generator resulted in far inferior PDF quality: jaggy fonts, lines within curves.

Now, I get the feeling that the system isn’t at fault here, as it is very capable of creating PDF files of correct quality. Indeed, I was able to get high quality PDFs generated from other programs (and as we’ll see in a minute, even from files generated from Freehand). So, it seems that Freehand ‘knows’ this is a ‘preview’ version, and cuts the quality of data it sends.

Eventually, after much work, I found that creating a PostScript file worked okay, but the page size was incorrect. At this stage I had installed the printer driver for our old Epson Stylus PHOTO EX, which resulted in the print dialog box no longer showing all of the Freehand MX settings.

The final solution was to create an IPP printer, to localhost, that is called Adobe PDF. This is set to use the generic PostScript driver. All of a sudden, we are able to access the Freehand MX advanced settings in the print dialog, and create PostScript files that are the right size, and of suitable quality. She then either uses Preview or Acrobat Distiller to turn these into PDFs.

Installing django (or any python framework, really)

TL;DR

1 $ pip install virtualenv
2 $ virtualenv /path/to/django_project/
3 $ . /path/to/django_project/bin/activate
4 $ pip install django

I hang around a fair bit in #django now on IRC. It’s open most of the time I am at work: if I am waiting for something to deploy, I’ll keep an eye out for someone that needs a hand, or whatever. Yesterday, I attempted to help someone out with an issue with django and apache: I ended up having to go home before it got sorted out.

One of the things that came up was how to actually install django. The person was following instructions on how to do so under Ubuntu, but they weren’t exactly ‘best practice’.

One of the things I wish I had been around when I first started developing using python is virtualenv. This tool allows you to isolate a python environment, and install stuff into it that will not affect other virtual environments, or the system python installation.

Unfortunately, it does not come standard with python. If it were part of the standard library, it may reduce the likelihood of someone not using it. The upside of it not being in the standard library is that it gets updated more frequently.

Installing virtualenv

First, see if virtualenv is installed:

1 $ virtualenv --version

If not, you’ll need to install it. You can install it using pip or easy_install, if you have either of those installed. If you are a super-user on your machine (ie, it is your computer), then you may want to use sudo. You can have it installed just in your user account, which you might need to do on a shared computer.

You’ll probably also want to install pip at the system level. I do this first, and use it to install virtualenv, fabric and other packages that I need to use outside of a virtualenv (mercurial springs to mind). Do note that a virtualenv contains an install of pip by default, so this is up to you: once you have virtualenv installed, you can use pip in every virtualenv to install packages.

Setting up a virtual environment

I recommend using virtualenv for both development and deployment.

I think I use virtualenv slightly differently to most other people. My project structure tends to look like:

 1 /home/user/development/<project-name>/
 2     bin/
 3     fabfile.py
 4     include/
 5     lib/python2.6/site-packages/...
 6     project/
 7         # Project-specific stuff goes here
 8     src/
 9         # pip install -e stuff goes here
10     tmp/

Thus, my $VIRTUAL_ENV is actually also my $PROJECT_ROOT. This means that everything is self contained. It has the negative side-effect of meaning if I clone my project, I need to install everything again. This is not such a bad thing, as I use Fabric to automate the setup and deployment processes. It takes a bit of time, but using a local pypi mirror makes if fairly painless.

Obviously, I ignore bin/, lib/ and the other virtualenv created directories in my source control.

However, since we are starting from scratch, we won’t have a fabfile.py to begin with, and we’ll just do stuff manually.

1 $ cd /location/to/develop
2 $ virtualenv my_django_project

That’s it. You now have a virtual environment.

Installing django/other python packages

You’ll want to activate your new virtualenv to install the stuff you will need:

1 $ cd my_django_project
2 $ . bin/activate
3 (my_django_project)$

Notice the prompt changes to show you are in a virtual environment.

Install the packages you need (from now on, I’ll assume your virtualenv is active):

1     $ pip install django

There has been some discussion about having packages like psycopg2 installed at the system level: I tend to install everything into the virtualenv.

So that’s it. You now have django installed in a virtual environment. I plan to write some more later about my deployment process, as well as how I structure my django projects.