Generating Coverage Badges

Drone.io is a pretty neat (free for open-source projects) continuous integration server. The best feature from my perspective is that it works with BitBucket repositories.

It’s pretty nice having status badges indicating if a build is passing or failing, but even better is also getting a coverage report.

I’ve been using django-coverage for this, and ages ago manually created a set of drone.io style badges, and added a patch to copy the relevant file across. But then, shortly after, drone.io changed their status badge format. I never got around to redoing my badges, as it was pretty time consuming.

Enter Pillow.

from PIL import Image, ImageDraw, ImageFont

SIZE = (95, 18)

BACKGROUND = hex_colour('#4A4A4A')
SUCCESS = hex_colour('#94B944')
WARNING = hex_colour('#E4A83C')
ERROR = hex_colour('#B10610')

# You may need a different font filename if you aren't on a Mac
FONT = ImageFont.truetype(size=10, filename="/Library/Fonts/Arial.ttf")
FONT_SHADOW = hex_colour('#525252')

PADDING_TOP = 3

def build_image(percentage, colour):
    # Create a brand-new Image object, with the background
    # as the main badge colour.
    image = Image.new('RGB', SIZE, color=BACKGROUND)
    drawing = ImageDraw.Draw(image)
    
    # Write the word 'coverage' in our specified font.
    # Fake a text-shadow by drawing the text twice.
    # TODO: Make the text-shadow better.
    drawing.text((8, PADDING_TOP+1), 'coverage', font=FONT, fill=FONT_SHADOW)
    drawing.text((7, PADDING_TOP), 'coverage', font=FONT)
    
    # Do the percentage text.
    # TODO: Make the text-shadow better.
    # TODO: Make the text centred in the coloured box.
    drawing.rectangle([(55, 0), SIZE], colour, colour)
    drawing.text((63, PADDING_TOP+1), '%s%%' % percentage, font=FONT, fill=FONT_SHADOW)
    drawing.text((62, PADDING_TOP), '%s%%' % percentage, font=FONT)

Creating the required RGB tuple from a hex colour is also fairly easy:

def hex_colour(hex):
    if hex[0] == '#':
        hex = hex[1:]
    return int(hex[:2], 16), int(hex[2:4], 16), int(hex[4:6], 16)

Finally, you can just generate an image for every percentage point, and save them:

SUCCESS_CUTOFF = 85
WARNING_CUTOFF = 45

# range(101) -> [0, 1, 2, ..., 99, 100]
for i in range(101):
    file = open('%i.png' % i, 'wb')
    
    if i < WARNING_CUTOFF:
        build_image(i, ERROR).save(file)
    elif i < SUCCESS_CUTOFF:
        build_image(i, WARNING).save(file)
    else:
        build_image(i, SUCCESS).save(file)

It’s not quite perfect: that isn’t quite the font they use, but it will do for now.

Dear ezyreg

I went to register with EzyReg to set up monthly direct debit for my car registration, and to register I was required to enter a secret question or two.

These secret questions were all essentially publicly available information, or things that were inane enough that I probably would not remember them. Thus, they are useless from both sides of the fence: I would not be able to use the meaningless ones to reset a password (not really that much of a problem), but if I used one of the “one true answer” ones, someone could easily discover, for instance, what my Mother’s maiden name was.

I could enter a random value, but I decided to get on my high horse, and supply feedback.

Then, after writing for a few minutes: “You may only enter 1000 characters in feedback”.

What. The. Fuck.

So, after splitting it into two comments, here is the entirety of my comment to the braindead fuckwads who wrote the registration system:

I want to use monthly direct debit to pay my car registration, but with the current requirement to have an ezyreg account, and the requirement that said account is “secured” with secret questions means that I cannot in good faith complete the registration process.

There are well documented flaws with secret questions as a second-level of security, or that can be used to reset or change a password. This becomes an active attack vector that, in the case of someone who uses good password hygiene, partially defeats the processes I have in place to protect access to my accounts.

This is made even worse in the case of your security questions, of which there are very few, and I am unable to create my own.

Some examples of the arguments against security questions, and why they are a security risk, not an improvement:

http://www.oneid.com/blog/passwords-are-bad-but-security-questions-are-worse/

https://www.schneier.com/blog/archives/2005/02/the_curse_of_th.html

http://www.intego.com/mac-security-blog/your-secret-question-may-not-be-so-secret-easy-to-guess-password-retrieval-questions-you-should-avoid-and-why/

Please consider removing this requirement from your account registration process.

pbgrep: grep your clipboard history

I’ve used ClipMenu as my clipboard history manager for several years now: it’s unobtrusive, and does almost exactly what I need.

Except, you can’t search the clipboard history.

I keep thousands of items in my clipboard history, and today I was trying to find a specific item, that I know was in there. And I couldn’t find it after about a minute of scanning through submenus.

Now, ClipMenu can persist it’s history to disk, in ~/Library/Application Support/ClipMenu/clips.data. Which is a binary plist file.

We can view it using plutil:

$ plutil -p ~/Library/Application\ Support/ClipMenu/clips.data -o -

I made the decision to limit searching for single-line clips: this means I can grep for lines that contain:

<string>.*(QUERY).*</string>

Doing single-line matches means I can use grep (or, as I discovered later, ack), which should be faster than firing up a python interpreter.

My first iteration was:

$ plutil -convert xml1 ~/Library/Application\ Support/ClipMenu/clips.data -o - \
  | grep "<string>.*test.*</string>" -o

This works quite well, but includes the XML string tags. I did strip them out using sed, but this is an extra command. It turns out that grep’s regular expressions can’t handle positive lookahead/behind assertions, and Mac OS X’s grep does not support --perl-mode, so I reached for ack:

function pbgrep() {
  plutil -convert xml1 ~/Library/Application\ Support/ClipMenu/clips.data -o - \
    | ack "(?<=<string>).*$1.*(?=</string>)" -o
}

That now takes pride of place in my .bashrc, and I can pbgrep foo to my hearts content.

I guess I could (if there was only one match), put the value back into the clipboard. That might be kind-of nice.

Custom Element Form Submission

Custom elements are starting to get some traction in Web Development. There have been some really nice recent posts, the one which got me back on track was Web Components: Why You’re Already an Expert, but also Custom Elements: defining new elements in HTML and Performance and Custom Elements. Also, Polymer makes heavy use of Custom Elements, and I’ve had a bit of a look there too.

I really like KnockoutJS, so started playing around with that. I have a nicely defined DatePicker element, that would work well being turned into a Custom Element, and allow me to use the Shadow DOM to hide the internals of the code.

However, I soon hit a problem. Browsers, and jQuery, will only add data to a form when the element type is <input>, <select>, <textarea> and <keygen>. In jQuery, for instance, this is hard-coded in as rsubmittable, see source.

This means that you cannot just use a <x-date-picker> element, and have it submit. You still need to use some type of a hidden <input> element, and link that to the value. “Subclassing” <input> is not sufficient.

This seems to be a fairly large oversight, and I have not been able to find anything else on the internet that discusses this.

I haven’t tried “subclassing” <button> yet, to see if it is possible to create custom elements that can be used to submit forms, rather than provide data for them.

Per-command Virtualenv

Recently, I finally got around to re-installing OS X from scratch on my work machine. It was past time it needed to happen, to the extent where I would frequently be unable to wake machine from display sleep, and saving a file in a monitored directory would take the wsgi-monitor package tens of seconds to restart django.

One thing I wanted to do this time was only install stuff as necssary, but also put every pip installed command line tool in it’s own virtualenv. However, this has one drawback, in that it is a little repetitive.

For instance, to install Fabric, my deployment tool of choice:

$ virtualenv ~/.venv/fabric
$ . ~/.venv/fabric/bin/activate
(fabric)$ pip install fabric
(fabric)$ ln -s ~/.venv/bin/fabric /usr/local/bin/

This is fine if you only have one ‘tool’ to install, but something like docutils actually installs a whole stack of command line tools.

What we want, is something like:

  • create the virtualenv
  • get a list of items already in the <virtualenv>/bin
  • install the required tool (and any extra modules)
  • link all of the newly added commands in <virtualenv>/bin to /usr/local/bin

We could just add each <virtualenv>/bin to our path, but that would mean that the first virtualenv created would be used for pip, which I don’t want installed at all.

Additionally, it would be nice to be able to specify a required version of the package to install, and other (non-dependency) packages that should be installed. For instance, I want mercurial_keyring to be installed in the mercurial virtualenv.

This last one is probably less important, as you can just use that virtualenv’s pip to install them after. But the version number stuff might be nice.

virtualenv has the nice ability to be able to create bootstrap scripts, which will do other stuff (like install specific packages). We can co-opt this to build a tool for doing the automatic installation and linking:

import virtualenv, subprocess

data = """
import os, subprocess

def extend_parser(optparse_parser):
    optparse_parser.add_option(
        "--upgrade",
        action="store_true",
        dest="upgrade",
        default=False,
        help="Upgrade package",
    )
    optparse_parser.add_option(
        "--path",
        dest="path",
        default='~/.venv/',
        help="Parent path of virtualenvs"
    )
    optparse_parser.add_option(
        '--package',
        dest="packages",
        action="append",
        help="Other packages to install"
    )
    
def adjust_options(options, args):
    global package
    if not args: 
        return
    package = args[0]
    if '==' in args[0]:
        args[0], version = args[0].split('==', 1)
    args[0] = os.path.join(os.path.expanduser(options.path), args[0])

def after_install(options, home_dir):
    global package
    venv = os.path.join(os.path.expanduser(options.path), home_dir)
    before = os.listdir(os.path.join(venv, 'bin'))
    command = [os.path.join(venv, 'bin', 'pip'), 'install', package]
    if options.upgrade:
        command += ['--upgrade']
    if options.packages:
        command += options.packages
    subprocess.call(command)
    after = os.listdir(os.path.join(venv, 'bin'))
    
    for command in set(after).difference(before):
        subprocess.call([
            'ln', '-s', 
            os.path.join(venv, 'bin', command),
            '/usr/local/bin'
        ])
"""

output = virtualenv.create_bootstrap_script(data)
open('/usr/local/bin/pip-install', 'w').write(output)
subprocess.call(['chmod', '+x', '/usr/local/bin/pip-install'])

There is one caveat: if an existing file is found in /usr/local/bin that matches one that should be linked, it will be ignored. That is, it does not overwrite existing commands. I think this is preferable, as it is marginally safer.

Linking commands like this is better than copying them, as it means you can just do a pip install --upgrade <package> in the relevant virtualenv, and it will upgrade commands. You can also use pip-install <package>==<new-version>, and that should work too. However, if you unlink a command (or remove one that would have linked but failed), and do a pip-install, it will not link the commands that were already installed in that virtualenv.

Anyway, your mileage may vary. I’m using it now, and it seems good.

Duck-punch misbehaving software

Recently, I found myself having to interact with an API that uses SOAP. I’ve been using the SOAPpy package. Which has made it possible, but not exactly easy. But that’s not what I am going to write about right now.

In order to make the linking of data between my software and that system easier, I needed to get a dump of the other system’s data, in a CSV that I could send to the client.

So, since the SOAPpy module gives you something that looks dict-y, I thought I’d just be able to pass it to csv.DictWriter’s writerow() method.

Not quite.

See, whilst it supports the python dict-like [key] syntax, the SOAPpy.Types.structType doesn’t support the .get() method, that DictWriter uses to extract the data (indeed, it needs to, to be able to trap missing keys).

So, here is a simple duck-punch (does it quack like a duck? No? Punch it harder so, it quacks!) that enables you to pass a structType object to a DictWriter.writerow() method call:

from SOAPpy.Types import structType
structType.get = lambda x,y,z=None : x[y]

In this case, I was able to use this simple case, as I knew the keys it would be asked for would all exist, but you could make a slightly more complex one that checked to see the key exists, and if not, return z. You need all three arguments in the lambda though, since DictWriter passes them in.

I Hate Generic Foreign Keys, but this works anyway

I’m really not a fan of the concept of Generic Foreign Keys. They do have their place, and the app I’ve just started is a reasonable example.

It’s django-activity-streams, and I’m using it essentially as an audit stream. It stores the user who performed the change, the object that was changed, when it was changed, and a serialised version of the fields that have changed, in the format of:

[
  {
    "field": "date_of_birth",
    "old": "1955-01-10",
    "new": "1955-10-01"
  }
]

Now, the complication comes when trying to generate reports based on this stuff, and that is all down to the use of GFKs.

Essentially, what I want to be able to do is:

Action.objects.between(start, finish).verb('updated').filter(
  target__in=queryset
)

But this will not work, as there is no real target field: it’s a GFK field. But we can query on the two fields that make it up: target_content_type and target_object_id.

So, you might think we can do something like:

ctype = ContentType.objects.get_for_model(queryset.model)
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=queryset
)

Alas, this will not work either, as target_object_id is a “character varying”, and a queryset kind-of looks like a set of integers (or whatever the primary key for that table is).

So, we need a list of characters, instead of integers.

pks = map(str, queryset.values_list('id', flat=True))
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=pks
)

Indeed, that works, but (a) it requires two queries (one to get the PKs, and the other to get the actions), and (b) the second query will get very long if there are lots of objects in the queryset.

So, we want a query that we can use as a subquery. Enter postgres:

pks = queryset.extra(select={'_id': 'SELECT CAST("id" AS text)'values('_id')
Action.objects.filter(
  target_content_type=ctype,
  target_object_id__in=pks
)

Bingo:

SELECT
    "actstream_action"."id",
    "actstream_action"."actor_content_type_id",
    "actstream_action"."actor_object_id",
    "actstream_action"."verb",
    "actstream_action"."description",
    "actstream_action"."target_content_type_id",
    "actstream_action"."target_object_id",
    "actstream_action"."action_object_content_type_id",
    "actstream_action"."action_object_object_id",
    "actstream_action"."timestamp",
    "actstream_action"."public",
    "actstream_action"."data"
FROM
    "actstream_action"
WHERE (
    "actstream_action"."verb" = created
    AND "actstream_action"."timestamp" <= 2013-09-11 00:00:00
    AND "actstream_action"."timestamp" >= 2001-01-01 00:00:00
    AND "actstream_action"."target_object_id" IN (
        SELECT
          (SELECT CAST("id" AS text)) AS "_id"
        FROM 
            "people" U0 
        WHERE 
            U0."comp_id" = 1 
    ) 
    AND "actstream_action"."target_content_type_id" = 17
)
ORDER BY
    "actstream_action"."timestamp" DESC;

You can see the subquery SELECT (SELECT CAST(...)) after the IN, which in the previous version was a list of string versions of the ids.

arp -a | vendor

I have lots of things on my local network. Most of them behave nicely with the DHCP server, and provide their machine name as part of their DHCP request (Client ID), which means I can see them in the list in Airport Utility.

However, some of them don’t which means I have some blank rows.

It would be nice to be able to figure out which devices these are, especially for those that don’t provide any services (particulary a web interface).

Enter MAC Vendor Lookup.

You can register, and get an API key that will return values in the format you desire.

Then, it’s possible to do:

$ curl --silent http://www.macvendorlookup.com/api/:API_KEY/:MAC_ADDRESS | cut -f 1 -d \|

(I use the pipe delimited version).

This is all well and good, but who wants to have to type them in? Not this guy.

Let’s look at how we can get them from arp -a.

$ arp -a | cut -f 4 -d ' '

Okay, that’s promising, it gives me a list of MAC addreses. Almost. It skips out leading zeros, which the API rejects. And it includes ones that are missing.

Cue about an hour mucking about with the (limited) sed regex docs:

$ arp -a | 
    cut -f 4 -d ' ' | 
    sed -E 's/:([[:xdigit:]]):/:0\1:/g' | 
    sed -E 's/^.:/0&/' | 
    sed -E 's/:(.)$/:0\1/'

Ah, that’s better. Now we have the proper MAC addresses.

Now, we can pipe this information through the API call.

This is where we need to start to get a bit tricky. We need to create a function that will allow us to call the API with a new value each time. You’ll want to stick this in your .bashrc.

function mac_vendor() {
  $API_URL="http://www.macvendorlookup.com/api"
  $API_KEY="<your api key>"
  if [[ $1 ]]; then
    curl --silent "$API_KEY/:API_KEY/$1" | cut -f 1 -d \|
  else
    while read DATA; do
      curl --silent "$API_KEY/:API_KEY/$DATA" | cut -f 1 -d \|
    done
  fi
}

The if statement means we can use it by passing an argument on the command line:

$ mac_vendor 00:00:00:00:00:00
Xerox Corporation

Or by passing through data from stdin:

$ arp -a | 
    cut -f 4 -d ' ' | 
    sed -E 's/:([[:xdigit:]]):/:0\1:/g' | 
    sed -E 's/^.:/0&/' | 
    sed -E 's/:(.)$/:0\1/' |
    mac_vendor

Okay, that’s nice, but we now can’t see which IP address is associated with which vendor.

Let’s move that ugly chained sed call into it’s own function, called normalise_mac_address, which we will also wrap in a while read DATA; do ... done clause, so we can pipe data through it:

function normalise_mac_address() {
  while read DATA; do
    echo $DATA |
      sed -E 's/:([[:xdigit:]]):/:0\1:/g' |
      sed -E 's/^.:/0&/' |
      sed -E 's/:(.)$/:0\1/'
  done
}

Nearly there!

We now need to be able to grab out the IP address and the MAC address from arp, and pass only the MAC address through our conversion functions. By default the bash for … in … construct will iterate through words, so we need to tell it to deal with a line at a time:

function get_all_local_vendors() {
  IFS=$'\n'
  for LINE in `arp -a | cut -f 2,4 -d ' '`; do
    # We have LINE="(<ip.address.here>) <mac:address:here>"
    MAC=`echo $LINE | cut -f 2 -d ' ' | normalise_mac_address`
    IP=`echo $LINE | cut -f 1 -d ' '`
    # We only want ones that were still active
    if [ $MAC != '(incomplete)' ]; then
      VENDOR=`echo $MAC | mac_vendor`
      echo $VENDOR $IP
    fi
  done
}

I’m hardly a bash expert, so there may be a better way of doing things rather than the repeated VARIABLE=`foo thing` construct I keep using.

So, the outcome I get when I run this looks something like:

$ get_all_local_vendors 
Apple, Inc. (10.0.1.1)
Sparklan Communications, Inc. (10.0.1.3)
Devicescape Software, Inc. (10.0.1.4)
Mitrastar Technology (10.0.1.5)
Apple, Inc. (10.0.1.15)
Silicondust Engineering Ltd (10.0.1.16)
Apple Computer (10.0.1.21)
none (10.0.1.255)

Getting rid of that last one is left as an exercise to the reader: the MAC address is FF:FF:FF:FF:FF:FF.

Django Single Table Inheritance on the cheap.

There was a recent question on Stack Overflow about Django Single Table Inheritance (STI). It got me thinking about how to use my FSM-proxy stuff to just be about STI.

Note: this only works when all sub-classes have the same fields: the example we are going to use here is different to a state machine, in that an object may not change state after it has been created.

class Sheep(models.Model):
  type = models.CharField(max_length=4)
  tag_number = models.CharField(max_length=64)

class Ram(Sheep):
  class Meta:
    proxy = True
  
class Ewe(Sheep):
  class Meta:
    proxy = True

In this case, we can fetch all sheep as Sheep.objects.all(). However, this gives us the objects as Sheep instances, when we want those with type='ram' to return Ram instances, and those with type='ewe' to return Ewe instances.

We can do this, by the magic of type().__subclasses__().

class Sheep(models.Model):
  # fields as above
  
  def __init__(self, *args, **kwargs):
    super(Sheep, self).__init__(*args, **kwargs)
    # If we don't have a subclass at all, then we need the type attribute to match
    # our current class. 
    if not self.__class__.__subclasses__():
      self.type = self.__class__.__name__.lower()
    else:
      subclass = [x for x in self.__class__.__subclasses__() if x.__name__.lower() == self.type]
      if subclass:
        self.__class__ = subclass[0]
      else:
        self.type = self.__class__.__name__.lower()
    

This will automatically downcast Sheep objects to the correct subclass, based upon the type field.

It also sets the type field on objects that are instantiated without one (based on the current instance class). This enables us to do things like:

# Fetch all Sheep, downcast to correct subclass.
>>> Sheep.objects.all()
[<Ram: Ram object>, <Ram: Ram object>, <Ewe: Ewe object>]

# Automatically set the type on a class.
>>> Ram()
<Ram: Ram object>
>>> Ram().type
'ram'
>>> Sheep()
<Sheep: Sheep object>
>>> Sheep().type
'sheep'

# Automatically set the class on a valid subclass/type
>>> Sheep(type='ram')
<Ram: Ram object>
# Force the type field on an invalid type argument. [see below]
>>> Ram(type='ewe')
<Ram: Ram object>
>>> Sheep(type='foo')
<Sheep: Sheep object>
>>> Sheep(type='foo').type
'sheep'

The assumption I have made here is that when instantiating a class, and the type value is not a valid value (our class, or one of our subclasses), then it changes the type field to the current class.

The other assumption is that the parent class is also valid. In this case, it wouldn’t be, as sheep must be either ewes or rams (or wethers, but that’s another story).

We also need to be able to fetch Ewe and Ram objects using their manager. This is just as simple as filtering on the type.

class ProxyManager(models.Manager):
  def get_query_set(self): # Note: get_queryset in Django1.6+
    return super(ProxyManager, self).get_query_set().filter(type=self.model.__name__.lower())

class Ram(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

class Ewe(Sheep):
  objects = ProxyManager()
  class Meta:
    proxy = True

Now, we can do:

>>> Ram.objects.all()
[<Ram: Ram object>, <Ram: Ram object>]

Clearly, the models have been simplified: I have not shown any model methods that would be the different behaviours that the subclasses have.

Who is having a birthday?

The latest thing I have been working on is notifications for our project. One of the required notification types is upcoming (and today’s) birthdays (and expiring work visas, but that’s a much easier problem).

This actually turns out to be quite a hard problem. There are some simple solutions, but they all do not meet our requirements:

  1. Store only the month and day of a person’s birthday. This is unsatisfactory as we use their age to calculate their wage, if applicable.
  2. Create a pseudo-column that contains their upcoming birthday. This gets hard when you take leap-day birthdays into account.

We need to fetch all people who have a birthday coming up in the next X days. This is a requirement because if we just matched people who had a birthday in X days, (a) leap days are easy to miss, and (b) changes to either the query period or a person’s birthday could mean some events were missed.

Instead, we will query for birthdays in a range, and see if we have already sent a notification for this instance of their birthday. If not, we will send a notification.

One solution is to look at all of the dates in the given range, in the format -%m-%d, and query using contains against this list:

dates = [start+datetime.timedelta(i) for i in range((finish-start).days)]
filters = [Q(dob__contains=x.strftime('-%m-%d')) for x in dates]
Person.objects.filter(reduce(operator.or_, filters))

But, this too fails when a birthday on a leap day exists, and this year is not a leap year.

(We use the -%m-%d format instead of %m-%d so we don’t get false matches from the year part of the date).

Then I came across a post by Zoltán Böszörményi, that contains the following useful function:

CREATE OR REPLACE FUNCTION indexable_month_day(date) RETURNS TEXT as $BODY$
  SELECT to_char($1, 'MM-DD');
$BODY$ language 'sql' IMMUTABLE STRICT;

There are a couple of things to notice: it does MM-DD, not the other way around. This allows us to sort lexically. Also, declaring it as IMMUTABLE means we will be able to create an index using it. And since we are querying against it, having an index may be useful:

CREATE INDEX person_birthday_idx ON people (indexable_month_day(dob));

Now, we can also query against this. I like to use django queryset methods (see building a higher-level query API), so my stuff looks like:

class BirthdayQuerySetMixin(object):
    def birthday_between(self, start, finish):
        assert start <= finish, "Start must be less than or equal to finish"
        start = start - datetime.timedelta(1)
        finish = finish + datetime.timedelta(1)
        return self.extra(where=[
            """
            indexable_month_day(dob) < '%(finish)s' 
            %(andor)s 
            indexable_month_day(dob) > '%(start)s'
            """ % {
                'start': start.strftime('%m-%d'),
                'finish': finish.strftime('%m-%d'),
                'andor': 'and' if start.year == finish.year else 'or'
            }
        ])
      
    def birthday_on(self, date):
        return self.birthday_between(date, date)

This has a caveat: it returns two matches for leap-day birthdays during non-leap-years. This is intentional, as other logic will prevent duplicate notifications, and we don’t know which offsetting method people will prefer.

The logic behind it is that it offsets the start and the finish by one day each, and then filters using less_than and greater_than. This is what allows us to find leap-day birthdays. The other tricky part is using AND when the years of the start and finish are the same, and OR if the finish is in the next year. This allows it to match over year boundaries.

Oh, there should also be a check to ensure that we have less than a full year between start and finish: if it’s a year or more, we can just return everyone!

Otherwise, it’s all good, and we can use it to filter a queryset. I’ve put those methods in my PersonManager and PersonQuerySet (via PassThroughManager), so I can do things like:

>>> today = datetime.date.today()
>>> Person.objects.birthday_between(today, today + datetime.timedelta(7))

… which provides me with a list of people who have a birthday within the next seven days.