Models from JSON(B)

One of the things we’ll often try to do is reduce the number of database queries. In Django, this is most often done by using the select_related queryset method, which does a join to the related objects, thus returning the data from those, and then automatically creates the instances for those objects.

This works great if you have a foreign key relationship (for instance, you are fetching Employee objects, and you also want to fetch the Company for which they work).

Tt does not work if you are following the reverse of a foreign key, or a many-to-many relation. You can’t select_related to get all of the employee’s locations at which she can work, for instance. In order to get around this, Django also provides a prefetch_related queryset method, that will do a second query and fetch all of the related objects for all of the objects in the initial queryset. This is evaluated at the same time as the initial queryset, so works pretty well during pagination, for example.

But, we don’t always want all of the objects: sometimes we might only want the most recent related object. Perhaps we have a queryset of Employee objects, and we want their most recent EmploymentPeriod. If we just want one field from that object (their start date, for instance), then we can do that using a subquery:

employment_start = EmploymentPeriod.objects.filter(
    employee=OuterRef('pk'),
).order_by('-start').values('start')[:1]

employees = Employee.objects.filter(
    company=company,
).annotate(
    start_date=Subquery(employment_start)
)

We can go a little bit further, and limit to only those currently employed (that is, they have no termination date, or their termination date is in the future).

employment_start = EmploymentPeriod.objects.filter(
    employee=OuterRef('pk'),
).filter(
    models.Q(finish__isnull=True) |
    models.Q(finish__gte=datetime.date.today())
).order_by('-start').values('start')[:1]

employees = Employee.objects.filter(
    company=company,
).filter(
    # New Django 3.0 feature alert!
    Exists(employment_start)
).annotate(
    start_date=Subquery(employment_start)
)

We could rewrite this to remove the OR, perhaps by using a DateRange annotation and using an overlap: but we’d want to ensure there was an index on the table that postgres would be able to use. Alternatively, if we stored our employment period using a date range instead of a pair of date fields, but that makes some of the other queries around this a bit more complicated.

But this does not help us if we want to get the whole related object. We could try to use a Prefetch object to obtain this:

Employee.objects.filter(
    company=company,
).prefetch(models.Prefetch(
    'employment_periods',
    queryset=EmploymentPeriod.objects.order_by('-start')[:1], to_attr='current_employment_periods',
))

But this will not work: because the filtering to the employee’s own employment periods is not applied until after the slice. We could do the ordering, and then just select the first one as the current employment period, but we would then still be returning a bunch of objects when we only need at most one.

It would be excellent if we could force django to join to a subquery (because then we could use select_related), however that’s not currently possible (or likely in the short term). Instead, we can turn our EmploymentPeriod into a JSON object in tha database, and then turn that JSON object into an instance of our target model.

class ToJSONB(Subquery):
    template = '(SELECT to_jsonb("row") FROM (%(subquery)s) "row")'
    output_field = JSONField()


current_employment_period = EmploymentPeriod.objects.filter(
    employee=OuterRef('pk')
).order_by('-start')[:1]

employees = Employee.objects.filter(
    company=company
).annotate(
    current_employment_period=ToJSONB(current_employment_period)
)

This gets us part of the way, as it turns our employment period into a JSON object. Let’s try turning that JSON object into an instance:

class EmployeeQuerySet(models.query.QuerySet):
    def with_current_employment_period(self):
        current_employment_period = EmploymentPeriod.objects.filter(
            employee=OuterRef('pk')
        ).order_by('-start')[:1]

        return self.annotate(
            current_employment_period=ToJSONB(current_employment_period)
        )


class Employee(models.Model):
    # field definitions...

    objects = EmployeeQuerySet.as_manager()

    @property
    def current_employment_period(self):
        return getattr(self, '_current_employment_period', None)

    @current_employment_period.setter
    def current_employment_period(self, value):
        if value:
          self._current_employment_period = EmploymentPeriod(**value)

And now we can query and include the current employment period (as an instance), without having to have an extra query:

>>> Employee.objects.with_current_employment_period()[0].current_employment_period
<EmploymentPeriod: 2019-01-01  >

To improve this further, we could allow assigning an EmploymentPeriod object to the value (instead of just the dict), and if that is the case, we could write that value to the database, but that’s probably going to play havoc with constraints that would prevent overlaps.