Django Implied Relationship

A little while ago, Alec McGavin put up a post on the Kogan blog about Custom Relationships in Django. This is a really cool way to get a relationship in Django that exists in the database, but cannot be modelled correctly in Django. For instance, this could be data in the database that does not have a Foreign Key, either because it’s legacy data, or because it’s possible either side of the relationship might be added to the database before the other, rather than the normal order in a database where the target table of the FK is always added to first.

However, I have another slighly different situation where an implied relationship exists, but should not be stored directly.

Consider the following data structures:

class Employee(models.Model):
    name = models.TextField()


class EmploymentPeriod(models.Model):
    employee = models.ForeignKey(
        Employee,
        related_name='employment_periods',
        on_delete=models.CASCADE,
    )
    valid_period = DateRangeField()

    class Meta:
        constraints = [
            ExclusionConstraint(
                name='employment_overlap',
                expressions=[
                    ('employee', RangeOperators.EQUAL),
                    ('valid_period', RangeOperators.OVERLAPS),
                ]
            )
        ]


class Shift(models.Model):
    employee = models.ForeignKey(
        Employee,
        related_name='shifts',
        on_delete=models.CASCADE,
    )
    date = models.DateField()
    start_time = models.TimeField()
    duration = models.DurationField()
    employment_period = Relationship(
        EmploymentPeriod,
        from_fields=['employee', 'date'],
        to_fields=['employee', 'valid_period'],
    )

    @property
    def start(self):
        return datetime.datetime.combine(self.date, self.start_time)

    @property
    def finish(self):
        return self.start + self.duration

Now, there is a direct relationship between Shift and Employee, and also between EmploymentPeriod and Employee, but there could be an inferred relatonship between Shift and EmploymentPeriod. Because of the exclusion constraint, we know there will only be one EmploymentPeriod for a given Shift.

It would be really nice to be able to create this relationship, so we can reference the employment period (or lack thereof) directly. The Relationship class above goes close, but tries to use an equality check between the date and date_range fields.

It turns out, we can add a bit to that class, and teach it how to handle the various ways this relationship can be accessed:

  • Shift().employment_period
  • EmploymentPeriod().shifts
  • Shift.objects.select_related(‘employment_period’)
  • EmploymentPeriod.objects.prefetch_related(‘shifts’)

Let’s have a look at the extended class:

from django.db import models


class Relationship(models.ForeignObject):
    """
    Create a django link between models on a field where a foreign key isn't used.
    This class allows that link to be realised through a proper relationship,
    allowing prefetches and select_related.

    Non-exact relationships can also be created, currently this supports
    DateField->DateRangeField. In any case, it is expected that there will
    only be one target object for a given source object (ie, a ForeignKey).

    https://schinckel.net/2021/07/14/django-implied-relationships/

    Based on an idea from:

    https://devblog.kogan.com/blog/custom-relationships-in-django
    """

    def __init__(self, model, from_fields, to_fields, **kwargs):
        super().__init__(
            model,
            on_delete=models.DO_NOTHING,
            from_fields=from_fields,
            to_fields=to_fields,
            null=True,
            blank=True,
            **kwargs,
        )

    def contribute_to_class(self, cls, name, private_only=False, **kwargs):
        # override the default to always make it private
        # this ensures that no additional columns are created
        super().contribute_to_class(cls, name, private_only=True, **kwargs)

    def resolve_related_fields(self):
        """
        We need to filter out fields that will use a non-exact match,
        as the Django ORM uses self.related_fields for the JOIN conditions,
        but also for the filtering when fetching a related instance.
        """
        exact_fields = []
        in_fields = []
        for lhs, rhs in super().resolve_related_fields():
            if get_key(lhs.cached_col, rhs.cached_col) in JOIN_LOOKUPS:
                in_fields.append((lhs, rhs))
            else:
                exact_fields.append((lhs, rhs))
        self._extra_related_fields = in_fields
        return exact_fields

    def get_extra_restriction(self, where_class, alias, remote_alias):
        """
        This method is used to get extra JOIN conditions.

        We don't need to include the exact conditions, only those
        that we filtered out from the regular related_fields.

        """
        if self._extra_related_fields:
            cond = where_class()
            for local, remote in self._extra_related_fields:
                local, remote = local.cached_col, remote.cached_col
                lookup = local.get_lookup(JOIN_LOOKUPS[get_key(local, remote)])(local, remote)
                cond.add(lookup, 'AND')

            return cond

    def get_extra_descriptor_filter(self, instance):
        """
        The sibling to get_extra_restriction, this one is used to get the extra
        filters that are required to limit to the correct objects.
        """
        extra_filter = {}
        for lhs, rhs in self._extra_related_fields:
            lookup = JOIN_LOOKUPS[get_key(rhs.cached_col, lhs.cached_col)]
            extra_filter[f'{rhs.name}__{lookup}'] = getattr(instance, lhs.attname)
        return extra_filter


# It should be possible to just add more field pairs
# here and get functionality for other implied relationship
# types.
JOIN_LOOKUPS = {
    ('DateField', 'DateRangeField'): 'contained_by',
}


def get_key(lhs, rhs):
    return (lhs.output_field.get_internal_type(), rhs.output_field.get_internal_type())

Optus Billing Misleading Chart

I really hate when people use charts or graphs in a misleading manner. It’s even worse when it’s big companies, and in my latest bill from Optus, I noticed a particularly egregious chart.

Chart with really misleading scaling on the axis

For those that don’t notice at first (it’s taken me a bunch of billing months to notice, actually), the middle line is slightly longer than the other two. In that week, I went over my data plan, and had to pay an extra $10 to get more data.

I don’t have a problem with this, but I do have a problem with the scaling on the axis.

The shorter ones are months where I was billed $35. The longer one is where I was billed $45.

That’s a difference of $10, which is ~30%. However, the middle bar in the chart is not even close to 20% longer. It looks to me to be around 20 pixels longer, or around 3.5%

So, when you overspend on your bill, it looks like you are only paying a small amound extra, when in fact I am paying a significant amount more on that given bill.

The only way you could have scaling like this would be if you had a logarithmic scale. Which, granted, many more people are familiar with now than before the pandemic, but you don’t use a logarithmic scale when your values are all of the same order of magnitude, unless you want to mislead people.

I find it pretty unlikely that a high proportion of Optus’s customers have a monthly billing variance that differs by orders of magnitude. I also find it pretty unlikely that this is a mistake - someone at Optus (and I suspect not someone in their tech department, great work dumping on them chat support person) has decided that their charts should be tweaked so that overspends look like they are “not that big”.

Who wants to bet Optus never get back to me about my complaint?