Close Contacts and Community Transmission

The Australian government (and specifically the South Australian government) have made a big deal about how, so far at least, there has been virtually no “Community Transmission”. That it’s all been imports from overseas, other states, or “Close Contacts”.

But what do these terms “Close Contacts” and “Community Transmission” mean, in the scope of what normal people would assume they mean, and what the governments are using them to mean.

To me, a “Close Contact” would be a member of my immediate family, or perhaps one of the three people who share my office. Possibly even the other 6 people who work in my company who also work in the same place as me - since the air we all breathe is the same, and we share the same kitchen and bathroom facilities. Also, you’d probably count my extended family, although I don’t see them often, when I do see them we in close physical proximity. Indeed, even our close friends, who we don’t see as often as we would like, the sort of ones we catch up with every few months for dinner would count as close contacts.

A generous expansion of this might include the parents we see, and chat with every morning doing school drop-off. And of course, our kids’ close contacts would include their teacher and their closest friends. Note the word close.

So, you’d think that “Community Transmission” would include everyone else. People you come across in service roles, like the guy at the sushi bar, or your Barista. Maybe your neighbour that you chat to while walking your dog. The other parents who you stop briefly to and say hello. The Principal of the school, who you talk to probably once a week.

But, according to The Conversation AU, these people would be counted by the official tally as “Close Contacts”.

I think this belies how serious of a mismatch there is between the government’s current position and communication and reality there is. Keeping schools open will not, according to their definition, cause Community Transmission. It cannot, because by definition, this would count as Close Contacts.

Perhaps this is part of the reason they want to keep schools open. Being able to claim that cases are still being spread only through Close Contacts, when in reality the virus is being transmitted through the school COMMUNITY.

Maybe I’m turning into a tin-hatter, but I feel like our government is not trying hard enough to reduce the growth in number of cases. At this stage, the number of cases in South Australia is speeding up: it was growing at 23%, but now seems to be growing at 33%. Sure, some of this is better testing, but it doesn’t seem at this stage that many of the measures they have started to take have had much impact on the spread.

Lockdown: Day 6

So I guess this is becoming a bit of a journal. I missed yesterday though… ;)

The Australian government announced further economic aid to people who are newly unemployed or under-employed due to COVID-19. Notably, this included doubling the Jobseeker allowance (or whatever it’s currently called). This is surely a kick in the guts for the previously unemployed. Does it not state that the previous amount was way less than what an average person needs to survive? But now there are going to be significant numbers of people who until a week ago were fully-employed, the amount doubles.

Interesting.

We are going to take the boys out for a walk in the local National Park. I don’t know how draconian lockdown rules are going to be in the future. Keeping schools open seems to suggest they don’t really give a fuck though.

Our internet was pretty crappy all day yesterday; and a bit intermittent on Friday. Right now, it’s totally gone - allegedly there is an issue at some place that is up the chain from me: they have turned the power off there and are working on it. Hopefully it comes back on today.

Using Zoom for remote Cello lessons

With COVID-19 starting to cause significant lockdowns, our music lessons have moved to online. Our piano and cello teachers are both using Zoom for this.

Piano was fine, but the Cello teacher noticed that when my eldest son was playing some notes (especially the lower ones), the audio was cutting out. When he mentioned it was fine at the start of the note, but dropped in the middle, and then came back at the end, it triggered something in my brain.

Turns out Zoom has automatic background noise cancellation. This was picking up the low sounds of the Cello and blocking them.

This is fairly easy to fix:

SettingsAudioAdvanced

Then, set both types of background noise suppression to disabled:

Zoom Settings Audio Advanced

I’m not sure which one is more important: you’d think it was probably intermittent, but it was easy enough in our case just to leave them both set to disable.

Lockdown: Day 4

It’s Friday. End of our first week of working and schooling from home.

We bought a new iPad for the boys, which arrived within 25 hours. They are now using it attached to the trampoline; the old iPad was not working with that, so that’s a bonus. We also bought a water filter, so it’s been a productive week.

Working from home has been fine. I’m sure I’m not as productive as I was in previous weeks, but I don’t know how much of that is COVID-19 in general playing on my mind.

Australia’s infection rate is still above what my 23% compounding increase model suggests. https://covid-19-au.github.io is currently showing 854 confirmed cases, whilst my simple predictions said 835 by today. Watch it go over 1000 tomorrow.

According to the Guardian, our rates are slightly better than most other countries, although only Japan is an outlier there. Since that’s a logarithmic scale, we are perhaps even better off than it seems. Not sure how much of that is just insufficient testing though. On the same page, you can see that our total number of deaths is low: however, there is nothing that shows deaths-per-infection.

SA still sits at 50, which is about 30% better than predicted. However, we are not doing nearly as much testing as NSW.

Recommendations are to stay at home all weekend. I suspect I will be taking the boys somewhere - perhaps we could go for a walk in BNP.

Lockdown: Day 3

Took the boys down to the park this afternoon. They hadn’t left the house since Monday evening.

The school adjacent to the park was closed (due to a case of COVID-19), but otherwise, it didn’t feel that different. There were a couple of other groups of people there, and we ran/scooted/played for about an hour.

Getting them out of the house was really good - not only did they get to burn a bunch of energy, but it seems to have made them better able to play together this evening. They’ve been really good so far in terms of not fighting that much, but it is only day 3.

I’d be so happy if we turn out to have gone way too gung-ho on this - if South Australia (and Australia) manage to get this outbreak under control in weeks, rather than many months, I’ll be happy to cop shit for going overboard and pulling the kids out of school. But the number of cases has continued to grow at a rate higher than 23% per day.

My modelling, which is simplistic, looks like this:

    date    │ infected
────────────┼──────────
 2020-03-17 │      449
 2020-03-18 │      552
 2020-03-19 │      679
 2020-03-20 │      835
 2020-03-21 │     1027
 2020-03-22 │     1263
 2020-03-23 │     1553
 2020-03-24 │     1910
 2020-03-25 │     2349
 2020-03-26 │     2889
 2020-03-27 │     3553
 2020-03-28 │     4370
 2020-03-29 │     5375
 2020-03-30 │     6611
 2020-03-31 │     8132
 2020-04-01 │    10002

That’s stopping at 10k, by April 1st.

Currently, it’s 2020-03-19, and there are 691 confirmed cases.

Lockdown: Day 2

I really don’t know how this is going to play out.

It seems that the Australian government just doesn’t want to shut schools because they know when they do shut them, they won’t be re-opening them for many months. What they’d prefer to do is hold that off as long as possible, at the cost of people’s lives.

Currently, in Australia, the number of cases of COVID-19 appears to be growing at a rate exceeding 23% per day.

At that rate, we’ll have over 1000 cases by the weekend, and 10k within a fortnight or so. Within three weeks, there will no longer be any where near enough intensive care beds or ventilators to support the number of critically ill patients, and we’ll start to see Italy-like levels of mortality.

How will schools stay open when teachers are dying.


Our parents don’t seem to realise how dangerous this is.

My parents are apparently still planning on travelling from a rural town of 6000 people (they actually live on a farm, so that’s even better), where they are relatively safe, to Adelaide to attend a funeral. I understand that it’s one of their friends that died, but to be honest, there will be a lot more of those in the not too distant future.

And my partner’s mother was planning on going to the hairdresser tomorrow.


Personally, we have savings, and I have a stable job (for now). But considering the number of our customers who are service industry, quite a number of which I have little doubt will be out of business in short order.

There will be a new normal, we just don’t know what that is going to look like.

Lockdown: Day One

The Australian government is in denial.

It’s becoming clear that the global health crisis from COVID-19 is very likely to worsen quite quickly. From information I have come across from reliable medical sources (mostly from the ABC), it seems that our government is probably not doing enough to slow the growth of cases, and that the health risks to all age groups are significant, not just the elderly or vulnerable. Similarly, it feels like our health system has every likelihood of being overwhelmed by the increasing case load of critically ill people.

Because I have the “luxury” of being able to work from home (and still be gainfully employed), I am now doing so; additionally we are keeping our children home from school. Whilst there is probably very little practical risk from any two students attending one specific school, every person you come into contact with, and more so every person you remain in contact with for a significant period of time is an increased risk.

Other countries have been recommending that gatherings larger than 5 people are sufficient to continue the spread of the virus, and whilst it may seem that Australia (and Adelaide specifically) has less risk right now than some European and Asian countries, we have already seen Victoria declare a State of Emergency. Yet our government maintains that 500 people is still safe.

The number of confirmed cases in Australia has reached a higher proportion of the general population than that of the United States, which has many states and cities implementing strict shutdown policies.

The actual number of confirmed cases has started to grow exponentially, with an increase of almost 50% in one day (20 to 29 on March 16th). The school 400m away from where I live has a confirmed case - the principal of my children’s school has a child that attends there. The school where my niece attends school has a confirmed case.

Around 4000 doctors sent a petition to the government asking for strict lockdowns, fearing that our outcomes could be worse than that of Italy.

The government continues to maintain that this disease is just not that risky for young people.

This is just a flat-out lie.

The only reason we have not had deaths in Australia of young people is just a matter of numbers. Currently, anyone who has had severe respiratory issues has been able to receive artificial respiration. This will change, as there are only so many resipirators in Australian hospitals. As more people are infected, more will get serious complications, and some of them will miss out. Either on a respirator altogether, or later in the epidemic, perhaps even a bed.

Children appear not to be having severe complications, or even especially strong symptoms, but there is evidence that asymptomatic people shed a higher amount of the virus, and children continue to have the virus in stool samples well after they are deemed to be “clear” according to a throat swab.


So, my kids, partner and I stayed home today. I think it’s still fun for them - we have tried to help them understand the scale of this problem (“We could be here, just the four of us together for at least two weeks, maybe even longer”), but the novelty (oops) of being home-schooled has yet to wear off.

I’m thankful I have the means and opportunity to work from home, and have my immediate family here with me. We aren’t going to see the boys’ grandparents for some weeks, and I’m sure we’ll all go batshit insane from being around one another non-stop for an indefinite period of time.

Stay safe.

Modelling a residential battery in Postgres

Having five-minute interval generation and consumption data from my PV system means that, in theory at least, I should be able to do some predictive modelling of what adding a battery to the system will do to my import and export levels, and more importantly, determine how much money having a battery of a specific capacity and efficiency would save me.

There are several approaches that could be used to model a battery. They all boil down to the same main idea:

  • The battery has a fixed capacity
  • The battery has a current level
  • If generating more power than consuming, this excess could be fed into the battery (assuming the battery is not full)
  • If the battery is full, any excess generation is exported
  • If we are consuming more power than generating, and we have battery level > 0, then we draw power from the battery
  • Any power we are drawing from the battery is divided by the round-trip efficiency of the battery
  • Any excess consumption that cannot be supplied by the battery results in imports

There are some assumptions that this model makes.

  • There is no current (as in Amperes, not “now”) limit to either exporting or importing
  • If we are both importing and exporting, the battery can handle both of these. In practice, most batteries take some time to switch between charging and discharging
  • The usable capacity is the same as the battery capacity

In all of these cases, the limitations we are not addressing would more than likely result in a poorer outcome than the model will predict.

So, on to the technical details of creating the model. In the first instance, I tried using a recursive view (using in this case the 13.5kWh/90% efficient Tesla Powerwall):

CREATE OR REPLACE RECURSIVE VIEW battery_status(
  timestamp,
  capacity,
  efficiency,
  level,
  import,
  export
) AS

SELECT '2019-03-07'::TIMESTAMP,
       13500 AS capacity,
       0.90 AS efficiency,
       0::NUMERIC AS level,
       0::NUMERIC AS import,
       0::NUMERIC AS export

 UNION ALL

SELECT battery.timestamp + INTERVAL '5 minutes',
       battery.capacity,
       battery.efficiency,
       GREATEST(
         0,
         battery.level + delta.amount_to_battery - delta.amount_from_battery
       )::NUMERIC(12, 3) AS level,
       (summary.import - delta.amount_from_battery)::NUMERIC(8, 3) AS import,
       (summary.export - delta.amount_to_battery)::NUMERIC(8, 3) AS export
  FROM battery_status battery
  LEFT OUTER JOIN summary ON (summary.timestamp = battery.timestamp + INTERVAL '5 minutes')
 INNER JOIN LATERAL (
   SELECT LEAST(summary.import, battery.level / battery.efficiency) AS amount_from_battery,
          LEAST(summary.export, battery.capacity - battery.level) AS amount_to_battery
 ) delta ON (true)

Whilst this sort-of worked, it was unsatisfactory in a couple of ways. Firstly, it took ages to run. Way longer than I expected, even when I used materialised views, and changed the interval to hourly instead of 5 minutely.

Additionally, because I have some gaps in my data (for instance, when my power was disconnected, or the inverter was off because I was cleaning the panels, or whatever), the view stopped recursing at this point, so I was never actually able to get a result that went for more than a couple of months. Even generating missing values seemed to be insufficient, so at some point I gave up on this.

I even tried the same approach on a daily summary - this was never going to give me anything close to an accurate result, but at that point I was stretching thin, so tried a bunch of things.

It seemed unusual that it was taking a long as it did (dozens of seconds, and that was only to build a few months of data). It should only need to pass over the summary data once, and store…that gave me an idea. I could write a fairly simple python generator function to perform the modeling.

def battery_status(capacity, efficiency):
    result = plpy.execute('SELECT * FROM summary')
    energy = 0

    for row in result:
        export_amount = row['export']
        if export_amount and energy < capacity:
            energy += export_amount
            if energy > capacity:
                export_amount = energy - capacity
                energy = capacity
            else:
                export_amount = 0
        import_amount = row['import']
        if import_amount and energy > 0:
            energy -= import_amount / efficiency
            if energy < 0:
                import_amount = -energy * efficiency
                energy = 0
            else:
                import_amount = 0
        yield (row['timestamp'], energy, import_amount, export_amount)

Following it through, we can see some benefits to this approach. We don’t need to pass the capacity and efficiency (or the current energy level in the battery) through to each iteration: because this is a generator, the state remains in the function, and those variables are just re-used over and over again.

The only thing that might be unusual there is the plpy.execute('SELECT * FROM summary'): this hints that this code is actually going inside a plpythonu function and living inside the database. We can use a composite type to make the return values easier to deal with:

CREATE TYPE battery_status AS (
    timestamp TIMESTAMP,
    stored NUMERIC,
    import NUMERIC,
    export NUMERIC
);

CREATE FUNCTION battery_status(capacity NUMERIC, efficiency NUMERIC)
RETURNS SETOF battery_status AS $$

result = plpy.execute('SELECT * FROM summary')
energy = 0

for row in result:
    export_amount = row['export']
    if export_amount and energy < capacity:
        energy += export_amount
        if energy > capacity:
            export_amount = energy - capacity
            energy = capacity
        else:
            export_amount = 0
    import_amount = row['import']
    if import_amount and energy > 0:
        energy -= import_amount / efficiency
        if energy < 0:
            import_amount = -energy * efficiency
            energy = 0
        else:
            import_amount = 0
    yield (row['timestamp'], energy, import_amount, export_amount)

$$ LANGUAGE plpythonu STRICT IMMUTABLE;

Again, I’ll reiterate that in this context, we apply both charging and discharging in the same period (which could happen), but assumes that there is zero turnaround time required between charge and discharge, and that there is no limit to the charge and discharge rate. We could probably improve the model to take those two into account fairly easily though.

To revisit what the other data stores look like, let’s have a look at my summary view:

CREATE OR REPLACE VIEW summary AS

SELECT "timestamp",
       COALESCE(generation.energy, 0::double precision) AS generation,
       COALESCE(import.energy, 0) AS import,
       COALESCE(export.energy, 0) AS export,
       import.energy::double precision
           + generation.energy
           - export.energy::double precision AS consumption
  FROM generation
  JOIN import USING ("timestamp")
  JOIN export USING ("timestamp")

My generation table just contains the generation for each interval: the import and export are both views that use the cumulative import/export totals and calculate the value by subtracting the previous row:

CREATE OR REPLACE VIEW import AS

SELECT import_cumulative."timestamp",
       import_cumulative.energy
           - lag(import_cumulative.energy, 1) OVER (ORDER BY import_cumulative."timestamp")
           AS energy
  FROM import_cumulative
 ORDER BY import_cumulative."timestamp";

We can create a materialised view of the battery status per-day, since that’s all we will need for the purposes of cost calculations:

CREATE MATERIALIZED VIEW daily_summary_battery AS

SELECT battery_status."timestamp"::date AS "timestamp",
       sum(battery_status.import) AS import,
       sum(battery_status.export) AS export
  FROM battery_status(13500::numeric, 0.9)
       battery_status("timestamp", stored, import, export)
 GROUP BY (battery_status."timestamp"::date)
 ORDER BY (battery_status."timestamp"::date);

So, lets have a look at what results we get.

Refreshing the materialised view takes me around 9 seconds (this is quite stable), and I currently have about 106000 records (370 days of data, 288 records per day if we got them all). This is nice: it’s not immediate, but not going to be a deal-breaker if I need to run a bunch of different models of different battery capacity or efficiencies.

I can use the same cost functions from my actual cost calculator to look at what this battery model will do to my costs:

CREATE VIEW daily_battery_cost AS
SELECT daily_summary_battery."timestamp" AS date,
       daily_summary_battery.import,
       daily_summary_battery.export,
       cost(retailer.*,
            daily_summary_battery.import::integer,
            daily_summary_battery.export::integer) AS cost,
       billing_periods.period
  FROM daily_summary_battery
  JOIN billing_periods ON daily_summary_battery."timestamp" <@ billing_periods.period
  JOIN retailer ON retailer.name = billing_periods.retailer
 ORDER BY daily_summary_battery."timestamp";

And for comparisons in my billing periods:

CREATE VIEW billing_period_battery_costs AS
SELECT daily_battery_cost.period,
       sum(daily_battery_cost.cost) AS cost
  FROM daily_battery_cost
 GROUP BY daily_battery_cost.period
 ORDER BY daily_battery_cost.period;

CREATE VIEW billing_period_battery_savings AS
SELECT costs.period,
       costs.cost - battery.cost AS savings
  FROM billing_period_costs costs
  JOIN billing_period_battery_costs battery USING (period)
 ORDER BY costs.period;

And for a per-day saving: it works out to around $1.05 per day.

Which is ~$380 per year.

A Tesla Powerwall 2 battery would cost me (excluding installation, but including the SA battery rebate), in excess of $5000. That’s still almost a 15 year payback time.


What’s quite amusing is that I distinctly remember calculating what I thought a battery would save me shortly after getting my PV system installed. I have such a strong recollection of it that I can remember exactly where I was: I was out running at lunch time, and was partway up the hill at the back of Flinders Uni.

And my in-head calculations were that it would save me $1 per day.


I’m actually more interested in a DC coupled battery system, partly because of the reduction in round-trip efficiency losses, but more so because I should be able to stick more panels in that would just feed the battery. I’m especially interested in the Fronius Gen24 inverters and a compatible battery, however there aren’t clear details on what that would cost. The benefit of this is that in SA there is currently a 10kW inverter limit; this includes the inverter in an AC coupled battery like the Tesla Powerwall.

These are likely to come midway through this year. I’ll be interested to see the pricing.

I think I’ll need to do a bit more on my calculations to be able to model this: my panel_factor calculations have a 5kW limit - this goes away, and instead the limit applies after taking the battery charging into account. But this can wait for another day.

QuerySets of various models

In general, the first thing we try to do to reduce the load time of a page in Django is to reduce the number of queries we are making to the database. This often has the greatest impact, sometimes orders of magnitude greater than other improvements.

One problem I recently hit with a specific set of pages was that there are potentially seven different models that may have zero or more items for a given request. This could mean we do seven queries that are all empty.

These are for all distinct models, but in this case, they are used for the same purpose, and this case, we always need all of them, if there are any relevant records. Whilst there are seven models, there may be more in the future.

Each of these models has a specific method, that validates some data against the rules that are applicable to that model, and the stored attributes. But each model has a different set of attributes, and future models will more than likely have different attributes.

There are at least three different ways we could have solved this problem. It turns out we have solved similar problems in the past the first two ways, and the third one is the one I’ve used in this case.


Solution 1: Store all data in the same model, and use a JSON field to store the attributes specific to this “class”.

This also requires a “type” field of some sort. Then, when loading data, we have the model use this type field to work out which attributes should apply.

This has a bunch of problems. First and foremost is that it becomes far more difficult to get the database (postgres, in this case) to apply integrity constraints. It’s not impossible, but it is much harder to read a constraint that checks a field and performs a JSON expression constraint. Changing a constraint, assuming it’s done using a check constraint and not a trigger, is still possible, but is likely to be harder to understand.

Secondly, it no longer becomes “automatic” to get Django Model Form support. It’s not impossible to use a Django Model Form, but you need to work a bit harder to get the initial data in, and ensure that the cleaned data for the fields is applied to the JSON field correctly.

Finally, as hinted above, using a “type” field means the logic for building the checks is more complex, unless you use class swizzling and proxy models or similar to have a different class for each type value. If an instance was accidentally updated to the wrong type, then all sorts of things could go wrong.

This was the first solution we used for our external integrations, and whilst convenient at some level, turned out to be much harder to manage than distinct model classes. It is not subject to the problem that is the basis of this article: we can always fetch objects of different logical types, as it’s all the same model class. Indeed, to only fetch a single class, we need to perform extra filtering.


Solution 2: Use concrete/multi-table inheritance.

This is the solution we moved to with our external integrations, and has been much nicer than the previous solution. Instead of having a JSON field with an arbitrary blob of data in it, we have distinct fields. This makes it much easier to have unique constraints, as well as requiring values, or values of a specific type. Knowing that the database is going to catch us accidentally putting a text value into the external id field for a system that requires an integer is reassuring.

This overcomes the second problem. We can now just use a Django ModelForm, and this makes writing views much easier. The validation for a given field or set of fields lives on the model, and where possible also in the database, as an exclusion or check constraint.

It also overcomes the third problem. We have distinct classes, which can have their own methods. We don’t need to try to use some magical proxy model code, and it’s easy for new developers to follow.

Finally, thanks to django-model-utils InheritanceManager, we can fetch all of our objects using the concrete parent model, and the .select_subclasses() method to downcast to our required class.

There are a couple of drawbacks to using concrete inheritance. Any fetch of an instance will perform a JOIN in your database, but more importantly, it’s impossible to perform a bulk_create() for models of these types.


Solution 3: Use a Postgres VIEW and JSONB to perform one query, and reconstruct models.

In the problem I’ve recently solved, we had a bunch of different models that, although being used in similar ways, didn’t have that much similarity. They were pre-existing, and it wasn’t worth the effort to move them to concrete inheritance, and using JSON fields for all data is not something I would repeat.

Instead, I came up with an idea that seems novel, based on some previous work I’ve done converting JSON data into models:

CREATE VIEW combined_objects AS

SELECT 'foo.Foo' AS model,
       foo.id,
       foo.tenant_id,
       TO_JSONB(foo) AS data
  FROM foo

 UNION ALL

SELECT 'foo.Bar' AS model,
       bar.id,
       baz.tenant_id,
       TO_JSONB(bar) AS data
  FROM bar
 INNER JOIN baz ON (baz.id = bar.baz_id)

 UNION ALL

SELECT 'qux.Qux' AS model,
       qux.id,
       tenant.id,
       TO_JSONB(qux) AS data
  FROM tenant
  JOIN qux ON (true)

This builds up a postgres view that contains all of the data from the model, but in a generic way. It also contains a tenant_id, which in this case was the mechanism that we’ll be using to filter the ones that are required at any given time. This can be a field on a model (as shown in the first subquery), or a field on a related model (as shown in the second). It could even be every object in a table for every tenant, as shown in the third.

From there, we need a model that will recreate the model instances correctly:

class CombinedObjectQuerySet(models.query.QuerySet):
    def for_tenant(self, tenant):
        return self.filter(tenant=tenant)

    def as_model_instances(self):
        return [x.instance for x in self]


class CombinedObject(models.Model):
    model = models.TextField()
    tenant = models.ForeignKey('tenant.Tenant')
    data = JSONField()

    objects = CombinedObjectQuerySet.as_manager()

    class Meta:
        managed = False
        db_table = 'combined_objects'

    def __str__(self):
        return '%s wrapper (%s)'.format(self.model, self.instance)

    def __eq__(self, other):
        return self.instance == other.instance

    @property
    def model_class(self):
        return apps.get_model(*self.model.split('.'))

    @cached_property
    def instance(self):
        return self.model_class(**self.data)

This works great, as long as you don’t apply additional annotations, typecast to python values, or want to deal with related objects. That is where it starts to get a bit tricky.

We can handle annotations and typecasting:

@cached_property
def instance(self):
    data = self.data
    model_class = self.model_class
    field_data = {
        field.name: field.to_python(data[field.name])
        for field in model_class
        if field.name in data
    }
    instance = model_class(**field_data)
    for attr, value in data.items():
        if attr not in field_data:
            setattr(instance, attr, value)
    return instance

There’s still the issue of foreign keys in the target models: in this case I know the code is not going to traverse these and trigger extra database hits. We could look at omitting those fields to prevent that being possible, but this is working well enough for now.

Postgres VIEW from Django QuerySet

It’s already possible, given an existing Postgres (or other database) VIEW, to stick a Django Model in front of it, and have it fetch data from that instead of a table.

Creating the views can currently be done using raw SQL (and a RunSQL migration operation), or using some helpers to store the SQL in files for easy versioning.

It would be excellent if it was possible to use Django’s ORM to actually generate the VIEW, and even better if you could make the migration autodetector generate migrations.

But why would this be necessary? Surely, if you were able to create a QuerySet instance that contains the items in your view, that should be good enough?

Not quite, because currently using the ORM it is not possible to perform the following type of query:

SELECT foo.a,
       foo.b,
       bar.d
  FROM foo
  INNER JOIN (
    SELECT baz.a,
           ARRAY_AGG(baz.c) AS d
      FROM baz
     GROUP BY baz.a) bar ON (foo.a = bar.a)

That is, generating a join to a subquery is not possible in the ORM. In this case, you could probably get away with a correlated Subquery, however that would probably not perform as well as using a join in this case. This is because a subquery in a SELECT is evaluated once for each row, whereas a subquery join will be evaluated once.

So, we could use a VIEW for the subquery component:

CREATE OR REPLACE VIEW bar AS

SELECT baz.a,
       ARRAY_AGG(baz.c) AS d
  FROM baz
 GROUP BY baz.a;

And then stick a model in front of that, and join accordingly:

SELECT foo.a,
       foo.b,
       bar.d
  FROM foo
 INNER JOIN bar ON (foo.a = bar.a)

The Django model for the view would look something like:

class Bar(models.Model):
    a = models.OneToOneField(
        'foo.Foo',
        on_delete=models.DO_NOTHING,
        primary_key=True,
        related_name='bar'
    )
    d = django.contrib.postgres.fields.ArrayField(
        base_field=models.TextField()
    )

    class Meta:
        managed = False

The on_delete=models.DO_NOTHING is important: without it, a delete of a Foo instance would trigger an attempted delete of a Bar instance - which would cause a database error, because it’s coming from a VIEW instead of a TABLE.

Then, we’d be able to use:

queryset = Foo.objects.select_related('bar')

So, that’s the logic behind needing to be able to do a subquery, and it becomes even more compelling if you need that subquery/view to filter the objects, or perform some other expression/operation. So, how can we make Django emit code that will enable us to handle that?

There are two problems:

  • Turn a queryset into a VIEW.
  • Get the migration autodetector to trigger VIEW creation.

The other day I came across Create Table As Select in Django, and it made me realise that we can use basically the same logic for creating a view. So, we can create a migration operation that will perform this for us:

class CreateOrReplaceView(Operation):
    def __init__(self, view_name, queryset):
        self.view_name = view_name
        self.queryset = queryset

    def database_forwards(self, app_label, schema_editor, from_state, to_state):
        queryset = self.queryset
        compiler = queryset.query.get_compiler(using=schema_editor.connection.alias)
        sql, params = compiler.as_sql()
        sql = 'CREATE OR REPLACE VIEW {view} AS {sql}'.format(
            view=schema_editor.connection.ops.quote_name(self.view_name),
            sql=sql
        )
        schema_editor.execute(sql, params)

    def state_forwards(self, app_label, state):
        pass

We can then have this operation (which needs to be passed a queryset).

This doesn’t really solve how to define the queryset for the view, and have some mechanism for resolving changes made to that queryset (so we can generate a new migration if necessary). It also means we have a queryset written in our migration operation. We won’t be able to leave it like that: due to loading issues, you won’t be able to import model classes during the migration setup - and even if you could, you shouldn’t be accessing them during a migration anyway - you should use models from the ProjectState which is tied to where in the migration graph you currently are.

What would be excellent is if we could write something like:

class Bar(models.Model):
    a = models.OneToOneField(
        'foo.Foo',
        on_delete=models.DO_NOTHING,
        primary_key=True,
        related_name='bar',
    )
    d = django.contrib.postgres.fields.ArrayField(
        base_field=models.TextField()
    )

    class Meta:
        managed = False

    @property
    def view_queryset(self):
        return Baz.objects.values('a').annotate(d=ArrayAgg('c'))

And then, if we change our view definition:

@property
def view_queryset(self):
  return Baz.objects.values('a').filter(
      c__startswith='qux',
  ).annotate(
      d=ArrayAgg('c')
  )

… we would want a migration operation generated that includes the new queryset, or at least be able to know that it has changed. Ideally, we’d want to have a queryset attribute inside the Meta class in our model, which could itself be a property. However, that’s not possible without making changes to django itself.

In the meantime, we can borrow the pattern used by RunPython to have a callable that is passed some parameters during application of the migration, which returns the queryset. We can then have a migration file that looks somewhat like:

def view_queryset(apps, schema_editor):
    Baz = apps.get_model('foo', 'Baz')

    return Baz.objects.values('a').filter(
        c__startswith='qux'
    ).annotate(
        d=ArrayAgg('c')
    )


class Migration(migrations.Migration):
    dependencies = [
        ('foo', '0001_initial'),
    ]

    operations = [
        migrations.CreateModel(
            name='Bar',
            fields=[
                ('a', models.OneToOneField(...)),
                ('d', ArrayField(base_field=models.TextField(), ...)),
            ],
            options={
                'managed': False,
            }
        ),
        CreateOrReplaceView('Bar', view_queryset),
    ]

We still need to have the CreateModel statement so Django knows about our model, but the important bit in this file is the CreateOrReplaceView, which references the callable.

Now for the actual migration operation.

class CreateOrReplaceView(migrations.Operation):
    def __init__(self, model, queryset_factory):
        self.model = model
        self.queryset_factory = queryset_factory

    def database_forwards(self, app_label, schema_editor, from_state, to_state):
        model = from_state.apps.get_model(app_label, self.model)
        queryset = self.queryset_factory(from_state.apps, schema_editor)
        compiler = queryset.query.get_compiler(using=schema_editor.connection.alias)
        sql, params = compiler.as_sql()
        sql = 'CREATE OR REPLACE VIEW {view_name} AS {query}'.format(
            view_name=model._meta.db_table,
            query=sql,
        )
        schema_editor.execute(sql, params)

The backwards migration is not quite a solved problem yet: I do have a working solution that steps up the stack to determine what the current migration name is, and then finds the previous migration that contains one of these operations for this model, but that’s a bit nasty.


There’s no (clean) way to inject ourself into the migration autodetector and “notice” when we need to generate a new version of the view, however we can leverage the checks framework to notify the user when our view queryset is out of date compared to the latest migration.

from django.apps import apps
from django.core.checks import register

@register()
def check_view_definitions(app_configs, **kwargs):
    errors = []

    if app_configs is None:
        app_configs = apps.app_configs.values()

    for app_config in app_configs:
        errors.extend(_check_view_definitions(app_config))

    return errors

And then we need to implement _check_view_definitions:

def get_out_of_date_views(app_config):
    app_name = app_config.name

    view_models = [
        model
        # We need the real app_config, not the migration one.
        for model in apps.get_app_config(app_name.split('.')[-1]).get_models()
        if not model._meta.managed and hasattr(model, 'get_view_queryset')
    ]

    for model in view_models:
        latest = get_latest_queryset(model)
        current = model.get_view_queryset()

        if latest is None or current.query.sql_with_params() != latest.query.sql_with_params():
            yield MissingViewMigration(
                model,
                current,
                latest,
                Warning(W003.format(app_name=app_name, model_name=model._meta.model_name), id='sql_helpers.W003'),
            )


def _check_view_definitions(app_config):
    return [x.warning for x in get_out_of_date_views(app_config)]

The last puzzle piece there is get_latest_queryset, which is a bit more complicated:

def get_latest_queryset(model, before=None):
    from django.db.migrations.loader import MigrationLoader
    from django.db import connection

    migration_loader = MigrationLoader(None)
    migrations = dict(migration_loader.disk_migrations)
    migration_loader.build_graph()
    state = migration_loader.project_state()
    app_label = model._meta.app_label
    root_node = dict(migration_loader.graph.root_nodes())[app_label]
    # We want to skip any migrations in our reverse list until we have
    # hit a specific node: however, if that is not supplied, it means
    # we don't skip any.
    if not before:
        seen_before = True
    for node in migration_loader.graph.backwards_plan((app_label, root_node)):
        if node == before:
            seen_before = True
            continue
        if not seen_before:
            continue
        migration = migrations[node]
        for operation in migration.operations:
            if (
                isinstance(operation, CreateOrReplaceView) and
                operation.model.lower() == model._meta.model_name.lower()
            ):
                return operation.queryset_factory(state.apps, connection.schema_editor())

This also has code to allow us to pass in a node (before), which limits the search to migrations that occur before that node in the forwards migration plan.

Since we already have the bits in place, we could also have a management command that creates a stub migration (without the queryset factory, that’s a problem I haven’t yet solved). I’ve built this into my related “load SQL from files” app.


This is still a bit of a work in progress, but writing it down helped me clarify some concepts.