It’s already possible, given an existing Postgres (or other database) VIEW, to stick a Django Model in front of it, and have it fetch data from that instead of a table.
Creating the views can currently be done using raw SQL (and a RunSQL migration operation), or using some helpers to store the SQL in files for easy versioning.
It would be excellent if it was possible to use Django’s ORM to actually generate the VIEW, and even better if you could make the migration autodetector generate migrations.
But why would this be necessary? Surely, if you were able to create a QuerySet instance that contains the items in your view, that should be good enough?
Not quite, because currently using the ORM it is not possible to perform the following type of query:
SELECT foo.a,
foo.b,
bar.d
FROM foo
INNER JOIN (
SELECT baz.a,
ARRAY_AGG(baz.c) AS d
FROM baz
GROUP BY baz.a) bar ON (foo.a = bar.a)
That is, generating a join to a subquery is not possible in the ORM. In this case, you could probably get away with a correlated Subquery, however that would probably not perform as well as using a join in this case. This is because a subquery in a SELECT is evaluated once for each row, whereas a subquery join will be evaluated once.
So, we could use a VIEW for the subquery component:
CREATE OR REPLACE VIEW bar AS
SELECT baz.a,
ARRAY_AGG(baz.c) AS d
FROM baz
GROUP BY baz.a;
And then stick a model in front of that, and join accordingly:
SELECT foo.a,
foo.b,
bar.d
FROM foo
INNER JOIN bar ON (foo.a = bar.a)
The Django model for the view would look something like:
class Bar(models.Model):
a = models.OneToOneField(
'foo.Foo',
on_delete=models.DO_NOTHING,
primary_key=True,
related_name='bar'
)
d = django.contrib.postgres.fields.ArrayField(
base_field=models.TextField()
)
class Meta:
managed = False
The on_delete=models.DO_NOTHING
is important: without it, a delete of a Foo
instance would trigger an attempted delete of a Bar
instance - which would cause a database error, because it’s coming from a VIEW instead of a TABLE.
Then, we’d be able to use:
queryset = Foo.objects.select_related('bar')
So, that’s the logic behind needing to be able to do a subquery, and it becomes even more compelling if you need that subquery/view to filter the objects, or perform some other expression/operation. So, how can we make Django emit code that will enable us to handle that?
There are two problems:
- Turn a queryset into a VIEW.
- Get the migration autodetector to trigger VIEW creation.
The other day I came across Create Table As Select in Django, and it made me realise that we can use basically the same logic for creating a view. So, we can create a migration operation that will perform this for us:
class CreateOrReplaceView(Operation):
def __init__(self, view_name, queryset):
self.view_name = view_name
self.queryset = queryset
def database_forwards(self, app_label, schema_editor, from_state, to_state):
queryset = self.queryset
compiler = queryset.query.get_compiler(using=schema_editor.connection.alias)
sql, params = compiler.as_sql()
sql = 'CREATE OR REPLACE VIEW {view} AS {sql}'.format(
view=schema_editor.connection.ops.quote_name(self.view_name),
sql=sql
)
schema_editor.execute(sql, params)
def state_forwards(self, app_label, state):
pass
We can then have this operation (which needs to be passed a queryset).
This doesn’t really solve how to define the queryset for the view, and have some mechanism for resolving changes made to that queryset (so we can generate a new migration if necessary). It also means we have a queryset written in our migration operation. We won’t be able to leave it like that: due to loading issues, you won’t be able to import model classes during the migration setup - and even if you could, you shouldn’t be accessing them during a migration anyway - you should use models from the ProjectState
which is tied to where in the migration graph you currently are.
What would be excellent is if we could write something like:
class Bar(models.Model):
a = models.OneToOneField(
'foo.Foo',
on_delete=models.DO_NOTHING,
primary_key=True,
related_name='bar',
)
d = django.contrib.postgres.fields.ArrayField(
base_field=models.TextField()
)
class Meta:
managed = False
@property
def view_queryset(self):
return Baz.objects.values('a').annotate(d=ArrayAgg('c'))
And then, if we change our view definition:
@property
def view_queryset(self):
return Baz.objects.values('a').filter(
c__startswith='qux',
).annotate(
d=ArrayAgg('c')
)
… we would want a migration operation generated that includes the new queryset, or at least be able to know that it has changed. Ideally, we’d want to have a queryset
attribute inside the Meta
class in our model, which could itself be a property. However, that’s not possible without making changes to django itself.
In the meantime, we can borrow the pattern used by RunPython to have a callable that is passed some parameters during application of the migration, which returns the queryset. We can then have a migration file that looks somewhat like:
def view_queryset(apps, schema_editor):
Baz = apps.get_model('foo', 'Baz')
return Baz.objects.values('a').filter(
c__startswith='qux'
).annotate(
d=ArrayAgg('c')
)
class Migration(migrations.Migration):
dependencies = [
('foo', '0001_initial'),
]
operations = [
migrations.CreateModel(
name='Bar',
fields=[
('a', models.OneToOneField(...)),
('d', ArrayField(base_field=models.TextField(), ...)),
],
options={
'managed': False,
}
),
CreateOrReplaceView('Bar', view_queryset),
]
We still need to have the CreateModel
statement so Django knows about our model, but the important bit in this file is the CreateOrReplaceView
, which references the callable.
Now for the actual migration operation.
class CreateOrReplaceView(migrations.Operation):
def __init__(self, model, queryset_factory):
self.model = model
self.queryset_factory = queryset_factory
def database_forwards(self, app_label, schema_editor, from_state, to_state):
model = from_state.apps.get_model(app_label, self.model)
queryset = self.queryset_factory(from_state.apps, schema_editor)
compiler = queryset.query.get_compiler(using=schema_editor.connection.alias)
sql, params = compiler.as_sql()
sql = 'CREATE OR REPLACE VIEW {view_name} AS {query}'.format(
view_name=model._meta.db_table,
query=sql,
)
schema_editor.execute(sql, params)
The backwards migration is not quite a solved problem yet: I do have a working solution that steps up the stack to determine what the current migration name is, and then finds the previous migration that contains one of these operations for this model, but that’s a bit nasty.
There’s no (clean) way to inject ourself into the migration autodetector and “notice” when we need to generate a new version of the view, however we can leverage the checks framework to notify the user when our view queryset is out of date compared to the latest migration.
from django.apps import apps
from django.core.checks import register
@register()
def check_view_definitions(app_configs, **kwargs):
errors = []
if app_configs is None:
app_configs = apps.app_configs.values()
for app_config in app_configs:
errors.extend(_check_view_definitions(app_config))
return errors
And then we need to implement _check_view_definitions
:
def get_out_of_date_views(app_config):
app_name = app_config.name
view_models = [
model
# We need the real app_config, not the migration one.
for model in apps.get_app_config(app_name.split('.')[-1]).get_models()
if not model._meta.managed and hasattr(model, 'get_view_queryset')
]
for model in view_models:
latest = get_latest_queryset(model)
current = model.get_view_queryset()
if latest is None or current.query.sql_with_params() != latest.query.sql_with_params():
yield MissingViewMigration(
model,
current,
latest,
Warning(W003.format(app_name=app_name, model_name=model._meta.model_name), id='sql_helpers.W003'),
)
def _check_view_definitions(app_config):
return [x.warning for x in get_out_of_date_views(app_config)]
The last puzzle piece there is get_latest_queryset
, which is a bit more complicated:
def get_latest_queryset(model, before=None):
from django.db.migrations.loader import MigrationLoader
from django.db import connection
migration_loader = MigrationLoader(None)
migrations = dict(migration_loader.disk_migrations)
migration_loader.build_graph()
state = migration_loader.project_state()
app_label = model._meta.app_label
root_node = dict(migration_loader.graph.root_nodes())[app_label]
# We want to skip any migrations in our reverse list until we have
# hit a specific node: however, if that is not supplied, it means
# we don't skip any.
if not before:
seen_before = True
for node in migration_loader.graph.backwards_plan((app_label, root_node)):
if node == before:
seen_before = True
continue
if not seen_before:
continue
migration = migrations[node]
for operation in migration.operations:
if (
isinstance(operation, CreateOrReplaceView) and
operation.model.lower() == model._meta.model_name.lower()
):
return operation.queryset_factory(state.apps, connection.schema_editor())
This also has code to allow us to pass in a node (before
), which limits the search to migrations that occur before that node in the forwards migration plan.
Since we already have the bits in place, we could also have a management command that creates a stub migration (without the queryset factory, that’s a problem I haven’t yet solved). I’ve built this into my related “load SQL from files” app.
This is still a bit of a work in progress, but writing it down helped me clarify some concepts.