Random, Fixed Ordering and Pagination

Consider the following situation:

We have a set of items that we want to show in a random order. The order, however, should remain fixed for a period. The display of items will need to be paginated, and we have over 200,000 items.

If the ordering is not truly random, then we could have an expression index, and allow ordering based on that. However, that doesn’t really help out with the pagination, and issues around LIMIT/OFFSET ordering of very large sets.

Instead, I came up with a solution this afternoon that uses Postgres Materialised Views.

Let’s start with a Django model:

class Post(models.Model):
    title = models.TextField()
    content = models.TextField()
    posted_at = models.DateTimeField()

We can build up a materialised view that associates each post with a position:

CREATE MATERIALISED VIEW post_ordering AS

SELECT post.id AS post_id,
       row_number() OVER () AS position
  FROM (
    SELECT id FROM blog_post ORDER BY random()
  ) post;

CREATE INDEX post_ordering_id ON post_ordering(post_id);
CREATE INDEX post_ordering_position ON post_ordering(position);

Because a materialised view stores a copy of the data, we need to index it if we want to get performance benefits.

This materialised view is interesting from the pagination perspective because there are no gaps in the position values, even if there are missing post_id values. This means we can use filtering to paginate, instead of having to use the regular slicing notation.

Do note that the ordering needs to be done inside a subquery, otherwise it will not work correctly.

We do need to stick a model in front of this to access it from Django:

class PostPosition(models.Model):
    post = models.OneToOneField(
        primary_key=True,
        related_name='ordering',
        on_delete=models.DO_NOTHING,
    )
    position = models.IntegerField()

    class Meta:
        managed = False
        db_table = 'post_ordering'

Now, because we have a relationship defined, we may filter using this:

page_2 = Post.objects.filter(
    ordering__position__gte=100,
    ordering__position__lt=200
).order_by('ordering__position')

From here, we just need to create a custom paginator to use this instead of the normal slicing.

class PositionPaginator(django.core.paginators.Paginator):
    def page(self, number):
        number = self.validate_number(number)
        bottom = (number - 1) * self.per_page
        top = bottom + self.per_page
        if top + self.orphans >= self.count:
            top = self.count
        return self._get_page(self.object_list.filter(
            ordering__position__gte=bottom,
            ordering__position__lt=top
        ).order_by('ordering__position'), self)

The last piece of the puzzle is getting the refresh of the ordering. In postgres, you just need:

REFRESH MATERIALISED VIEW post_ordering;

Orders please!

Things are better when they are ordered.

Why might we want things ordered?

From a user experience perspective, if things are not ordered the same way upon repeated viewings, they are liable to get confused, and think that things have changed, when really it’s only the order that has changed.

Some time ago, there were versions of Microsoft Office that used to move menu items closer to the start of the menu that had been used more recently and more frequently. This is not “most recent documents”, or onything like that: this was the menu items like “Format” and other tools. There’s a pretty good rationalé for why they stopped doing that!

Why might we need things ordered?

There are a few reasons why we must have things ordered. One good example is the reason Django will complain about a queryset without ordering when you use pagination: if you select the first 10 items when there is no ordering, and then select the next 10 items, there is no guarantee that those items will not have some of the same items as the first set. Without ordering, pagination is useless. There’s no way to page through items and ensure that you see all of them.

Another reason is where you need to process things in a specific order: either to ensure that the first (or last) matching item is “the winner”. We’ll come across one of those shortly.

Ordering by a specific value or set of values (or function over the same) is usually a sane way to order. This could be alphabetically, by date of insertion, or some numeric value, for instance.

How should we order things?

Sometimes, it is obvious how we should order a list of things. Sometimes the order doesn’t matter, and sometimes we want to allow an ordering that doesn’t rely at all on any attributes of the objects in question.

Let’s look at a list of condition names that might be used for a workplace agreement in Australia.

  • Regular Hours
  • Public Holiday
  • Saturday
  • Sunday
  • After 6pm
  • Regular Overtime (first 2 hours)
  • Extended Overtime (more than 2 hours)

For those outside of Australia: under many work circumstances, employees will get paid at a higher rate because of what are called “Penalty” conditions. This could include being paid, for example, a 15% loading on a Saturday, and a 25% loading on a Sunday. We can rewrite this list with the loading percentages:

Condition Penalty Rule
Regular Hours 100 % When no other rules apply
Public Holiday 200 % Any hours on a public holiday
Saturday 125 % Any hours on a Saturday
Sunday 150 % Any hours on a Sunday
After 6pm 117 % Any hours between 6pm and midnight
Regular Overtime 150 % After 8 hours worked in a day or 38 hours in a week
Extended Overtime 200 % After 2 hours of overtime worked, or any overtime on a Sunday

So now we have two possible ways of ordering our list, by name:

  • After 6pm
  • Extended Overtime (more than 2 hours)
  • Public Holiday
  • Regular Hours
  • Regular Overtime (first 2 hours)
  • Saturday
  • Sunday

Or by penalty rate:

  • 100% Regular Hours
  • 117% After 6pm
  • 125% Saturday
  • 150% Regular Overtime (first 2 hours)
  • 150% Sunday
  • 200% Extended Overtime (more than 2 hours)
  • 200% Public Holiday

However, there’s actually a little more complexity than this. We can have the rules in this order, and apply them sequentially, and we should get the right result for an employee’s wages. But, according to our rules, we should be applying Extended Overtime for any overtime hours on a Sunday. So we need some way of controlling the ordering manually.

Specifically, we want to have our conditions ordered in a way that we can check each rule in turn, and ensure that the last rule that is applied will be the one that should apply, if multiple rules do match.

  • Regular hours
  • After 6pm
  • Saturday
  • Sunday
  • Regular Overtime
  • Extended Overtime
  • Public Holiday

So, how do we actually store this custom ordering?

The naive solution is to store an integer for each item, and ensure that we only have one of each value. Let’s look at that as a Django model:

class Condition(models.Model):
    name = models.TextField()
    penalty = models.DecimalField()
    position = models.IntegerField(unique=True)

Then, we can ensure we get our conditions in the correct order by applying an ordering to our queryset:

queryset = Condition.objects.order_by('position')

However, this solution is not without it’s drawbacks. Primarily, to insert an object between two others, we renumber all of the subsequent items:

condition = Condition(name='New condition', position=3, penalty=Decimal('1.20'))
Condition.objects.filter(position__gte=condition.position).update(position=F('position') + 1)
condition.save()

We must ensure that within our database, we are able to suspend constraints until our transaction is complete, or we must perform actions in a specific order. If we care about not having “holes” between our positions, then we also need to renumber remaining objects when we remove one.

Let’s have a look at how we could handle this, with some client and server code. We can use a formset here, which has ordering permitted, and then we just need the JS to make that happen:

ConditionPositionFormSet = modelformset_factory(Condition, ordering=True, fields=())

class OrderConditionsView(FormView):
    form_class = ConditionPositionFormSet

This will result in rewriting all items one by one, which must happen in a transaction, or in the correct order.


Alternatively, we could use a form that just indicates the new position of a given condition, and have that form handle the UPDATE queries: this will result in just two queries instead.

class ConditionPositionForm(forms.ModelForm):
    class Meta:
        model = Condition
        fields = ('position',)


class OrderConditionsView(UpdateView):
    """
    Handle an AJAX request to move one condition to a new position.

    The condition in that position will be moved to the next position, and so on.
    """
    form_class = ConditionPositionForm

    def form_valid(self, form):
        position = form.cleaned_data('position')
        self.get_queryset.filter(position__gte=position)\
                         .exclude(pk=form.instance.pk)\
                         .update(position=F('position') + 1)
        form.save()
        return HttpResponse(json.dumps({'status': 'ok'}))

    def form_invalid(self, form):
      return HttpResponse(json.dumps({
        'status': 'error',
        'message': form.fields['position'].errors,
      }), status_code=409)

    def get_queryset(self):
        return Condition.objects.order_by('position')

# urls.py: simplified

urlpatterns = [
  include('/condition/', [
    path('/condition/<int:pk>/position/', OrderConditionsView.as_view(), name='position'),
    # ...
  ], namespace='condition'),
  # ...
]

In this case, our HTML and JS can just send back the one that was moved, and it’s new position. We’ll want some mechanism for indicating that the move was successful though, because it may not be permitted to move some objects to specific positions.

  <ul id="conditions">
    {% for condition in object_list %}
      <li data-position-url="{% url 'condition:position' condition.pk %}">
        {{ condition.name }} ({{ condition.penalty }}%)
      </li>
    {% endfor %}
  </ul>

  <script>
  // Uses SortableJS: https://github.com/SortableJS/Sortable

  require(['Sortable'], (Sortable) => {
    function save(evt) {
      const condition = evt.item;

      function revert() {
        if (evt.oldIndex < evt.newIndex) {
          // Move back to the old index.
          condition.before(condition.parentElement.children[evt.oldIndex]);
        } else {
          // The old index is actually in the wrong spot now,
          // move back to the old index + 1
          condition.before(condition.parentElement.children[evt.oldIndex + 1]);
        }
      }

      fetch(condition.dataset.positionUrl, {
        method: 'POST',
        body: {position: evt.newIndex},
        credentials: 'same-origin'
      }).then(
        response => response.json()
      ).then(
        response => {
          if (response.status === 'error') {
            // We got an error: let's put the object back into the previous
            // position and then show that error.
            revert();
            // Maybe a nicer way than this though!
            alert(response.message);
          }
        }
      );
    }

    new Sortable(document.getElementById('conditions'), {onUpdate: evt => save});

  });
  </script>

So, are there any other ways we could store the positions of our conditions, which would allow us to insert a new one, or move one around, without having to renumber a bunch of other items?

Why yes, yes we can.

Instead of storing each position as an integer, we can instead store two numbers, and use the ratio of these two numbers as our ordering:

POSITION = ExpressionWrapper(
  models.F('position_a') / models.F('position_b'),
  output_field=models.DecimalField()
)


class PositionQuerySet(models.query.QuerySet):
    def with_position(self):
        return self.annotate(position=POSITION)

    def order_by_position(self):
        return self.order_by(POSITION)


class Condition(models.Model):
    name = models.TextField()
    penalty = models.DecimalField()
    position_a = models.IntegerField()
    position_b = models.IntegerField()

    class Meta:
      unique_together = (
        ('position_a', 'position_b'),  # In fact, this is not quite strict enough, more below.
      )

The logic behind this is that here is always the ability to find a fraction that is between two other fractions. Our initial “first” item can be given the values (1, 1), which will evaluate to 1.0

To place another item, we have three cases:

  • The new value is before the first item in the list.
  • The new value is between two existing items.
  • The new value is after the last item in the list.

Let’s look at how we can calculate the position for the newly added item:

  • Before the first item in the list:

    Increment the denominator (position_b), and use the same numerator (position_a).

    For instance, adding a new first value where our current first value is (1, 1) would give us (1, 2), which evaluates to 0.5, less than 1.0

  • After the last item in the list:

    Increment the numerator (position_a), and use the same denominator (position_b).

    For instance, adding a new value where the last value is (5, 1) would give us (6, 1), which evaluates to 6.0, greater than 5.0

  • Inserting an item between two existing items.

    Use the sum of the two numerators (position_a) as the numerator, and the sum of the two denominators (position_b) as the new denominator. Optionally, you can then simplify this fraction if there are common factors.

    For instance, if we have two values (1, 4) and (5, 1), we can make a new value (6, 5), which evaluates to 1.2, which is between 0.25 and 5.0.

The same operations can be performed on a move: we don’t actually care what the values are, just that the relative values are still consistent. Whenever we move an object, we just change it’s position_a and/or position_b to whatever makes sense for the new position.

<ul id="conditions">
  {% for condition in object_list %}
    <li class="condition"
        data-position-url="{% url 'condition:position' condition.pk %}"
        data-position-a="{{ condition.position_a }}"
        data-position-b="{{ condition.position_b}}">
      {{ condition.name }}
    </li>
  {% endfor %}
</ul>

<script>
  require(['sortable'], (Sortable) => {
    function save(evt) {
      const item = evt.item;
      const prev = condition.previousElementSibling;
      const next = condition.nextElementSibling;

      if (prev == null && next == null) {
        // This is the only item in the list.
        item.dataset.position_a = 1;
        item.dataset.position_b = 1;
      } else if (prev == null) {
        // This is now the first item in the list.
        item.dataset.position_a = next.dataset.position_a;
        item.dataset.position_b = Number(next.dataset.position_b) + 1;
      } else if (next == null) {
        // This is now the last item in the list.
        item.dataset.position_a = Number(prev.dataset.position_a) + 1;
        item.dataset.position_b = prev.dataset.position_b;
      } else {
        // Neither the first nor the last.
        item.dataset.position_a = Number(prev.dataset.position_a) + Number(next.dataset.position_a);
        item.dataset.position_b = Number(prev.dataset.position_b) + Number(next.dataset.position_b);
      }

      function revert() {
        if (evt.oldIndex < evt.newIndex) {
          // Move back to the old index.
          condition.before(condition.parentElement.children[evt.oldIndex]);
        } else {
          // The old index is actually in the wrong spot now,
          // move back to the old index + 1
          condition.before(condition.parentElement.children[evt.oldIndex + 1]);
        }
      }

      // This could do with some error handling...
      fetch(condition.dataset.positionUrl, {
        method: 'POST',
        body: {
          position_a: item.dataset.position_a,
          position_b: item.dataset.position_b
        },
        credentials: 'same-origin'
      }).then(response => response.json()).then(
        response => {
          if (response.status == 'error') {
            revert();
          }
        }
      );
    }


  })
</script>

Ratio Ordering in Postgres

A while ago, I had an idea about how to use a materialised view to handle ordering and pagination. Today in #postgresql on IRC, there was a discussion about using ratio ordering. Which reminded me that I also did some work on that.

It’s possible to use both of these techniques together to allow easy access to ordinal position, and more importantly, fast pagination over large data sets. We can also make use of a few other new postgres features to make things even nicer.

CREATE TABLE condition (
  condition_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name TEXT UNIQUE NOT NULL,
  position_a INTEGER NOT NULL,
  position_b INTEGER NOT NULL,
  ratio_position NUMERIC UNIQUE GENERATED ALWAYS AS (position_a::NUMERIC / position_b::NUMERIC) STORED
);

We can add some data to make sure we have correct behaviour on those generated columns:

INSERT INTO condition (name, position_a, position_b)
VALUES ('One', 1, 1),
       ('Two', 2, 1);

We can also try adding in values that should be prevented:

-- Different position_a/position_b, but position_a / position_b clashes.
INSERT INTO condition (name, position_a, position_b)
VALUES ('Three', 2, 2);

-- Not able to divide by zero.
INSERT INTO condition (name, position_a, position_b)
VALUES ('Three', 2, 0);

Okay, it would be really neat if by default we just inserted a new value at the end of the list. However, this would require us to be able to have a DEFAULT on a pair of columns. Perhaps if we were storing these as a custom type, with a custom sequence, we could do that.

Anyway, onward. We can insert values between two others by doing some simple mathematics:

INSERT INTO condition (name, position_a, position_b)
VALUES (
  'One-point-five',
  1 + 2,  -- The sum of the previous and next items' position_a values
  1 + 1   -- The sum of the previous and next items' position_b values
);

And, we can see that we now have the sortable by the correct field:

SELECT name FROM condition ORDER BY ratio_position;
name
One
One-point-five
Two

(3 rows)

So, what about getting the ordinal positions? We can do that using a row_number() window function, but that’s only useful if we are fetching all of them.

We can re-use the trick of creating a materialised view that contains all of them:

CREATE MATERIALIZED VIEW condition_position AS

SELECT condition_id,
       row_number() OVER () AS position
  FROM condition
  ORDER BY ratio_position;

CREATE UNIQUE INDEX condition_position_condition ON condition_position(condition_id);
CREATE UNIQUE INDEX condition_position_position ON condition_position(position);

And now we can use it:

SELECT *
  FROM condition
 INNER JOIN condition_position USING (condition_id)
 ORDER BY position;

We want this to recalculate whenever any positions change:

CREATE OR REPLACE FUNCTION refresh_positions()
RETURNS TRIGGER AS $$
BEGIN
  EXECUTE 'REFRESH MATERIALIZED VIEW ' || TG_ARGV[0];
  RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;

CREATE TRIGGER condition_refresh_positions
AFTER INSERT OR UPDATE OR DELETE OR TRUNCATE
ON condition
FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_positions('condition_position');

Now we can add more data:

INSERT INTO condition (name, position_a, position_b)
VALUES ('Three', 5, 1);

And fetch our data again:

SELECT *
  FROM condition
 INNER JOIN condition_position USING (condition_id)
 ORDER BY position;
condition_id name position_a position_b ratio_position position
1 One 1 1 1.00000000000000000000 1
2 Two 2 1 2.0000000000000000 2
5 One-point-five 3 2 1.5000000000000000 3
8 Three 5 1 5.0000000000000000 4

(4 rows)