Locking rows in PostgreSQL

Someone asked a question today in #postgresql about locking rows for a long running task, where multiple tasks could be updating rows. There is a flag in each row that indicates whether a row has been processed, but not one indicating if it is currently being processed.

One solution is to use advisory locks to mark that we are currently working on a given row. The advantage of this over having a column that indicates it is currently being processed (or mis-using the “processed” column by setting it before we’ve finished building the data) is that if our task that generates the data fails, the lock is released when the session is closed.

So, let’s look at the things we need to make this happen. We’ll start with something that represents our long-running process:

CREATE OR REPLACE FUNCTION expensive_operation(INTEGER)
RETURNS JSONB AS $$

  BEGIN
    PERFORM pg_sleep(10);
    RETURN JSONB_BUILD_OBJECT(
        'foo', 'bar',
        'baz', $1
    );
  END;

$$ LANGUAGE plpgsql;

And we need our table to put the tasks in (and some unprocessed rows):

CREATE TABLE tasks (
    id SERIAL PRIMARY KEY,
    processed BOOLEAN DEFAULT FALSE,
    data JSONB
);

INSERT INTO tasks (data) VALUES (NULL), (NULL), (NULL);

Finally, we are ready to process our tasks.

WITH next_task AS (
  SELECT t.id, pg_advisory_lock(t.id)
    FROM (
      SELECT id
        FROM tasks
       WHERE NOT processed
         AND id NOT IN (SELECT objid
                          FROM pg_locks
                         WHERE locktype = 'advisory')
       LIMIT 1
    ) t
)
UPDATE tasks
   SET data = expensive_operation(id),
       processed = true
 WHERE id IN (SELECT id FROM next_task)
   AND NOT processed;

The CTE will get a task.id that has not been processed (and is not currently being processed). It will take an advisory lock on that value, which marks the task as in-progress.

Then, when it has the lock, it performs the expensive operation and updates the data (and the processed status).

I’m not totally sure if the AND NOT processed at the bottom of the query is necessary - it seems to me that there could be a situation where two workers try to get the lock at the same time (and therefore are both asking for the same lock), and one of them gets it. The other one, when it finally gets the lock, would now see that row as processed, and so would not update anything.

I’m happy to see a better way to do this.