Too many rows!

We had an interesting problem at work today.

It seems that the sequence on one of our tables had exceeded 231 (2147483648), and since the primary key was an SERIAL column, this was problematic. From Numeric Types, we can see that only 4 bytes were used. Not enough.

This was presenting some problems, was was only limited to two aspects of the system, neither of which meant that it was worth bringing down the rest of the system to fix it.

Since the obvious fix would have resulted in downtime of somewhere between 20 minutes and an hour, we discarded that:

ALTER TABLE big_problem_here
ALTER COLUMN id TYPE BIGINT;

We tried that on our staging database, which had far fewer rows. That took 20 minutes to rewrite the table, during which time the entire database was essentially out of order.

Instead, we came up with a different solution:

Create a new table, which is identical to the other table (including using the same sequence: this is very important), except has the bigger integer type:

CREATE TABLE big_problem_here_fixed (
  id BIGINT NOT NULL PRIMARY KEY DEFAULT nextval('big_problem_here_id_seq'::regclass),
  user_id INTEGER NOT NULL,
  ...
);

ALTER TABLE big_problem_here_fixed
ADD CONSTRAINT user_id_refs_id_6ccf0120
FOREIGN KEY (user_id) REFERENCES auth_user (id)
DEFERRABLE INITIALLY DEFERRED;

CREATE INDEX big_problem_here_fixed_user_id
ON big_problem_here_fixed(user_id);

Then, we can copy the data from the old table into the new one. This is safe, because we can’t have any new rows inserted into the old table at the moment anyway, as all writes to it occur in a transaction, and there are no cases (other than a celery task, which only runs late at night) where an update or delete is not accompanied by at least one new row.

If this happens to you: you would need to ensure that there are not any rows being updated or deleted whilst you are doing the copy, otherwise you would lose those changes.

INSERT INTO big_problem_here_fixed SELECT * FROM big_problem_here;

This part took about an hour. I’m not sure if it took longer than the staging rewrite because there is more to do in this case, or just because there is more data.

Finally, the last part. We can rename both tables in a single transaction, so there won’t be any errors from missing tables between when we rename the first and the second.

BEGIN;
  ALTER TABLE big_problem_here RENAME TO big_problem_here_replaced;
  ALTER TABLE big_problem_here_fixed RENAME TO big_problem_here;
COMMIT;