Bowled over by Postgres

There’s a nice article at Bowled Over by SQL Window Functions by a chap called Dwain Camps. It’s written from the perspective of T-SQL, which has some differences to Postgres’s DDL and querying. I’ve reworked his stuff into what feels nice for me from a Postgressy perspective.

I’ll recap a couple of things he mentions, but you’ll probably want to head there and read that first.

  • Strike: all pins knocked down on the first ball of a frame. Scores 10 (from this frame), plus the total of whatever the next two balls knock down.
  • Spare: all pins knocked down on the second ball of a frame. Scores 10 (from this frame), plus how ever many pins you knock down on the next ball.
  • Open: at least one pin remains standing at the end of the frame. Score is how ever many pins you knocked down.

By convention, only the running tally is shown on the scoresheet. I’ve kept the frame score as the score for this frame only, and the total will contain the running total.


The first thing I do a bit differently is to use Postgres’ DOMAIN structures, which enables us to remove some of the check constraints, and simplify some others:

CREATE SCHEMA bowling;

CREATE DOMAIN bowling.frame_number AS integer
  CHECK ('[1,10]'::int4range @> VALUE)
  NOT NULL;

CREATE DOMAIN bowling.ball AS integer
  CHECK ('[0,10]'::int4range @> VALUE);

So, now we have two integer domains: the frame number may be between 1 and 10, and the ball pin count may be null, or between 0 and 10.

We’ll start by recreating the table structure: initially without constraints:

CREATE TABLE bowling.frame
(
  game_id INTEGER NOT NULL,
  player_id INTEGER NOT NULL,
  frame bowling.frame_number NOT NULL,
  ball1 bowling.ball NOT NULL,
  ball2 bowling.ball NULL,
  ball3 bowling.ball NULL,
  PRIMARY KEY (game_id, player_id, frame)
);

Not much different to the original, other than using those fresh new domain types.

The other approach I’ve used is to use more constraints, but make them simpler. I’m also relying on the fact that X + NULL => NULL, which means we can leave off a heap of the constraint clauses.

We’ll start by preventing the (non-final) frames from exceeding the number of available pins. In the case of final frame, we still only allow a spare unless we have a strike already.

ALTER TABLE bowling.frame
ADD CONSTRAINT max_spare_unless_frame_10_strike CHECK
(
  ball1 + ball2 <= 10 OR (frame = 10 AND ball1 = 10)
);

This is as simple as it can be. Because ball 2 may be null but ball 1 may not, and each ball must be no greater than 10, this is enough to encapsulate the requirement. There is one slightly incorrect circumstance: a value of (10, 0) would be valid, which is strictly incorrect (ball 2 was never bowled). In the case of the calculations it’s all correct, but if a 0 was bowled immediately after a strike, it may be possible to insert that as ball 2, which would be misleading.

ALTER TABLE bowling.frame
ADD CONSTRAINT ball_2_never_bowled_after_strike CHECK
(
  ball2 IS NULL OR ball1 < 10 OR frame = 10
);

We can now prevent ball 3 from being set unless we are on frame 10.

ALTER TABLE bowling.frame
ADD CONSTRAINT ball_3_only_in_frame_10 CHECK
(
  ball3 IS NULL OR frame = 10
);

A follow-up to the previous constraint: we only get to bowl ball 3 if we have bowled ball 2, and scored a strike or spare already. Note, we may have a strike on the first ball, which means we might have more than 10.

ALTER TABLE bowling.frame
ADD CONSTRAINT ball3_only_if_eligible CHECK
(
  ball3 IS NULL OR (ball2 IS NOT NULL AND ball1 + ball2 >= 10)
);

Finally, we have some specific allowable conditions for that last ball. We already know that we must have scored a strike or spare with the first two balls, but we need to know how many pins are available to us.

If we scored a spare or two strikes, then we may have any number up to 10 to score now. Otherwise, we have however many pins were left by ball 2 (which means ball 1 must have been a strike).

ALTER TABLE bowling.frame
ADD CONSTRAINT ball3_max_spare_or_strike CHECK
(
  ball2 + ball3 <= 10
  OR
  ball1 + ball2 = 20
  OR
  ball1 + ball2 = 10
);

I find those constraints much easier to read that the original ones.


I’ve written a view, that uses a couple of Common Table Expressions (CTEs), as well as the window functions Dwain discussed.

CREATE OR REPLACE VIEW bowling.frame_score AS (
  WITH pin_counts AS (
    SELECT
      game_id,
      player_id,
      frame,
      ball1, ball2, ball3,
      -- Get the first ball from the next frame.
      -- Used by strike and spare.
      LEAD(ball1, 1) OVER (
        PARTITION BY game_id, player_id
        ORDER BY frame
      ) AS next_ball_1,
      -- Get the second ball from the next frame.
      -- Used by strike.
      LEAD(ball2, 1) OVER (
        PARTITION BY game_id, player_id
        ORDER BY frame
      ) AS next_ball_2,
      -- Get the first ball from the next next frame.
      -- Used by double-strike.
      LEAD(ball1, 2) OVER (
        PARTITION BY game_id, player_id
        ORDER BY frame
      ) AS next_next_ball_1
    FROM bowling.frame
  ),
  frame_counts AS (
    SELECT
      game_id,
      player_id,
      frame,
      ball1, ball2, ball3,
      CASE
      -- We will start with frame 10: when we have a strike
      -- or spare, we get all three balls.
      WHEN frame = 10 AND ball1 + ball2 >= 10 THEN
        ball1 + ball2 + ball3
      -- On a strike, we get the next two balls. This could
      -- be from the next frame, or include the first ball
      -- of the frame after that. Note that in frame 9, we will
      -- also look at the second ball from fram 10, rather than
      -- looking for a non-existent frame 11.
      WHEN ball1 = 10 THEN
        ball1 + next_ball_1 + (
          CASE WHEN next_ball_1 = 10 AND frame < 9 THEN
            next_next_ball_1
          ELSE
            next_ball_2
          END
        )
      -- In the case of a spare, grab the next ball. We already
      -- handled a spare on frame 10 above.
      WHEN ball1 + ball2 = 10 THEN
        ball1 + ball2 + next_ball_1
      -- Otherwise, it's just the two balls we bowled in this frame.
      ELSE
        ball1 + ball2
      END AS score
    FROM pin_counts
  )

  -- We have everything we need in the previous CTE, except that
  -- shows us the frame score, rather than the running tally.
  -- We need to do that in another window function here, but
  -- only calculate a value when the frame's wave function has
  -- collapsed (ie, it's score is known).
  SELECT
    frame_counts.*,
    CASE WHEN score IS NOT NULL THEN
      SUM(score) OVER (
        PARTITION BY game_id, player_id
        ORDER BY frame
        ROWS UNBOUNDED PRECEDING
      )
    ELSE NULL END AS total
  FROM frame_counts
);

Again, I think this is a simpler query, and easier to read. But, I guess I wrote it.

We can insert the same data as used there, and look at our results:

-- Game 1
INSERT INTO bowling.frame VALUES
  (1, 1, 1, 7, 2, NULL),
  (1, 1, 2, 3, 7, NULL),
  (1, 1, 3, 6, 4, NULL),
  (1, 1, 4, 10, NULL, NULL),
  (1, 1, 5, 10, NULL, NULL),
  (1, 1, 6, 10, NULL, NULL),
  (1, 1, 7, 9, 1, NULL),
  (1, 1, 8, 10, NULL, NULL),
  (1, 1, 9, 8, 1, NULL),
  (1, 1, 10, 6, 3, NULL);

-- Game 2
INSERT INTO bowling.frame VALUES
  (2, 1, 1, 10, NULL, NULL),
  (2, 1, 2, 3, 7, NULL),
  (2, 1, 3, 10, NULL, NULL),
  (2, 1, 4, 6, 4, NULL),
  (2, 1, 5, 10, NULL, NULL),
  (2, 1, 6, 9, 1, NULL),
  (2, 1, 7, 10, NULL, NULL),
  (2, 1, 8, 8, 2, NULL),
  (2, 1, 9, 10, NULL, NULL),
  (2, 1, 10, 7, 3, 10);

-- Game 3
INSERT INTO bowling.frame VALUES
  (3, 1, 1, 10, NULL, NULL),
  (3, 1, 2, 10, NULL, NULL),
  (3, 1, 3, 10, NULL, NULL),
  (3, 1, 4, 10, NULL, NULL),
  (3, 1, 5, 10, NULL, NULL),
  (3, 1, 6, 10, NULL, NULL),
  (3, 1, 7, 10, NULL, NULL),
  (3, 1, 8, 10, NULL, NULL),
  (3, 1, 9, 10, NULL, NULL),
  (3, 1, 10, 10, 10, 10);
$ SELECT * FROM frame_score;

 game_id | player_id | frame | ball1 | ball2  | ball3  | score | total
---------+-----------+-------+-------+--------+--------+-------+-------
       1 |         1 |     1 |     7 |      2 | <NULL> |     9 |     9
       1 |         1 |     2 |     3 |      7 | <NULL> |    16 |    25
       1 |         1 |     3 |     6 |      4 | <NULL> |    20 |    45
       1 |         1 |     4 |    10 | <NULL> | <NULL> |    30 |    75
       1 |         1 |     5 |    10 | <NULL> | <NULL> |    29 |   104
       1 |         1 |     6 |    10 | <NULL> | <NULL> |    20 |   124
       1 |         1 |     7 |     9 |      1 | <NULL> |    20 |   144
       1 |         1 |     8 |    10 | <NULL> | <NULL> |    19 |   163
       1 |         1 |     9 |     8 |      1 | <NULL> |     9 |   172
       1 |         1 |    10 |     6 |      3 | <NULL> |     9 |   181
       2 |         1 |     1 |    10 | <NULL> | <NULL> |    20 |    20
       2 |         1 |     2 |     3 |      7 | <NULL> |    20 |    40
       2 |         1 |     3 |    10 | <NULL> | <NULL> |    20 |    60
       2 |         1 |     4 |     6 |      4 | <NULL> |    20 |    80
       2 |         1 |     5 |    10 | <NULL> | <NULL> |    20 |   100
       2 |         1 |     6 |     9 |      1 | <NULL> |    20 |   120
       2 |         1 |     7 |    10 | <NULL> | <NULL> |    20 |   140
       2 |         1 |     8 |     8 |      2 | <NULL> |    20 |   160
       2 |         1 |     9 |    10 | <NULL> | <NULL> |    20 |   180
       2 |         1 |    10 |     7 |      3 |     10 |    20 |   200
       3 |         1 |     1 |    10 | <NULL> | <NULL> |    30 |    30
       3 |         1 |     2 |    10 | <NULL> | <NULL> |    30 |    60
       3 |         1 |     3 |    10 | <NULL> | <NULL> |    30 |    90
       3 |         1 |     4 |    10 | <NULL> | <NULL> |    30 |   120
       3 |         1 |     5 |    10 | <NULL> | <NULL> |    30 |   150
       3 |         1 |     6 |    10 | <NULL> | <NULL> |    30 |   180
       3 |         1 |     7 |    10 | <NULL> | <NULL> |    30 |   210
       3 |         1 |     8 |    10 | <NULL> | <NULL> |    30 |   240
       3 |         1 |     9 |    10 | <NULL> | <NULL> |    30 |   270
       3 |         1 |    10 |    10 |     10 |     10 |    30 |   300
(30 rows)

Building the average scores for a player is likewise similar. Because I’m using a VIEW, I can jut reference that.

SELECT
  player_id,
  AVG(total) as average
FROM frame_score
WHERE frame=10
GROUP BY player_id;
 player_id |       average
-----------+----------------------
         1 | 227.0000000000000000
(1 row)

I’m fairly sure I’ve rewritten the constraints correctly, but may have missed some. Here are some of the condition tests that show invalid constraints:

$ INSERT INTO bowling.frame VALUES(1, 2, 0, 9, NULL, NULL);
ERROR:  value for domain frame_number violates check constraint "frame_number_check"
Time: 0.405 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 1, 11, NULL, NULL);
ERROR:  value for domain ball violates check constraint "ball_check"
Time: 0.215 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 1, -1, NULL, NULL);
ERROR:  value for domain ball violates check constraint "ball_check"
Time: 0.218 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 1, 8, 3, NULL);
ERROR:  new row for relation "frame" violates check constraint "max_spare_unless_frame_10_strike"
DETAIL:  Failing row contains (1, 2, 1, 8, 3, null).
Time: 0.332 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 1, 8, 1, 1);
ERROR:  new row for relation "frame" violates check constraint "ball3_only_if_eligible"
DETAIL:  Failing row contains (1, 2, 1, 8, 1, 1).
Time: 0.392 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 1, 8, 2, 1);
ERROR:  new row for relation "frame" violates check constraint "ball_3_only_in_frame_10"
DETAIL:  Failing row contains (1, 2, 1, 8, 2, 1).
Time: 0.327 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 10, 8, 3, 1);
ERROR:  new row for relation "frame" violates check constraint "max_spare_unless_frame_10_strike"
DETAIL:  Failing row contains (1, 2, 10, 8, 3, 1).
Time: 0.340 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 10, 8, 2, 11);
ERROR:  value for domain ball violates check constraint "ball_check"
Time: 0.200 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 10, 10, NULL, 10);
ERROR:  new row for relation "frame" violates check constraint "ball3_only_if_eligible"
DETAIL:  Failing row contains (1, 2, 10, 10, null, 10).
Time: 0.316 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 5, 10, 0, NULL);
ERROR:  new row for relation "frame" violates check constraint "ball_2_never_bowled_after_strike"
DETAIL:  Failing row contains (1, 2, 5, 10, 0, null).
Time: 0.307 ms
$ INSERT INTO bowling.frame VALUES(1, 2, 10, 10, 2, 9);
ERROR:  new row for relation "frame" violates check constraint "ball3_max_spare_or_strike"
DETAIL:  Failing row contains (1, 2, 10, 10, 2, 9).
Time: 0.323 ms

Finally, I rewrote the pretty printer. It’s not quite perfect (I don’t like how I get the plus signs at the newline character), but it is workable:

WITH symbols AS (
  SELECT
    game_id, player_id, frame,
    CASE WHEN ball1 = 10 THEN 'X' ELSE ball1::text END as ball1,
    CASE WHEN ball2 IS NULL THEN ' '
         WHEN ball1 + ball2 = 10 THEN '/'
         WHEN ball1 = 10 AND ball2 = 10 THEN 'X'
         ELSE ball2::text
         END as ball2,
    CASE WHEN ball3 IS NULL THEN ' '
    WHEN ball3 = 10 THEN 'X'
    WHEN ball3 + ball2 = 10 THEN '/'
    ELSE ball3::text
    END as ball3,
    lpad(total::text, 5, ' ') as total
  FROM
    frame_score
  ORDER BY game_id, player_id, frame
), grouped_data AS (
  SELECT
    game_id,
    player_id,
    array_agg(ball1) ball1,
    array_agg(ball2) ball2,
    array_agg(ball3) ball3,
    array_agg(total) total
  FROM
    symbols
  GROUP BY
    game_id, player_id
)
SELECT
  game_id,
  player_id,
  ball1[1] || ' | ' || ball2[1] || ' ' || chr(10) || total[1] AS "1",
  ball1[2] || ' | ' || ball2[2] || ' ' || chr(10) || total[2] AS "2",
  ball1[3] || ' | ' || ball2[3] || ' ' || chr(10) || total[3] AS "3",
  ball1[4] || ' | ' || ball2[4] || ' ' || chr(10) || total[4] AS "4",
  ball1[5] || ' | ' || ball2[5] || ' ' || chr(10) || total[5] AS "5",
  ball1[6] || ' | ' || ball2[6] || ' ' || chr(10) || total[6] AS "6",
  ball1[7] || ' | ' || ball2[7] || ' ' || chr(10) || total[7] AS "7",
  ball1[8] || ' | ' || ball2[8] || ' ' || chr(10) || total[8] AS "8",
  ball1[9] || ' | ' || ball2[9] || ' ' || chr(10) || total[9] AS "9",
  ball1[10] || ' | ' || ball2[10] || ' | ' || ball3[10] || ' ' || chr(10) || lpad(total[10], 9, ' ') AS "10"
FROM grouped_data;
 game_id | player_id |   1    |   2    |   3    |   4    |   5    |   6    |   7    |   8    |   9    |     10
---------+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+------------
       1 |         1 | 7 | 2 +| 3 | / +| 6 | / +| X |   +| X |   +| X |   +| 9 | / +| X |   +| 8 | 1 +| 6 | 3 |   +
         |           |     9  |    25  |    45  |    75  |   104  |   124  |   144  |   163  |   172  |       181
       2 |         1 | X |   +| 3 | / +| X |   +| 6 | / +| X |   +| 9 | / +| X |   +| 8 | / +| X |   +| 7 | / | X +
         |           |    20  |    40  |    60  |    80  |   100  |   120  |   140  |   160  |   180  |       200
       3 |         1 | X |   +| X |   +| X |   +| X |   +| X |   +| X |   +| X |   +| X |   +| X |   +| X | X | X +
         |           |    30  |    60  |    90  |   120  |   150  |   180  |   210  |   240  |   270  |       300
(3 rows)

That will do for now. Corrections welcome!

blog comments powered by Disqus