Fixed-board blunder enumeration

On this site: Home · Rules (v1.13) · YouTube tutorial · ML · AI safety · Game theory

This note documents a complementary benchmark to the usual “random blunder” trials: same opening position, enumerate every legal last draft pick for B (optimal + all deviations), and compare A’s payoff when A re-solves at the Exchange versus when A commits to the principal-line gift from the original full-game solution.

It makes the mechanism behind “static vs re-solving” visible: which mistakes change the Crucibles enough that a frozen Exchange action is no longer best?

Two policies at the Exchange (after B’s last pick)

Timing. B’s “blunder” is a suboptimal last draft pick. The draft then finishes; we are at the Poisoned Gift (Exchange). Everything below is evaluated after that pick—we are not re-running the draft from scratch.

Re-solving (A_rs) — At the Exchange, A plays as your solver does on the true position: equilibrium of the simultaneous Exchange subgame given the actual Crucibles. So yes: A re-solves the Exchange after the blunder (in the sense that matters—recomputing the gift/twist choice for the state that actually arose).
Static (A_st) — At the Exchange, A does not re-solve. A uses A’s gift (die index + facing for the opponent) taken from the principal line of the game solved from the opening roll, before B’s mistake was known—then B best-responds to that fixed gift. This models deployment: “I cached my Exchange move from the first full solve and I execute it even though the opponent’s last pick changed the position.”

Why compare them? The gap is not “the solver never re-solves.” It is oracle-perfect Exchange play vs frozen ex ante Exchange commitment once the state has moved off the line the cache assumed. When st − rs < 0, sticking to the old gift is strictly worse for A than recomputing the Exchange on the real Crucibles.

Script

In the public psdg repository:

solvers/python/fixed_board_blunder_sweep.py

bash

# from psdg repo root after clone
cd solvers/python

# One random open (seed controls board + crystals, same scheme as other blunder tests)
python3 fixed_board_blunder_sweep.py --seed 42 --dice 6

# Explicit histogram (counts for tops 1..6) and crystals
python3 fixed_board_blunder_sweep.py --board 0,1,1,0,3,1 --crystal-a 1,5 --crystal-b 4,1

# Aggregate: many opens, enumerate all B blunders on each (scale --batch to taste; large runs are slow)
python3 fixed_board_blunder_sweep.py --batch 2000 --seed 42 --dice 6

Columns: A_rs = A’s result if the Exchange is played in equilibrium after the draft; A_st = A’s result if A uses the static principal-line gift and B best-responds. st − rs < 0 means freezing the line hurts A relative to re-solving on that branch.

Representative results (6 dice)

On 2000 random opens (seeds 42–2041, six dice, full enumeration of B’s last-pick blunders), about 5% of blunder rows have st − rs < 0 and about 13% of opens show at least one such row—order of magnitude that already held on smaller pilots. Full command-line summary: benchmark/output/fixed_board_blunder_batch2000_seed42.txt in psdg. The point for the site is mechanism (when freezing the principal-line gift stops matching the true subgame), not a second headline metric beside the main blunder tables.

Illustrative single open (seed 48): game value from the roll is −1 (B wins under optimal play). After B’s last pick, one blunder branch gives A_rs = +1 and A_st = −1—re-solving lets A win; static A still loses. Other blunders on that open do not split the two policies.

So the effect is not rare noise: a material fraction of positions admit a last-pick mistake that turns a frozen Exchange plan into a strict mistake relative to re-solving.

How this fits the research story

The headline benchmark remains aggregate rates (e.g. blundering B vs static A over thousands of trials).
Enumeration answers what is going wrong: same tableau, vary only the mistake, and you see exactly when commitment to the ex ante line stops matching optimal play in the true subgame.
For game theory and deployment / alignment framing, it supports the claim that “oracle value” and “always play the first computed gift” are different objects once opponents can move the state off the principal line.

Fixed-board blunder enumeration ​

Two policies at the Exchange (after B’s last pick) ​

Script ​

Representative results (6 dice) ​

How this fits the research story ​

See also ​

Fixed-board blunder enumeration

Two policies at the Exchange (after B’s last pick)

Script

Representative results (6 dice)

How this fits the research story

See also