Appearance
Optimal A vs random legal B (baseline)
On this site: Home · Game theory · ML
Standalone stress test: Player A uses the exact solver’s optimal draft move at every decision (assuming optimal continuation thereafter), then the benchmark Exchange protocol below. Player B picks uniformly at random among all legal draft moves, and (in the default exchange mode) a uniform random legal Poisoned Gift at the Exchange; A best-responds to B’s gift. This is not the simultaneous Nash Exchange subgame for B—B is intentionally weak everywhere except legality.
Contrast: this script mainly probes general competence against noise (legal but unstructured play). The 5,000-game blunder suite is different: B follows the principal line until a deliberate last-pick deviation, which stresses deployment (static vs re-solving at the Gift, Exchange timing)—off-equilibrium fragility, not the same statistic as win rate vs random.
Published log (public repo): benchmark/output/optimal_vs_random_legal_batch10000_seed42.txt in psdg.
Driver: the batch was produced with optimal_vs_random_legal.js (Node), which lives only in the internal repository layout under private/psdg/benchmark/ at the root of that checkout—the public psdg repo does not include private/ or this script (Python solver + benchmarks only there).
What the 31 draws mean (and do not mean)
Yes — about the opening, not about B “being optimal.”
For all 31 draws in the 10k run, the opening (board + crystals) had oracle value 0: if both sides followed optimal play from that roll under the solver embedding, the outcome class is a draw. So those positions are theoretically drawn under optimal-vs-optimal.
No — the dumb player is not “just as good as optimal.”
The realized games were A optimal vs B uniformly random (legal). From the 1698 opens that were oracle draw, optimal A still beat random B in 1667 of them (~98.2%). Only 31 times did random B’s actual choices (draft + random gift), together with A’s optimal replies, still land in a draw instead of an A win.
So:
- Optimal B from those opens could force the draw (value says neither side can improve unilaterally under the embedding).
- Random B usually drifts into a loss against optimal A; rarely the random line stays “in the drawing region” ex post.
That is not the same as “random causes a draw as well as optimal.” It is: “from a draw-valued open, noise sometimes doesn’t throw the game away.”
10,000 trials (six board dice, seed 42…10041, random-br)
| Outcome | Count | Rate |
|---|---|---|
| A wins | 9965 | 99.65% |
| Draw | 31 | 0.31% |
| B wins | 4 | 0.04% |
Opening value from solveFromRoll (both sides optimal from the roll — Nash-modeled Exchange at the end of the solver tree):
| Oracle value (A perspective) | Count | Rate |
|---|---|---|
| +1 (A wins under optimal) | 8289 | 82.89% |
| 0 (draw under optimal) | 1698 | 16.98% |
| −1 (B wins under optimal) | 13 | 0.13% |
Cross-checks (same run):
- 0 trials with oracle +1 but B won (random never “stole” a win from an A-winning open).
- 4 trials with oracle −1 and B won — all B wins sit in that 13-trial slice (random B converted 4/13 of B-favored opens).
- Draws by opening oracle: 31 draws with oracle 0; 0 draws with oracle +1; 0 with oracle −1.
So the 31 draws are not “random luck from an A-favored board.” See above: they are oracle-0 opens where random B nonetheless held a draw against optimal A only ~1.8% of the time within that slice.
How to read “amazing” vs “setup”
- Board + crystals fix the opening’s +1 / 0 / −1 under optimal-vs-optimal (draw semantics).
- A realized draw in this harness is (i) oracle 0 and (ii) random B’s trajectory did not collapse to an A win under optimal A.
More than 10,000 trials: useful if you want tighter confidence on very rare cells (e.g. B wins at 0.04%). For draws at ~0.31%, 10k already gives a rough order of magnitude; doubling trials shrinks Monte Carlo noise but will not change the qualitative story unless the script or seed regime changes.
Relation to other site notes
- Game theory — pinned definitions of optimal and protocol.
- Blunder suite / static vs re-solving — different embedding (B blunders off the principal line; emphasis on commitment and Exchange timing).
- This page — policy contrast: optimal A vs uniform random legal B, to separate “how hard is the game to lose against noise?” from blunder exploitation rates.
