Skip to content

Illegal-upgrade stress test

On this site: Home · ML · AI safety · Game theory · Blunder sweep

Counterfactual benchmark—not legal play. In a real match a verifier would reject the move.


Some stress tests stay on the rules: blunders are legal but suboptimal. This one steps off the spec on purpose: oracle vs oracle, except A’s second draft pick is recorded as top 6 when the true board die was lower—an illegal upgrade that changes duplicates, Exchange eligibility, tumble geometry, and tiebreak reach.

Why run it? In a naive picture, extra visible strength should help. Under PSDG’s coupled rules, optimal B on the true tree responds to the resulting position; A’s win rate need not rise—and can fall when the board is large enough.

Published numbers (same random-crystal style as other suites; 500 trials each in the reference script):

Board diceEffect on A win rate (percentage points)Qualitative note
4about +1.4Small headline gain; draws collapse and many games become B wins.
6about −21.2Strong backfire—illegal 6 reshapes the subgame B exploits.

Implementation and tables live only in the internal repository: Node script private/psdg/benchmark/cheat_test_second_draft.js (at the root of that checkout; not in public psdg) and the strategy note Cheaters never prosper (public/docs/strategies/cheaters-never-prosper.md, mirror private/docs/strategies/—those paths are not on psdg.pages.dev and not in the psdg artifact repo). On consumer-facing copy we prefer neutral language—illegal move, constraint violation, misreported state—rather than moralizing about “cheating.”

Takeaway for ML / safety: a spec violation is just another state update for an exact opponent model; visible shortcuts can be non-monotonic when latent structure matters—here, dramatically so at six dice.