Skip to content

PSDG for game theory

On this site: Home · FAQ · Game theory — at a glance · Mortal vs Oracle parable · Q-learning / bandit demo · Rules (v1.13) · YouTube tutorial · ML · AI safety · Blunder sweep · Optimal vs random legal B


Protocol-dependent optimality. In the PSDG benchmarks, “optimal” realised outcomes are not fixed by the abstract extensive form alone: they are joint with a deployment protocol (P)—whether the Exchange is simultaneous or sequential, and whether A re-solves at the realised Gift node or commits to the principal-line Exchange action. The three headline blunder-suite rates in the empirical snapshot are different answer objects under different (P), not three readings of a single unconstrained value.


PSDG is a finite, perfect-information (after setup) extensive-form game with a simultaneous-move subphase (the Exchange) in the canonical rules. The project’s exact solver supplies values and equilibria at that phase—so you can separate equilibrium analysis from implementation choices about commitment and re-optimization.

That formal classification coexists with a pedagogical point the site names explicitly elsewhere: Twists can force commitment before legibility—Phase‑2 and tiebreak consequences are fixed before the Exchange and Tumble make them obvious from a coarse “tops only” summary—without introducing hidden information after setup (Technical report — Commitment before legibility · FAQ — “Really” PI?).

Sequential Exchange in the benchmark suite is still the same game tree for draft, eligibility, tumble, and scoring: the ML and robustness story—proxy vs latent structure, static vs re-solving, regret off the principal line—does not depend on simultaneity alone. Simultaneity changes the information structure at the Exchange (and the numbers in the snapshot), but it is not the only source of “hard structure.”

Deployment gap {#deployment-gap} — shorthand for the wedge between what was optimal on the assumed principal storyline and ex-post payoffs once play is off that path, under a fixed deployment rule P (e.g. static vs re-solving at the Exchange, sequential vs simultaneous timing). The blunder table quantifies slices of that gap; it is not the same object as “suboptimal play from the opening roll” in unconstrained minimax.


Extensive form, observables, and latent commitments

After setup, PSDG is perfect information in the usual sense: there are no hidden draws mid-game. Even so, a description that collapses play to current-stage observables alone—for example, treating the histogram of die tops on the board as “the state”—aliases distinct histories. The same visible board can come from different draft Twist commitments (facings), so the optimal continuation from one history need not be optimal from another. Any analysis (or learned policy) that acts as if “the board photo” is a sufficient summary is merging nodes that the rules keep distinct. Informal version: Home — In brief; mechanism skim: Core ideas.


Representation, deployment, and simultaneity

If you keep the full game but remove simultaneous Exchange, you are not left with just one research thread. You still have both:

  1. Latent state / wrong representation — The board snapshot is still insufficient: draft-time facings still encode later consequences, and a learner can still alias distinct latent states.
  2. Deployment / protocol — Even with sequential Exchange, static principal-line commitment at the Gift still differs from re-solving after deviation. The 8.5% row in the empirical snapshot is already sequential; the “frozen commitment gets punished” story does not depend on simultaneity at the Gift.

What drops out is only the simultaneous-move packaging at the Exchange: no lockstep choice there, no Nash-at-a-simultaneous-node framing, and less emphasis on the 8.5% vs 6.9% split (that contrast is timing / information structure at the Gift, not the whole deployment layer).

Compact map:

SettingWhat is in play
Parable onlyRepresentation failure (proxy vs true rule) — no draft, no Exchange
Full game, sequential ExchangeRepresentation plus deployment fragility (static vs re-solve; off-path play)
Full game, simultaneous ExchangeSame plus simultaneity effects at the Gift (game-theoretic node; 8.5% vs 6.9%)

Same breakdown from a safety angle: AI safety — Simultaneous Exchange and the safety thesis. What to do about each layer (objective, protocol, irrevocability): AI safety — Three layers, three handles.


Pinned definitions: optimal, blunder, ex post payoff

These three phrases are easy to mix in English. On this site they mean specific things tied to the benchmark embedding and the protocol after a deviation.

  1. Optimal under the project embedding — The oracle returns values and actions under the project’s full-game solution concept and benchmark spec (principal line, equilibrium at the Exchange, and the rest of the rules as implemented). “Optimal” in tables and prose means optimal under that embedding—not an informal “best-looking move” without a convention.

  2. Blunder — In the blunder suite, a blunder is a move that is suboptimal under that embedding at the recorded decision (e.g. B’s last draft pick is not the continuation prescribed by the solver on the assumed storyline).

  3. Better ex post outcome (under protocol P) — After a deviation, realized play follows a deployment rule P: whether A commits to the Exchange action from the principal line or re-solves at the Exchange; simultaneous vs sequential Exchange; and so on. Ex post payoffs are evaluated on the realized path. A move that was oracle-suboptimal at the blunder node can still land in a part of the tree where, under P, another continuation would have been better for a player—because P is not the same object as “always recompute from the true node with no institutional commitment.” (See deployment gap above.)

Takeaway. The oracle is not “wrong” when it marks a move suboptimal under embedding X: it is consistent with X. The stress test is whether deployment (static vs re-solving, Exchange timing, etc.) tracks the realized extensive form. The empirical snapshot reports how often that gap appears in seeded suites—protocol-dependent rates (5.7% / 8.5% / 6.9% in the standard blunder table), not a single universal percentage. Read 5.7% as re-solving at the Gift (not as draft minimax failing); read 8.5% / 6.9% vs ~8.0% optimal-vs-optimal as the commitment contrast.


What is unusually clear here

  • Equilibrium vs engineering: Having (V^*) and optimal actions does not by itself specify whether a real player should re-solve after observing off-path opponent play or commit to a precomputed principal line. PSDG measures that gap.
  • Simultaneous vs sequential timing: The same strategic mistake is punished differently when the opponent can best-respond sequentially (observe, then act) vs must act in a simultaneous institution. Empirically (blunder suite): 8.5% vs 6.9% B wins for static A under sequential vs simultaneous exchange—information structure appears in the percentages, not only payoff matrices.
  • Off-equilibrium play as first-class: Classical analysis often fixes an equilibrium path; this setup asks what happens when realized play leaves that path and commitments are (or are not) revised. With re-solving at the Exchange, ~5.7% (287/5000) B wins are not extra wins “stolen” from A-win or drawn roots under the same minimax embedding—they align with B-win openings and sit below optimal-vs-optimal B wins (399): last-draft blunders hurt B in part of the mass. The static rows show the distinct phenomenon: frozen principal-line Exchange can yield more B wins than optimal-vs-optimal play.

Conceptual map

ObjectRole in PSDG
Oracle / re-solving solverEvaluates positions assuming correct continuation at Exchange
Static principal-line policyCommits to exchange action from ex ante optimal line; may be suboptimal on realized off-path node
Simultaneous ExchangeRestricts how much B can exploit A’s mistaken commitment
Sequential ExchangeAllows best response after relevant information; higher B win rate in the static regime

Figure: deployment at the Exchange (schematic)

The Poisoned Gift and related eligibility rules narrow the feasible set of exchanges (what is legal from a given realized Crucible)—see Poisonous System Gift for why that constraint is system-imposed, not only adversarial—but the research stress test is different from algorithmic “pruning” in search: here the issue is which continuation rule is wired in after play leaves the principal line—re-solving versus static commitment—and whether the Exchange is run sequentially or simultaneously. Those choices change ex post outcomes under protocol (P) even when the oracle is consistent with a fixed embedding.

Diagram: from a realized off-path node, branch to re-solve at Exchange versus static principal-line commitment; lower panels contrast sequential and simultaneous Exchange timing.
Same realized node; different deployment and timing at the Exchange. Numbers in the empirical snapshot (e.g. 8.5% vs 6.9% B wins for static A) are tied to this geometry—not to a fixed fraction of branches “cut” by the gift rules.

Relation to “standard” theory

PSDG does not overturn basic equilibrium concepts; it instantiates tensions that textbook abstractions sometimes leave implicit:

  • Subgame perfection vs computational / institutional commitment to a single rolling plan.
  • Trembling-hand and noise in opponent play as stress on policies that assume always-on-path opponents.
  • Simultaneous vs sequential protocol choice as robustness technology (caps on punishment vs observability).

Rule-staged commitment games — a taxonomy aside

Named classes, imported intuition. Textbook groupings—finite perfect-information (combinatorial / Zermelo-style) play, repeated games, cooperative or coalitional models, mechanism design, poker- or Harsanyi- type incomplete information—each comes with a familiar toolkit. PSDG, after the opening roll, is still a finite two-player zero-sum extensive-form game with a simultaneous subphase. Nothing here claims it falls outside that mathematics: you can always write the tree. The pedagogical point is that no single off-the-shelf banner matches the bundle this project measurescommitment under the rules before later phases make the payoff of those commitments obvious in a coarse observation; state aliasing under full visibility; and optimal realised outcomes that are joint with a protocol (P) (deployment gap, empirical snapshot). The usual inherited one-liners for a neighboring class (beliefs, stationarity, “the board is the state,” “equilibrium is one object,” …) are easy to get wrong here—not because the foundations are new, but because the default picture is a bad fit for the stress the benchmark is built to make legible.

Candidate working name (not a new formalism): call games with that compositional profile rule-staged commitment games if you need a label: the rules stage when payoff-relevant distinctions become legible in a summary a policy (or a human) actually uses, after some moves are already irrevocable; “staged” points at rule-given ordering, not private exogenous draws. (Same spine as commitment before legibility in different words.) This is a naming convenience for talks and tables, not a claim that the literature has no related formal threads—only that no one headline class contains the whole bundle the site treats as first-class. Short signpost: FAQ — Where does PSDG sit in the usual named classes?.


Irrevocable draft and the ~5.7% re-solving row

The ~5.7% cell is not “minimax beaten from A-win or drawn roots”: under the published draft backup, B wins in that row almost all occur at openings the benchmark already scores as B-win—a full value cross-join found 284/287 with value == -1 (see empirical snapshot). The 399 → 287 gap is largely B’s random last-draft error costing wins, not A losing from a non-losing root because search was weak. What is structurally load-bearing is irrevocable draft: twists cannot be un-picked, so the realised tree is fixed before the Exchange; re-solving at the Gift restores part of what subgame-perfection-style reasoning wants, while static principal-line play at the Gift can push B wins above optimal-vs-optimal (8.5% / 6.9% vs ~8.0%)—that is the clean commitment contrast.


Closing emphasis. PSDG is a prompt to move from computing a single equilibrium under idealised on-path play toward measuring protocol-dependent outcome distributions under implementation constraints (what is frozen vs recomputed, who moves when), off-path stress (the blunder suite), and state that must encode historical commitments—not only current-stage observables.


Related entry points

For rules detail and benchmark definitions, see the repository rules and benchmark spec when linked from this site.