Skip to content

PSDG for ML — At a Glance

On this site: Home · ML (full) · FAQ · Parable · Empirical snapshot

What happens when a policy looks strong on the **visible** game—yet tracks the wrong **state**, the wrong **target**, or the wrong **continuation rule**? PSDG is an exact two-player benchmark where the offline **solver is not the learner**: it **rules** regret, legality, and outcome splits so you argue about deployment and representation—not about what counts as ground truth.

Learn basic play in about 3 minutes — Watch on YouTube. Tiebreak / Immortal takes a few more minutes — demo & script.

The result: Under a fixed oracle-static baseline (A) and strictly worse (B), win-rate and regret move with what the benchmark holds fixed in the rollout — not “smarter gradients.” Freeze the midgame swap plan vs allow replanning after branches resolve; run that episode sequential vs simultaneous. Those choices shift how often (B) still wins—in the suite, between ~5.7% and ~8.5%, with headline rows near ~8.5%, ~6.9%, and ~5.7%. Report metrics at the same pin—“strong-looking” summaries that skip the freeze/replan fork smear incomparable rows. Formal names ((P)): Game theory framing · Empirical snapshot.

PSDG playmat (empty board)

The Playmat

Six gray dice in the draft pool row after setup

The complete board after initial setup; deterministic play begins here.

The mechanism: Value and regret are defined under the project embedding—principal line, equilibrium at the Exchange when relevant, static vs re-solving, sequential vs simultaneous timing. A board snapshot aliases distinct histories: draft facings and eligibility matter for the true Markov state. Training on salient Phase‑1 signal can stay optimal on the proxy yet wrong under oracle value when latent rules bite.

Why this matters to ML research:

  • Oracle-grounded metrics — Legal-move rate, regret (delta vs oracle), principal-line rate, and (P)‑labeled outcome splits beat win-rate-alone stories on this failure class.
  • Representation is load-bearing — Compression that drops facings / commitments pressures wrong sufficient statistics; rollouts of the same schema can increase confidence without alignment.
  • Deployment / protocol is measured — What stays frozen vs re-solved at the Exchange is a reporting surface, not a modeling afterthought.
  • Credit assignment hurts — Final payoffs entangle your draft with the opponent’s Poisoned Gift; disentangling without oracle-grade structure is hard.
  • More scale on the wrong objective sharpens the mistake — Not fixes it: the failure is misspecification, not weak optimization.

See: ML pillars · Credit / adversarial injection · Parable

Deeper Dive:


If you take one thing:

When you report win rate alone, what deployment protocol are you implicitly assuming — and does your actual deployment match it?

Detailed report: Technical report (summary)