Appearance
PSDG for ML — At a Glance
On this site: Home · ML (full) · FAQ · Parable · Empirical snapshot
What happens when a policy looks strong on the **visible** game—yet tracks the wrong **state**, the wrong **target**, or the wrong **continuation rule**? PSDG is an exact two-player benchmark where the offline **solver is not the learner**: it **rules** regret, legality, and outcome splits so you argue about deployment and representation—not about what counts as ground truth.
Learn basic play in about 3 minutes — Watch on YouTube. Tiebreak / Immortal takes a few more minutes — demo & script.
The result: Under a fixed oracle-static baseline (A) and strictly worse (B), win-rate and regret move with what the benchmark holds fixed in the rollout — not “smarter gradients.” Freeze the midgame swap plan vs allow replanning after branches resolve; run that episode sequential vs simultaneous. Those choices shift how often (B) still wins—in the suite, between ~5.7% and ~8.5%, with headline rows near ~8.5%, ~6.9%, and ~5.7%. Report metrics at the same pin—“strong-looking” summaries that skip the freeze/replan fork smear incomparable rows. Formal names ((P)): Game theory framing · Empirical snapshot.

The Playmat

The complete board after initial setup; deterministic play begins here.
The mechanism: Value and regret are defined under the project embedding—principal line, equilibrium at the Exchange when relevant, static vs re-solving, sequential vs simultaneous timing. A board snapshot aliases distinct histories: draft facings and eligibility matter for the true Markov state. Training on salient Phase‑1 signal can stay optimal on the proxy yet wrong under oracle value when latent rules bite.
Why this matters to ML research:
- Oracle-grounded metrics — Legal-move rate, regret (delta vs oracle), principal-line rate, and (P)‑labeled outcome splits beat win-rate-alone stories on this failure class.
- Representation is load-bearing — Compression that drops facings / commitments pressures wrong sufficient statistics; rollouts of the same schema can increase confidence without alignment.
- Deployment / protocol is measured — What stays frozen vs re-solved at the Exchange is a reporting surface, not a modeling afterthought.
- Credit assignment hurts — Final payoffs entangle your draft with the opponent’s Poisoned Gift; disentangling without oracle-grade structure is hard.
- More scale on the wrong objective sharpens the mistake — Not fixes it: the failure is misspecification, not weak optimization.
See: ML pillars · Credit / adversarial injection · Parable
Deeper Dive:
- ML — full analysis — pillar map, metrics table, heuristic pilot, open questions
- Q-learning / bandit demo — runnable minimal agent, regret numbers
- Parable (~60 seconds) — proxy vs latent rule
- Game theory framing — optimal, blunder, (P), deployment gap
- Empirical snapshot — reproducible table
- Solver + benchmarks (GitHub) — reproduce runs
If you take one thing:
When you report win rate alone, what deployment protocol are you implicitly assuming — and does your actual deployment match it?
Detailed report: Technical report (summary)
