Skip to content

FAQ

The first answer orients how to read the project (three reviewer buckets). § 2 — Novelty & scope matches the framing on Home — In brief (edit both when that note changes). After that: short responses to common objections. One-line pitch for colleagues (single canonical copy): How to cite this. Deeper arguments: Technical report (summary) · Game theory · ML · AI safety.

Before diving in: How to read PSDG — methodological framing (benchmark vs theorem vs analogy vs instrument).

ML reviewers: if your first question is where the flagship trained baseline on draft → Exchange → full Reckoning is, see § 13; the methodological answer (“oracle as instrument,” not RL placeholder) is on How to read PSDG (especially bucket 1).

One benchmark, many doors: PSDG sits at the intersection of game theory, ML evaluation, and alignment. These are not separate stories, but different routes into the same benchmark under the same oracle. Entry points by audience: ML · AI safety · Game theory · Home — Audience routes.


1. How to read this project

Readers often sort PSDG into one of three buckets — trained-agent benchmark, game-theoretic restatement, or AI safety analogy. Each captures part of the story:

  • Benchmark lens: expects flagship learned baselines (§ 13); but the oracle is measuring equipment, not a placeholder waiting for RL to replace it — How to read PSDG unpacks why.
  • Game theory lens: themes are familiar; the payoff is oracle-grounded counts under pinned deployment protocol (P), not a single new impossibility theorem.
  • Safety analogy lens: proves nothing about frontier models directly; it isolates structure that is easier to measure exactly here than in the wild.

Full write-up (buckets in prose, microscope metaphor, evaluation checklist): How to read PSDG →


2. Novelty & scope — same copy as the home researcher note

The block below matches the Home — In brief NOTE on purpose (maintenance: edit both places or keep them in sync).

PSDG is not presented as an entirely new catalog of failure modes. Proxy misspecification, latent structure, and deployment brittleness are each familiar in isolation. What is distinctive is that all three live in one small, exactly solved, reproducible environment with a shared oracle—the parable for proxy / representation stress, the full game for latent commitment structure, and the seeded benchmarks for measurable deployment brittleness—so their interaction is concrete and checkable rather than spread across separate papers, noisy demos, or toy setups. That yields a tight conceptual counterexample (e.g. exact value does not imply deployment-safe play) and an exact diagnostic benchmark.

One implication may be novel as a demonstrated result: the failure survives not only perfect optimization but perfect rule knowledge and complete visibility—the conditions under which human-in-the-loop oversight is often supposed to work. The problem is structural, not a limitation of attention, scale, or compute.

Public artifact (to our knowledge). PSDG is, to our knowledge, the first public, compact, perfect-information (after setup), exactly solved environment where static vs re-solving (and related blunder / deployment) win-rate splits under the published benchmark protocol are reproducible from published seeds and an exact reference implementationoracle-grounded values and optimal play, not approximate equilibria or learned policies. The phenomenon is not new in the abstract; what is distinctive is a clean, checkable, public measurement tied to exact play. Corrections welcome.

In one line: not brand-new ingredients in isolation; yes to the exact package, what it measures, and the oversight implication under ideal observability.


3. How to cite this to colleagues

Paste into email or Slack when someone asks what PSDG is:

Small deterministic game, full oracle, seeded 5k suite—proxy policies fail against ground truth; misspecification demo is “optimally wrong”; even optimal machinery is fragile if you freeze the wrong commitments.

(This is the only place this blurb is kept on the docs site, so it stays easy to edit in one file.)


4. What is the difference between “oracle correctness” and “static A”?

The oracle (solver) defines legal moves, values, and the principal line under the published embedding. Static A is a deployment choice at the Exchange: replay the principal-line gift on the realized crucibles instead of re-solving the Gift on that node. The oracle is not “wrong” when static play loses; the protocol no longer matches continued re-optimisation. One-line signpost above the blunder table: Home — Empirical snapshot.

Skimmers sometimes read “8.5% B wins” as “the solver was wrong.” The site means the opposite: the oracle is still the reference; static replay at the Exchange is a different object from continued re-optimisation on the realized node.


5. Reference solver and benchmark integrity

The Python reference solver is the source of published values and principal lines. A full cross-join (blunder trial vs benchmark value) is documented in benchmark/output/blunder_resolving_vs_root_value_5000_2026-04-05.txt in the psdg repository (in a full development checkout, the same file may also appear under private/psdg/benchmark/output/psdg has benchmark/output/ only): 284/287 B wins pair with value == -1. 3 seeds (4167, 4359, 4402) have value == +1 but the blunder trial still records a B win; there the principal line from solve_from_roll disagrees with solve_from_position at B’s last draft—treat as an implementation / principal-line selection follow-up, not as benchmark noise and not as a refutation of draft minimax. Details: Reading the 5.7% row on the home snapshot.

“Exact oracle” framing: For those three openings, the stored root value and the principal line from solve_from_roll still agree—with each other and with the benchmark JSON—but solve_from_position at B’s final draft node points to a different minimizing move than the line’s last B twist, so a strict mid-game subgame check and the rolled-forward line diverge. That is 3/287 re-solving B wins (~1% of that cell) and 3/5000 trials (~0.06% of the suite)—not an ambiguity in the static vs re-solving story or in the 284/287 bulk cross-join. It is a known reference-solver consistency item (principal-line witness), flagged so nobody mistakes it for “minimax is optional.”


6. Why benchmark static principal-line play—isn’t that a strawman?

No serious claim that “everyone deploys exactly this.” Static A is a deliberate counterfactual next to re-solving on the same seeds: it isolates the cost of freezing an ex ante plan when play goes off-path. Real systems still lean on cached responses, stale plans, or infrequent replanning; the table is a clean measurement of protocol (P), not a portrait of industry defaults. The game-theoretic tension (commitment vs subgame-perfect re-optimization) is standard—the payoff here is explicit protocol plus exact counts. See § 7. Static vs re-solving is already in game theory and Game theory — deployment gap.


7. “Static vs re-solving is already in game theory.”

The theme is not new. The payoff here is tight packaging, numbers on a seeded suite, and a playable toy where protocol is explicit—so “optimal under P” is not confused with “safe under every deployment rule.” See Game theory (opening framing and deployment gap) and the empirical snapshot. For how PSDG relates to familiar named classes (and a working label for the feature bundle), see § 21 · Game theory — Rule-staged commitment games.


8. “The proxy is misspecified by construction.”

Partly fair for the parable. The bandit setup deliberately trains on a visible proxy so you can isolate proxy vs latent structure when a tiebreaker turns on—by design, not as an accidental bug. The full game adds draft, Exchange, and richer state; the point is still measurement and mechanism, not “surprise that a wrong reward exists.” See ML and Core ideas on the home page. For mesa-optimization vocabulary—speculative only—see § 9.


9. Could PSDG help detect mesa-optimization?

Speculative — future work. PSDG does not instantiate an inner optimizer or a validated mesa detector. It can be read as a tractable analogue of the proxy vs base objective tension mesa-optimization discussions often centre on—except both objectives are fully specified, and the mismatch is explicit and measurable against an oracle.

The parable is the closest fit: learning can converge on behaviour that scores well under a visible proxy while failing under true game value once latent tiebreakers matter. The full game stacks deployment and representation stresses discussed on ML and in the empirical snapshot.

That combination might eventually motivate tests for characteristic patterns—oracle regret when latent rules activate, brittle frozen plans after deviation—but no mesa-focused protocol has been designed or validated here. For the proxy framing without mesa vocabulary, see § 8.


10. “Facings are visible on the table—so this is just bad feature engineering.”

Partly fair as a coding critique, wrong as a dismissal. Twists are visible to humans, but the benchmark story is what happens when a learned model or training setup never requires those degrees of freedom in the state—so the system can still alias histories the rules keep distinct. That matches real pipelines where the information exists in principle yet never enters the representation the policy optimizes. See Core ideas and ML.


11. Is PSDG hidden-information or partially observable?

Not in the usual sense. After random setup it is a perfect-information game: nothing is hidden by the rules of play; every legal move is deterministic and both players can see the mat. The load-bearing caveat is that the true game state is richer than a naive board-only summary—what you need for optimal play is not just “what scores now.” The reference solver tracks drafted dice and crystals as (top, facing) pairs (among other fields) because facings encode Phase 2, eligibility, and tiebreak-relevant structure the same way the rules do.

So why do people call it “partially observable”? Because a board snapshot (or any compressed observation the agent is given) can omit variables the rules treat as distinct. Then the input is lossy even though the full configuration is knowable in principle—that is state compression / aliasing, not poker-style hidden information. The wrong reflex is “more sensing fixes it”: here, complete physical visibility already holds; the diagnostic is insufficient state for the task in the model or protocol (see § 10. “Facings are visible…” and Core ideas). For ordinary strategic uncertainty vs protocol multiplicity (sequential variant), see § 12.


12. “Fully sequential PSDG is only ordinary strategic uncertainty.”

Partly fair: after setup it is a finite extensive game, so forward-looking reasoning about branches is standard.

Where it misses: the suite compares different deployment protocols (P) on the same seeds—static principal-line Exchange vs re-solving at the realized node, with sequential vs simultaneous Exchange timing (8.5% / 6.9% / 5.7% B wins in the standard blunder table, snapshot). That spread is which rule is wired in after deviation, not secret noise or imperfect observability after setup. The oracle fixes values under a pinned embedding; (P) is outside that unless analysis models it explicitly (oracle vs static A, deployment gap).

Simultaneous Exchange: breaks strict sequentiality at one node and is not load-bearing for the core representation/deployment story (AI safety — simultaneous Exchange thesis).


13. Why aren't there flagship trained agents on the full game?

Short answer: PSDG leads with an oracle-backed evaluation harness (exact regret, legality, deployment splits on published seeds). That harness is the measuring instrument, not an interim until a generic trained policy replaces it — see How to read PSDG (bucket 1 and “instrument”). Flagship learned baselines on the full pipeline remain possible future work, not load-bearing for the structural claims already pinned by the parable and tables below.

Concretely: the site does not yet centre a flagship trained RL/LLM baseline on draft → Exchange → full Reckoning the way it centres oracle-backed tables and the parable toy. That is not because the project argues learning is impossible: ML lays out serious representation, objective, and credit-assignment challenges (and § 14 rejects a general impossibility read). Training a policy that reliably inherits full-rule structure remains hard enough that we lead with exact evaluation harnesses first.

What is true: the repo is built so agents can be evaluated against the oracle (regret, legality, outcome splits under protocols)—see ML and the solver / benchmark section. Showing that a specific RL or LLM pipeline discovers latent structure is not required for the structural claims the parable and exact tables already support. A small full-game heuristic pilot (not learned) is documented in § 19.


14. Does PSDG prove that scale, architecture, or oversight cannot fix this class of problem in general?

No. The honest claim is safety-relevant structure in a tiny exact domain and rhymes with larger systems—not a general impossibility theorem. See AI safety for scoped language; numbers are embedding-specific (snapshot). For why small and exact are methodological strengths in this project—and how that differs from a dismissive “toy” read—see § 16 — Harder games already solved… (subsection Why so small…).


15. Won’t better tools and mature oversight solve this in a few years?

For many failure modes, yes—better tooling helps when the problem is “the system did something it wasn’t supposed to” under noisy monitoring.

PSDG highlights a different shape of failure: correct optimization and full observability can still collide with deployment—because commitment can precede clear retrograde knowledge of what should have been chosen. Dashboards and HIL tighten execution; they do not by themselves move where in the tree a plan stops being revocable. Closing that gap is structural (what gets frozen, when, and under which protocol), not only a maturity or investment story. See Human oversight and the researcher note on the home page.


16. Harder games are already solved—why should PSDG matter?

PSDG is solved too—that is the point. The published suite is not “the model can’t learn the game.” It is: even with an exact oracle, a static deployment rule at the Exchange can still lose to a weaker opponent under protocol (P) (snapshot).

Poker is the analogy many readers reach for: the breakthrough was belief over hidden information. PSDG, after setup, has no hidden dice: everything is visible. The stressor is representation / aliasing—histories the rules distinguish can look the same in a coarse “board photo.” More inference over hidden state is the wrong fix for a game where the gap is state collapse under full observability. Solving the abstract game and safely deploying a line under a concrete protocol are different targets.

If the game is fully solved, isn’t the science already done?

No—solved is where a large class of noise ends and the measurement begins. A fully specified game plus a reference implementation of value and legality (the oracle / solver on the home page) means you are not arguing in the dark about the rules, illegal moves, or “did we find the right equilibrium.” The remaining research questions in this project are representation, protocol, and deployment: what state does a policy actually use, what stays frozen when play goes off-path, and do outcome splits under (P) line up with re-solving? That is what the empirical snapshot is for. The oracle is a ground-truth interface (values, legality, counterfactuals)—not a “superplayer” and not a substitute for stating what embedding a deployed policy used.

Why so small—toy bias, scale, and “bigger simulators”?

Compact, deterministic, and exactly analyzable is a methodological choice: it holds other factors fixed so that alias / proxy and static vs re-solving are not swamped by unmodeled noise. A useful image is a vacuum chamber for a narrow question, not a claim that the only important systems are “10‑minute” games.

  • “It’s a toy.” Small state + public rules + shared solver keep regret and splits checkable; that is a strength for a diagnostic benchmark, not a promise that the only risks are tabletop-sized. The parable and full benchmarks are the project’s way of separating structural claims from “this one cute game is the world.”
  • “Scale will fix it.” Bigger models or more compute do not by themselves repair a wrong state for the true game or a static plan at a node that should be re-solved—those are not standard lack-of-capacity stories in the usual sense. The benchmark does not prove that no large system can help in any situation; it isolates a class of failures that is not automatically resolved by scale alone. (See § 14 for the honest non‑impossibility claim.)
  • “Why not a more complex simulator?” Richer environments add useful realism—and often more authoring surface, idiosyncratic noise, and disagreement about what the true rules are. PSDG is tight so the one object everyone shares is the rules + solver + seeds; the disagreements the tables surface are not “we can’t run the world model” but “the agent did not carry the right state or commitment rule at deployment.”

For rhymes in real domains when no oracle exists, see § 17.


17. Can PSDG-like structure be detected in real-world domains?

Partially — and the limits are the point.

A structural signatureirrevocable early commitments, latent conditional rules, entangled evaluation—can often be argued from specs, contracts, regulations, and physical constraints without an oracle. A careful analyst can flag where several of these forces overlap and say informally that an area “looks like a PSDG pocket.” That remains a judgment call; it is easy to false-positive without a stress test or a shared model.

What cannot be done without an oracle is measure the cost the way the empirical snapshot does. You can hypothesize the structure; you cannot show that a specific agent’s representation misses it with oracle-grounded regret, because there is no canonical truth interface in the wild.

Epistemic gap: In PSDG the rules are fully published; latent structure is visible in principle but may still be absent from the agent’s state (see § 10 and § 11). In many real domains the complete rule set is not known—interactions and edge cases outrun what any single author wrote down; physical couplings appear under conditions that were never fully specified. “Latent rules” can be emergent rather than documented. That makes both detection and measurement harder than in the benchmark.

Three levels of difficulty:

  1. Known rules + oracle (PSDG). Detect “pockets,” measure deployment / representation failure against truth, verify claims—this is the benchmark’s job.
  2. Known (or seriously modeled) rules, no oracle. Argue that a pocket might exist, design mitigations and stress tests—no PSDG-style exact split without a truth interface everyone accepts.
  3. Incomplete rules, no oracle. You may still run red teams, counterfactual reviews, and audits that surface candidates—but you cannot close the loop with regret against a canonical solver. Unknown rules mean the “pocket” may stay ill-defined until reality teaches you.

PSDG demonstrates the phenomenon in the one setting where it is exactly measurable; the real world is where measurement is hardest and specification is least complete. That gap is not a limitation of the project—it is why deployment remains a safety problem, not only a learning problem. See § 14 and AI safety. For the reflex that noise must average out PSDG-shaped gaps because the benchmark is deterministic after setup, see § 22.


18. Physics: Cauchy surfaces, representation, superposition?

Q: Has any thought been given to the implications of PSDG for physics — for example, Cauchy surfaces, representational sufficiency, or superposition?

A: Yes — though the project’s research focus is ML, AI safety, and game theory, PSDG may offer a pedagogical lens for physicists. The commitment topology that drives proxy misspecification, deployment fragility, and state aliasing in the benchmark also supports informal rhymes with three themes physics encountered across the 20th century: when a chosen “surface” or summary fails to support the inferences you want (a Cauchy-flavored question in GR, but not the same mathematics), representational insufficiency (the kind of shift that forced richer formalisms such as matrix mechanics — not a claim that PSDG is quantum theory), and reasoning about multiple live branches before an irreversible act (a decision-theoretic echo of superposition’s role in calculation, not ontology or interference).

All three rhymes appear in PSDG without manifolds, complex amplitudes, or laboratory measurement — which suggests that some of the epistemic structure may be generic to rule-governed systems with irreversible commitments, while physics still supplies the substantive equations and empirical content. This is a lens, not new physics: PSDG adds no predictions or fundamental equations; it only provides a minimal, exact system where the three themes coexist and the benchmark claims remain checkable on the rest of the site.

Draft essay (informal, strong caveats at top): Three lessons for physics — From Cauchy Surfaces to Superpositionlinked only from this FAQ entry so casual readers are not routed there from the main research spine.


19. Did you benchmark a simple full-game heuristic against the oracle?

A (pilot, not the 5k blunder suite): Yes—one hand-pinned policy as a sanity / gap-filler, not a claim about all “naive” agents.

What was run: 500 consecutive six-dice rows from the same source as the public suite (benchmark_5000_6d.json, seeds 42–541 inclusive). A always plays optimal draft moves (re-solved from the current node). B follows “facing 6 toward the player when legal” on each draft twist: among legal (top, facing) pairs, B must use facing 6 whenever 6 is a legal side face for that top; if no die on the board allows that (only tops 1 and 6 remain), B falls back to any legal twist. When several tops still admit facing 6, B breaks ties with the same ordering heuristic the reference solver uses for move sorting (prefer tops in {6, crystal top}, then higher top). At the Exchange, B’s gift orientation is restricted the same way (facing 6 when legal for the gifted die’s top); A is unrestricted at the Gift. The joint Exchange outcome is then the usual maximin over A’s legal gifts and B’s restricted set.

Pilot outcomes: A wins 454 (90.8%), draws 42 (8.4%), B wins 4 (0.8%). On those same 500 opens, the stored oracle root values are 377 A-win, 99 draw, 24 B-win—so this one heuristic often throws away B-favoured and drawn roots against an unrestricted optimal opponent (e.g. 20 / 24 B-win roots still end as A wins; 57 / 99 drawn roots become A wins).

How to read it: One authored rule is not the whole class of heuristics or learned policies—but it does show that a simple twist rule can be far from optimal play even when the opening is sometimes value-drawn or B-favoured under optimal-vs-optimal. A full 5,000 run is optional polish for tighter percentages; the pilot is already strongly directional. Reproducible driver (development tree): private/psdg/benchmark/pilot_heuristic_facing6_vs_oracle.py. More context: ML — Hand-authored heuristic pilot.


20. Is PSDG “really” a deterministic perfect-information game—and can it still feel like poker?

A: Yes to the formal classification after setup, with one standard nuance; only as metaphor for “poker-like” pressure.

Deterministic + perfect information (after setup). Once dice and crystals are fixed, PSDG has no further chance moves and no private information: every legal continuation is a public function of the full mat. That is exactly the setting in which an exact oracle is well-defined. The simultaneous Poisoned Gift in v1.13 is modeled as a simultaneous-move node in an extensive form—players do not observe each other’s simultaneous gift choice before picking—but that is not “imperfect information” in the Harsanyi / hidden-type sense (there is nothing on the table you are not allowed to look at).

What people mean when they say it doesn’t “fit” chess-like PI. Chess trains you to treat the visible position as close to a sufficient state summary. PSDG breaks that habit: histories the rules distinguish can look the same in a “board photo,” and Twists commit you before later phases make the payoff of those facings legible from salient tops alone. That is the site’s ready–fire–aim point (Home)—commitment before legibility—not a claim that PSDG escapes Zermelo-style analysis.

“Uncertainty staged by the rules.” Useful if uncertainty is read as which contingent structure will bite and what the right Markov state is for the true objective—not as “there are secret random draws mid-game.”

Poker. A fair pedagogical rhyme: both can involve committing before you fully feel the consequence tree. A misleading identification if anyone hears “like poker” as private cards or Bayesian beliefs—PSDG after setup has neither. For ordinary strategic uncertainty vs deployment protocol splits (sequential framing), see § 12.

Longer framing: Technical report — Commitment before legibility · Game theory — Rule-staged commitment games · Game theory (overview).


21. Where does PSDG sit in the usual named game-theory classes?

In the mathematics, it is a finite (post-setup) extensive formnot an exotic object class the literature cannot parse. The misfit is pedagogical and methodological: the headline toolkits for Zermelo-style PI, poker- / Harsanyi- style incomplete information, repeated games, cooperation, or mechanism design each foreground a different set of levers, and none of those banners lines up with the whole bundle this project’s tables are about—commitment before legibility in a summary, static vs re-solving at deployment, and protocol (P)-dependent “optimal” under a pinned oracle. So readers who reach for a default one-liner from a neighboring class (beliefs, stationarity, “the visible board is the state,” a single off-the-shelf notion of “equilibrium,” …) can get a clean but misleading picture even when the formal tree is standard.

Working label (convenience, not a new theorem class): rule-staged commitment gamesfull aside on the game-theory page. Same theme as § 7: familiar themes; tight, numerical instantiation here.


22. Doesn't real-world noise average out the gaps PSDG shows?

No—for the substantive claim PSDG isolates. Noise can hide a failure pattern, not cancel it. After setup PSDG deliberately removes randomness to hold variance fixed and show whether the wedge is structural (representation, objective, deployment protocol) rather than sampling luck. That methodological move is unpacked under § 16 — Why so small… (“vacuum chamber”): compact and deterministic is a strength for measurement, not a claim that messy domains have no noise.

In stochastic real deployments, the same aliasing and proxy misspecification rarely “washes away” just because averages look fine on ordinary traffic—it often stays inactive in the bulk until a latent rule or regime switches on. A system optimizing the wrong objective can look successful on most draws while tail risk tied to missing structure accumulates unseen. That is not a packaged statistical theorem; it is an informal pattern PSDG names clearly because the benchmark has an oracle and pinned protocols.

One useful informal picture (not a formal safety floor/certificate): real-world variance can raise the ceiling on how bad undetected bleed can become before monitors notice; averaging does not by itself impose a floor guaranteeing safety when the embedding or protocol is wrong.

See § 17 (rhymes without an oracle), Home — In brief (researcher note on structural failure under visibility), and AI safety — conclusion (stakes in deployment language).


23. Is the detector problem in PSDG related to Rice’s Theorem?

In shape, not by formal reduction. Rice’s Theorem (1953) says any non-trivial semantic property of programs—specified by what the program computes, not how it is written—is undecidable in the general Turing-complete setting.

The property “this commitment configuration is already a trap under the PSDG embedding” is semantic in flavour: answering it faithfully looks past a surface snapshot to consequences in the extensive form and counterfactual branches. Where it is genuinely non-trivial (sometimes yes, sometimes no), that matches the Rice shape: behaviour-defined predicates are not reliably recoverable from a thin observable interface.

PSDG’s game is finite and solved, so the oracle decides trap-hood exactly here—Rice’s Theorem does not formally apply inside this benchmark.

The lesson carries anyway: wishing for a lightweight, oracle-free monitor that reliably reads “already baked in” from coarse summaries parallels the ambition Rice ruled out in generalsurface inspection ↔ behaviour-defined truth. PSDG instantiates a small measurable version of that friction; Rice explains why that kind of barrier is often conceptually deep, not a quirk of dice. Full argument: AI safety — PSDG detectors without the oracle.


24. Why does a deterministic game produce stochastic-looking behavior?

Q: PSDG is deterministic after setup, yet outcome distributions and protocol-dependent win rates resemble what you’d expect from a stochastic environment. Is that a contradiction?

A: No—but the resolution is worth naming explicitly.

PSDG is formally deterministic once dice and crystals are fixed: no chance moves mid-episode, and under the sequential Exchange variant no simultaneous hidden Gift choice at that node (§ 11 · § 20). Randomness in published suites enters through which opening is sampled, not through noise inside each trajectory (§ 22 separates “deterministic isolation” from real-world variance). With the full extensive-form bookkeeping the oracle uses, there is nothing to ‘draw’: outcomes follow from moves and protocol (P) (pinned definitions).

Yet for an agent whose representation is too coarse, the same summarized state–action pair can land in different futures because the summary aliases histories the rules distinguish (distinct facings and commitments collapsed into one board photo). From inside that interface the MDP can look stochastic even though the world model is deterministic: the apparent randomness is induced by the observation map, not generated by dice after setup.

Deployment protocol sharpens this for frozen principals: static commitment behaves as though off-path deviations were ambient shocks—nothing in its interface records “(B) left the principal line here.” Re-solving avoids that mistake by recomputing on the realized Gift node (§ 12, deployment gap). In the headline six-dice blunder suite, static sequential vs re-solving A pins ~8.5% vs ~5.7% B wins (~6.9% for simultaneous static is a separate timing pin), all on deterministic play (snapshot).

Deterministic latent dynamics + coarse observation is standard non-Markov / POMDP‑like vocabulary in ML—not a PSDG-exclusive theorem—but here the “latent” facings/eligibility patterns are typically historical: choices you stopped tracking remain load-bearing (endogenous structure).

Two one-liners:

  • PSDG adds no randomness after setup; it can still produce stochastic-looking trajectories when the abstraction is non-Markov or when a frozen plan meets an unexpected realized node.
  • The game is deterministic for the oracle, but stochastic-looking for a policy that mistakes the board snapshot for the state.

The concept is known; the site’s payoff is a small exact place where the cost of that gap is oracle-measurable as regret and protocol splits. See ML — deterministic play vs stochastic-looking experience.


On this site: Home · Technical report (summary) · Parable · ML · AI safety · Game theory · Empirical snapshot