diff --git a/docs/superpowers/specs/2026-05-21-phase-a6-indoor-physics-fidelity-design.md b/docs/superpowers/specs/2026-05-21-phase-a6-indoor-physics-fidelity-design.md new file mode 100644 index 0000000..e460963 --- /dev/null +++ b/docs/superpowers/specs/2026-05-21-phase-a6-indoor-physics-fidelity-design.md @@ -0,0 +1,409 @@ +# Phase A6 — Indoor physics fidelity (cdb-driven) — design + +**Status:** Active — brainstormed + approved 2026-05-21. Implementation +starts at A6.P1 (cdb probe spike). + +**Milestone:** M1.5 — "Indoor world feels right." Active per +[`docs/plans/2026-05-12-milestones.md`](../../plans/2026-05-12-milestones.md). + +**Pre-empted by:** Phase O — DatPath Unification (shipped `2256006`, +2026-05-21). A6 resumes from M1.5 baseline + Phase O state. Phase O only +touched rendering / dat / scenery code; indoor physics code is unchanged. + +**Related:** +- [`docs/plans/2026-04-11-roadmap.md`](../../plans/2026-04-11-roadmap.md) — A6 entry under "Phases ahead". +- [`docs/research/2026-05-20-m15-kickoff-handoff.md`](../../research/2026-05-20-m15-kickoff-handoff.md) — M1.5 baseline + workaround inventory. +- [`CLAUDE.md`](../../../CLAUDE.md) — "Retail debugger toolchain" section (cdb workflow). + +--- + +## 1. Phase shape and inputs + +### 1.1 Goal + +Identify and fix the underlying BSP collision response divergence(s) that +produce the family of indoor symptoms (walls walk through, CellId +ping-pong, vibration, multi-Z falling, stairs walk-through) at Holtburg +inn, cottages, and the Holtburg Sewer dungeon. Remove the two known +workarounds (#90 sphere-overlap stickiness in +`PhysicsEngine.ResolveCellId`; `Transition.TryFindIndoorWalkablePlane` +per-frame ContactPlane synthesis) as part of the same phase. + +### 1.2 Hypothesis (single-cause) + +Our `BSPQuery.AdjustSphereToPlane` — or one of its callers in the 6-path +dispatcher (`BSPQuery.FindCollisions`) — over-corrects by more than retail +when resolving wall collisions. The over-correction pushes the sphere +center OUT of the cell on partial collisions, which causes (a) ping-pong +[#90 workaround masks], (b) ContactPlane invalidation +[`TryFindIndoorWalkablePlane` workaround masks], (c) vibration [#88 — +multi-tick push-back oscillation], and (d) walk-through [large pushes +overshoot the wall]. If true, one surgical fix in A6.P3 closes all four +symptoms and A6.P4 removes both workarounds. + +### 1.3 Hypothesis (multi-cause backup) + +The symptoms have distinct causes — one in the cell-list / `find_cell_list` +pipeline (the question retail's point-in-cell raises for our sphere-overlap +workaround), one in the BSP correction paths, one in sub-step state +mutation. A6.P1's wide-net capture surfaces all of them; A6.P3 ships +one PR per identified bug. **A6 is structured to be true regardless of +which hypothesis the data validates** — the probe spike methodology and +acceptance criteria don't change based on whether one or N fixes land. + +### 1.4 Inputs + +| Input | Source | +|---|---| +| Retail oracle (Sept 2013 EoR build, full PDB symbols) | `docs/research/named-retail/acclient_2013_pseudo_c.txt` | +| Matching retail binary (v11.4186) | `C:\Turbine\Asheron's Call\acclient.exe`, GUID-verified vs `refs/acclient.pdb` | +| cdb toolchain | `C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\cdb.exe`. Workflow documented in [`CLAUDE.md`](../../../CLAUDE.md#retail-debugger-toolchain-live-runtime-trace) | +| Existing acdream probes | `[indoor-bsp]`, `[cell-transit]`, `[cell-cache]`, `[cp-write]`, `[walk-miss]` — all in `PhysicsDiagnostics` | +| New probe (to ship in A6.P1) | `[push-back]` — three sites in `BSPQuery` + `Transition` | +| Retail decomp anchors (read during brainstorm) | `check_other_cells` (272717), `step_up` (273099), `transitional_insert` Collide branch (273193), `find_cell_list` Position-variant (308742), `sphere_intersects_cell` (317666), `adjust_sphere_to_plane` (322032), `find_collisions` 6-path dispatcher (323725), `find_walkable` (326211), `set_collide` (321594) | + +--- + +## 2. Slice structure + +| Slice | Duration | Outputs | +|---|---|---| +| **A6.P1 — cdb probe spike + `[push-back]` build** | ~3 days | New `[push-back]` probe shipped; cdb script committed at `tools/cdb/a6-probe.cdb`; 9 paired captures (retail+acdream) at `docs/research/2026-05-21-a6-captures/scenN-/{retail,acdream}.log`; `docs/research/2026-05-21-a6-cdb-capture-findings.md` (the quantitative findings table). | +| **A6.P2 — Analysis report** | ~1 day | Findings doc lists each bug candidate with: retail decomp anchor, our suspect code site, divergence quantified (e.g. "push-back at site X: retail mean=0.4 mm, ours mean=18 mm; 45× over"), proposed fix sketch. | +| **A6.P3 — Surgical fixes** | ~3–5 days | One PR per identified bug. Each PR ships: the fix to the suspect code site, a unit test using golden values from A6.P1 captures as the regression anchor, visual verification at the scenario that surfaced the bug. | +| **A6.P4 — Workaround removal + acceptance** | ~1 day | Pure-revert #90 commit (`4ca3596`). Delete `Transition.TryFindIndoorWalkablePlane` + its call site. Run all 9 scenarios as acceptance walks. Update CLAUDE.md baseline paragraph. Update milestones doc M1.5 partial-progress writeup. | + +Total estimated phase duration: **~8–11 days focused work**. + +--- + +## 3. A6.P1 — cdb probe spike methodology + +### 3.1 cdb breakpoint set (the wide-net script) + +Seven breakpoints. All actions are non-blocking (`gc` after dump). Auto-detach +via `qd` at hit threshold to avoid manual cleanup. Per-breakpoint action +output stays under 200 bytes to mitigate retail-side lag (CLAUDE.md +gotcha — high BP hit rates trigger ACE timeout). + +| # | Symbol | Captures | Hit-rate estimate | +|---|---|---|---| +| 1 | `acclient!CTransition::transitional_insert` | Sub-step loop entry: `eax_2` (sub-step count), `sphere_path.check_pos` (target), `sphere_path.curr_pos` (current), `sphere_path.insert_type` | ~30 Hz × ~3 sub-steps/tick = ~100/sec | +| 2 | `acclient!CTransition::step_up` | Path 5 step-up entry: `arg2` (step-up normal), `sphere_path.walkable_allowance` (cone angle) | Burst-only during stair ascent | +| 3 | `acclient!SPHEREPATH::set_collide` | Wall-collision halt: `arg2` (collision normal), `this->backup_check_pos` | Burst during wall touches | +| 4 | `acclient!BSPTREE::find_collisions` | 6-path dispatcher: `arg3` (walkable_allowance), `eax->sphere_path.collide`, `state` flag, `insert_type`. Dumps return value via `gu;r eax` to identify which path fired. | ~150/sec under motion | +| 5 | `acclient!CPolygon::adjust_sphere_to_plane` | **The over-correction suspect.** Input: `arg3->center` (pre-push-back), `this->plane.N`, `this->plane.d`, `arg2->walk_interp`. Output (after `gu`): `arg3->center` (post-push-back), `arg2->walk_interp`. Yields per-call push-back delta. | Burst during collisions | +| 6 | `acclient!CTransition::validate_walkable` | Ground-plane verdict: `arg2`, `sphere_path.walkable`, return value | ~30 Hz when grounded | +| 7 | `acclient!CollisionInfo::set_contact_plane` | CP writes: `arg2` (plane), `arg3` (water flag), one-frame backtrace for caller | ~30 Hz | + +Auto-detach threshold: 50,000 total hits across all breakpoints OR +scenario-end marker (user runs over a specific spot — e.g. a torch). cdb +logs to scenario-tagged file via `.logopen ` at start. + +### 3.2 acdream-side mirror — the new `[push-back]` probe + +`AcDream.Core.Physics.PhysicsDiagnostics.ProbePushBackEnabled` (env: +`ACDREAM_PROBE_PUSH_BACK=1`, DebugPanel mirror under "Diagnostics"). + +Fires at three sites: + +| Site | Location | Per-call line fields | +|---|---|---| +| `BSPQuery.AdjustSphereToPlane` | `BSPQuery.cs:332` | `siteId=adjust_sphere`, input `center`, `plane.N`, `plane.D`, `radius`, `walkInterp` (pre), `dpPos`, `dpMove`, `iDist`; output `center` (post), `walkInterp` (post), `applied` (true/false), `cellId`, `polyId` (from `BSPQuery.LastBspHitPoly` side-channel). Direct comparison to retail BP #5. | +| `BSPQuery.FindCollisions` entry | `BSPQuery.cs:1550` + `BSPQuery.cs:1895` | `siteId=dispatch`, `state` flags, `pathTaken` (1-7 mapped from return value), `walkInterp` (entry), `collide`, `insertType`, return state. Direct comparison to retail BP #4. | +| `Transition.CheckOtherCells` → `ApplyOtherCellResult` | `TransitionTypes.cs:1614+` | `siteId=other_cell`, off-cell transition outcome, multi-cell BSP iteration result. Paired with A4's multi-cell BSP. | + +All three respect existing logging conventions (timestamp prefix, off +when disabled, zero-cost when off — checked via `if (!ProbePushBackEnabled) return;` early-out at each site). + +### 3.3 Capture pairing protocol per scenario + +For each of the 9 scenarios: + +1. **Setup phase** (~30 sec): + - User opens retail client, navigates to the scenario start point + (e.g., outside Holtburg inn, facing the doorway). Stops. + - cdb attaches via `tools\cdb\a6-probe.cdb` with scenario-tagged log + (`scen1_inn_doorway.log`). + - Separately, user launches acdream with `ACDREAM_PROBE_PUSH_BACK=1` + + `ACDREAM_PROBE_INDOOR_BSP=1` + `ACDREAM_PROBE_CELL=1` + + `ACDREAM_PROBE_CELL_CACHE=1` + `ACDREAM_PROBE_CONTACT_PLANE=1`. + - Navigates `+Acdream` to the same scenario start. Stops. + +2. **Capture phase** (~30 sec per pass): + - User performs the SAME scripted walk in BOTH clients (per-scenario + scripts in the A6.P1 plan). E.g. "walk forward 2 meters, sidestep + right 1 meter, walk forward 2 meters." + - cdb fires breakpoints, logs to scenario file. acdream emits + `[push-back]` lines to `launch.log`. + +3. **Teardown phase**: + - cdb auto-detaches at hit threshold OR user triggers scenario-end + marker. + - Both logs filed to + `docs/research/2026-05-21-a6-captures/scenN-/{retail,acdream}.log`. + +**Total time estimate:** ~5 min/scenario × 9 = 45 min user time at retail. +Plus ~30 min capture-setup overhead = ~75 min single session. Captures +can be split across days; each scenario's pair is self-contained. + +### 3.4 The 9 scenarios + +| # | Scenario | Location | Walk script | +|---|---|---|---| +| 1 | Inn doorway entry | Holtburg town inn front door | Walk forward through door, stop just inside | +| 2 | Inn stairs ascent | Holtburg inn interior, stairs to 2nd floor | Walk up 4 steps, stop on landing | +| 3 | Inn 2nd-floor walking | Holtburg inn 2nd floor | Walk forward 3 m, sidestep 1 m, walk back | +| 4 | Cottage cellar entry | Holtburg cottage with cellar | Walk to cellar opening, descend 2 steps | +| 5 | Sewer entry portal | Holtburg sewer entrance (in-town building stab) | Walk into portal, then walk 2 m forward inside | +| 6 | Sewer first stair descent | First stair after entry portal | Walk down full stair flight | +| 7 | Inter-room portal transition | Between any two sewer rooms via portal | Walk through portal, stop 1 m past | +| 8 | Open central chamber (multi-Z) | Sewer's multi-Z room | Walk in, traverse center, walk out other side | +| 9 | Dark corridor | Sewer narrow corridor | Walk full length end-to-end | + +Order matters: 1→4 are buildings (smaller cells, simpler geometry), 5→9 +are dungeons (larger cells, more portals, multi-Z). Capturing buildings +first lets us verify the probe is producing usable data before +committing to the dungeon traversal. + +--- + +## 4. A6.P2 — analysis pipeline + +Output: `docs/research/2026-05-21-a6-cdb-capture-findings.md`. Single +document, four mandatory tables plus a per-scenario narrative. + +### 4.1 Table 1 — Per-site push-back delta (the smoking gun) + +| Site | Scenario | Retail mean delta (mm) | Retail p99 (mm) | acdream mean (mm) | acdream p99 (mm) | Ratio | +|---|---|---|---|---|---|---| + +Rows = (site × scenario) cross-product. Delta computed as +`‖output_center − input_center‖` per call. **If our ratio is > 3× retail +anywhere, that's the bug candidate.** Surfaces over-correction in a +single column. + +### 4.2 Table 2 — Path-frequency diff + +| Scenario | Path | Retail count | acdream count | Diff % | +|---|---|---|---|---| + +Paths labeled 1–7 per the `find_collisions` dispatcher (PLACEMENT_INSERT, +check_walkable, step_down, collide_with_pt, set_collide+slid, +step_sphere_up, find_walkable). Surfaces divergent path selection (e.g. +"we fire Path 5 step_up where retail fires Path 6 set_collide"). + +### 4.3 Table 3 — ContactPlane lifecycle diff + +| Scenario | Retail CP writes/sec | acdream CP writes/sec | Retail CP-restore-from-LKCP/sec | acdream CP-restore/sec | +|---|---|---|---|---| + +Surfaces the per-frame CP-resynthesis pattern that +`TryFindIndoorWalkablePlane` is masking. + +### 4.4 Table 4 — Sub-step state mutations + +| Scenario | Field | Retail mutations/sec | acdream mutations/sec | +|---|---|---|---| + +Fields = `cell_array_valid`, `hits_interior_cell`, `walk_interp`, +`walkable`, `collide`. Surfaces stale-state across sub-steps (vibration +/ #88 family). + +### 4.5 Per-scenario narrative + +For each scenario, one paragraph describing what the trace shows +frame-by-frame. Include a side-by-side trace excerpt at the sub-step +where the divergence is sharpest. + +### 4.6 Findings section + +Numbered bug candidates. Each entry contains: + +- **Title** +- **Retail decomp anchor** (line in `acclient_2013_pseudo_c.txt`) +- **Our suspect code site** (file + line) +- **Divergence quantified** (e.g. "push-back at site X: retail + mean=0.4 mm, ours mean=18 mm; 45× over") +- **Proposed fix sketch** (1-3 paragraphs) +- **Scenarios affected** (which of the 9 reproduce this bug) + +### 4.7 Acceptance for A6.P2 + +Every M1.5-in-scope symptom (#83, #88, #90, stairs walk-through, +2nd-floor walking, cellar descent, `TryFindIndoorWalkablePlane` MISS) +maps to **at least one** bug candidate, OR is explicitly flagged as +"not surfaced by this capture — defer to A8 / promote scope". + +--- + +## 5. A6.P3 — fix surface + +### 5.1 Sequencing + +**Multi-PR per bug.** Each PR ships independently with its own commit +message + visual verification gate. Reasoning: per-bug attribution is +clearer; bisecting future regressions is easier; one PR rollback doesn't +take all fixes down. + +Order: **highest-confidence single-cause fix first** (probably +`AdjustSphereToPlane` if Table 1 confirms over-correction). Re-run the +9-scenario probe spike AFTER each PR to verify (a) the targeted bug is +closed, (b) no other symptom is worse. If the re-run shows multiple +symptoms closed by the same fix, that's evidence for the single-cause +hypothesis — file the other planned PRs as N/A and proceed to A6.P4. + +### 5.2 Per-PR shape + +Each PR ships: + +- The fix to the suspect code site (surgical change, narrow scope). +- A unit test using **golden values captured during A6.P1** as the + regression anchor. Example: "`AdjustSphereToPlane` at plane + N=(0,0,1) d=−100, sphere center=(50,50,99.5), movement=(0,0,−0.5), + radius=0.6 → output center.Z = 99.4 ± 0.1 mm matching retail capture + line 47". +- Visual verification at the scenario that surfaced the bug. + +### 5.3 Likely-touched files + +Based on hypothesis + decomp reading. The actual fix surface is whatever +A6.P2 surfaces. + +| File | Likely touch | +|---|---| +| `src/AcDream.Core/Physics/BSPQuery.cs` | `AdjustSphereToPlane`, 6-path dispatcher entry, Path 5 (step_up) / Path 6 (set_collide) branches | +| `src/AcDream.Core/Physics/TransitionTypes.cs` | `Transition.FindEnvCollisions` indoor branch, `ApplyOtherCellResult` state reset | +| `src/AcDream.Core/Physics/CellTransit.cs` | `FindCellList` if path-selection diverges from retail's point-in-cell | +| `src/AcDream.Core/Physics/PhysicsEngine.cs` | `ResolveCellId` only via A6.P4 removal of the #90 workaround — no functional change in A6.P3 | + +### 5.4 Commit message convention + +`fix(physics): A6.P3 — `. Body +references the A6.P2 findings doc by anchor (`§ findings.bug-1` etc.). + +--- + +## 6. A6.P4 — workaround removal + acceptance + +### 6.1 Workaround 1 — Issue #90 sphere-overlap stickiness + +**Location:** `src/AcDream.Core/Physics/PhysicsEngine.cs:285-300`. + +**Removal:** pure revert of `4ca3596`. The `BSPQuery.SphereIntersectsCellBsp` +helper itself **stays** — it's also used by #89's `CheckBuildingTransit` +which IS retail-faithful per the `sphere_intersects_cell` decomp at +`acclient_2013_pseudo_c.txt:317666`. + +**Verification:** with A6.P3's fix(es) landed, walk Holtburg inn doorway +— observe NO `[cell-transit]` ping-pong, no walk-through. + +**Tripwire:** add a regression test at +`tests/AcDream.Core.Tests/Physics/PhysicsEngineResolveCellIdTests.cs` +constructing a sphere at the inn-doorway geometry post-push-back, calls +`ResolveCellId`, asserts the indoor CellId is preserved. Golden values +captured from A6.P1 scenario #1. + +### 6.2 Workaround 2 — `TryFindIndoorWalkablePlane` synthesis + +**Location:** `src/AcDream.Core/Physics/TransitionTypes.cs:1294-1373` +(method body) + caller at `:1519`. + +**Removal:** delete method body + caller block. The three CP-retention +mechanisms — A (Path 6 land write at `acclient_2013_pseudo_c.txt:323924`), +B (`validate_transition` LKCP proximity restore at `:272565`), C (post-OK +step-down probe at `:273242`) — must catch the player without the +synthesis. + +**Verification:** walk Holtburg cottage doorway threshold (the case that +broke Bug A's revert on 2026-05-20). Walk Holtburg sewer first stair +descent. With A6.P3's fix(es) landed, observe NO free-fall through +doorway threshold, NO falling-stuck on stair descent. + +**Tripwire:** existing `walk-miss` probe stays enabled. MISS rate must +drop to <5% (current: 99.87% per +[`docs/research/2026-05-21-walk-miss-capture-findings.md`](../../research/2026-05-21-walk-miss-capture-findings.md)). + +### 6.3 M1.5-physics acceptance criteria + +All five must hold for A6 to be marked complete: + +1. All 9 scenarios walk cleanly with NO probe warnings. +2. `walk-miss` MISS rate < 5% across a 60-sec wander. +3. `[cell-transit]` log shows zero ping-pong events in the 60-sec + wander. +4. **Holtburg Sewer end-to-end walk** (entry → 5–7 rooms → exit) without + getting stuck, without walking through any wall, without falling + through any stair. This is the M1.5 physics acceptance criterion. +5. M0 + M1 outdoor regression: walk Holtburg outdoor for 60 sec, + observe no regressions in outdoor cell handling, no FPS drop, + baseline test suite still 1147 + 8 (or whatever post-Phase O + baseline is — to be re-measured at A6.P1 start). + +### 6.4 Three-failed-verifications policy + +Per [`CLAUDE.md`](../../../CLAUDE.md) Visual verification rule: if three +consecutive visual verifications fail at the same scenario after +attempted fixes, **stop and write a handoff doc**. Do not attempt a +fourth fix on the same symptom in the same session. Hand off with full +reproduction notes + probe captures + the failed-fix code diffs. + +--- + +## 7. Out of scope + +- **A7 (Indoor lighting fidelity)** — separate phase, separate + methodology (RenderDoc + retail-decomp), follows A6. Per + [`CLAUDE.md`](../../../CLAUDE.md), do NOT mix lighting work into A6. + If A6.P1 reveals a lighting cause for an apparent physics symptom, + file as an A7 issue and continue A6. +- **A2 — `polygon_hits_sphere_slow_but_sure` tangent-epsilon rejection.** + Separate issue; A6.P4's `walk-miss` MISS rate target tolerates the + residual A2-class events as long as they're < 5% of all calls. +- **Outdoor physics regression** — A6 is indoor-focused. Outdoor walks + appear in acceptance criteria only as a regression backstop, not as + a fix surface. Any outdoor-physics findings in A6.P1 capture get + filed as separate issues for post-A6 work. +- **Combat / animation / movement networking** — completely orthogonal. + M2's prerequisites. + +--- + +## 8. References + +### 8.1 Retail decomp anchors + +All in `docs/research/named-retail/acclient_2013_pseudo_c.txt`: + +- `:272717-272798` — `CTransition::check_other_cells` (A4 oracle, already + ported) +- `:273099-273133` — `CTransition::step_up` +- `:273193-273239` — `CTransition::transitional_insert` Collide branch +- `:308742-308819` — `CObjCell::find_cell_list` Position-variant (the + hysteresis question for #90's root cause) +- `:317657-317671` — `CCellStruct::point_in_cell` + `sphere_intersects_cell` +- `:321594-321607` — `SPHEREPATH::set_collide` +- `:322032-322077` — `CPolygon::adjust_sphere_to_plane` (suspected + over-correction site) +- `:322974-322993` — `CPolygon::pos_hits_sphere` (front-face culling) +- `:323725-323939` — `BSPTREE::find_collisions` (full 6-path dispatcher) +- `:326211-326242` — `BSPNODE::find_walkable` + +### 8.2 Our suspect code sites + +- `src/AcDream.Core/Physics/BSPQuery.cs:332` — `AdjustSphereToPlane` +- `src/AcDream.Core/Physics/BSPQuery.cs:1550`, `:1895` — `FindCollisions` + (two overloads) +- `src/AcDream.Core/Physics/PhysicsEngine.cs:285-300` — `ResolveCellId` + #90 workaround block +- `src/AcDream.Core/Physics/TransitionTypes.cs:1294-1373` — + `TryFindIndoorWalkablePlane` synthesis (workaround) +- `src/AcDream.Core/Physics/PhysicsDiagnostics.cs` — probe infrastructure + (where `[push-back]` lives) + +### 8.3 Prior handoffs + +- [`docs/research/2026-05-20-m15-kickoff-handoff.md`](../../research/2026-05-20-m15-kickoff-handoff.md) — M1.5 baseline + workaround inventory. +- [`docs/research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md`](../../research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md) — A4 ship + #90 ping-pong investigation. +- [`docs/research/2026-05-21-walk-miss-capture-findings.md`](../../research/2026-05-21-walk-miss-capture-findings.md) — `TryFindIndoorWalkablePlane` MISS rate evidence (99.87%). +- [`docs/research/2026-05-20-indoor-walking-bug-a-handoff.md`](../../research/2026-05-20-indoor-walking-bug-a-handoff.md) — Bug A's tried-and-reverted synthesis removal story.