docs(spec): Phase A6 — Indoor physics fidelity (cdb-driven) — design

Brainstormed + approved 2026-05-21 for M1.5 milestone work. Designs
the cdb probe spike methodology (7 retail breakpoints + new
[push-back] probe) to capture retail's per-tick BSP collision
response state at 9 indoor scenarios (4 buildings + 5 dungeon sites)
and compare against acdream. Working hypothesis: BSPQuery.AdjustSphereToPlane
or its callers over-correct vs retail, producing the family of
indoor symptoms (walls walk through, ping-pong, vibration, multi-Z
falling) plus driving the existing #90 + TryFindIndoorWalkablePlane
workarounds. A6 ships in 4 slices: P1 probe spike, P2 analysis,
P3 surgical fixes, P4 workaround removal + acceptance.

Phase O (DatPath Unification) pre-empted M1.5 and shipped 2026-05-21;
A6 resumes from Phase O state. Phase O only touched rendering/dat
code; indoor physics design is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-05-21 17:57:48 +02:00
parent 2256006cb7
commit f9214433c3

View file

@ -0,0 +1,409 @@
# Phase A6 — Indoor physics fidelity (cdb-driven) — design
**Status:** Active — brainstormed + approved 2026-05-21. Implementation
starts at A6.P1 (cdb probe spike).
**Milestone:** M1.5 — "Indoor world feels right." Active per
[`docs/plans/2026-05-12-milestones.md`](../../plans/2026-05-12-milestones.md).
**Pre-empted by:** Phase O — DatPath Unification (shipped `2256006`,
2026-05-21). A6 resumes from M1.5 baseline + Phase O state. Phase O only
touched rendering / dat / scenery code; indoor physics code is unchanged.
**Related:**
- [`docs/plans/2026-04-11-roadmap.md`](../../plans/2026-04-11-roadmap.md) — A6 entry under "Phases ahead".
- [`docs/research/2026-05-20-m15-kickoff-handoff.md`](../../research/2026-05-20-m15-kickoff-handoff.md) — M1.5 baseline + workaround inventory.
- [`CLAUDE.md`](../../../CLAUDE.md) — "Retail debugger toolchain" section (cdb workflow).
---
## 1. Phase shape and inputs
### 1.1 Goal
Identify and fix the underlying BSP collision response divergence(s) that
produce the family of indoor symptoms (walls walk through, CellId
ping-pong, vibration, multi-Z falling, stairs walk-through) at Holtburg
inn, cottages, and the Holtburg Sewer dungeon. Remove the two known
workarounds (#90 sphere-overlap stickiness in
`PhysicsEngine.ResolveCellId`; `Transition.TryFindIndoorWalkablePlane`
per-frame ContactPlane synthesis) as part of the same phase.
### 1.2 Hypothesis (single-cause)
Our `BSPQuery.AdjustSphereToPlane` — or one of its callers in the 6-path
dispatcher (`BSPQuery.FindCollisions`) — over-corrects by more than retail
when resolving wall collisions. The over-correction pushes the sphere
center OUT of the cell on partial collisions, which causes (a) ping-pong
[#90 workaround masks], (b) ContactPlane invalidation
[`TryFindIndoorWalkablePlane` workaround masks], (c) vibration [#88
multi-tick push-back oscillation], and (d) walk-through [large pushes
overshoot the wall]. If true, one surgical fix in A6.P3 closes all four
symptoms and A6.P4 removes both workarounds.
### 1.3 Hypothesis (multi-cause backup)
The symptoms have distinct causes — one in the cell-list / `find_cell_list`
pipeline (the question retail's point-in-cell raises for our sphere-overlap
workaround), one in the BSP correction paths, one in sub-step state
mutation. A6.P1's wide-net capture surfaces all of them; A6.P3 ships
one PR per identified bug. **A6 is structured to be true regardless of
which hypothesis the data validates** — the probe spike methodology and
acceptance criteria don't change based on whether one or N fixes land.
### 1.4 Inputs
| Input | Source |
|---|---|
| Retail oracle (Sept 2013 EoR build, full PDB symbols) | `docs/research/named-retail/acclient_2013_pseudo_c.txt` |
| Matching retail binary (v11.4186) | `C:\Turbine\Asheron's Call\acclient.exe`, GUID-verified vs `refs/acclient.pdb` |
| cdb toolchain | `C:\Program Files (x86)\Windows Kits\10\Debuggers\x86\cdb.exe`. Workflow documented in [`CLAUDE.md`](../../../CLAUDE.md#retail-debugger-toolchain-live-runtime-trace) |
| Existing acdream probes | `[indoor-bsp]`, `[cell-transit]`, `[cell-cache]`, `[cp-write]`, `[walk-miss]` — all in `PhysicsDiagnostics` |
| New probe (to ship in A6.P1) | `[push-back]` — three sites in `BSPQuery` + `Transition` |
| Retail decomp anchors (read during brainstorm) | `check_other_cells` (272717), `step_up` (273099), `transitional_insert` Collide branch (273193), `find_cell_list` Position-variant (308742), `sphere_intersects_cell` (317666), `adjust_sphere_to_plane` (322032), `find_collisions` 6-path dispatcher (323725), `find_walkable` (326211), `set_collide` (321594) |
---
## 2. Slice structure
| Slice | Duration | Outputs |
|---|---|---|
| **A6.P1 — cdb probe spike + `[push-back]` build** | ~3 days | New `[push-back]` probe shipped; cdb script committed at `tools/cdb/a6-probe.cdb`; 9 paired captures (retail+acdream) at `docs/research/2026-05-21-a6-captures/scenN-<scenario>/{retail,acdream}.log`; `docs/research/2026-05-21-a6-cdb-capture-findings.md` (the quantitative findings table). |
| **A6.P2 — Analysis report** | ~1 day | Findings doc lists each bug candidate with: retail decomp anchor, our suspect code site, divergence quantified (e.g. "push-back at site X: retail mean=0.4 mm, ours mean=18 mm; 45× over"), proposed fix sketch. |
| **A6.P3 — Surgical fixes** | ~35 days | One PR per identified bug. Each PR ships: the fix to the suspect code site, a unit test using golden values from A6.P1 captures as the regression anchor, visual verification at the scenario that surfaced the bug. |
| **A6.P4 — Workaround removal + acceptance** | ~1 day | Pure-revert #90 commit (`4ca3596`). Delete `Transition.TryFindIndoorWalkablePlane` + its call site. Run all 9 scenarios as acceptance walks. Update CLAUDE.md baseline paragraph. Update milestones doc M1.5 partial-progress writeup. |
Total estimated phase duration: **~811 days focused work**.
---
## 3. A6.P1 — cdb probe spike methodology
### 3.1 cdb breakpoint set (the wide-net script)
Seven breakpoints. All actions are non-blocking (`gc` after dump). Auto-detach
via `qd` at hit threshold to avoid manual cleanup. Per-breakpoint action
output stays under 200 bytes to mitigate retail-side lag (CLAUDE.md
gotcha — high BP hit rates trigger ACE timeout).
| # | Symbol | Captures | Hit-rate estimate |
|---|---|---|---|
| 1 | `acclient!CTransition::transitional_insert` | Sub-step loop entry: `eax_2` (sub-step count), `sphere_path.check_pos` (target), `sphere_path.curr_pos` (current), `sphere_path.insert_type` | ~30 Hz × ~3 sub-steps/tick = ~100/sec |
| 2 | `acclient!CTransition::step_up` | Path 5 step-up entry: `arg2` (step-up normal), `sphere_path.walkable_allowance` (cone angle) | Burst-only during stair ascent |
| 3 | `acclient!SPHEREPATH::set_collide` | Wall-collision halt: `arg2` (collision normal), `this->backup_check_pos` | Burst during wall touches |
| 4 | `acclient!BSPTREE::find_collisions` | 6-path dispatcher: `arg3` (walkable_allowance), `eax->sphere_path.collide`, `state` flag, `insert_type`. Dumps return value via `gu;r eax` to identify which path fired. | ~150/sec under motion |
| 5 | `acclient!CPolygon::adjust_sphere_to_plane` | **The over-correction suspect.** Input: `arg3->center` (pre-push-back), `this->plane.N`, `this->plane.d`, `arg2->walk_interp`. Output (after `gu`): `arg3->center` (post-push-back), `arg2->walk_interp`. Yields per-call push-back delta. | Burst during collisions |
| 6 | `acclient!CTransition::validate_walkable` | Ground-plane verdict: `arg2`, `sphere_path.walkable`, return value | ~30 Hz when grounded |
| 7 | `acclient!CollisionInfo::set_contact_plane` | CP writes: `arg2` (plane), `arg3` (water flag), one-frame backtrace for caller | ~30 Hz |
Auto-detach threshold: 50,000 total hits across all breakpoints OR
scenario-end marker (user runs over a specific spot — e.g. a torch). cdb
logs to scenario-tagged file via `.logopen <path>` at start.
### 3.2 acdream-side mirror — the new `[push-back]` probe
`AcDream.Core.Physics.PhysicsDiagnostics.ProbePushBackEnabled` (env:
`ACDREAM_PROBE_PUSH_BACK=1`, DebugPanel mirror under "Diagnostics").
Fires at three sites:
| Site | Location | Per-call line fields |
|---|---|---|
| `BSPQuery.AdjustSphereToPlane` | `BSPQuery.cs:332` | `siteId=adjust_sphere`, input `center`, `plane.N`, `plane.D`, `radius`, `walkInterp` (pre), `dpPos`, `dpMove`, `iDist`; output `center` (post), `walkInterp` (post), `applied` (true/false), `cellId`, `polyId` (from `BSPQuery.LastBspHitPoly` side-channel). Direct comparison to retail BP #5. |
| `BSPQuery.FindCollisions` entry | `BSPQuery.cs:1550` + `BSPQuery.cs:1895` | `siteId=dispatch`, `state` flags, `pathTaken` (1-7 mapped from return value), `walkInterp` (entry), `collide`, `insertType`, return state. Direct comparison to retail BP #4. |
| `Transition.CheckOtherCells``ApplyOtherCellResult` | `TransitionTypes.cs:1614+` | `siteId=other_cell`, off-cell transition outcome, multi-cell BSP iteration result. Paired with A4's multi-cell BSP. |
All three respect existing logging conventions (timestamp prefix, off
when disabled, zero-cost when off — checked via `if (!ProbePushBackEnabled) return;` early-out at each site).
### 3.3 Capture pairing protocol per scenario
For each of the 9 scenarios:
1. **Setup phase** (~30 sec):
- User opens retail client, navigates to the scenario start point
(e.g., outside Holtburg inn, facing the doorway). Stops.
- cdb attaches via `tools\cdb\a6-probe.cdb` with scenario-tagged log
(`scen1_inn_doorway.log`).
- Separately, user launches acdream with `ACDREAM_PROBE_PUSH_BACK=1`
+ `ACDREAM_PROBE_INDOOR_BSP=1` + `ACDREAM_PROBE_CELL=1`
+ `ACDREAM_PROBE_CELL_CACHE=1` + `ACDREAM_PROBE_CONTACT_PLANE=1`.
- Navigates `+Acdream` to the same scenario start. Stops.
2. **Capture phase** (~30 sec per pass):
- User performs the SAME scripted walk in BOTH clients (per-scenario
scripts in the A6.P1 plan). E.g. "walk forward 2 meters, sidestep
right 1 meter, walk forward 2 meters."
- cdb fires breakpoints, logs to scenario file. acdream emits
`[push-back]` lines to `launch.log`.
3. **Teardown phase**:
- cdb auto-detaches at hit threshold OR user triggers scenario-end
marker.
- Both logs filed to
`docs/research/2026-05-21-a6-captures/scenN-<scenario>/{retail,acdream}.log`.
**Total time estimate:** ~5 min/scenario × 9 = 45 min user time at retail.
Plus ~30 min capture-setup overhead = ~75 min single session. Captures
can be split across days; each scenario's pair is self-contained.
### 3.4 The 9 scenarios
| # | Scenario | Location | Walk script |
|---|---|---|---|
| 1 | Inn doorway entry | Holtburg town inn front door | Walk forward through door, stop just inside |
| 2 | Inn stairs ascent | Holtburg inn interior, stairs to 2nd floor | Walk up 4 steps, stop on landing |
| 3 | Inn 2nd-floor walking | Holtburg inn 2nd floor | Walk forward 3 m, sidestep 1 m, walk back |
| 4 | Cottage cellar entry | Holtburg cottage with cellar | Walk to cellar opening, descend 2 steps |
| 5 | Sewer entry portal | Holtburg sewer entrance (in-town building stab) | Walk into portal, then walk 2 m forward inside |
| 6 | Sewer first stair descent | First stair after entry portal | Walk down full stair flight |
| 7 | Inter-room portal transition | Between any two sewer rooms via portal | Walk through portal, stop 1 m past |
| 8 | Open central chamber (multi-Z) | Sewer's multi-Z room | Walk in, traverse center, walk out other side |
| 9 | Dark corridor | Sewer narrow corridor | Walk full length end-to-end |
Order matters: 1→4 are buildings (smaller cells, simpler geometry), 5→9
are dungeons (larger cells, more portals, multi-Z). Capturing buildings
first lets us verify the probe is producing usable data before
committing to the dungeon traversal.
---
## 4. A6.P2 — analysis pipeline
Output: `docs/research/2026-05-21-a6-cdb-capture-findings.md`. Single
document, four mandatory tables plus a per-scenario narrative.
### 4.1 Table 1 — Per-site push-back delta (the smoking gun)
| Site | Scenario | Retail mean delta (mm) | Retail p99 (mm) | acdream mean (mm) | acdream p99 (mm) | Ratio |
|---|---|---|---|---|---|---|
Rows = (site × scenario) cross-product. Delta computed as
`‖output_center input_center‖` per call. **If our ratio is > 3× retail
anywhere, that's the bug candidate.** Surfaces over-correction in a
single column.
### 4.2 Table 2 — Path-frequency diff
| Scenario | Path | Retail count | acdream count | Diff % |
|---|---|---|---|---|
Paths labeled 17 per the `find_collisions` dispatcher (PLACEMENT_INSERT,
check_walkable, step_down, collide_with_pt, set_collide+slid,
step_sphere_up, find_walkable). Surfaces divergent path selection (e.g.
"we fire Path 5 step_up where retail fires Path 6 set_collide").
### 4.3 Table 3 — ContactPlane lifecycle diff
| Scenario | Retail CP writes/sec | acdream CP writes/sec | Retail CP-restore-from-LKCP/sec | acdream CP-restore/sec |
|---|---|---|---|---|
Surfaces the per-frame CP-resynthesis pattern that
`TryFindIndoorWalkablePlane` is masking.
### 4.4 Table 4 — Sub-step state mutations
| Scenario | Field | Retail mutations/sec | acdream mutations/sec |
|---|---|---|---|
Fields = `cell_array_valid`, `hits_interior_cell`, `walk_interp`,
`walkable`, `collide`. Surfaces stale-state across sub-steps (vibration
/ #88 family).
### 4.5 Per-scenario narrative
For each scenario, one paragraph describing what the trace shows
frame-by-frame. Include a side-by-side trace excerpt at the sub-step
where the divergence is sharpest.
### 4.6 Findings section
Numbered bug candidates. Each entry contains:
- **Title**
- **Retail decomp anchor** (line in `acclient_2013_pseudo_c.txt`)
- **Our suspect code site** (file + line)
- **Divergence quantified** (e.g. "push-back at site X: retail
mean=0.4 mm, ours mean=18 mm; 45× over")
- **Proposed fix sketch** (1-3 paragraphs)
- **Scenarios affected** (which of the 9 reproduce this bug)
### 4.7 Acceptance for A6.P2
Every M1.5-in-scope symptom (#83, #88, #90, stairs walk-through,
2nd-floor walking, cellar descent, `TryFindIndoorWalkablePlane` MISS)
maps to **at least one** bug candidate, OR is explicitly flagged as
"not surfaced by this capture — defer to A8 / promote scope".
---
## 5. A6.P3 — fix surface
### 5.1 Sequencing
**Multi-PR per bug.** Each PR ships independently with its own commit
message + visual verification gate. Reasoning: per-bug attribution is
clearer; bisecting future regressions is easier; one PR rollback doesn't
take all fixes down.
Order: **highest-confidence single-cause fix first** (probably
`AdjustSphereToPlane` if Table 1 confirms over-correction). Re-run the
9-scenario probe spike AFTER each PR to verify (a) the targeted bug is
closed, (b) no other symptom is worse. If the re-run shows multiple
symptoms closed by the same fix, that's evidence for the single-cause
hypothesis — file the other planned PRs as N/A and proceed to A6.P4.
### 5.2 Per-PR shape
Each PR ships:
- The fix to the suspect code site (surgical change, narrow scope).
- A unit test using **golden values captured during A6.P1** as the
regression anchor. Example: "`AdjustSphereToPlane` at plane
N=(0,0,1) d=100, sphere center=(50,50,99.5), movement=(0,0,0.5),
radius=0.6 → output center.Z = 99.4 ± 0.1 mm matching retail capture
line 47".
- Visual verification at the scenario that surfaced the bug.
### 5.3 Likely-touched files
Based on hypothesis + decomp reading. The actual fix surface is whatever
A6.P2 surfaces.
| File | Likely touch |
|---|---|
| `src/AcDream.Core/Physics/BSPQuery.cs` | `AdjustSphereToPlane`, 6-path dispatcher entry, Path 5 (step_up) / Path 6 (set_collide) branches |
| `src/AcDream.Core/Physics/TransitionTypes.cs` | `Transition.FindEnvCollisions` indoor branch, `ApplyOtherCellResult` state reset |
| `src/AcDream.Core/Physics/CellTransit.cs` | `FindCellList` if path-selection diverges from retail's point-in-cell |
| `src/AcDream.Core/Physics/PhysicsEngine.cs` | `ResolveCellId` only via A6.P4 removal of the #90 workaround — no functional change in A6.P3 |
### 5.4 Commit message convention
`fix(physics): A6.P3 — <bug candidate N> — <one-line summary>`. Body
references the A6.P2 findings doc by anchor (`§ findings.bug-1` etc.).
---
## 6. A6.P4 — workaround removal + acceptance
### 6.1 Workaround 1 — Issue #90 sphere-overlap stickiness
**Location:** `src/AcDream.Core/Physics/PhysicsEngine.cs:285-300`.
**Removal:** pure revert of `4ca3596`. The `BSPQuery.SphereIntersectsCellBsp`
helper itself **stays** — it's also used by #89's `CheckBuildingTransit`
which IS retail-faithful per the `sphere_intersects_cell` decomp at
`acclient_2013_pseudo_c.txt:317666`.
**Verification:** with A6.P3's fix(es) landed, walk Holtburg inn doorway
— observe NO `[cell-transit]` ping-pong, no walk-through.
**Tripwire:** add a regression test at
`tests/AcDream.Core.Tests/Physics/PhysicsEngineResolveCellIdTests.cs`
constructing a sphere at the inn-doorway geometry post-push-back, calls
`ResolveCellId`, asserts the indoor CellId is preserved. Golden values
captured from A6.P1 scenario #1.
### 6.2 Workaround 2 — `TryFindIndoorWalkablePlane` synthesis
**Location:** `src/AcDream.Core/Physics/TransitionTypes.cs:1294-1373`
(method body) + caller at `:1519`.
**Removal:** delete method body + caller block. The three CP-retention
mechanisms — A (Path 6 land write at `acclient_2013_pseudo_c.txt:323924`),
B (`validate_transition` LKCP proximity restore at `:272565`), C (post-OK
step-down probe at `:273242`) — must catch the player without the
synthesis.
**Verification:** walk Holtburg cottage doorway threshold (the case that
broke Bug A's revert on 2026-05-20). Walk Holtburg sewer first stair
descent. With A6.P3's fix(es) landed, observe NO free-fall through
doorway threshold, NO falling-stuck on stair descent.
**Tripwire:** existing `walk-miss` probe stays enabled. MISS rate must
drop to <5% (current: 99.87% per
[`docs/research/2026-05-21-walk-miss-capture-findings.md`](../../research/2026-05-21-walk-miss-capture-findings.md)).
### 6.3 M1.5-physics acceptance criteria
All five must hold for A6 to be marked complete:
1. All 9 scenarios walk cleanly with NO probe warnings.
2. `walk-miss` MISS rate < 5% across a 60-sec wander.
3. `[cell-transit]` log shows zero ping-pong events in the 60-sec
wander.
4. **Holtburg Sewer end-to-end walk** (entry → 57 rooms → exit) without
getting stuck, without walking through any wall, without falling
through any stair. This is the M1.5 physics acceptance criterion.
5. M0 + M1 outdoor regression: walk Holtburg outdoor for 60 sec,
observe no regressions in outdoor cell handling, no FPS drop,
baseline test suite still 1147 + 8 (or whatever post-Phase O
baseline is — to be re-measured at A6.P1 start).
### 6.4 Three-failed-verifications policy
Per [`CLAUDE.md`](../../../CLAUDE.md) Visual verification rule: if three
consecutive visual verifications fail at the same scenario after
attempted fixes, **stop and write a handoff doc**. Do not attempt a
fourth fix on the same symptom in the same session. Hand off with full
reproduction notes + probe captures + the failed-fix code diffs.
---
## 7. Out of scope
- **A7 (Indoor lighting fidelity)** — separate phase, separate
methodology (RenderDoc + retail-decomp), follows A6. Per
[`CLAUDE.md`](../../../CLAUDE.md), do NOT mix lighting work into A6.
If A6.P1 reveals a lighting cause for an apparent physics symptom,
file as an A7 issue and continue A6.
- **A2 — `polygon_hits_sphere_slow_but_sure` tangent-epsilon rejection.**
Separate issue; A6.P4's `walk-miss` MISS rate target tolerates the
residual A2-class events as long as they're < 5% of all calls.
- **Outdoor physics regression** — A6 is indoor-focused. Outdoor walks
appear in acceptance criteria only as a regression backstop, not as
a fix surface. Any outdoor-physics findings in A6.P1 capture get
filed as separate issues for post-A6 work.
- **Combat / animation / movement networking** — completely orthogonal.
M2's prerequisites.
---
## 8. References
### 8.1 Retail decomp anchors
All in `docs/research/named-retail/acclient_2013_pseudo_c.txt`:
- `:272717-272798``CTransition::check_other_cells` (A4 oracle, already
ported)
- `:273099-273133``CTransition::step_up`
- `:273193-273239``CTransition::transitional_insert` Collide branch
- `:308742-308819``CObjCell::find_cell_list` Position-variant (the
hysteresis question for #90's root cause)
- `:317657-317671``CCellStruct::point_in_cell` + `sphere_intersects_cell`
- `:321594-321607``SPHEREPATH::set_collide`
- `:322032-322077``CPolygon::adjust_sphere_to_plane` (suspected
over-correction site)
- `:322974-322993``CPolygon::pos_hits_sphere` (front-face culling)
- `:323725-323939``BSPTREE::find_collisions` (full 6-path dispatcher)
- `:326211-326242``BSPNODE::find_walkable`
### 8.2 Our suspect code sites
- `src/AcDream.Core/Physics/BSPQuery.cs:332``AdjustSphereToPlane`
- `src/AcDream.Core/Physics/BSPQuery.cs:1550`, `:1895``FindCollisions`
(two overloads)
- `src/AcDream.Core/Physics/PhysicsEngine.cs:285-300``ResolveCellId`
#90 workaround block
- `src/AcDream.Core/Physics/TransitionTypes.cs:1294-1373`
`TryFindIndoorWalkablePlane` synthesis (workaround)
- `src/AcDream.Core/Physics/PhysicsDiagnostics.cs` — probe infrastructure
(where `[push-back]` lives)
### 8.3 Prior handoffs
- [`docs/research/2026-05-20-m15-kickoff-handoff.md`](../../research/2026-05-20-m15-kickoff-handoff.md) — M1.5 baseline + workaround inventory.
- [`docs/research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md`](../../research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md) — A4 ship + #90 ping-pong investigation.
- [`docs/research/2026-05-21-walk-miss-capture-findings.md`](../../research/2026-05-21-walk-miss-capture-findings.md) — `TryFindIndoorWalkablePlane` MISS rate evidence (99.87%).
- [`docs/research/2026-05-20-indoor-walking-bug-a-handoff.md`](../../research/2026-05-20-indoor-walking-bug-a-handoff.md) — Bug A's tried-and-reverted synthesis removal story.