Bug B (indoor BSP world-origin fix) shipped today atde8ffde. Bug A (delete per-frame walkable-plane synthesis) attempted and reverted at0a7ce8f. Real bug is deeper than scoped: Indoor cell floor polys don't cover the player's full XY range when crossing thresholds (doorways). Step-down probes miss past the floor edge, Mechanism C (post-OK step-down) can't catch the player, ContactPlane invalidates, gravity pulls them through the void. We have all three retail CP retention mechanisms (A, B, C). The defect is geometry, not retention. Either dat-decoder missing some floor polys, or cell-transition timing too late, or some retail mechanism we haven't traced. Handoff includes: - State of every commit on this branch + KEEP/REMOVE recommendation - Bug B evidence and recommendation to ship to main - Bug A failure analysis with probe data - Mechanisms A/B/C location in our code vs retail decomp anchors - 5 prioritized investigation targets for fresh session - Anti-patterns to avoid (don't repeat Bug A approach) - Lessons learned (probe-first discipline, risk-as-falsification, 3-fails-in-a-session stop signal, Matrix4x4.Decompose idiom, binary-timestamp paranoia) Recommendation: merge Bug B alone, leave the rest for fresh session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
Indoor walking — Bug A wrong-scope handoff (2026-05-20)
Status: Bug B shipped (de8ffde). Bug A attempted + reverted (9f874f4 → 0a7ce8f). The real bug is deeper than scoped and needs a fresh session with full context. ISSUES #83 remains OPEN.
This doc captures everything learned today so the next session picks up clean.
TL;DR
I went into today expecting to land "ContactPlane retention" as a 2-slice phase:
- Slice 1 (Bug B): indoor BSP world-origin fix. SHIPPED at
de8ffde. Closed a real corruption (320 corrupt CP writes/session withD≈0instead of world floor Z). - Slice 2 (Bug A): delete the per-frame
TryFindIndoorWalkablePlanesynthesis on the indoor OK path. REVERTED. Caused worse regression (player fell through ground when crossing thresholds).
The probe + decomp study revealed Slice 2's premise was wrong:
- Retail's
BSPTREE::find_collisionsPath 5B (grounded mover) does NOT callfind_walkableeither. It only checks for walls. So Bug A's "delete the synthesis and trust the BSP" had nothing to fall back on for the no-step-down case. - Retail keeps grounded movement coherent via THREE interacting mechanisms — A (Path 6 land), B (LKCP proximity restore), C (post-OK step-down probe). We have all three in our code already.
- The actual failure mode is when the player crosses a threshold (doorway) and the step-down probe finds no floor poly at the new XY. Step-down returns OK without writing CP, Mechanism B's proximity check fails because the player moved laterally past the cached plane,
oi.Contactclears, player goes airborne, gravity wins.
This is a cell geometry / cell-transition problem, not a CP retention problem. Outside Bug A's scope.
What's on main / what's on this branch
Branch: claude/sad-aryabhata-2d2479 (worktree, not merged).
Commits ahead of main (in order):
| SHA | Subject | Status |
|---|---|---|
66de00d |
feat(physics): [cp-write] probe for ContactPlane retention spike |
KEEP — invaluable for next session |
865634f |
docs(spec): indoor BSP world-origin / world-rotation fix (Bug B) |
KEEP — describes the shipped fix |
56816fc |
docs(plan): indoor BSP world-origin fix implementation plan |
KEEP |
39d4e65 |
test(physics): BSPQuery.FindCollisions writes world-space plane... |
KEEP — regression test for Bug B |
de8ffde |
fix(physics): pass cell world-transform to indoor BSP collision |
KEEP — the Bug B fix |
3bec18f |
docs(spec): remove per-frame indoor walkable-plane synthesis (Bug A) |
KEEP but mark wrong-approach |
686f27f |
docs(plan): remove per-frame indoor walkable-plane synthesis (Bug A) |
KEEP but mark wrong-approach |
9f874f4 |
fix(physics): remove per-frame indoor walkable-plane synthesis |
REVERTED by next commit |
0a7ce8f |
Revert "fix(physics): remove per-frame indoor walkable-plane synthesis" |
The revert. Brings back pre-Bug-A behavior. |
The branch is in a self-consistent post-Bug-B state: world-origin fix shipped, synthesis re-instated as it was before the session.
Decision for next session: merging Bug B to main is safe (closes a real corruption with strong probe evidence). The Bug A spec/plan + revert can stay on this branch as a tried-and-reverted record, or get cleaned up before merging.
What Bug B actually fixed (slice 1, shipped)
The defect
Indoor cell BSP queries at TransitionTypes.cs:1442 invoked BSPQuery.FindCollisions with Quaternion.Identity + defaulted Vector3.Zero for worldOrigin. Inside the BSP, Path 3 (step_sphere_down) and Path 4 (land-on-surface) use those args via TransformVertices + BuildWorldPlane to produce a world-space ContactPlane. With both args defaulted, the produced plane was in cell-LOCAL space — D ≈ 0 instead of D = -world_floor_Z (e.g., -94.02 for Holtburg cottages).
The fix (de8ffde)
Quaternion cellRotation;
Vector3 cellOrigin;
if (!Matrix4x4.Decompose(cellPhysics.WorldTransform, out _, out cellRotation, out cellOrigin))
{
Console.WriteLine($"[indoor-bsp] WARN cellPhysics.WorldTransform did not decompose ...");
cellRotation = Quaternion.Identity;
cellOrigin = cellPhysics.WorldTransform.Translation;
}
var cellState = BSPQuery.FindCollisions(
cellPhysics.BSP.Root, cellPhysics.Resolved, this,
localSphere, localSphere1, localCurrCenter,
Vector3.UnitZ, 1.0f,
cellRotation,
engine,
worldOrigin: cellOrigin);
Mirrors the existing correct pattern at TransitionTypes.cs:1808 (object BSP via FindObjCollisions).
Evidence (probe-driven)
Pre-fix (launch-cp-probe.log): 320 [cp-write] caller=BSPQuery.StepSphereDown:1123 writes producing D=-0.000 instead of D=-94.020.
Post-fix (launch-cp-probe-postfix-v2.log): step-down writes show D=-94.020, D=-66.020, D=-158.994, D=-159.129 — all matching the cell's actual world floor Z. The 2 remaining D=0.000 outliers are either polygons legitimately at world Z=0 or marginal edge cases.
Tests
- Unit test added:
BSPQueryTests.FindCollisions_StepDown_TranslatedWorldOrigin_WritesWorldSpacePlane— verifies BSPQuery writes world-space CP when called with a translated worldOrigin. - 8-failure physics baseline holds (no new regressions).
Recommendation
Ship Bug B alone. The Bug A spec/plan + revert can stay or get cleaned. The probe (66de00d) is worth keeping in tree until the deeper investigation is complete.
What Bug A tried and why it failed (slice 2, reverted)
The hypothesis
Per the previous handoff (docs/research/2026-05-19-indoor-walkable-plane-bsp-port-shipped-handoff.md) and the subagent's first decomp study, retail's BSPTREE::find_collisions does NOT call find_walkable on the OK path. ContactPlane is retained across OK frames from the prior tick's seed (our equivalent: PhysicsEngine.ResolveWithTransition:583, the init_contact_plane analogue). The synthesis we added in Phase 2 (eb0f772 2026-05-19) was an unfaithful stop-gap that runs every frame, 99.87% MISSES due to tangent epsilon rejection in walkable_hits_sphere, and falls through to outdoor terrain → wrong CP plane.
Proposed fix: delete the synthesis call + outdoor fallthrough from the indoor OK path. Just return TransitionState.OK; after the indoor BSP returns OK. Let CP retain via the seed and let BSP Path 3/4 refresh it during step-down or landing.
The fix (9f874f4)
Replaced TryFindIndoorWalkablePlane(...) → ValidateWalkable(...) → fallthrough to outdoor terrain with return TransitionState.OK;. Deleted the method + constant + 9 tests. -491 lines.
The regression
User report: "I could not get out of the building, I had to jump out of the door, then I fell through the ground."
Probe data (launch-buga-v2.log):
[indoor-bsp] cell=0xA9B40125 wpos=(96.880,159.403,61.536) result=OK
[indoor-bsp] cell=0xA9B40125 wpos=(96.800,159.603,61.336) result=OK
[indoor-bsp] cell=0xA9B40125 wpos=(96.720,159.803,61.130) result=OK
...continues until wpos=(67,233,-262) ~350m below cell floor
The player's Z decreased ~0.2m per tick (gravity step), inside an indoor cell, with the BSP returning OK every frame (no walls below them). No step-down probe lines firing during the fall — oi.Contact had cleared.
Why Mechanisms A/B/C didn't catch this
The full decomp study (in this session's subagent transcript, file
C:\Users\erikn\AppData\Local\Temp\claude\C--Users-erikn-source-repos-acdream--claude-worktrees-sad-aryabhata-2d2479\cd9bbcf4-a861-4797-99e3-8c1c623ff66e\tasks\a88c5ab14446853ea.output)
mapped retail's three CP retention mechanisms:
| Retail location | acdream location | Status | |
|---|---|---|---|
A — Path 6 collide-path land (set_contact_plane) |
acclient_2013_pseudo_c.txt:323924 |
BSPQuery.cs:1615 (Path 4) |
Present, works |
B — validate_transition LKCP proximity restore |
:272565-272578 |
TransitionTypes.cs:2618-2662 |
Present, has proximity-guard |
C — transitional_insert post-OK step-down probe |
:273242-273307 |
TransitionTypes.cs:896-933 |
Present, gated on oi.Contact && !ci.ContactPlaneValid && oi.StepDown |
All three exist in our code. The failure was that they're all gated on conditions that fail in the doorway-crossing case:
- Step-down probe (Mech C) fires correctly: log shows ~209 successful Adjusted results from BSPQuery.StepSphereDown.
- Player walks toward the cottage doorway. Sub-step moves
lpos.Yfrom -5.994 to -6.398 (past the cottage floor edge). - At new position, step-down probe BSP returns OK +
poly=n/a(no floor poly at this XY) — same for Z probes at -0.75, -1.5, -2.25. The cottage's indoor cell has no floor poly extending past the doorway threshold. - Step-down returns OK without writing CP.
ci.ContactPlaneValidstays false. - Mechanism B (LKCP proximity) checks distance from sphere to cached plane: sphere moved ~0.4m laterally, the prior plane is at the prior XY, but the proximity check is
radius + EPSILON > |angle|whereangle = N·sphere + D. For a horizontal floor,angle = sphere.Z - cached_floor_Z. If sphere.Z hasn't moved much vertically, this should pass...- Actually: I didn't fully trace this. Mech B might fire correctly. Need next-session probe to confirm.
- Either way: by the time the player has traveled a few sub-steps with no floor underneath,
oi.Contactclears via the ValidateTransition else-branch (line 2664-2666). Mech C stops firing (it requiresoi.Contact). - Player free-falls. Path 5 stops firing (no Contact). Path 6 fires for airborne movement. No CP gets re-established.
Why the previous "stuck-falling on 2nd-floor edge" symptom is the same bug
The user's PRIOR symptom (pre-Bug-A): "Walking up the stairs, if I sort of just touch the floor on top of me I get stuck in falling animation."
That's the same root cause manifesting in a different geometry: step-down probe doesn't find a floor poly at the 2nd-floor edge → Mechanism C can't catch → some path along the synthesis → wrong CP → ValidateWalkable marks airborne → falling animation never recovers.
The Phase 2 synthesis (TryFindIndoorWalkablePlane) was a duct-tape over this — it tried to find a "best-guess" floor poly via XY scan. When the scan returned the wrong poly (rare HIT case) or missed (99% case) the player got stuck. But it didn't make them free-fall through the void because the fallthrough to outdoor terrain at least gave them SOMETHING (just slightly below the cottage floor).
Bug A removed the duct-tape. With nothing replacing it, the player free-falls.
Key insight: the duct-tape was hiding a deeper bug
The Phase 2 synthesis (eb0f772) was patching over a real defect: indoor cell floor polygons don't extend to cover the player's full possible XY range when crossing thresholds. Either:
- (a) Retail's indoor cells have floor polys that extend further than ours do (dat-decoder bug?).
- (b) Retail's cell-transition timing moves the player into the outdoor cell BEFORE they step past the indoor floor poly edge, so the indoor BSP query at the threshold always has a floor under the sphere.
- (c) Retail has a mechanism we haven't found yet that handles "no floor poly at this XY" gracefully (e.g., extending the search to neighbor cells via portals).
- (d) Retail's player-collision-sphere is sized differently so the player physically can't reach the edge of the cottage floor.
Without further investigation, I can't say which. The next session needs to figure this out.
State of the [cp-write] probe
Committed at 66de00d. Converts 8 CollisionInfo fields (CP + LKCP groups, 4 sub-fields each) from public fields to public properties with logging setters. Logging is gated on PhysicsDiagnostics.ProbeContactPlaneEnabled (env var ACDREAM_PROBE_CONTACT_PLANE=1, also runtime-toggleable). When the flag is off, the property accessors are inlined to direct field access by the JIT — zero cost.
Keep this in tree — the next session will need it to validate any new hypothesis before designing the fix. The Bug B + Bug A specs both say "remove the probe when the retention fix lands"; that's not yet, defer the removal.
The probe surfaces:
- Each write site's source line (
PhysicsEngine.cs:583,BSPQuery.cs:1123,BSPQuery.cs:1615,TransitionTypes.cs:663etc). - Old value → new value, only logged when actually changed (value-equality suppression in the setter).
- Plane Normal + D, CellId, IsWater, Valid flags.
Caller distribution from the failed Bug A run is in launch-buga-v2.utf8.log:
| Count | Caller | Role |
|---|---|---|
| 57,144 | PhysicsEngine.ResolveWithTransition:583 |
Per-tick seed (init_contact_plane) |
| 607 | Transition.FindTransitionalPosition:663 |
Sub-step CPV=0 reset |
| 341 | Transition.ValidateWalkable:1488 |
Outdoor terrain on-surface |
| 217 | BSPQuery.StepSphereDown:1123 |
Path 3 step-down (Mechanism C fires) |
| 19 | Transition.ValidateWalkable:1511 |
Outdoor terrain below-surface |
| 0 | Transition.ValidateWalkable (indoor) |
Bug A removed the indoor path |
| 0 | [indoor-walkable] lines |
Bug A removed the probe |
Investigation targets for next session
If picking this up, the priority order:
-
Confirm the doorway-edge hypothesis with cdb on retail. The retail debugger toolchain (CLAUDE.md "Retail debugger toolchain" section) lets us attach to a live retail client. Set a breakpoint at
BSPLEAF::find_walkableand walk the same cottage threshold. Capture the polygons the floor BSP iterates over. Either:- Retail's cell has more floor polys covering the threshold → our dat-decoder is missing some polys.
- Retail's cell-id changes BEFORE the sphere reaches the edge → our cell-transition timing lags.
- Retail does something we haven't seen yet.
-
Cross-reference with WorldBuilder. The CLAUDE.md "Reference hierarchy by domain" table says WB is the production base for EnvCell geometry. Look at
WorldBuilder/EnvCellRenderManager.csandWorldBuilder/PortalRenderManager.csfor how WB handles cell boundaries. -
Add a probe that logs each indoor cell's floor poly count + extent. Diagnostic-only. When the player enters an indoor cell, dump the cell's floor polys + their XY bounding boxes. Compare to the player's eventual XY position when step-down misses. Tells us whether the floor poly genuinely doesn't extend that far OR whether something else is wrong.
-
Look at Phase 2 cell-transition work. The
[cell-transit]probe + the portal-graph traversal inCellTransit.FindCellListwere shipped 2026-05-19 (commits1969c55througheb0f772). Whether they fire in time at the cottage doorway is unclear. -
Don't repeat the Bug A approach. "Just delete the synthesis and trust BSP" doesn't work because the BSP genuinely has no floor poly at the threshold. Some replacement is needed — the question is what.
Anti-patterns the next session should avoid
- Don't trust the previous handoff's recommendation blindly. The 2026-05-19 handoff said "remove TryFindIndoorWalkablePlane" — that recommendation was based on incomplete decomp analysis. The proper fix requires understanding cell geometry, not just CP retention.
- Don't design a fix before the probe data points at the right code path. I designed Bug A's spec on a "Mechanism C will catch us" assumption that the data didn't validate.
- Don't fix two related bugs in one session. Bug B + Bug A were both indoor-CP issues but they had different root causes. Slicing them was the right call; what went wrong was Bug A's design.
Things to definitely KEEP from today's work
- Bug B fix (
de8ffde) — closes a real corruption. - The
[cp-write]probe (66de00d). - The
[indoor-bsp]probe (pre-existing, from Phase 1). - BSPQuery regression test (
39d4e65). - Spec + plan docs for Bug B (good engineering artifacts).
- This handoff doc.
Things to consider removing on next session
- Bug A spec/plan docs (
3bec18f/686f27f) — they document a wrong approach. Optional to delete; they're useful as a "tried this, didn't work, here's why" record.
How to start a fresh session
Copy this into a new Claude Code session in the acdream worktree:
Pick up the acdream indoor walking issue (ISSUES #83). Read
docs/research/2026-05-20-indoor-walking-bug-a-handoff.md FIRST. The
prior session today shipped Bug B (BSP world-origin fix, de8ffde) but
attempted-and-reverted Bug A. The real bug is deeper than scoped — see
the handoff for the full diagnosis and investigation targets.
1. Don't try Bug A again ("just delete TryFindIndoorWalkablePlane and
trust retention"). That was today's wrong approach; data showed
Mechanism C can't catch when there's no floor poly past the
threshold.
2. The probe (66de00d) + [indoor-bsp] probe should stay in tree until
the proper fix lands.
3. Investigation targets are in the handoff's "Investigation targets
for next session" section. The most useful first move is probably
attaching cdb to retail at the same cottage threshold and watching
what BSPLEAF::find_walkable iterates over.
4. CLAUDE.md rules apply. No workarounds, no band-aids. Visual
verification is the acceptance test.
5. M2 critical path candidates remain (F.2 / F.3 / F.5a / L.1c /
L.1b). If this investigation looks like it'll burn a phase or two
to nail down, consider whether the user wants you to pivot to M2
work and address indoor walking in M7 polish.
State the milestone + chosen phase in the first action you take.
Or just say "Read docs/research/2026-05-20-indoor-walking-bug-a-handoff.md and start a fresh session."
Lessons from today (for future Claude)
-
The user's pickup-prompt language was right: probe-first, design-second. I did the probe spike for Bug B — that worked great. For Bug A I didn't do an equivalent spike for "will Mechanism C catch the no-floor case?" before deleting the synthesis. The R1 risk I called out in the spec was the actual failure mode.
-
A spec's "Out of scope" + "Risks" sections can lie. I wrote them after the design was decided, and they reflected the design's blind spots, not actual blind spots. Next time: list risks BEFORE writing the design, treat them as falsification tests, validate them with the probe before shipping.
-
"Three failed visual verifications in a session" is the stop signal. I got to two and pushed for a third (which triggered the revert decision via the user's "Got stuck falling in the staircase" report + "I had to jump out of the door, then I fell through the ground" report). The third should have been the trigger to stop and write the handoff — instead I dispatched another subagent and dug deeper. That additional dig was useful (it surfaced the doorway-edge insight) but it would have happened in the fresh session too with a fresher context budget.
-
Matrix4x4.Decomposeworks fine for cell transforms. Bug B's mechanical fix landed cleanly. The pattern (decompose once at the call site, pass rotation + origin to a function that previously took defaults) is a clean idiom for places where we have a Matrix and the API wants a Quaternion + Vector3. -
Test build + binary timestamp paranoia is real. During Bug B's first visual verification, my test passed but I'd accidentally rebuilt the AcDream.Core DLL from un-stashed code, so the launched client didn't have the fix. The mismatch was only caught by checking the binary mtime against the source mtime. After every code change to be tested in the client, verify the build is fresh.
Recommendation: merge Bug B to main. Keep the rest of this branch around as a learning artifact. Start fresh on the deeper investigation in a new session with this handoff as the starting brief.