acdream/docs/research/2026-05-24-door-bug-apparatus-shipped-findings.md
Erik 28cd97be62 fix(phys): A6.P4 door bug — AddAllOutsideCells coord convention + replay apparatus
CellTransit.AddAllOutsideCells assumed sphere coords were absolute world
coords (subtracting lbXf = 0xA9 * 192 = 32448 from the sphere position).
Production has used landblock-local coords since Phase A.1
(streaming-center landblock at world origin), so the subtraction
produced localX = -32316, gridX = -1346 → out-of-range → early return
→ ZERO outdoor cells added.

For outdoor primary cells the bug was masked by GetNearbyObjects's
radial sweep. For indoor primary cells (where #98 gates the outdoor
sweep), the door's outdoor cell 0xA9B40029 never reached
portalReachableCells, the door's BSP was never queried, and the player
walked through Holtburg cottage doors unimpeded.

Fix: AddAllOutsideCells treats worldSphereCenter as landblock-local
directly. Matches retail CLandCell::add_all_outside_cells which uses
the per-cell 6-byte landblock-relative position struct.

Existing CellTransitAddAllOutsideCellsTests + CellTransitFindCellSetTests
updated to use landblock-local sphere coords (they were the only callers
using the world-coord convention; production never did).

Apparatus shipped:
- DoorBugTrajectoryReplayTests — live-capture-driven replay harness
  that pinpointed the bug per-field at unit-test speed (<500ms iteration)
- AddAllOutsideCells_LandblockLocalSphere_AddsDoorOutdoorCell — direct
  unit test that demonstrates the fix
- FindTransitCellsSphere_IndoorExitPortal_AddsOutsideForCapturedSpherePos
  — verifies cell-portal traversal at the captured sphere position
- DoorSetupGfxObjInspectionTests.HoltburgCottage_CellPortals_DatInspection
  — dat-direct EnvCell + Environment.Cells + portal-poly inspector
- Fixture: tests/AcDream.Core.Tests/Fixtures/door-bug/live-capture.jsonl
  (tick 13558 walkthrough + tick 22760 outdoor block)

Visual verification (user-driven at Holtburg cottage door, ~50cm off-center):
- outside→inside RUN: now BLOCKS (was: walks through)
- outside→inside WALK: presumed blocks (not retested)
- inside→outside RUN: PARTIAL — body intersects door, sphere slides through
- inside→outside WALK: same partial behavior

The remaining inside→outside asymmetry is a SEPARATE bug in BSP
collision response for two-sided polygons. The [bsp-test] probe now
fires 245 times for the door entity from indoor (was 0 pre-fix) —
door IS being queried; the BSP polygon-level collision response is
the new bug. Handoff at
docs/research/2026-05-25-door-bug-partial-fix-shipped.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 07:53:34 +02:00

10 KiB
Raw Blame History

Door collision — apparatus replay shipped, root cause identified

2026-05-24 (continuation of the door-collision investigation)

SUPERSEDED 2026-05-25 by docs/research/2026-05-25-door-bug-partial-fix-shipped.md. The root-cause analysis here was correct in direction (cell-portal traversal is upstream of BSP query) but missed the specific bug: CellTransit.AddAllOutsideCells silently failed for landblock-local sphere coords (production's convention) because it subtracted an absolute-world lbXf offset. Diagnosis + fix in the 2026-05-25 doc.

TL;DR

The trajectory-replay apparatus is wired and useful. Run the diagnostic test for the failing tick and the engine's full [step-walk] trace prints, naming the divergence per-field.

The bug: CellTransit.FindCellSet does not surface outdoor cell 0xA9B40029 (where the door is registered) from indoor primary cell 0xA9B40150. With issue #98's indoor-cell gate on the outdoor radial sweep, the door is therefore invisible to GetNearbyObjects and the BSP slab is never tested. The player walks through unimpeded.

Cn=(0,1,0) from the harness is not the door — it's the seeded walkable polygon's south edge being treated as a wall when the sphere falls off it. The harness reproduces production's "door not queried" behavior, just with an apparatus artifact in place of clean walkthrough.

What was shipped

  1. Live capture (door-walkthrough.jsonl, 24,310 records ≈ 45 MB). The capture was driven via ACDREAM_CAPTURE_RESOLVE + the existing [entity-source] + [bsp-test] probes. One record per PhysicsEngine.ResolveWithTransition call with full PhysicsBody snapshots before/after.

  2. Fixture extraction (tests/AcDream.Core.Tests/Fixtures/door-bug/live-capture.jsonl, 4 KB). Two representative ticks pulled from the JSONL:

    • Tick 13558 — the walkthrough. Player at (132.36, 16.81, 94) in indoor cell 0xA9B40150, target (132.43, 17.20, 94). Live result.Position = target with collisionNormalValid = false. Door centered at world XY (132.57, 16.99), BSP radius 1.975, state 0x00010008 = PERSISTENT_PS | 0x8 (NO ETHEREAL_PS = 0x4CLOSED).
    • Tick 22760 — the working block. Player at (133.14, 18.02, 94) in outdoor cell 0xA9B40029, target (133.10, 17.60, 94). Live blocks at Y=18.018 with cn=(0, +1, 0). Same door, different primary cell type.
  3. Replay harness (DoorBugTrajectoryReplayTests.cs): loads tick fixtures, hydrates door GfxObj 0x010044B5 from real dat (DatCollection.Get<GfxObj>), registers a synthetic door via ShadowObjectRegistry.RegisterMultiPart at the captured BSP world center ((132.57, 16.99, 95.36)) with cellScope=0u (mirrors production registration at GameWindow.cs:3158-3167). AssertCallMatchesCapture replays the call and prints the first per-field divergence. Diagnostic variant enables every PhysicsDiagnostics.Probe*Enabled and dumps the full engine trace.

Chronology (from door-walkthrough.launch.log)

Confirmed the door state at the time of every walkthrough:

Log line Event
10796 [setstate] door state → 0x0001000C (PERSISTENT + ETHEREAL = OPEN)
10993 [setstate] door state → 0x00010008 (PERSISTENT, NOT ethereal = CLOSED)
1099511071 First and last [bsp-test] line on door 0x000F4246. All state=0x00010008

So every [bsp-test] hit on the door, and every walkthrough event in the JSONL, is against the closed door. The bug is real, not an ETHEREAL pass-through.

What the diagnostic test prints (tick 13558)

=== Replay tick 13558 (the walkthrough) ===
[step-walk] site=find-start cur=(132.36,16.81,94) ... walkPoly=True
[step-walk-adjust] branch=into-plane input=(0.07,0.39,0.00) output=(0.07,0.39,0.00) zGain=0
[step-walk] site=before-insert ... delta=(0.0744,0.3928,0) cell=0xA9B40150 ... walkPoly=True
[step-walk] site=stepdown-enter ... delta=(0.0744,0.3928,0) stepDown=True walkableZ=0.6642
[step-walk] site=stepdown-after-offset ... delta=(0.0744,0.3928,-0.75) ... walkPoly=True
... (probes down by 0.75, then 1.5; all OK; walkPoly=True)
[step-walk] site=stepdown-enter ... delta=(0.0744,0.0000,0) ... hit=(0,-1,0) walkPoly=False
... (probes down again; hit stays (0,-1,0); walkPoly=False throughout)
[step-walk] site=after-insert state=Collided ... hit=(0,-1,0) walkPoly=False
[step-walk] site=after-validate state=OK ... position back to input
[resolve] in=(132.360,16.811,94) cell=0xA9B40150 tgt=(132.435,17.204,94)
          out=(132.360,16.811,94) cell=0xA9B40150 ok=True
          hit=yes n=(0,-1,0) walkable=True
=== Harness: pos=(132.36,16.81,94) cn=(0,-1,0) cnValid=True onGround=True cell=0xA9B40150
=== Live:    pos=(132.43,17.20,94) cn=(0,0,0)  cnValid=False onGround=True cell=0xA9B40150

No [bsp-test] line fires. The door's BSP is never queried. The hit (0, -1, 0) is the engine's "sliding off the south edge of the seeded walkable polygon" response — not a door collision.

This matches production: at indoor primary cell 0xA9B40150, GetNearbyObjects returns ZERO shadows because:

  1. The captured cellId low-nibble 0x150 >= 0x100 → indoor → issue #98's gate at ShadowObjectRegistry.cs:480 skips the outdoor radial sweep.
  2. portalReachableCells (built by CellTransit.FindCellSet) lacks outdoor cell 0xA9B40029. In the harness, this is because we register no cell fixture for 0xA9B40150 and the indoor branch at CellTransit.cs:403-407 early-returns with empty candidates. In production, the cell IS in cache but the traversal still doesn't produce 0xA9B40029 — the cell's exit portal (OtherCellId=0xFFFF) either doesn't fire exitOutside=true at the sphere's position, or AddAllOutsideCells isn't computing the right outdoor cell.

Next investigation move

Dump cell 0xA9B40150 from the dat and inspect its portal list. Two ways:

a) Dat-direct read in a test (preferred — no live launch). Pattern from DoorSetupGfxObjInspectionTests: dats.Get<EnvCell>(0xA9B40150u), then iterate envCell.CellPortals and print each portal's OtherCellId, PolygonId, Flags. If no portal with OtherCellId == 0xFFFF, exitOutside can never be true → bug is in the cell's portal-graph loading (or the cottage doesn't connect via 0xFFFF exit portals; it might use the building-shell path via BuildingPhysics.CheckBuildingTransit instead).

b) Live ACDREAM_DUMP_CELLS=0xA9B40150,0xA9B4013F,0xA9B40154 — another launch cycle. Less preferred; we already have what we need from the dat read.

The dat-direct read can be a new test method in DoorSetupGfxObjInspectionTests (it's the natural home for this class of dat-introspection checks).

What NOT to do next

  1. Don't speculate on the fix. We have the right replay apparatus now; the next move is read the dat to determine the cell's actual portal structure. Then we'll know whether the bug is in the dat data, the portal loading, the exit-portal detection in FindTransitCellsSphere, or AddAllOutsideCells's grid math.

  2. Don't modify the replay test to mask the walkable-polygon edge artifact. The artifact is harmless (it documents that, given a single isolated walkable poly, the engine treats its boundary as a wall — true regardless of the door bug). The interesting finding is "no [bsp-test] line"; the edge artifact just happens to fill the collision slot.

  3. Don't re-do the registration shape. Multi-part registration

    • dedup fix + Task 7 wiring are correct. Verified by the harness's ability to query the door registration (it just isn't reached at indoor primary cells).

Files touched this session

Committed: none yet — pending commit at session end.

Uncommitted:

  • tests/AcDream.Core.Tests/Fixtures/door-bug/live-capture.jsonl — 2 captured ResolveWithTransition records (tick 13558 walkthrough + tick 22760 outdoor block)
  • tests/AcDream.Core.Tests/Physics/DoorBugTrajectoryReplayTests.cs — apparatus: 2 LiveCompare tests + 1 Diagnostic dump
  • docs/research/2026-05-24-door-bug-apparatus-shipped-findings.md — this doc

Pickup prompt for the next session

A6.P4 door bug — apparatus replay shipped. DoorBugTrajectoryReplayTests
loads tick 13558 (walkthrough) and 22760 (block) from a captured fixture
and replays through the engine. Door 0x000F4246 (closed, state=0x00010008,
BSP world (132.57, 16.99, 95.36) radius 1.975) IS registered correctly
in the harness, BUT the engine never queries it from indoor primary cell
0xA9B40150 — no [bsp-test] line fires. Root cause located:
CellTransit.FindCellSet's portal traversal does not surface outdoor cell
0xA9B40029 from indoor cell 0xA9B40150.

  Read docs/research/2026-05-24-door-bug-apparatus-shipped-findings.md

  State both altitudes:
    Currently working toward: M1.5 — Indoor world feels right
    Current phase: A6.P4 door bug — cell-portal investigation.
                   Apparatus shipped; next step is to dump cell
                   0xA9B40150's portal list (from the dat) and
                   determine why FindTransitCellsSphere doesn't
                   add outdoor cell 0xA9B40029 to candidates.

  First move: add a test to DoorSetupGfxObjInspectionTests (or a new
  CellPortalDatInspectionTests file) that reads EnvCell 0xA9B40150 from
  the real dat and prints every portal's OtherCellId, PolygonId, Flags.
  Then read 0xA9B4013F (player's other indoor cell from JSONL) and
  0xA9B40029 (door's outdoor cell) for cross-comparison. The portal
  structure will reveal whether cottages use 0xFFFF exit portals
  (FindTransitCellsSphere path) or building-shell portals
  (CheckBuildingTransit path). If 0xFFFF exit portals exist but
  exitOutside isn't firing, the bug is in the sphere-vs-plane test
  at CellTransit.cs:99-112. If they don't exist, the building-shell
  path is misconfigured for indoor-primary calls.

  DO NOT:
    - Modify the replay test to mask the walkable-polygon-edge artifact
    - Re-do the registration shape (correct)
    - Speculate on the fix without dat evidence