acdream/docs/research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md
Erik 1534990102 docs(roadmap): A4 shipped + #90 cell-tracking ping-pong filed
Phase A4 (multi-cell BSP iteration) ships in three commits (e6369e2,
493c5e5, 691493e — with revert 3add110 + reapply during visual
verification that proved A4 is not the cause of the issue surfaced).
1139 + 8 baseline maintained. 10 new unit tests pass. Wires retail's
CTransition::check_other_cells (acclient_2013_pseudo_c.txt:272717-272798)
into Transition.FindEnvCollisions.

Visual verification at the Holtburg inn vestibule surfaced a separate,
pre-existing M2 blocker (filed as #90): CellId ping-pongs between
outdoor 0xA9B40022 and indoor 0xA9B40164 on every wall push-back
because the push-back exits the indoor CellBSP volume, causing the
resolver to flip back to outdoor and bypass walls on outdoor ticks.
Indoor BSP results (Collided/Adjusted/Slid all firing) prove walls ARE
detected when the player is indoor; the aggregate "walls walk through"
appearance comes from CellId classification instability, not from
collision detection.

Bug reproduces fully with A4 reverted (launch-revert2.log captured 18
cell-id flips between 0xA9B40022 ↔ 0xA9B40164, 11 inside=True
building-transit events, 61 indoor-bsp queries firing the full
result distribution). A4 is correct and tested but dormant in
practice until #90 is fixed.

Updates:
  - docs/research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md (new)
  - docs/plans/2026-04-11-roadmap.md (A4 shipped row added)
  - CLAUDE.md (Indoor walking Phase A4 paragraph + next-step pointer
    to #90 with retail oracle anchor at acclient_2013_pseudo_c.txt:308742-308783)
  - docs/ISSUES.md (#90 filed, HIGH severity, M2-blocker)
  - docs/research/2026-05-21-open-items-pickup-prompt.md (landscape
    table updated — A4 closed, #90 promoted to top blocker)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 20:10:29 +02:00

11 KiB

Phase A4 shipped + cell-tracking ping-pong finding — 2026-05-20

Status: A4 (multi-cell BSP iteration) shipped in 3 commits + 1 revert + 1 reapply

  • 1 doc. Build green, 1139 + 8 baseline failures (same as pre-A4 baseline). A4 is dormant in practice because of a separate, pre-existing cell-tracking bug at the inn doorway that prevents the player from stably remaining in an indoor cell.

TL;DR

  • A4 ports retail's CTransition::check_other_cells (acclient_2013_pseudo_c.txt:272717-272798). After the primary cell's BSP returns OK, every other cell the foot-sphere overlaps is queried via BSPQuery.FindCollisions. Halt on first Collided/Adjusted/Slid; Slid clears the contact-plane fields. Matches retail exactly.
  • 10 new unit tests pass; full test suite holds at the prior 8-failure baseline. Three commits land the slices (FindCellSet overload → CheckOtherCells helper → FindEnvCollisions wire-up).
  • Visual verification surfaced a different bug: walking into the Holtburg inn ping-pongs the player's CellId between indoor 0xA9B40164 and outdoor 0xA9B40022 rapidly. Indoor BSP DOES detect walls (Collided / Adjusted / Slid all fire on push-back), but the push-back moves the sphere outside the indoor CellBSP's volume → ResolveCellId reclassifies the player as outdoor → next tick bypasses indoor BSP entirely → player advances freely → re-enters → repeats.
  • Because the player never STAYS in an indoor cell, A4's multi-cell pass is rarely (if ever) actually exercised in production. The user's reported "walls walk through everywhere in the inn" reproduces fully with A4 wire-up reverted, confirming A4 is not the cause.

What shipped

SHA Phase Description
b100d54 A4 spec docs/superpowers/specs/2026-05-20-phase-a4-multi-cell-bsp-design.md
a8a0366 A4 plan docs/superpowers/plans/2026-05-20-phase-a4-multi-cell-bsp.md
e6369e2 A4 slice 1 CellTransit.FindCellSet overload + 3 unit tests
493c5e5 A4 slice 2 Transition.CheckOtherCells + ApplyOtherCellResult + 6 unit tests
967d065 A4 slice 3 Wire CheckOtherCells into FindEnvCollisions + 1 integration test
3add110 A4 revert Temporary revert of slice 3 to confirm A4 wasn't the cause
691493e A4 reapply Restored slice 3 after revert test proved A4 not the cause

Total: ~380 LOC added (3 new test files + helper methods); 1139 + 8 baseline maintained throughout.

Visual verification — what we tested

Launched twice with the light-probe set (ACDREAM_PROBE_INDOOR_BSP, ACDREAM_PROBE_CELL, ACDREAM_PROBE_CELL_CACHE).

Launch 1 — A4 wire-up active (launch-a4.log, 782 lines)

User walked from spawn toward the Holtburg inn. Log captured:

  • Player CellId stayed at outdoor 0xA9B4002A the entire session.
  • 0 indoor-bsp probes fired.
  • 0 other-cells probes fired (A4 wire-up only runs for indoor cells).
  • User reported "all interior walls in the inn can be walked through; going from indoor to outdoor broken."

But — A4 wire-up only fires when cellLow >= 0x0100. The player never reached that state. So A4 couldn't possibly be the cause of the reported behavior.

Launch 2 — A4 wire-up reverted (launch-revert2.log, 18490 lines)

User walked into and out of the inn multiple times. Log captured:

  • 18 cell-transit events: outdoor cells 0xA9B40021 / 0xA9B40022 / 0xA9B4002A ping-ponging with indoor cell 0xA9B40164 (vestibule).
  • 11 [check-bldg] inside=True events — player crossed the building threshold.
  • 61 indoor-bsp queries against 0xA9B40164 (58) and 0xA9B40162 (3).
  • Indoor BSP results: 40 OK + 7 Adjusted + 7 Collided + 7 Slid.
  • User confirmed: "walls still walked through (same bug)" with A4 reverted.

The bug reproduces with A4 reverted, proving A4 is not responsible.

The actual bug — cell-tracking ping-pong at doorway threshold

The repeating cycle observed in the revert log:

  1. Player at outdoor cell 0xA9B40022, walking toward inn door.
  2. CheckBuildingTransit returns inside=True for portal to 0xA9B40164.
  3. ResolveCellId promotes CellId to 0xA9B40164.
  4. Next tick: indoor branch of FindEnvCollisions fires. BSP query against 0xA9B40164's walls returns Adjusted/Collided/Slid. The sphere is pushed back (Adjusted/Slid) or halted (Collided).
  5. The push-back moves the sphere's world position BACK toward the outdoor side, beyond the indoor CellBSP's volume.
  6. ResolveCellId re-evaluates: indoor CellBSP no longer contains the sphere center → falls through to outdoor resolution → returns 0xA9B40022.
  7. CellId flips back to outdoor. Next tick: indoor BSP not queried, player keeps advancing.
  8. Player re-crosses the building threshold → goto 2.

Net effect: the player visually moves through the doorway zone, walls intermittently push them back, but most ticks classify them as OUTDOOR and those ticks bypass wall collision entirely. The aggregate behavior LOOKS LIKE "walls walk through" even though wall hits are firing.

Why A4 doesn't help here

A4 multi-cell iteration only runs when the primary cell BSP returns OK. In the ping-pong cycle, the primary cell BSP returns NON-OK (Collided/Adjusted/Slid) on most indoor frames — so A4 short-circuits early at the existing if (cellState != TransitionState.OK) return cellState; path. A4 would help if the player were STABLY indoor (cellLow >= 0x100) AND the primary cell's BSP had sparse geometry that missed walls in adjacent cells. The ping-pong prevents both conditions.

Why this is a Bug A cousin

The 2026-05-20 Bug A investigation (docs/research/2026-05-20-indoor-walking-bug-a-handoff.md) documented a similar doorway-edge problem: indoor cell floor polys don't extend past the doorway threshold, causing free-fall when stepping out.

The current ping-pong is the same family of bug, different symptom: the indoor CellBSP volume doesn't extend past the doorway threshold either, so the push-back from a wall collision exits the cell's containment volume, and the cell-id resolver bounces the player back to outdoor.

Hypothesis: the inn's vestibule cell 0xA9B40164 has a CellBSP that's tightly bounded to the room's interior volume. The doorway threshold is right at the boundary. Walking against an interior wall pushes the foot-sphere back toward the boundary → exits CellBSP → outdoor classification.

Next steps (not blocking A4 ship)

Two paths to investigate the ping-pong, both out of A4's scope:

  1. CellBSP-volume retention. Match retail's behaviour: once a player enters an indoor cell, don't flip back to outdoor until they cross the EXIT portal plane, not just because they exited the CellBSP volume on a push-back. Likely a ResolveCellId modification that prefers the previous indoor classification when sphere is "close enough" to the indoor CellBSP volume.

  2. CellBSP-volume expansion. Pad the indoor cell's CellBSP volume by the sphere radius (~0.48m) on all sides. The push-back stays within the padded volume. Risk: may incorrectly classify nearby outdoor positions as indoor.

The retail oracle for cell-id stickiness is at acclient_2013_pseudo_c.txt:308742-308783 (CObjCell::find_cell_list Position- variant) and the cell-array hysteresis logic around it. Not yet ported in detail.

Why ship A4 anyway

  • Correctness. A4 matches retail's check_other_cells exactly. 10 unit tests pin the halt semantics + integration test verifies the wire-up. Pure port, no design improvisation.
  • No regressions. 1139-passing + 8-pre-existing-failing baseline holds. All A1 / A1.5 / A1.6 / A1.7 / Bug B fixes remain green.
  • Foundation for A3. A3 (synthesis removal) is unblocked by A4 being in place — it can rely on multi-cell BSP coverage for floor synthesis once the ping-pong is fixed and players stay indoor long enough.
  • Reverting it would lose work. A4 is correct and tested. The dormant state is caused by an unrelated bug. Reverting would just make the unrelated bug harder to investigate (no multi-cell foundation to build on).

What this is NOT

This is NOT a fix for the user's "walls walk through" report. That bug is pre-existing, caused by cell-tracking instability at doorway thresholds.

This is NOT a regression introduced by A4. The bug reproduces fully with A4's wire-up reverted (verified by launch-revert2.log).

This is NOT the same as Bug A (synthesis removal). Bug A's symptom was free-fall on doorway exit; this is wall walk-through due to CellId classification flipping back to outdoor on each push-back.

Code anchors

Probe captures

  • launch-a4.log (782 lines) — A4 active, player stayed outdoor (didn't reach inn). Confirms A4's indoor branch never fired in that session.
  • launch-revert.log (1.2M lines) — A4 reverted, player parked at outdoor cell with 400K+ [check-bldg] probes all returning inside=False. Player never moved.
  • launch-revert2.log (18490 lines) — A4 reverted, player walked into inn multiple times. Captured the ping-pong cycle. Indoor BSP results breakdown: 40 OK + 7 Adjusted + 7 Collided + 7 Slid. 11 inside=True building-transit events.

How to start a fresh session

Open a new Claude Code session, then:

Pick up the cell-tracking ping-pong investigation that blocked Phase A4
from being exercised in practice.

1. Read docs/research/2026-05-20-phase-a4-shipped-cell-pingpong-finding.md
   FIRST. It documents A4 ship (correct, dormant) + the ping-pong bug it
   surfaced.

2. A4 is shipped (3 commits at e6369e2, 493c5e5, 691493e). Don't touch it.
   1139 + 8 baseline holds.

3. The real M2 blocker: at the Holtburg inn doorway, CellId ping-pongs
   between 0xA9B40022 (outdoor) and 0xA9B40164 (vestibule) every few ticks
   because indoor BSP push-back exits the indoor CellBSP volume → outdoor
   reclassification → walls bypassed on outdoor ticks.

4. Investigate cell-id hysteresis. Retail oracle:
   acclient_2013_pseudo_c.txt:308742-308783 (CObjCell::find_cell_list
   Position-variant). Look for the cell-array stickiness logic that retail
   uses to prevent ping-pong.

5. CLAUDE.md rules: no workarounds, retail-faithful, probe-first.

State M2 as the milestone, "cell-tracking ping-pong fix" as the phase.