acdream/docs/research/2026-06-11-tower-stairs-fundamental-handoff.md
Erik d82f070b88 docs: tower-stairs fundamental handoff - the broken-state log kills all mesh-absence theories
The users final broken-state session (user-session-capture2.log,
standing in front of broken stairs) reports meshMissing=0 and
entSeen==entDrawn: the staircase is DRAWN WRONG, not missing. The
handoff records the 8 verified fixes shipped today (none was this bug),
the ranked hypothesis space (H-A hydration-time MeshRef corruption via
SetupMesh.Flatten identity fallback - predicts the barrel IS the
collapsed staircase; H-B Tier-1 partial-batch cache; H-C draw compose),
the decisive one-launch probe design, the polyClipFinish/cdstW port
spec for the climb strobes + top flap (read done, constant pinned), the
apparatus inventory, and the paste-ready pickup prompt.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:46:35 +02:00

13 KiB

The AAB3 tower "broken stairs + water barrel" — fundamental handoff (2026-06-11 late)

Branch: claude/thirsty-goldberg-51bb9b. Nothing on main. Suites green: App 242 + 1 skip / Core 1422 + 2 skips / UI 420 / Net 294. The user logged out INSIDE the tower in the BROKEN state — the next login restores there (claim 0xAAB30107 validates cleanly now), so the broken state is one login away for the next session.

0. The one fact that reframes everything (READ THIS FIRST)

user-session-capture2.log — the user standing IN the tower, broken stairs + barrel visible on screen — final dispatcher diagnostics:

[WB-DIAG] entSeen=11808510 entDrawn=11808510 meshMissing=0 ...

meshMissing=0. entSeen == entDrawn. Every referenced mesh is loaded and every walked entity is drawn — while the user SEES broken stairs. The staircase is NOT missing from the pipeline. It is being drawn wrong (wrong transforms, wrong batches, or a stale partial classification). Every "the mesh didn't load" theory is now DEAD for the persistent symptom. (226 ids — including stair part 0x01000E2A — DID go transiently missing during the login churn and were self-healed by the new point-of-use re-arm; the question is what got built or cached wrong during that window and STAYS wrong.)

1. Symptom + reproduction (user-verified, multiple sessions)

  • The AAB3 tower (building[1], model 0x01001117, cells 0x0107..0x010A; user pinned it by logging out inside — [snap] claim=0xAAB30107).
  • Its spiral staircase = ONE cell static: Setup 0x020003F2, 43 parts (5 platforms 0x01000E2A + 38 steps 0x01000E2B/2C/2D/2F/31/32), placement frames spiral z 0.35→15.15 (dat-proven, Issue119TowerDumpTests). Four 0x020005D8 statics (part 0x01001774, barrel-shaped — REAL water-barrel models per the user's screenshot) sit at wall positions.
  • Broken state (reproduces on teleport-heavy / run-back logins, ~3 of 4 attempts): stairs render PARTIALLY or not at all (collision intact — the invisible stairs are walkable to the top), and "a water barrel" shows near the floor. Clean state (reproduced twice, screenshots in worktree: tower-rearm-verify.png, tower-selfheal-verify.png): full spiral staircase, no complaints. Same build can produce both — the divergence is SESSION-SHAPED (what happened during login/streaming), and once broken it stays broken for the session.
  • User axiom: the barrel is NOT in the tower in retail. (It IS in the dat's 0x0107.StaticObjects. Unresolved tension — but note H-A below predicts the "barrel" may not be the dat barrel at all.)

2. What was FIXED today (all verified, all committed — do not re-litigate)

Fix Commit Verified by
#118 house-exit vanish (seal vs dynamics order) 5a80a2e user gate "Yes solved"
#120 flood ping-pong (CellView containment) dede7e4 0 [pv-ERROR] since (was 24/session); #122 cured
#121 portals invisible (dynamics-owner particle pass) c446473 user gate "Yes"
#125 WB_DIAG GL-error cascade (query ring begun-flags) fcade06 0 [wb-error] under diag
Render lift leaking into the visibility graph f35cb8b captured-frame replay both arms (CapturedTopOfStairs_*)
#126 restore re-derived Z (now commits server Z — retail SetPositionInternal 0x00515bd0 shape) 120aeff clean VALIDATED snaps since
#128 first-ever-only Prepare gate re-arm commit [mesh-miss] self-heal observed live (226 ids re-requested)
Point-of-use re-request (mesh absence now impossible) last commit final meshMissing=0 in the broken session — which is exactly what KILLED the absence theory

Each fix was real; none was THE tower bug. The user's "running in circles" critique stands: the persistent symptom survives all of them.

3. The live hypothesis space (ranked — design the probe, don't guess)

H-A — hydration-time MeshRef corruption (top suspect). The staircase entity's 43 MeshRefs are built ONCE at landblock hydration (GameWindow.BuildInteriorEntitiesForStreamingSetupMesh.Flatten(setup), GameWindow ~5611-5627). SetupMesh.Flatten falls back to identity transforms when the placement-frame lookup comes up short (SetupMesh.cs:57-61: i < defaultAnim.Frames.Count else identity), and returns per-part frames from setup.PlacementFrames (Resting → Default → first). If, during the login burst, the Setup object or its frames read DEGRADED (a dat-race / partially-hydrated object — see feedback_phase_a1_hotfix_saga: DatCollection thread-safety + "objects can cache half-parsed"), Flatten yields identity (or partial) transforms → all 43 parts draw stacked at the entity origin = a barrel-shaped pile ("the water barrel"!!) with a few parts elsewhere ("broken stairs"). MeshRefs are never rebuilt → broken all session. PREDICTS: the "barrel" the user sees may be the collapsed staircase, not the dat barrel; meshMissing=0; entity drawn. Probe: dump the live entity's MeshRefs (count + per-part transform translations) in the broken state — if translations are ~zero/identity, H-A is confirmed and the fix is hydration-side (retry/validate Flatten inputs, or rebuild MeshRefs when degraded).

H-B — Tier-1 classification cache served a partial/stale batch set. The entity classified during the transient-miss window; some path caches an incomplete batch set that the cache hit then serves forever (static entity → fast path → never re-classified). The known #53 vetoes (null renderData per MeshRef; null Setup part — both patched) read correct, but a batch-level partial (renderData present, batches not yet complete during atlas staging) may not be vetoed. PREDICTS: drawn entity, wrong/missing batches; fixable by invalidating the cache entry when any of the entity's ids finishes loading AFTER the classification. Probe: EntityClassificationCache dump for the staircase entity id in the broken state (batch count vs the clean session's).

H-C — draw-side transform composition (ComposePartWorldMatrix / meshRef.PartTransform path) — least likely (the same code draws the clean sessions), but the per-part dump from H-A's probe exonerates or implicates it for free.

4. The decisive next step (ONE probe, one launch)

Add a one-shot diagnostic (env-gated, e.g. ACDREAM_DUMP_ENTITY=0x020003F2): at first draw of any entity whose SourceGfxObjOrSetupId matches, print:

  • MeshRefs.Count (expect 43),
  • per MeshRef: GfxObjId + PartTransform.Translation (expect the dat spiral: platforms at (0,3,1.55)…(3,3,11.95), steps ascending — compare Issue119TowerDumpTests.DumpTowerStairSetups output),
  • whether the Tier-1 cache has an entry + its batch count.

Launch (the user's save restores INSIDE the tower; the broken state is probable on first login), read the dump:

  • identity/collapsed translations → H-A: fix at hydration (validate Flatten's inputs; rebuild MeshRefs on degraded reads; likely also explains "barrel" as the collapsed pile).
  • correct translations + small/odd batch count → H-B: cache invalidation on late load completion.
  • correct everything → H-C: instrument ComposePartWorldMatrix for this id.

5. Also pinned today, port pending (the SECOND remaining tower artifact)

The climb strobes + top-of-tower roof/floor flap while TURNING (user: "the roof and floor up top still flaps when turning") = the knife-edge in-plane portal clip family, mechanically pinned by capture + replay:

  • The eye riding/crossing HORIZONTAL portal planes (spiral climbs, the deck) → side test allows (in-plane window) but OUR clip collapses the portal to EMPTY → the cell behind drops ([viewer-diff] removed=[0xAAB30107,0x010A] at the top; mid-climb removed=[0x0108/0x0109]).
  • Retail has no hole: ACRender::polyClipFinish (0x006b6d00, pc:702749) — read today: a homogeneous Sutherland-Hodgman whose FIRST pass clips the polygon at W=0 (the eye plane) with full intersection emission (pc:702889-702978: scans vertex W, runs the W-clip pass, REQUIRES ≥3 output verts, THEN clips against the portal-view edges in homogeneous space, < 3 → return per edge). cdstW = 0.000199999995, PINNED at 0x007247d5 (where it's consumed still to be mapped — grep reads of the global 0x008fb788; landPolysDraw at 0x006b7040 uses 0.0002 inline for plane side tests).
  • THE PORT: match PortalProjection.ProjectToClip's near-eye behavior (currently EyePlaneW=1e-4 + empty-collapse) to polyClipFinish's W=0-clip semantics; then DELETE the EyeInsidePortalOpening rescue (the documented cdstW-gap compensation, T2 ledger) and re-run the full harness suite (CornerFloodReplay + Issue120 + TowerAscent + the captured-frame pins).

6. Apparatus inventory (new this session — use, don't rebuild)

Tool Where Purpose
[viewer] probe ACDREAM_PROBE_VIEWER=1 print-on-change root/flood/outPolys/pCell + eye@mm + fwd
[viewer-diff] same flag names cells entering/leaving the flood per change
[mesh-miss] ACDREAM_WB_DIAG=1 once-per-id missing-mesh naming + point-of-use re-request
HouseExitWalkReplayTests App.Tests #118 pins (cone + seal-depth + straddle)
Issue120ReciprocalPingPongTests App.Tests #120 pins + LoadAllInteriorCells helper
TowerAscentReplayTests App.Tests captured-frame replay + lift canary + gate-by-gate diagnostic
Issue127FloodFlipReplayTests App.Tests outdoor flood replay (stable — flood math exonerated for the 4 cm pair)
Issue119TowerDumpTests / Issue119UpNullGfxObjDumpTests Core.Tests tower dat truth / no-draw GfxObj class
Session logs (worktree root) user-session-capture2.log (THE broken-state evidence), tower-rearm-gate.log, flap-diff-capture.log, screenshots capture record

7. Open issues ledger (post-session state)

  • #119/#128 tower stairs: OPEN — the drawn-but-wrong layer (§3-4). THE priority.
  • knife-edge clip port (§5): OPEN — second priority; kills climb strobes, top flap, and retires the rescue + #120's window class.
  • #124 far-building back walls (interior-root look-in floods missing — lead documented in ISSUES), #127 distant-building churn (narrowed), #108-residual cellar grass band, #112 hill-cottage transparency, #113 phantom stairs: all OPEN with leads in ISSUES.md.
  • The user axiom stands: barrel not in retail — re-evaluate after H-A resolves (the "barrel" may be the collapsed staircase).

8. Paste-ready pickup prompt

Pick up acdream as a SENIOR 3D ENGINE DEVELOPER on the AAB3 tower
"broken stairs + water barrel" bug. Worktree branch
claude/thirsty-goldberg-51bb9b. Nothing goes to main.

READ FIRST (in order):
1. docs/research/2026-06-11-tower-stairs-fundamental-handoff.md — THE
   handoff. Its §0 fact reframes the bug: in the user's broken-state
   session (user-session-capture2.log) the dispatcher reports
   meshMissing=0 / entSeen==entDrawn WHILE broken stairs are on screen —
   the staircase is DRAWN WRONG, not missing. All mesh-absence theories
   are dead (8 real fixes shipped today; none was this).
2. Memory digests: project_render_pipeline_digest +
   project_physics_collision_digest (DO-NOT-RETRY tables apply).

DO NEXT — the decisive probe (handoff §4): add ACDREAM_DUMP_ENTITY-style
one-shot diag printing the staircase entity's (SourceGfxObjOrSetupId
0x020003F2) MeshRefs count + per-part transform translations + Tier-1
cache state at first draw. The user's save restores INSIDE the tower;
the broken state reproduces on teleport-heavy logins (~3 of 4). One
launch + the dump decides H-A (hydration-time MeshRef corruption via
SetupMesh.Flatten identity fallback — top suspect; predicts the "barrel"
IS the collapsed staircase) vs H-B (Tier-1 cache partial batch set) vs
H-C (draw-side compose). Fix the confirmed branch ROOT-CAUSE-FIRST (no
band-aids; the user has explicitly demanded the fundamental fix).

THEN — the knife-edge clip port (handoff §5): match
PortalProjection.ProjectToClip's near-eye clip to retail
ACRender::polyClipFinish (0x006b6d00, pc:702749; cdstW=0.0002 pinned at
0x007247d5): W=0 eye-plane clip with intersection emission, never
empty-collapse for in-plane portals; then DELETE the
EyeInsidePortalOpening rescue and re-run the full harness suite. This
kills the climb strobes + the top-of-tower roof/floor flap while
turning (the user's other standing report).

The user's reports are AXIOMS. Visual gates are the acceptance tests.
Build + test green per commit (App 242+1skip / Core 1422+2skip / UI 420
/ Net 294). When launching for the user: launch, hand over, do NOT
foreground/screenshot the window while they play; read logs when told.