diff --git a/docs/research/2026-06-11-tower-stairs-fundamental-handoff.md b/docs/research/2026-06-11-tower-stairs-fundamental-handoff.md new file mode 100644 index 00000000..372e567d --- /dev/null +++ b/docs/research/2026-06-11-tower-stairs-fundamental-handoff.md @@ -0,0 +1,214 @@ +# The AAB3 tower "broken stairs + water barrel" — fundamental handoff (2026-06-11 late) + +**Branch:** `claude/thirsty-goldberg-51bb9b`. **Nothing on main.** Suites green: +App 242 + 1 skip / Core 1422 + 2 skips / UI 420 / Net 294. +**The user logged out INSIDE the tower in the BROKEN state** — the next login +restores there (claim `0xAAB30107` validates cleanly now), so the broken state +is one login away for the next session. + +## 0. The one fact that reframes everything (READ THIS FIRST) + +`user-session-capture2.log` — the user standing IN the tower, broken stairs + +barrel visible on screen — final dispatcher diagnostics: + +``` +[WB-DIAG] entSeen=11808510 entDrawn=11808510 meshMissing=0 ... +``` + +**meshMissing=0. entSeen == entDrawn. Every referenced mesh is loaded and every +walked entity is drawn — while the user SEES broken stairs.** The staircase is +NOT missing from the pipeline. It is being **drawn wrong** (wrong transforms, +wrong batches, or a stale partial classification). Every "the mesh didn't +load" theory is now DEAD for the persistent symptom. (226 ids — including +stair part `0x01000E2A` — DID go transiently missing during the login churn +and were self-healed by the new point-of-use re-arm; the question is what got +built or cached wrong during that window and STAYS wrong.) + +## 1. Symptom + reproduction (user-verified, multiple sessions) + +- The AAB3 tower (building[1], model `0x01001117`, cells `0x0107..0x010A`; + user pinned it by logging out inside — `[snap] claim=0xAAB30107`). +- Its spiral staircase = ONE cell static: **Setup `0x020003F2`, 43 parts** + (5 platforms `0x01000E2A` + 38 steps `0x01000E2B/2C/2D/2F/31/32`), + placement frames spiral z 0.35→15.15 (dat-proven, + `Issue119TowerDumpTests`). Four `0x020005D8` statics (part `0x01001774`, + barrel-shaped — REAL water-barrel models per the user's screenshot) sit at + wall positions. +- **Broken state** (reproduces on teleport-heavy / run-back logins, ~3 of 4 + attempts): stairs render PARTIALLY or not at all (collision intact — the + invisible stairs are walkable to the top), and "a water barrel" shows near + the floor. **Clean state** (reproduced twice, screenshots in worktree: + `tower-rearm-verify.png`, `tower-selfheal-verify.png`): full spiral + staircase, no complaints. Same build can produce both — the divergence is + SESSION-SHAPED (what happened during login/streaming), and once broken it + stays broken for the session. +- User axiom: **the barrel is NOT in the tower in retail.** (It IS in the + dat's `0x0107.StaticObjects`. Unresolved tension — but note H-A below + predicts the "barrel" may not be the dat barrel at all.) + +## 2. What was FIXED today (all verified, all committed — do not re-litigate) + +| Fix | Commit | Verified by | +|---|---|---| +| #118 house-exit vanish (seal vs dynamics order) | `5a80a2e` | user gate "Yes solved" | +| #120 flood ping-pong (CellView containment) | `dede7e4` | 0 `[pv-ERROR]` since (was 24/session); #122 cured | +| #121 portals invisible (dynamics-owner particle pass) | `c446473` | user gate "Yes" | +| #125 WB_DIAG GL-error cascade (query ring begun-flags) | `fcade06` | 0 `[wb-error]` under diag | +| Render lift leaking into the visibility graph | `f35cb8b` | captured-frame replay both arms (`CapturedTopOfStairs_*`) | +| #126 restore re-derived Z (now commits server Z — retail SetPositionInternal 0x00515bd0 shape) | `120aeff` | clean `VALIDATED` snaps since | +| #128 first-ever-only Prepare gate | re-arm commit | `[mesh-miss]` self-heal observed live (226 ids re-requested) | +| Point-of-use re-request (mesh absence now impossible) | last commit | final `meshMissing=0` in the broken session — which is exactly what KILLED the absence theory | + +Each fix was real; none was THE tower bug. The user's "running in circles" +critique stands: the persistent symptom survives all of them. + +## 3. The live hypothesis space (ranked — design the probe, don't guess) + +**H-A — hydration-time MeshRef corruption (top suspect).** The staircase +entity's 43 MeshRefs are built ONCE at landblock hydration +(`GameWindow.BuildInteriorEntitiesForStreaming` → `SetupMesh.Flatten(setup)`, +GameWindow ~5611-5627). `SetupMesh.Flatten` falls back to **identity +transforms** when the placement-frame lookup comes up short +(`SetupMesh.cs:57-61`: `i < defaultAnim.Frames.Count` else identity), and +returns per-part frames from `setup.PlacementFrames` (Resting → Default → +first). If, during the login burst, the Setup object or its frames read +DEGRADED (a dat-race / partially-hydrated object — see +`feedback_phase_a1_hotfix_saga`: DatCollection thread-safety + "objects can +cache half-parsed"), Flatten yields identity (or partial) transforms → **all +43 parts draw stacked at the entity origin = a barrel-shaped pile** ("the +water barrel"!!) with a few parts elsewhere ("broken stairs"). MeshRefs are +never rebuilt → broken all session. PREDICTS: the "barrel" the user sees may +be the collapsed staircase, not the dat barrel; meshMissing=0; entity drawn. +**Probe:** dump the live entity's MeshRefs (count + per-part transform +translations) in the broken state — if translations are ~zero/identity, H-A +is confirmed and the fix is hydration-side (retry/validate Flatten inputs, or +rebuild MeshRefs when degraded). + +**H-B — Tier-1 classification cache served a partial/stale batch set.** The +entity classified during the transient-miss window; some path caches an +incomplete batch set that the cache hit then serves forever (static entity → +fast path → never re-classified). The known #53 vetoes (null renderData per +MeshRef; null Setup part — both patched) read correct, but a batch-level +partial (renderData present, batches not yet complete during atlas staging) +may not be vetoed. PREDICTS: drawn entity, wrong/missing batches; fixable by +invalidating the cache entry when any of the entity's ids finishes loading +AFTER the classification. **Probe:** `EntityClassificationCache` dump for the +staircase entity id in the broken state (batch count vs the clean session's). + +**H-C — draw-side transform composition** (`ComposePartWorldMatrix` / +`meshRef.PartTransform` path) — least likely (the same code draws the clean +sessions), but the per-part dump from H-A's probe exonerates or implicates it +for free. + +## 4. The decisive next step (ONE probe, one launch) + +Add a one-shot diagnostic (env-gated, e.g. `ACDREAM_DUMP_ENTITY=0x020003F2`): +at first draw of any entity whose `SourceGfxObjOrSetupId` matches, print: +- `MeshRefs.Count` (expect 43), +- per MeshRef: `GfxObjId` + `PartTransform.Translation` (expect the dat + spiral: platforms at (0,3,1.55)…(3,3,11.95), steps ascending — compare + `Issue119TowerDumpTests.DumpTowerStairSetups` output), +- whether the Tier-1 cache has an entry + its batch count. + +Launch (the user's save restores INSIDE the tower; the broken state is +probable on first login), read the dump: +- identity/collapsed translations → **H-A**: fix at hydration (validate + Flatten's inputs; rebuild MeshRefs on degraded reads; likely also explains + "barrel" as the collapsed pile). +- correct translations + small/odd batch count → **H-B**: cache + invalidation on late load completion. +- correct everything → H-C: instrument ComposePartWorldMatrix for this id. + +## 5. Also pinned today, port pending (the SECOND remaining tower artifact) + +The climb strobes + top-of-tower roof/floor flap while TURNING (user: "the +roof and floor up top still flaps when turning") = the knife-edge in-plane +portal clip family, mechanically pinned by capture + replay: +- The eye riding/crossing HORIZONTAL portal planes (spiral climbs, the deck) + → side test allows (in-plane window) but OUR clip collapses the portal to + EMPTY → the cell behind drops ([viewer-diff] `removed=[0xAAB30107,0x010A]` + at the top; mid-climb `removed=[0x0108/0x0109]`). +- Retail has no hole: `ACRender::polyClipFinish` (0x006b6d00, pc:702749) — + read today: a homogeneous Sutherland-Hodgman whose FIRST pass clips the + polygon at **W=0 (the eye plane)** with full intersection emission + (pc:702889-702978: scans vertex W, runs the W-clip pass, REQUIRES ≥3 + output verts, THEN clips against the portal-view edges in homogeneous + space, `< 3 → return` per edge). **cdstW = 0.000199999995, PINNED at + 0x007247d5** (where it's consumed still to be mapped — grep reads of the + global 0x008fb788; `landPolysDraw` at 0x006b7040 uses 0.0002 inline for + plane side tests). +- THE PORT: match `PortalProjection.ProjectToClip`'s near-eye behavior + (currently `EyePlaneW=1e-4` + empty-collapse) to polyClipFinish's W=0-clip + semantics; then DELETE the `EyeInsidePortalOpening` rescue (the documented + cdstW-gap compensation, T2 ledger) and re-run the full harness suite + (CornerFloodReplay + Issue120 + TowerAscent + the captured-frame pins). + +## 6. Apparatus inventory (new this session — use, don't rebuild) + +| Tool | Where | Purpose | +|---|---|---| +| `[viewer]` probe | `ACDREAM_PROBE_VIEWER=1` | print-on-change root/flood/outPolys/pCell + eye@mm + fwd | +| `[viewer-diff]` | same flag | names cells entering/leaving the flood per change | +| `[mesh-miss]` | `ACDREAM_WB_DIAG=1` | once-per-id missing-mesh naming + point-of-use re-request | +| `HouseExitWalkReplayTests` | App.Tests | #118 pins (cone + seal-depth + straddle) | +| `Issue120ReciprocalPingPongTests` | App.Tests | #120 pins + `LoadAllInteriorCells` helper | +| `TowerAscentReplayTests` | App.Tests | captured-frame replay + lift canary + gate-by-gate diagnostic | +| `Issue127FloodFlipReplayTests` | App.Tests | outdoor flood replay (stable — flood math exonerated for the 4 cm pair) | +| `Issue119TowerDumpTests` / `Issue119UpNullGfxObjDumpTests` | Core.Tests | tower dat truth / no-draw GfxObj class | +| Session logs (worktree root) | `user-session-capture2.log` (THE broken-state evidence), `tower-rearm-gate.log`, `flap-diff-capture.log`, screenshots | capture record | + +## 7. Open issues ledger (post-session state) + +- **#119/#128 tower stairs**: OPEN — the drawn-but-wrong layer (§3-4). THE + priority. +- **knife-edge clip port** (§5): OPEN — second priority; kills climb strobes, + top flap, and retires the rescue + #120's window class. +- **#124** far-building back walls (interior-root look-in floods missing — + lead documented in ISSUES), **#127** distant-building churn (narrowed), + **#108-residual** cellar grass band, **#112** hill-cottage transparency, + **#113** phantom stairs: all OPEN with leads in ISSUES.md. +- The user axiom stands: **barrel not in retail** — re-evaluate after H-A + resolves (the "barrel" may be the collapsed staircase). + +## 8. Paste-ready pickup prompt + +``` +Pick up acdream as a SENIOR 3D ENGINE DEVELOPER on the AAB3 tower +"broken stairs + water barrel" bug. Worktree branch +claude/thirsty-goldberg-51bb9b. Nothing goes to main. + +READ FIRST (in order): +1. docs/research/2026-06-11-tower-stairs-fundamental-handoff.md — THE + handoff. Its §0 fact reframes the bug: in the user's broken-state + session (user-session-capture2.log) the dispatcher reports + meshMissing=0 / entSeen==entDrawn WHILE broken stairs are on screen — + the staircase is DRAWN WRONG, not missing. All mesh-absence theories + are dead (8 real fixes shipped today; none was this). +2. Memory digests: project_render_pipeline_digest + + project_physics_collision_digest (DO-NOT-RETRY tables apply). + +DO NEXT — the decisive probe (handoff §4): add ACDREAM_DUMP_ENTITY-style +one-shot diag printing the staircase entity's (SourceGfxObjOrSetupId +0x020003F2) MeshRefs count + per-part transform translations + Tier-1 +cache state at first draw. The user's save restores INSIDE the tower; +the broken state reproduces on teleport-heavy logins (~3 of 4). One +launch + the dump decides H-A (hydration-time MeshRef corruption via +SetupMesh.Flatten identity fallback — top suspect; predicts the "barrel" +IS the collapsed staircase) vs H-B (Tier-1 cache partial batch set) vs +H-C (draw-side compose). Fix the confirmed branch ROOT-CAUSE-FIRST (no +band-aids; the user has explicitly demanded the fundamental fix). + +THEN — the knife-edge clip port (handoff §5): match +PortalProjection.ProjectToClip's near-eye clip to retail +ACRender::polyClipFinish (0x006b6d00, pc:702749; cdstW=0.0002 pinned at +0x007247d5): W=0 eye-plane clip with intersection emission, never +empty-collapse for in-plane portals; then DELETE the +EyeInsidePortalOpening rescue and re-run the full harness suite. This +kills the climb strobes + the top-of-tower roof/floor flap while +turning (the user's other standing report). + +The user's reports are AXIOMS. Visual gates are the acceptance tests. +Build + test green per commit (App 242+1skip / Core 1422+2skip / UI 420 +/ Net 294). When launching for the user: launch, hand over, do NOT +foreground/screenshot the window while they play; read logs when told. +```