docs: tower-stairs fundamental handoff - the broken-state log kills all mesh-absence theories

The users final broken-state session (user-session-capture2.log,
standing in front of broken stairs) reports meshMissing=0 and
entSeen==entDrawn: the staircase is DRAWN WRONG, not missing. The
handoff records the 8 verified fixes shipped today (none was this bug),
the ranked hypothesis space (H-A hydration-time MeshRef corruption via
SetupMesh.Flatten identity fallback - predicts the barrel IS the
collapsed staircase; H-B Tier-1 partial-batch cache; H-C draw compose),
the decisive one-launch probe design, the polyClipFinish/cdstW port
spec for the climb strobes + top flap (read done, constant pinned), the
apparatus inventory, and the paste-ready pickup prompt.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Erik 2026-06-11 20:46:35 +02:00
parent 7bbb169c6c
commit d82f070b88

View file

@ -0,0 +1,214 @@
# The AAB3 tower "broken stairs + water barrel" — fundamental handoff (2026-06-11 late)
**Branch:** `claude/thirsty-goldberg-51bb9b`. **Nothing on main.** Suites green:
App 242 + 1 skip / Core 1422 + 2 skips / UI 420 / Net 294.
**The user logged out INSIDE the tower in the BROKEN state** — the next login
restores there (claim `0xAAB30107` validates cleanly now), so the broken state
is one login away for the next session.
## 0. The one fact that reframes everything (READ THIS FIRST)
`user-session-capture2.log` — the user standing IN the tower, broken stairs +
barrel visible on screen — final dispatcher diagnostics:
```
[WB-DIAG] entSeen=11808510 entDrawn=11808510 meshMissing=0 ...
```
**meshMissing=0. entSeen == entDrawn. Every referenced mesh is loaded and every
walked entity is drawn — while the user SEES broken stairs.** The staircase is
NOT missing from the pipeline. It is being **drawn wrong** (wrong transforms,
wrong batches, or a stale partial classification). Every "the mesh didn't
load" theory is now DEAD for the persistent symptom. (226 ids — including
stair part `0x01000E2A` — DID go transiently missing during the login churn
and were self-healed by the new point-of-use re-arm; the question is what got
built or cached wrong during that window and STAYS wrong.)
## 1. Symptom + reproduction (user-verified, multiple sessions)
- The AAB3 tower (building[1], model `0x01001117`, cells `0x0107..0x010A`;
user pinned it by logging out inside — `[snap] claim=0xAAB30107`).
- Its spiral staircase = ONE cell static: **Setup `0x020003F2`, 43 parts**
(5 platforms `0x01000E2A` + 38 steps `0x01000E2B/2C/2D/2F/31/32`),
placement frames spiral z 0.35→15.15 (dat-proven,
`Issue119TowerDumpTests`). Four `0x020005D8` statics (part `0x01001774`,
barrel-shaped — REAL water-barrel models per the user's screenshot) sit at
wall positions.
- **Broken state** (reproduces on teleport-heavy / run-back logins, ~3 of 4
attempts): stairs render PARTIALLY or not at all (collision intact — the
invisible stairs are walkable to the top), and "a water barrel" shows near
the floor. **Clean state** (reproduced twice, screenshots in worktree:
`tower-rearm-verify.png`, `tower-selfheal-verify.png`): full spiral
staircase, no complaints. Same build can produce both — the divergence is
SESSION-SHAPED (what happened during login/streaming), and once broken it
stays broken for the session.
- User axiom: **the barrel is NOT in the tower in retail.** (It IS in the
dat's `0x0107.StaticObjects`. Unresolved tension — but note H-A below
predicts the "barrel" may not be the dat barrel at all.)
## 2. What was FIXED today (all verified, all committed — do not re-litigate)
| Fix | Commit | Verified by |
|---|---|---|
| #118 house-exit vanish (seal vs dynamics order) | `5a80a2e` | user gate "Yes solved" |
| #120 flood ping-pong (CellView containment) | `dede7e4` | 0 `[pv-ERROR]` since (was 24/session); #122 cured |
| #121 portals invisible (dynamics-owner particle pass) | `c446473` | user gate "Yes" |
| #125 WB_DIAG GL-error cascade (query ring begun-flags) | `fcade06` | 0 `[wb-error]` under diag |
| Render lift leaking into the visibility graph | `f35cb8b` | captured-frame replay both arms (`CapturedTopOfStairs_*`) |
| #126 restore re-derived Z (now commits server Z — retail SetPositionInternal 0x00515bd0 shape) | `120aeff` | clean `VALIDATED` snaps since |
| #128 first-ever-only Prepare gate | re-arm commit | `[mesh-miss]` self-heal observed live (226 ids re-requested) |
| Point-of-use re-request (mesh absence now impossible) | last commit | final `meshMissing=0` in the broken session — which is exactly what KILLED the absence theory |
Each fix was real; none was THE tower bug. The user's "running in circles"
critique stands: the persistent symptom survives all of them.
## 3. The live hypothesis space (ranked — design the probe, don't guess)
**H-A — hydration-time MeshRef corruption (top suspect).** The staircase
entity's 43 MeshRefs are built ONCE at landblock hydration
(`GameWindow.BuildInteriorEntitiesForStreaming``SetupMesh.Flatten(setup)`,
GameWindow ~5611-5627). `SetupMesh.Flatten` falls back to **identity
transforms** when the placement-frame lookup comes up short
(`SetupMesh.cs:57-61`: `i < defaultAnim.Frames.Count` else identity), and
returns per-part frames from `setup.PlacementFrames` (Resting → Default →
first). If, during the login burst, the Setup object or its frames read
DEGRADED (a dat-race / partially-hydrated object — see
`feedback_phase_a1_hotfix_saga`: DatCollection thread-safety + "objects can
cache half-parsed"), Flatten yields identity (or partial) transforms → **all
43 parts draw stacked at the entity origin = a barrel-shaped pile** ("the
water barrel"!!) with a few parts elsewhere ("broken stairs"). MeshRefs are
never rebuilt → broken all session. PREDICTS: the "barrel" the user sees may
be the collapsed staircase, not the dat barrel; meshMissing=0; entity drawn.
**Probe:** dump the live entity's MeshRefs (count + per-part transform
translations) in the broken state — if translations are ~zero/identity, H-A
is confirmed and the fix is hydration-side (retry/validate Flatten inputs, or
rebuild MeshRefs when degraded).
**H-B — Tier-1 classification cache served a partial/stale batch set.** The
entity classified during the transient-miss window; some path caches an
incomplete batch set that the cache hit then serves forever (static entity →
fast path → never re-classified). The known #53 vetoes (null renderData per
MeshRef; null Setup part — both patched) read correct, but a batch-level
partial (renderData present, batches not yet complete during atlas staging)
may not be vetoed. PREDICTS: drawn entity, wrong/missing batches; fixable by
invalidating the cache entry when any of the entity's ids finishes loading
AFTER the classification. **Probe:** `EntityClassificationCache` dump for the
staircase entity id in the broken state (batch count vs the clean session's).
**H-C — draw-side transform composition** (`ComposePartWorldMatrix` /
`meshRef.PartTransform` path) — least likely (the same code draws the clean
sessions), but the per-part dump from H-A's probe exonerates or implicates it
for free.
## 4. The decisive next step (ONE probe, one launch)
Add a one-shot diagnostic (env-gated, e.g. `ACDREAM_DUMP_ENTITY=0x020003F2`):
at first draw of any entity whose `SourceGfxObjOrSetupId` matches, print:
- `MeshRefs.Count` (expect 43),
- per MeshRef: `GfxObjId` + `PartTransform.Translation` (expect the dat
spiral: platforms at (0,3,1.55)…(3,3,11.95), steps ascending — compare
`Issue119TowerDumpTests.DumpTowerStairSetups` output),
- whether the Tier-1 cache has an entry + its batch count.
Launch (the user's save restores INSIDE the tower; the broken state is
probable on first login), read the dump:
- identity/collapsed translations → **H-A**: fix at hydration (validate
Flatten's inputs; rebuild MeshRefs on degraded reads; likely also explains
"barrel" as the collapsed pile).
- correct translations + small/odd batch count → **H-B**: cache
invalidation on late load completion.
- correct everything → H-C: instrument ComposePartWorldMatrix for this id.
## 5. Also pinned today, port pending (the SECOND remaining tower artifact)
The climb strobes + top-of-tower roof/floor flap while TURNING (user: "the
roof and floor up top still flaps when turning") = the knife-edge in-plane
portal clip family, mechanically pinned by capture + replay:
- The eye riding/crossing HORIZONTAL portal planes (spiral climbs, the deck)
→ side test allows (in-plane window) but OUR clip collapses the portal to
EMPTY → the cell behind drops ([viewer-diff] `removed=[0xAAB30107,0x010A]`
at the top; mid-climb `removed=[0x0108/0x0109]`).
- Retail has no hole: `ACRender::polyClipFinish` (0x006b6d00, pc:702749) —
read today: a homogeneous Sutherland-Hodgman whose FIRST pass clips the
polygon at **W=0 (the eye plane)** with full intersection emission
(pc:702889-702978: scans vertex W, runs the W-clip pass, REQUIRES ≥3
output verts, THEN clips against the portal-view edges in homogeneous
space, `< 3 → return` per edge). **cdstW = 0.000199999995, PINNED at
0x007247d5** (where it's consumed still to be mapped — grep reads of the
global 0x008fb788; `landPolysDraw` at 0x006b7040 uses 0.0002 inline for
plane side tests).
- THE PORT: match `PortalProjection.ProjectToClip`'s near-eye behavior
(currently `EyePlaneW=1e-4` + empty-collapse) to polyClipFinish's W=0-clip
semantics; then DELETE the `EyeInsidePortalOpening` rescue (the documented
cdstW-gap compensation, T2 ledger) and re-run the full harness suite
(CornerFloodReplay + Issue120 + TowerAscent + the captured-frame pins).
## 6. Apparatus inventory (new this session — use, don't rebuild)
| Tool | Where | Purpose |
|---|---|---|
| `[viewer]` probe | `ACDREAM_PROBE_VIEWER=1` | print-on-change root/flood/outPolys/pCell + eye@mm + fwd |
| `[viewer-diff]` | same flag | names cells entering/leaving the flood per change |
| `[mesh-miss]` | `ACDREAM_WB_DIAG=1` | once-per-id missing-mesh naming + point-of-use re-request |
| `HouseExitWalkReplayTests` | App.Tests | #118 pins (cone + seal-depth + straddle) |
| `Issue120ReciprocalPingPongTests` | App.Tests | #120 pins + `LoadAllInteriorCells` helper |
| `TowerAscentReplayTests` | App.Tests | captured-frame replay + lift canary + gate-by-gate diagnostic |
| `Issue127FloodFlipReplayTests` | App.Tests | outdoor flood replay (stable — flood math exonerated for the 4 cm pair) |
| `Issue119TowerDumpTests` / `Issue119UpNullGfxObjDumpTests` | Core.Tests | tower dat truth / no-draw GfxObj class |
| Session logs (worktree root) | `user-session-capture2.log` (THE broken-state evidence), `tower-rearm-gate.log`, `flap-diff-capture.log`, screenshots | capture record |
## 7. Open issues ledger (post-session state)
- **#119/#128 tower stairs**: OPEN — the drawn-but-wrong layer (§3-4). THE
priority.
- **knife-edge clip port** (§5): OPEN — second priority; kills climb strobes,
top flap, and retires the rescue + #120's window class.
- **#124** far-building back walls (interior-root look-in floods missing —
lead documented in ISSUES), **#127** distant-building churn (narrowed),
**#108-residual** cellar grass band, **#112** hill-cottage transparency,
**#113** phantom stairs: all OPEN with leads in ISSUES.md.
- The user axiom stands: **barrel not in retail** — re-evaluate after H-A
resolves (the "barrel" may be the collapsed staircase).
## 8. Paste-ready pickup prompt
```
Pick up acdream as a SENIOR 3D ENGINE DEVELOPER on the AAB3 tower
"broken stairs + water barrel" bug. Worktree branch
claude/thirsty-goldberg-51bb9b. Nothing goes to main.
READ FIRST (in order):
1. docs/research/2026-06-11-tower-stairs-fundamental-handoff.md — THE
handoff. Its §0 fact reframes the bug: in the user's broken-state
session (user-session-capture2.log) the dispatcher reports
meshMissing=0 / entSeen==entDrawn WHILE broken stairs are on screen —
the staircase is DRAWN WRONG, not missing. All mesh-absence theories
are dead (8 real fixes shipped today; none was this).
2. Memory digests: project_render_pipeline_digest +
project_physics_collision_digest (DO-NOT-RETRY tables apply).
DO NEXT — the decisive probe (handoff §4): add ACDREAM_DUMP_ENTITY-style
one-shot diag printing the staircase entity's (SourceGfxObjOrSetupId
0x020003F2) MeshRefs count + per-part transform translations + Tier-1
cache state at first draw. The user's save restores INSIDE the tower;
the broken state reproduces on teleport-heavy logins (~3 of 4). One
launch + the dump decides H-A (hydration-time MeshRef corruption via
SetupMesh.Flatten identity fallback — top suspect; predicts the "barrel"
IS the collapsed staircase) vs H-B (Tier-1 cache partial batch set) vs
H-C (draw-side compose). Fix the confirmed branch ROOT-CAUSE-FIRST (no
band-aids; the user has explicitly demanded the fundamental fix).
THEN — the knife-edge clip port (handoff §5): match
PortalProjection.ProjectToClip's near-eye clip to retail
ACRender::polyClipFinish (0x006b6d00, pc:702749; cdstW=0.0002 pinned at
0x007247d5): W=0 eye-plane clip with intersection emission, never
empty-collapse for in-plane portals; then DELETE the
EyeInsidePortalOpening rescue and re-run the full harness suite. This
kills the climb strobes + the top-of-tower roof/floor flap while
turning (the user's other standing report).
The user's reports are AXIOMS. Visual gates are the acceptance tests.
Build + test green per commit (App 242+1skip / Core 1422+2skip / UI 420
/ Net 294). When launching for the user: launch, hand over, do NOT
foreground/screenshot the window while they play; read logs when told.
```