From a1b12dff405d54347217a6120f6010b5586c1370 Mon Sep 17 00:00:00 2001 From: Erik Date: Tue, 9 Jun 2026 18:44:33 +0200 Subject: [PATCH] docs(render): R-A2b shipped + flap residual (sec 4) + texture red-herring handoff R-A2b (485e44d) killed the 0171<->0173 churn (maxPop 16->1, measured). Visible flap residual is sec 4 (edge-on openings render-side + corner camera-seal). Camera-damping tried+failed+reverted. The white-walls scare was a RED HERRING: heavy per-frame probes (ACDREAM_PROBE_FLAP) starve the thread-unsafe dat-reader so texture-decode loses the race -> white; a clean launch (no probes) fixes it. The dat-reader thread-safety bug is the real underlying issue (filed). Repo clean at HEAD. Co-Authored-By: Claude Opus 4.8 (1M context) --- ...residual-and-texture-redherring-handoff.md | 219 ++++++++++++++++++ ...2026-06-09-portal-flood-r-a2b-side-cull.md | 8 + 2 files changed, 227 insertions(+) create mode 100644 docs/research/2026-06-09-r-a2b-shipped-flap-residual-and-texture-redherring-handoff.md diff --git a/docs/research/2026-06-09-r-a2b-shipped-flap-residual-and-texture-redherring-handoff.md b/docs/research/2026-06-09-r-a2b-shipped-flap-residual-and-texture-redherring-handoff.md new file mode 100644 index 00000000..302293ce --- /dev/null +++ b/docs/research/2026-06-09-r-a2b-shipped-flap-residual-and-texture-redherring-handoff.md @@ -0,0 +1,219 @@ +# HANDOFF — R-A2b shipped (churn killed) · flap residual is §4 · the texture RED-HERRING + +**Date:** 2026-06-09. **Branch:** `claude/thirsty-goldberg-51bb9b`. **HEAD:** `485e44d`. +**Milestone:** M1.5 (indoor world feels right). **Phase:** full retail render port (Option A) → R-A2b done; §4 open. + +> Read this top-to-bottom before any code. This session shipped a real, MEASURED fix (R-A2b) but the +> VISIBLE flap is only partly resolved, and a **separate runtime bug (the dat-reader) + a self-inflicted +> diagnostic mistake (heavy probes)** caused a long, confusing detour ("missing textures"). The §5 lesson +> at the bottom is the most important durable takeaway. + +--- + +## 0. TL;DR + +1. **R-A2b SHIPPED + MEASURED (commit `485e44d`):** removed the `&& !eyeInsideOpening` bypass from the + portal-flood **side-cull** so back portals cull like retail's `PView::InitCell` side test. This kills + the `0171↔0173` re-enqueue **churn** — measured `maxPop` **16 → 1** (44 % of frames churning → 0 %, + across 1.3 M frames). The flood is now deterministic. 218 App tests green, including the void-fix and + #95 over-inclusion guards. +2. **But R-A2b did NOT fix the VISIBLE flap.** The remaining grey flicker at doorways/windows is **§4**, a + *different* mechanism: the openings project **edge-on** from the 3rd-person eye and the clip collapses + (geometric — retail's clip collapses too; retail avoids it by keeping the eye head-on/collided). R-A2b + killed the *churn* layer; the *edge-on* layer remains. +3. **The camera-damping attempt FAILED and was reverted** (laggy, no flap improvement). Do not re-try it. +4. **The "missing textures" scare was a RED HERRING, not a code bug.** The cottage walls rendered WHITE + because the **heavy debug probes** (`ACDREAM_PROBE_FLAP=1` + per-frame `Tee` to multi-hundred-MB logs) + **starve the thread-unsafe dat-reader**, which then fails to decode the wall textures. **A CLEAN launch + (no probes) renders correctly** — user-confirmed. The underlying dat-reader thread-safety bug (the + `AccessViolation` crashes) is real and FILED. +5. **Repo is clean at HEAD `485e44d`** (R-A2b in). Only uncommitted tracked change: the plan-doc PINNED + note (committed alongside this handoff). Throwaway `analyze_*.py` + `*.log` are untracked. + +--- + +## 1. What shipped — R-A2b (the churn fix), commit `485e44d` + +**The change (one functional edit + a probe):** in `PortalVisibilityBuilder.cs`, the side-cull was +`if (i < ClipPlanes.Count && !CameraOnInteriorSide(...) && !eyeInsideOpening) continue;`. R-A2b drops the +`&& !eyeInsideOpening`, in BOTH `Build` and `BuildFromExterior`. Retail's `PView::InitCell` side test +(decomp `:432962`) culls a back-facing portal by the side test ALONE — there is no eye-in-opening bypass. +The forward-portal clip-empty **void rescue** (`Build` ~`clippedRegion.Count == 0` branch, the 2026-06-05 +fix) is a SEPARATE code path and is **untouched** — `Build_EyeStandingInInteriorPortal_FloodsNeighbour` +stays green. + +**The pin (B1, not B2):** live capture `flap-sidechk.log` showed every back portal (`0173→0171`, +`0172→0173`) with `camInterior=False` (our side test already agrees with retail — it WANTS to cull) and +traversed **only when `eyeIn=True`** (eye within 1.75 m of the shared doorway). So the cycle was the +`eyeInsideOpening` **bypass**, not a `CameraOnInteriorSide` convention bug. Forward portals showed +`camInterior=True` (unaffected; void rescue preserved). + +**Measured result:** +- `launch-churn-confirm.log` (pre-fix walk): `maxPop` worst **16**, **44 %** of frames `≥2`. +- `flap-fix-verify.log` / `flap-structured.log` (post-fix): `maxPop` worst **1**, **0 %** `≥2` across + ~1.3 M frames. The back portal is now `skip=side` (culled) even at `eyeIn=True`. **Churn eliminated.** +- New RED→GREEN test `Build_BackFacingPortal_EyeStandingInOpening_StillCulled`; full App suite 218 green. + +**Docs:** spec `docs/superpowers/specs/2026-06-09-portal-flood-bounded-propagation-r-a2b-design.md` +(REVISION → Option B), plan `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (with the +PINNED B1 note). Commits: `3fd71a1` (spec) → `7b8a490` (plan) → `89a2032` (sidechk probe) → `485e44d` (fix). + +**R-A2b status / disposition:** committed at HEAD. It is a real, retail-faithful, measured improvement +(no more churn → deterministic flood, better perf). It is **innocent of the texture issue** (§5). It does +NOT, by itself, make the visible flap go away (that's §4). **Recommendation: keep it.** Next session +should do ONE clean (no-probe) launch to re-confirm it renders + the churn stays gone, then move to §4. +(Plan Phase 3 — removing the now-dead `MaxReprocessPerCell` cap — was NOT done; it's optional cleanup, +risky for the synthetic cyclic tests that have no ClipPlanes. Leave the cap as a harmless backstop, or do +it carefully with fixture ClipPlanes.) + +--- + +## 2. The VISIBLE flap residual = §4 (NOT fixed by R-A2b) + +Two DISTINCT sub-issues, both ending in "background where geometry should be." Do not conflate them. + +### 2a. Doorway / window grey flap (room↔room) — RENDER-side edge-on +- Measured (`flap-structured.log`, the user's room→room pass): root `0171` `vis` oscillates **2↔3↔4** + (1,725 transitions); `OutsideView` `outPolys` oscillates **0↔1** (1,544 transitions) → the + outdoor-through-window region flips **empty(grey)↔drawn**; **8,574** frames have an edge-on `clip=0`. +- The eye is **collision-correct** here: `viewerCell=0171` (the adjacent INTERIOR cell), ~1.7 m back, no + wall hit. So it is NOT penetration. The openings simply project **edge-on** from that legitimate + position → the clip collapses to <3 verts → grey. +- **§5.3 (decomp) established the clip collapse is GEOMETRIC — retail's `polyClipFinish` collapses at + edge-on too. There is no clip robustness to port.** Retail avoids the flicker because its eye is + **collided/head-on 93 %** at the doorway (`flap-cam-measure.log`), so it rarely views openings edge-on. + Ours **floats** (`pulledIn~0`/`collNormValid=False` **97 %**). +- So 2a is bounded by the camera, not the clip. But the camera-damping attempt to address it FAILED (§3). + +### 2b. Corner see-through — CAMERA-SEAL failure (eye penetrates wall) +- Live-captured (`flap-corner.log` tail): pressing the camera into a corner → eye escapes to **outdoor + cell `0xA9B40031`** at `X=165.22` (well outside the cottage), `pulledIn=0`, `collNormValid=False` (NO + collision stopped the 2.61 m boom) → render roots **outside** → the ENTIRE interior drops → bluish + background. Retail "under no circumstances sees through the wall." +- The viewer sweep **does** query the cottage exterior-shell GfxObj (`FindObjCollisions` passes + `isViewer`, `TransitionTypes.cs:2376`), so it's not a "shell not queried" gap. The boom slips through + somewhere (corner seam? the interior doorway is open so the boom legitimately enters the next room, but + in a corner it reaches the exterior). **Not fully root-caused.** This is the most CONCRETE, tractable §4 + bug — fixing the camera seal (keep the eye inside, like retail) would also reduce 2a (head-on eye). + +--- + +## 3. DO-NOT-RETRY (failed/refuted this session) + +- **Camera-damping spring-arm on the published eye** (a separate `_publishedEye` damped toward the + collision result, `RetailChaseCamera.cs`). TRIED, **FAILED**: laggy camera feel, zero flap improvement + (the forward-crossing flap has no collision to damp; the eye is already smooth). Fully reverted + (`git checkout`, verified). Do not re-try damping the published eye. +- **"It's the corner" / "runtime-state, relaunch fixes it" / "disk full"** — all WRONG explanations I gave + for the white walls. The truth is §5 (probes starve the dat-reader). Don't repeat these. +- Carried-over DO-NOTs (still valid): byte-stable eye; bounded-propagation/churn as the WHOLE story (it + was the churn LAYER, R-A2b fixed it); two-pipe split; PVS membership grounding retail lacks; the §3 + "port retail's edge-on clip robustness" (retail has none). + +--- + +## 4. WHAT TO WORK ON NEXT (recommendation) + +This flap has consumed many sessions for partial results. Honest options, in rough preference order: + +1. **Re-validate + keep R-A2b, then attack §2b (the corner camera-seal).** It's the most concrete, + measured-wrong behavior (eye demonstrably escapes to an outdoor cell). Root-cause WHY the viewer-sphere + sweep finds no collision when the boom reaches the exterior in a corner (the sweep queries the shell but + `collNormValid=False`). Fixing the seal (eye stays inside) is retail-faithful AND would dampen 2a (a + head-on eye doesn't view openings edge-on). Decomp oracle: `SmartBox::update_viewer` (`:92761`), the + `viewer_sphere` sweep + cell enclosure. +2. **Accept/defer the flap and move to milestone work.** The §2a edge-on flicker is genuinely hard + (retail relies on the camera; our 3rd-person floating eye is the divergence) and the visible payoff per + session has been low. Per the milestone discipline, M2 ("kill a drudge": F.2/F.3/F.5a/L.1c/L.1b) or + other M1.5 issues may be a better use of effort than continued flap grinding. **This is a legitimate + call — flag it to the user.** +3. **Fix the dat-reader thread-safety bug** (the real underlying cause of both the crashes and the + white-wall load failures). See §5. It's a correctness/stability win independent of the flap. Filed as a + background task this session. + +Do NOT silently keep grinding §2a with more camera tweaks — the camera-damping already failed. If pursuing +the flap, §2b (seal) is the tractable lever; if not, say so and move on. + +--- + +## 5. THE TEXTURE RED-HERRING — the durable lesson + +**Symptom:** mid-session, the cottage walls rendered WHITE (geometry present — windows/painting/floor/NPCs +drew — but the wall surfaces were the clear/background color). I called it "missing textures." + +**My failure:** I thrashed — blamed the corner, then "runtime state," then disk, then implied my code, +across ~5 messages, while the user (rightly) got furious. I should have checked **baseline reproduction** +immediately. + +**Actual root cause (proven):** +- The wall data wasn't loading. `[flap]` probe showed cell `0171` with **identity transform + zero + portals** = an unhydrated/empty cell (a dat-LOAD failure, not a flood/visibility bug — R-A2b only + *reads* cell data, it cannot make a cell load empty). +- **It reproduced on the EXACT baseline** (`git checkout 8f879bd`, `diff` empty, rebuilt) — so NOT my code. +- Disk fine (1074 GB free); dats intact (normal sizes); no corrupt cache. +- **The trigger: the heavy per-frame probes** (`ACDREAM_PROBE_FLAP=1` writing `[flap]`/`[pv-trace]`/ + `sidechk` every frame + `Tee-Object` to hundreds-of-MB logs) **load the render thread enough to skew the + timing of the background mesh/texture-decode thread**, which races on the **thread-unsafe dat-reader** + (`MemoryMappedBlockAllocator.ReadBlock` — the same code that throws `AccessViolation` and crashed several + launches). Under that timing the texture/cell load loses the race → empty/white. +- **FIX: launch CLEAN (no probes).** `launch-final.log`: no load errors, render path + all atlases loaded, + **user confirmed the walls render.** This is also how the user normally runs the client. + +**DURABLE RULES (add to memory):** +- **Never run normal play or a visual gate with `ACDREAM_PROBE_FLAP` (or other per-frame probes) on.** + They starve the thread-unsafe dat-reader → intermittent white-wall / empty-cell load failures (and raise + the `AccessViolation` crash rate). Use probes ONLY for short, targeted captures, then relaunch clean. +- **For any render/load symptom, check baseline reproduction FIRST** (`git stash`/`git checkout ` + + relaunch) before theorizing or blaming a change. One reproduction test beats five wrong explanations. + +**The real bug (separate task, FILED):** `DatReaderWriter ... MemoryMappedBlockAllocator.ReadBlock` +`AccessViolation` — the dat-reader is not thread-safe under the concurrent mesh-decode/streaming access. +Fix = serialize dat block reads behind a per-DatDatabase lock, or thread-local buffers, or a private +`MemoryMappedViewAccessor` per reader. Files: `src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs`, +`DatDatabaseWrapper.cs`, the `DatReaderWriter` block allocator. This is the highest-value stability fix +and is independent of the flap. + +--- + +## 6. Apparatus (reuse; STRIP after §4 ships) + +- **Probes** (gated in `AcDream.Core.Rendering.RenderingDiagnostics`): + - `ACDREAM_PROBE_FLAP=1` → `[flap]` (root cell per-portal D/TRV/CULL/proj/clip + outPolys + vis), + `[flap-sweep]` (camera sweep: `pulledIn`/`collNormValid`/`viewerCell`/`in=`/`out=`), + `[pv-trace]` (per-cell flood decisions, signature-dedup'd ≤160), and the **`sidechk`** line added this + session (per-portal `camInterior`/`eyeIn`/`D`). **HEAVY — see §5; do not leave on for play.** + - `ACDREAM_PROBE_PORTAL_CHURN=1` → `[portal-churn]` (per-Build `maxPop` + reciprocal pre→post). +- **Throwaway analyzers (untracked, in the worktree root):** `analyze_flap_vis.py` (same-root vs root-swap + split), `analyze_churn_confirm.py` (maxPop distribution + flap reproduction; takes ` [startByte]`), + `analyze_segment.py` (windowed segment: churn/cull/flap/outPolys/camera pull-in from a byte offset). + All auto-detect UTF-16 (PowerShell `Tee-Object` writes UTF-16LE — Python must BOM-detect or it reads 0 + lines). +- **cdb on retail:** `tools/cdb/flap-cam-measure.cdb` (retail eye + CameraManager; PDB `refs/acclient.pdb` + = MATCH). Exit code 5 = clean detach. +- **The `sidechk` probe** (in `PortalVisibilityBuilder.Build`, after `eyeInsideOpening`) is throwaway — + strip it with the rest when §4 ships. + +--- + +## 7. Repo state (exact) + +- **HEAD = `485e44d`** (R-A2b fix). Working tree `src/` matches HEAD (verified `git diff HEAD -- src/` + empty). RetailChaseCamera.cs is clean (camera-damping fully reverted — `grep _publishedEye` = 0). +- **Uncommitted tracked:** `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (the PINNED + B1 note) — committed with this handoff. +- **Untracked (throwaway, gitignore/delete):** `analyze_*.py`, `flap-*.log`, `launch-*.log`, the older + `*.jsonl`/`*.png` from prior sessions. +- **The user's last running client was the clean (no-probe) launch and renders correctly.** The source now + has R-A2b restored, so the next build will include R-A2b (rebuild + clean-launch to align). + +--- + +## 8. First moves next session + +1. Read this doc + memory `project_indoor_flap_rootcause`. +2. `git log --oneline -6` (HEAD `485e44d`); confirm `git diff HEAD -- src/` empty. +3. **Clean launch — NO PROBES** (omit `ACDREAM_PROBE_FLAP`). Confirm the cottage renders (walls textured) + and, if you want, that R-A2b's churn stays gone (one short `ACDREAM_PROBE_PORTAL_CHURN=1` capture, then + relaunch clean). This re-validates R-A2b after the chaos. +4. Decide §4 direction with the user (per §4): §2b corner-seal (tractable) vs defer-the-flap vs the + dat-reader thread-safety fix. Do NOT re-try camera-damping. diff --git a/docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md b/docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md index 941c8725..03d2c879 100644 --- a/docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md +++ b/docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md @@ -12,6 +12,14 @@ **Build-while-running gotcha:** the running client locks the DLLs (MSB3027). Always close the client (graceful `CloseMainWindow`) before `dotnet build`. +**PINNED (Phase 1 Task 2, 2026-06-09 capture `flap-sidechk.log`): B1 — the `eyeInsideOpening` bypass.** +Every back portal (`0173→0171` at `D=-1.51…-1.70`; `0172→0173` at `D=1.71`) shows `camInterior=False` (our +`CameraOnInteriorSide` already agrees with retail — it WANTS to cull) and is traversed **only when +`eyeIn=True`** (eye within 1.75 m of the shared doorway). At `D=-2.32` (farther, `eyeIn=False`) the same +back portal is correctly culled. So the cycle is the `&& !eyeInsideOpening` bypass. Forward portals +(`0171→0173`) show `camInterior=True` (unaffected; the clip-empty void rescue is preserved). **Fix = Branch +B1 (Task 4): drop `&& !eyeInsideOpening` from the side-cull.** B2 (side-test convention) is NOT needed. + --- ## PHASE 1 — Pin the back-portal traversal mechanism (B1 vs B2)