docs(render): R-A2b shipped + flap residual (sec 4) + texture red-herring handoff

R-A2b (485e44d) killed the 0171<->0173 churn (maxPop 16->1, measured). Visible flap residual is sec 4 (edge-on openings render-side + corner camera-seal). Camera-damping tried+failed+reverted. The white-walls scare was a RED HERRING: heavy per-frame probes (ACDREAM_PROBE_FLAP) starve the thread-unsafe dat-reader so texture-decode loses the race -> white; a clean launch (no probes) fixes it. The dat-reader thread-safety bug is the real underlying issue (filed). Repo clean at HEAD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-06-09 18:44:33 +02:00
parent 485e44d163
commit a1b12dff40
2 changed files with 227 additions and 0 deletions

View file

@ -0,0 +1,219 @@
# HANDOFF — R-A2b shipped (churn killed) · flap residual is §4 · the texture RED-HERRING
**Date:** 2026-06-09. **Branch:** `claude/thirsty-goldberg-51bb9b`. **HEAD:** `485e44d`.
**Milestone:** M1.5 (indoor world feels right). **Phase:** full retail render port (Option A) → R-A2b done; §4 open.
> Read this top-to-bottom before any code. This session shipped a real, MEASURED fix (R-A2b) but the
> VISIBLE flap is only partly resolved, and a **separate runtime bug (the dat-reader) + a self-inflicted
> diagnostic mistake (heavy probes)** caused a long, confusing detour ("missing textures"). The §5 lesson
> at the bottom is the most important durable takeaway.
---
## 0. TL;DR
1. **R-A2b SHIPPED + MEASURED (commit `485e44d`):** removed the `&& !eyeInsideOpening` bypass from the
portal-flood **side-cull** so back portals cull like retail's `PView::InitCell` side test. This kills
the `0171↔0173` re-enqueue **churn** — measured `maxPop` **16 → 1** (44 % of frames churning → 0 %,
across 1.3 M frames). The flood is now deterministic. 218 App tests green, including the void-fix and
#95 over-inclusion guards.
2. **But R-A2b did NOT fix the VISIBLE flap.** The remaining grey flicker at doorways/windows is **§4**, a
*different* mechanism: the openings project **edge-on** from the 3rd-person eye and the clip collapses
(geometric — retail's clip collapses too; retail avoids it by keeping the eye head-on/collided). R-A2b
killed the *churn* layer; the *edge-on* layer remains.
3. **The camera-damping attempt FAILED and was reverted** (laggy, no flap improvement). Do not re-try it.
4. **The "missing textures" scare was a RED HERRING, not a code bug.** The cottage walls rendered WHITE
because the **heavy debug probes** (`ACDREAM_PROBE_FLAP=1` + per-frame `Tee` to multi-hundred-MB logs)
**starve the thread-unsafe dat-reader**, which then fails to decode the wall textures. **A CLEAN launch
(no probes) renders correctly** — user-confirmed. The underlying dat-reader thread-safety bug (the
`AccessViolation` crashes) is real and FILED.
5. **Repo is clean at HEAD `485e44d`** (R-A2b in). Only uncommitted tracked change: the plan-doc PINNED
note (committed alongside this handoff). Throwaway `analyze_*.py` + `*.log` are untracked.
---
## 1. What shipped — R-A2b (the churn fix), commit `485e44d`
**The change (one functional edit + a probe):** in `PortalVisibilityBuilder.cs`, the side-cull was
`if (i < ClipPlanes.Count && !CameraOnInteriorSide(...) && !eyeInsideOpening) continue;`. R-A2b drops the
`&& !eyeInsideOpening`, in BOTH `Build` and `BuildFromExterior`. Retail's `PView::InitCell` side test
(decomp `:432962`) culls a back-facing portal by the side test ALONE — there is no eye-in-opening bypass.
The forward-portal clip-empty **void rescue** (`Build` ~`clippedRegion.Count == 0` branch, the 2026-06-05
fix) is a SEPARATE code path and is **untouched**`Build_EyeStandingInInteriorPortal_FloodsNeighbour`
stays green.
**The pin (B1, not B2):** live capture `flap-sidechk.log` showed every back portal (`0173→0171`,
`0172→0173`) with `camInterior=False` (our side test already agrees with retail — it WANTS to cull) and
traversed **only when `eyeIn=True`** (eye within 1.75 m of the shared doorway). So the cycle was the
`eyeInsideOpening` **bypass**, not a `CameraOnInteriorSide` convention bug. Forward portals showed
`camInterior=True` (unaffected; void rescue preserved).
**Measured result:**
- `launch-churn-confirm.log` (pre-fix walk): `maxPop` worst **16**, **44 %** of frames `≥2`.
- `flap-fix-verify.log` / `flap-structured.log` (post-fix): `maxPop` worst **1**, **0 %** `≥2` across
~1.3 M frames. The back portal is now `skip=side` (culled) even at `eyeIn=True`. **Churn eliminated.**
- New RED→GREEN test `Build_BackFacingPortal_EyeStandingInOpening_StillCulled`; full App suite 218 green.
**Docs:** spec `docs/superpowers/specs/2026-06-09-portal-flood-bounded-propagation-r-a2b-design.md`
(REVISION → Option B), plan `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (with the
PINNED B1 note). Commits: `3fd71a1` (spec) → `7b8a490` (plan) → `89a2032` (sidechk probe) → `485e44d` (fix).
**R-A2b status / disposition:** committed at HEAD. It is a real, retail-faithful, measured improvement
(no more churn → deterministic flood, better perf). It is **innocent of the texture issue** (§5). It does
NOT, by itself, make the visible flap go away (that's §4). **Recommendation: keep it.** Next session
should do ONE clean (no-probe) launch to re-confirm it renders + the churn stays gone, then move to §4.
(Plan Phase 3 — removing the now-dead `MaxReprocessPerCell` cap — was NOT done; it's optional cleanup,
risky for the synthetic cyclic tests that have no ClipPlanes. Leave the cap as a harmless backstop, or do
it carefully with fixture ClipPlanes.)
---
## 2. The VISIBLE flap residual = §4 (NOT fixed by R-A2b)
Two DISTINCT sub-issues, both ending in "background where geometry should be." Do not conflate them.
### 2a. Doorway / window grey flap (room↔room) — RENDER-side edge-on
- Measured (`flap-structured.log`, the user's room→room pass): root `0171` `vis` oscillates **2↔3↔4**
(1,725 transitions); `OutsideView` `outPolys` oscillates **0↔1** (1,544 transitions) → the
outdoor-through-window region flips **empty(grey)↔drawn**; **8,574** frames have an edge-on `clip=0`.
- The eye is **collision-correct** here: `viewerCell=0171` (the adjacent INTERIOR cell), ~1.7 m back, no
wall hit. So it is NOT penetration. The openings simply project **edge-on** from that legitimate
position → the clip collapses to <3 verts grey.
- **§5.3 (decomp) established the clip collapse is GEOMETRIC — retail's `polyClipFinish` collapses at
edge-on too. There is no clip robustness to port.** Retail avoids the flicker because its eye is
**collided/head-on 93 %** at the doorway (`flap-cam-measure.log`), so it rarely views openings edge-on.
Ours **floats** (`pulledIn~0`/`collNormValid=False` **97 %**).
- So 2a is bounded by the camera, not the clip. But the camera-damping attempt to address it FAILED (§3).
### 2b. Corner see-through — CAMERA-SEAL failure (eye penetrates wall)
- Live-captured (`flap-corner.log` tail): pressing the camera into a corner → eye escapes to **outdoor
cell `0xA9B40031`** at `X=165.22` (well outside the cottage), `pulledIn=0`, `collNormValid=False` (NO
collision stopped the 2.61 m boom) → render roots **outside** → the ENTIRE interior drops → bluish
background. Retail "under no circumstances sees through the wall."
- The viewer sweep **does** query the cottage exterior-shell GfxObj (`FindObjCollisions` passes
`isViewer`, `TransitionTypes.cs:2376`), so it's not a "shell not queried" gap. The boom slips through
somewhere (corner seam? the interior doorway is open so the boom legitimately enters the next room, but
in a corner it reaches the exterior). **Not fully root-caused.** This is the most CONCRETE, tractable §4
bug — fixing the camera seal (keep the eye inside, like retail) would also reduce 2a (head-on eye).
---
## 3. DO-NOT-RETRY (failed/refuted this session)
- **Camera-damping spring-arm on the published eye** (a separate `_publishedEye` damped toward the
collision result, `RetailChaseCamera.cs`). TRIED, **FAILED**: laggy camera feel, zero flap improvement
(the forward-crossing flap has no collision to damp; the eye is already smooth). Fully reverted
(`git checkout`, verified). Do not re-try damping the published eye.
- **"It's the corner" / "runtime-state, relaunch fixes it" / "disk full"** — all WRONG explanations I gave
for the white walls. The truth is §5 (probes starve the dat-reader). Don't repeat these.
- Carried-over DO-NOTs (still valid): byte-stable eye; bounded-propagation/churn as the WHOLE story (it
was the churn LAYER, R-A2b fixed it); two-pipe split; PVS membership grounding retail lacks; the §3
"port retail's edge-on clip robustness" (retail has none).
---
## 4. WHAT TO WORK ON NEXT (recommendation)
This flap has consumed many sessions for partial results. Honest options, in rough preference order:
1. **Re-validate + keep R-A2b, then attack §2b (the corner camera-seal).** It's the most concrete,
measured-wrong behavior (eye demonstrably escapes to an outdoor cell). Root-cause WHY the viewer-sphere
sweep finds no collision when the boom reaches the exterior in a corner (the sweep queries the shell but
`collNormValid=False`). Fixing the seal (eye stays inside) is retail-faithful AND would dampen 2a (a
head-on eye doesn't view openings edge-on). Decomp oracle: `SmartBox::update_viewer` (`:92761`), the
`viewer_sphere` sweep + cell enclosure.
2. **Accept/defer the flap and move to milestone work.** The §2a edge-on flicker is genuinely hard
(retail relies on the camera; our 3rd-person floating eye is the divergence) and the visible payoff per
session has been low. Per the milestone discipline, M2 ("kill a drudge": F.2/F.3/F.5a/L.1c/L.1b) or
other M1.5 issues may be a better use of effort than continued flap grinding. **This is a legitimate
call — flag it to the user.**
3. **Fix the dat-reader thread-safety bug** (the real underlying cause of both the crashes and the
white-wall load failures). See §5. It's a correctness/stability win independent of the flap. Filed as a
background task this session.
Do NOT silently keep grinding §2a with more camera tweaks — the camera-damping already failed. If pursuing
the flap, §2b (seal) is the tractable lever; if not, say so and move on.
---
## 5. THE TEXTURE RED-HERRING — the durable lesson
**Symptom:** mid-session, the cottage walls rendered WHITE (geometry present — windows/painting/floor/NPCs
drew — but the wall surfaces were the clear/background color). I called it "missing textures."
**My failure:** I thrashed — blamed the corner, then "runtime state," then disk, then implied my code,
across ~5 messages, while the user (rightly) got furious. I should have checked **baseline reproduction**
immediately.
**Actual root cause (proven):**
- The wall data wasn't loading. `[flap]` probe showed cell `0171` with **identity transform + zero
portals** = an unhydrated/empty cell (a dat-LOAD failure, not a flood/visibility bug — R-A2b only
*reads* cell data, it cannot make a cell load empty).
- **It reproduced on the EXACT baseline** (`git checkout 8f879bd`, `diff` empty, rebuilt) — so NOT my code.
- Disk fine (1074 GB free); dats intact (normal sizes); no corrupt cache.
- **The trigger: the heavy per-frame probes** (`ACDREAM_PROBE_FLAP=1` writing `[flap]`/`[pv-trace]`/
`sidechk` every frame + `Tee-Object` to hundreds-of-MB logs) **load the render thread enough to skew the
timing of the background mesh/texture-decode thread**, which races on the **thread-unsafe dat-reader**
(`MemoryMappedBlockAllocator.ReadBlock` — the same code that throws `AccessViolation` and crashed several
launches). Under that timing the texture/cell load loses the race → empty/white.
- **FIX: launch CLEAN (no probes).** `launch-final.log`: no load errors, render path + all atlases loaded,
**user confirmed the walls render.** This is also how the user normally runs the client.
**DURABLE RULES (add to memory):**
- **Never run normal play or a visual gate with `ACDREAM_PROBE_FLAP` (or other per-frame probes) on.**
They starve the thread-unsafe dat-reader → intermittent white-wall / empty-cell load failures (and raise
the `AccessViolation` crash rate). Use probes ONLY for short, targeted captures, then relaunch clean.
- **For any render/load symptom, check baseline reproduction FIRST** (`git stash`/`git checkout <baseline>`
+ relaunch) before theorizing or blaming a change. One reproduction test beats five wrong explanations.
**The real bug (separate task, FILED):** `DatReaderWriter ... MemoryMappedBlockAllocator.ReadBlock`
`AccessViolation` — the dat-reader is not thread-safe under the concurrent mesh-decode/streaming access.
Fix = serialize dat block reads behind a per-DatDatabase lock, or thread-local buffers, or a private
`MemoryMappedViewAccessor` per reader. Files: `src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs`,
`DatDatabaseWrapper.cs`, the `DatReaderWriter` block allocator. This is the highest-value stability fix
and is independent of the flap.
---
## 6. Apparatus (reuse; STRIP after §4 ships)
- **Probes** (gated in `AcDream.Core.Rendering.RenderingDiagnostics`):
- `ACDREAM_PROBE_FLAP=1``[flap]` (root cell per-portal D/TRV/CULL/proj/clip + outPolys + vis),
`[flap-sweep]` (camera sweep: `pulledIn`/`collNormValid`/`viewerCell`/`in=`/`out=`),
`[pv-trace]` (per-cell flood decisions, signature-dedup'd ≤160), and the **`sidechk`** line added this
session (per-portal `camInterior`/`eyeIn`/`D`). **HEAVY — see §5; do not leave on for play.**
- `ACDREAM_PROBE_PORTAL_CHURN=1``[portal-churn]` (per-Build `maxPop` + reciprocal pre→post).
- **Throwaway analyzers (untracked, in the worktree root):** `analyze_flap_vis.py` (same-root vs root-swap
split), `analyze_churn_confirm.py` (maxPop distribution + flap reproduction; takes `<log> [startByte]`),
`analyze_segment.py` (windowed segment: churn/cull/flap/outPolys/camera pull-in from a byte offset).
All auto-detect UTF-16 (PowerShell `Tee-Object` writes UTF-16LE — Python must BOM-detect or it reads 0
lines).
- **cdb on retail:** `tools/cdb/flap-cam-measure.cdb` (retail eye + CameraManager; PDB `refs/acclient.pdb`
= MATCH). Exit code 5 = clean detach.
- **The `sidechk` probe** (in `PortalVisibilityBuilder.Build`, after `eyeInsideOpening`) is throwaway —
strip it with the rest when §4 ships.
---
## 7. Repo state (exact)
- **HEAD = `485e44d`** (R-A2b fix). Working tree `src/` matches HEAD (verified `git diff HEAD -- src/`
empty). RetailChaseCamera.cs is clean (camera-damping fully reverted — `grep _publishedEye` = 0).
- **Uncommitted tracked:** `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (the PINNED
B1 note) — committed with this handoff.
- **Untracked (throwaway, gitignore/delete):** `analyze_*.py`, `flap-*.log`, `launch-*.log`, the older
`*.jsonl`/`*.png` from prior sessions.
- **The user's last running client was the clean (no-probe) launch and renders correctly.** The source now
has R-A2b restored, so the next build will include R-A2b (rebuild + clean-launch to align).
---
## 8. First moves next session
1. Read this doc + memory `project_indoor_flap_rootcause`.
2. `git log --oneline -6` (HEAD `485e44d`); confirm `git diff HEAD -- src/` empty.
3. **Clean launch — NO PROBES** (omit `ACDREAM_PROBE_FLAP`). Confirm the cottage renders (walls textured)
and, if you want, that R-A2b's churn stays gone (one short `ACDREAM_PROBE_PORTAL_CHURN=1` capture, then
relaunch clean). This re-validates R-A2b after the chaos.
4. Decide §4 direction with the user (per §4): §2b corner-seal (tractable) vs defer-the-flap vs the
dat-reader thread-safety fix. Do NOT re-try camera-damping.

View file

@ -12,6 +12,14 @@
**Build-while-running gotcha:** the running client locks the DLLs (MSB3027). Always close the client (graceful `CloseMainWindow`) before `dotnet build`.
**PINNED (Phase 1 Task 2, 2026-06-09 capture `flap-sidechk.log`): B1 — the `eyeInsideOpening` bypass.**
Every back portal (`0173→0171` at `D=-1.51…-1.70`; `0172→0173` at `D=1.71`) shows `camInterior=False` (our
`CameraOnInteriorSide` already agrees with retail — it WANTS to cull) and is traversed **only when
`eyeIn=True`** (eye within 1.75 m of the shared doorway). At `D=-2.32` (farther, `eyeIn=False`) the same
back portal is correctly culled. So the cycle is the `&& !eyeInsideOpening` bypass. Forward portals
(`0171→0173`) show `camInterior=True` (unaffected; the clip-empty void rescue is preserved). **Fix = Branch
B1 (Task 4): drop `&& !eyeInsideOpening` from the side-cull.** B2 (side-test convention) is NOT needed.
---
## PHASE 1 — Pin the back-portal traversal mechanism (B1 vs B2)