docs(render): R-A2b shipped + flap residual (sec 4) + texture red-herring handoff
R-A2b (485e44d) killed the 0171<->0173 churn (maxPop 16->1, measured). Visible flap residual is sec 4 (edge-on openings render-side + corner camera-seal). Camera-damping tried+failed+reverted. The white-walls scare was a RED HERRING: heavy per-frame probes (ACDREAM_PROBE_FLAP) starve the thread-unsafe dat-reader so texture-decode loses the race -> white; a clean launch (no probes) fixes it. The dat-reader thread-safety bug is the real underlying issue (filed). Repo clean at HEAD.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
485e44d163
commit
a1b12dff40
2 changed files with 227 additions and 0 deletions
|
|
@ -0,0 +1,219 @@
|
|||
# HANDOFF — R-A2b shipped (churn killed) · flap residual is §4 · the texture RED-HERRING
|
||||
|
||||
**Date:** 2026-06-09. **Branch:** `claude/thirsty-goldberg-51bb9b`. **HEAD:** `485e44d`.
|
||||
**Milestone:** M1.5 (indoor world feels right). **Phase:** full retail render port (Option A) → R-A2b done; §4 open.
|
||||
|
||||
> Read this top-to-bottom before any code. This session shipped a real, MEASURED fix (R-A2b) but the
|
||||
> VISIBLE flap is only partly resolved, and a **separate runtime bug (the dat-reader) + a self-inflicted
|
||||
> diagnostic mistake (heavy probes)** caused a long, confusing detour ("missing textures"). The §5 lesson
|
||||
> at the bottom is the most important durable takeaway.
|
||||
|
||||
---
|
||||
|
||||
## 0. TL;DR
|
||||
|
||||
1. **R-A2b SHIPPED + MEASURED (commit `485e44d`):** removed the `&& !eyeInsideOpening` bypass from the
|
||||
portal-flood **side-cull** so back portals cull like retail's `PView::InitCell` side test. This kills
|
||||
the `0171↔0173` re-enqueue **churn** — measured `maxPop` **16 → 1** (44 % of frames churning → 0 %,
|
||||
across 1.3 M frames). The flood is now deterministic. 218 App tests green, including the void-fix and
|
||||
#95 over-inclusion guards.
|
||||
2. **But R-A2b did NOT fix the VISIBLE flap.** The remaining grey flicker at doorways/windows is **§4**, a
|
||||
*different* mechanism: the openings project **edge-on** from the 3rd-person eye and the clip collapses
|
||||
(geometric — retail's clip collapses too; retail avoids it by keeping the eye head-on/collided). R-A2b
|
||||
killed the *churn* layer; the *edge-on* layer remains.
|
||||
3. **The camera-damping attempt FAILED and was reverted** (laggy, no flap improvement). Do not re-try it.
|
||||
4. **The "missing textures" scare was a RED HERRING, not a code bug.** The cottage walls rendered WHITE
|
||||
because the **heavy debug probes** (`ACDREAM_PROBE_FLAP=1` + per-frame `Tee` to multi-hundred-MB logs)
|
||||
**starve the thread-unsafe dat-reader**, which then fails to decode the wall textures. **A CLEAN launch
|
||||
(no probes) renders correctly** — user-confirmed. The underlying dat-reader thread-safety bug (the
|
||||
`AccessViolation` crashes) is real and FILED.
|
||||
5. **Repo is clean at HEAD `485e44d`** (R-A2b in). Only uncommitted tracked change: the plan-doc PINNED
|
||||
note (committed alongside this handoff). Throwaway `analyze_*.py` + `*.log` are untracked.
|
||||
|
||||
---
|
||||
|
||||
## 1. What shipped — R-A2b (the churn fix), commit `485e44d`
|
||||
|
||||
**The change (one functional edit + a probe):** in `PortalVisibilityBuilder.cs`, the side-cull was
|
||||
`if (i < ClipPlanes.Count && !CameraOnInteriorSide(...) && !eyeInsideOpening) continue;`. R-A2b drops the
|
||||
`&& !eyeInsideOpening`, in BOTH `Build` and `BuildFromExterior`. Retail's `PView::InitCell` side test
|
||||
(decomp `:432962`) culls a back-facing portal by the side test ALONE — there is no eye-in-opening bypass.
|
||||
The forward-portal clip-empty **void rescue** (`Build` ~`clippedRegion.Count == 0` branch, the 2026-06-05
|
||||
fix) is a SEPARATE code path and is **untouched** — `Build_EyeStandingInInteriorPortal_FloodsNeighbour`
|
||||
stays green.
|
||||
|
||||
**The pin (B1, not B2):** live capture `flap-sidechk.log` showed every back portal (`0173→0171`,
|
||||
`0172→0173`) with `camInterior=False` (our side test already agrees with retail — it WANTS to cull) and
|
||||
traversed **only when `eyeIn=True`** (eye within 1.75 m of the shared doorway). So the cycle was the
|
||||
`eyeInsideOpening` **bypass**, not a `CameraOnInteriorSide` convention bug. Forward portals showed
|
||||
`camInterior=True` (unaffected; void rescue preserved).
|
||||
|
||||
**Measured result:**
|
||||
- `launch-churn-confirm.log` (pre-fix walk): `maxPop` worst **16**, **44 %** of frames `≥2`.
|
||||
- `flap-fix-verify.log` / `flap-structured.log` (post-fix): `maxPop` worst **1**, **0 %** `≥2` across
|
||||
~1.3 M frames. The back portal is now `skip=side` (culled) even at `eyeIn=True`. **Churn eliminated.**
|
||||
- New RED→GREEN test `Build_BackFacingPortal_EyeStandingInOpening_StillCulled`; full App suite 218 green.
|
||||
|
||||
**Docs:** spec `docs/superpowers/specs/2026-06-09-portal-flood-bounded-propagation-r-a2b-design.md`
|
||||
(REVISION → Option B), plan `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (with the
|
||||
PINNED B1 note). Commits: `3fd71a1` (spec) → `7b8a490` (plan) → `89a2032` (sidechk probe) → `485e44d` (fix).
|
||||
|
||||
**R-A2b status / disposition:** committed at HEAD. It is a real, retail-faithful, measured improvement
|
||||
(no more churn → deterministic flood, better perf). It is **innocent of the texture issue** (§5). It does
|
||||
NOT, by itself, make the visible flap go away (that's §4). **Recommendation: keep it.** Next session
|
||||
should do ONE clean (no-probe) launch to re-confirm it renders + the churn stays gone, then move to §4.
|
||||
(Plan Phase 3 — removing the now-dead `MaxReprocessPerCell` cap — was NOT done; it's optional cleanup,
|
||||
risky for the synthetic cyclic tests that have no ClipPlanes. Leave the cap as a harmless backstop, or do
|
||||
it carefully with fixture ClipPlanes.)
|
||||
|
||||
---
|
||||
|
||||
## 2. The VISIBLE flap residual = §4 (NOT fixed by R-A2b)
|
||||
|
||||
Two DISTINCT sub-issues, both ending in "background where geometry should be." Do not conflate them.
|
||||
|
||||
### 2a. Doorway / window grey flap (room↔room) — RENDER-side edge-on
|
||||
- Measured (`flap-structured.log`, the user's room→room pass): root `0171` `vis` oscillates **2↔3↔4**
|
||||
(1,725 transitions); `OutsideView` `outPolys` oscillates **0↔1** (1,544 transitions) → the
|
||||
outdoor-through-window region flips **empty(grey)↔drawn**; **8,574** frames have an edge-on `clip=0`.
|
||||
- The eye is **collision-correct** here: `viewerCell=0171` (the adjacent INTERIOR cell), ~1.7 m back, no
|
||||
wall hit. So it is NOT penetration. The openings simply project **edge-on** from that legitimate
|
||||
position → the clip collapses to <3 verts → grey.
|
||||
- **§5.3 (decomp) established the clip collapse is GEOMETRIC — retail's `polyClipFinish` collapses at
|
||||
edge-on too. There is no clip robustness to port.** Retail avoids the flicker because its eye is
|
||||
**collided/head-on 93 %** at the doorway (`flap-cam-measure.log`), so it rarely views openings edge-on.
|
||||
Ours **floats** (`pulledIn~0`/`collNormValid=False` **97 %**).
|
||||
- So 2a is bounded by the camera, not the clip. But the camera-damping attempt to address it FAILED (§3).
|
||||
|
||||
### 2b. Corner see-through — CAMERA-SEAL failure (eye penetrates wall)
|
||||
- Live-captured (`flap-corner.log` tail): pressing the camera into a corner → eye escapes to **outdoor
|
||||
cell `0xA9B40031`** at `X=165.22` (well outside the cottage), `pulledIn=0`, `collNormValid=False` (NO
|
||||
collision stopped the 2.61 m boom) → render roots **outside** → the ENTIRE interior drops → bluish
|
||||
background. Retail "under no circumstances sees through the wall."
|
||||
- The viewer sweep **does** query the cottage exterior-shell GfxObj (`FindObjCollisions` passes
|
||||
`isViewer`, `TransitionTypes.cs:2376`), so it's not a "shell not queried" gap. The boom slips through
|
||||
somewhere (corner seam? the interior doorway is open so the boom legitimately enters the next room, but
|
||||
in a corner it reaches the exterior). **Not fully root-caused.** This is the most CONCRETE, tractable §4
|
||||
bug — fixing the camera seal (keep the eye inside, like retail) would also reduce 2a (head-on eye).
|
||||
|
||||
---
|
||||
|
||||
## 3. DO-NOT-RETRY (failed/refuted this session)
|
||||
|
||||
- **Camera-damping spring-arm on the published eye** (a separate `_publishedEye` damped toward the
|
||||
collision result, `RetailChaseCamera.cs`). TRIED, **FAILED**: laggy camera feel, zero flap improvement
|
||||
(the forward-crossing flap has no collision to damp; the eye is already smooth). Fully reverted
|
||||
(`git checkout`, verified). Do not re-try damping the published eye.
|
||||
- **"It's the corner" / "runtime-state, relaunch fixes it" / "disk full"** — all WRONG explanations I gave
|
||||
for the white walls. The truth is §5 (probes starve the dat-reader). Don't repeat these.
|
||||
- Carried-over DO-NOTs (still valid): byte-stable eye; bounded-propagation/churn as the WHOLE story (it
|
||||
was the churn LAYER, R-A2b fixed it); two-pipe split; PVS membership grounding retail lacks; the §3
|
||||
"port retail's edge-on clip robustness" (retail has none).
|
||||
|
||||
---
|
||||
|
||||
## 4. WHAT TO WORK ON NEXT (recommendation)
|
||||
|
||||
This flap has consumed many sessions for partial results. Honest options, in rough preference order:
|
||||
|
||||
1. **Re-validate + keep R-A2b, then attack §2b (the corner camera-seal).** It's the most concrete,
|
||||
measured-wrong behavior (eye demonstrably escapes to an outdoor cell). Root-cause WHY the viewer-sphere
|
||||
sweep finds no collision when the boom reaches the exterior in a corner (the sweep queries the shell but
|
||||
`collNormValid=False`). Fixing the seal (eye stays inside) is retail-faithful AND would dampen 2a (a
|
||||
head-on eye doesn't view openings edge-on). Decomp oracle: `SmartBox::update_viewer` (`:92761`), the
|
||||
`viewer_sphere` sweep + cell enclosure.
|
||||
2. **Accept/defer the flap and move to milestone work.** The §2a edge-on flicker is genuinely hard
|
||||
(retail relies on the camera; our 3rd-person floating eye is the divergence) and the visible payoff per
|
||||
session has been low. Per the milestone discipline, M2 ("kill a drudge": F.2/F.3/F.5a/L.1c/L.1b) or
|
||||
other M1.5 issues may be a better use of effort than continued flap grinding. **This is a legitimate
|
||||
call — flag it to the user.**
|
||||
3. **Fix the dat-reader thread-safety bug** (the real underlying cause of both the crashes and the
|
||||
white-wall load failures). See §5. It's a correctness/stability win independent of the flap. Filed as a
|
||||
background task this session.
|
||||
|
||||
Do NOT silently keep grinding §2a with more camera tweaks — the camera-damping already failed. If pursuing
|
||||
the flap, §2b (seal) is the tractable lever; if not, say so and move on.
|
||||
|
||||
---
|
||||
|
||||
## 5. THE TEXTURE RED-HERRING — the durable lesson
|
||||
|
||||
**Symptom:** mid-session, the cottage walls rendered WHITE (geometry present — windows/painting/floor/NPCs
|
||||
drew — but the wall surfaces were the clear/background color). I called it "missing textures."
|
||||
|
||||
**My failure:** I thrashed — blamed the corner, then "runtime state," then disk, then implied my code,
|
||||
across ~5 messages, while the user (rightly) got furious. I should have checked **baseline reproduction**
|
||||
immediately.
|
||||
|
||||
**Actual root cause (proven):**
|
||||
- The wall data wasn't loading. `[flap]` probe showed cell `0171` with **identity transform + zero
|
||||
portals** = an unhydrated/empty cell (a dat-LOAD failure, not a flood/visibility bug — R-A2b only
|
||||
*reads* cell data, it cannot make a cell load empty).
|
||||
- **It reproduced on the EXACT baseline** (`git checkout 8f879bd`, `diff` empty, rebuilt) — so NOT my code.
|
||||
- Disk fine (1074 GB free); dats intact (normal sizes); no corrupt cache.
|
||||
- **The trigger: the heavy per-frame probes** (`ACDREAM_PROBE_FLAP=1` writing `[flap]`/`[pv-trace]`/
|
||||
`sidechk` every frame + `Tee-Object` to hundreds-of-MB logs) **load the render thread enough to skew the
|
||||
timing of the background mesh/texture-decode thread**, which races on the **thread-unsafe dat-reader**
|
||||
(`MemoryMappedBlockAllocator.ReadBlock` — the same code that throws `AccessViolation` and crashed several
|
||||
launches). Under that timing the texture/cell load loses the race → empty/white.
|
||||
- **FIX: launch CLEAN (no probes).** `launch-final.log`: no load errors, render path + all atlases loaded,
|
||||
**user confirmed the walls render.** This is also how the user normally runs the client.
|
||||
|
||||
**DURABLE RULES (add to memory):**
|
||||
- **Never run normal play or a visual gate with `ACDREAM_PROBE_FLAP` (or other per-frame probes) on.**
|
||||
They starve the thread-unsafe dat-reader → intermittent white-wall / empty-cell load failures (and raise
|
||||
the `AccessViolation` crash rate). Use probes ONLY for short, targeted captures, then relaunch clean.
|
||||
- **For any render/load symptom, check baseline reproduction FIRST** (`git stash`/`git checkout <baseline>`
|
||||
+ relaunch) before theorizing or blaming a change. One reproduction test beats five wrong explanations.
|
||||
|
||||
**The real bug (separate task, FILED):** `DatReaderWriter ... MemoryMappedBlockAllocator.ReadBlock`
|
||||
`AccessViolation` — the dat-reader is not thread-safe under the concurrent mesh-decode/streaming access.
|
||||
Fix = serialize dat block reads behind a per-DatDatabase lock, or thread-local buffers, or a private
|
||||
`MemoryMappedViewAccessor` per reader. Files: `src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs`,
|
||||
`DatDatabaseWrapper.cs`, the `DatReaderWriter` block allocator. This is the highest-value stability fix
|
||||
and is independent of the flap.
|
||||
|
||||
---
|
||||
|
||||
## 6. Apparatus (reuse; STRIP after §4 ships)
|
||||
|
||||
- **Probes** (gated in `AcDream.Core.Rendering.RenderingDiagnostics`):
|
||||
- `ACDREAM_PROBE_FLAP=1` → `[flap]` (root cell per-portal D/TRV/CULL/proj/clip + outPolys + vis),
|
||||
`[flap-sweep]` (camera sweep: `pulledIn`/`collNormValid`/`viewerCell`/`in=`/`out=`),
|
||||
`[pv-trace]` (per-cell flood decisions, signature-dedup'd ≤160), and the **`sidechk`** line added this
|
||||
session (per-portal `camInterior`/`eyeIn`/`D`). **HEAVY — see §5; do not leave on for play.**
|
||||
- `ACDREAM_PROBE_PORTAL_CHURN=1` → `[portal-churn]` (per-Build `maxPop` + reciprocal pre→post).
|
||||
- **Throwaway analyzers (untracked, in the worktree root):** `analyze_flap_vis.py` (same-root vs root-swap
|
||||
split), `analyze_churn_confirm.py` (maxPop distribution + flap reproduction; takes `<log> [startByte]`),
|
||||
`analyze_segment.py` (windowed segment: churn/cull/flap/outPolys/camera pull-in from a byte offset).
|
||||
All auto-detect UTF-16 (PowerShell `Tee-Object` writes UTF-16LE — Python must BOM-detect or it reads 0
|
||||
lines).
|
||||
- **cdb on retail:** `tools/cdb/flap-cam-measure.cdb` (retail eye + CameraManager; PDB `refs/acclient.pdb`
|
||||
= MATCH). Exit code 5 = clean detach.
|
||||
- **The `sidechk` probe** (in `PortalVisibilityBuilder.Build`, after `eyeInsideOpening`) is throwaway —
|
||||
strip it with the rest when §4 ships.
|
||||
|
||||
---
|
||||
|
||||
## 7. Repo state (exact)
|
||||
|
||||
- **HEAD = `485e44d`** (R-A2b fix). Working tree `src/` matches HEAD (verified `git diff HEAD -- src/`
|
||||
empty). RetailChaseCamera.cs is clean (camera-damping fully reverted — `grep _publishedEye` = 0).
|
||||
- **Uncommitted tracked:** `docs/superpowers/plans/2026-06-09-portal-flood-r-a2b-side-cull.md` (the PINNED
|
||||
B1 note) — committed with this handoff.
|
||||
- **Untracked (throwaway, gitignore/delete):** `analyze_*.py`, `flap-*.log`, `launch-*.log`, the older
|
||||
`*.jsonl`/`*.png` from prior sessions.
|
||||
- **The user's last running client was the clean (no-probe) launch and renders correctly.** The source now
|
||||
has R-A2b restored, so the next build will include R-A2b (rebuild + clean-launch to align).
|
||||
|
||||
---
|
||||
|
||||
## 8. First moves next session
|
||||
|
||||
1. Read this doc + memory `project_indoor_flap_rootcause`.
|
||||
2. `git log --oneline -6` (HEAD `485e44d`); confirm `git diff HEAD -- src/` empty.
|
||||
3. **Clean launch — NO PROBES** (omit `ACDREAM_PROBE_FLAP`). Confirm the cottage renders (walls textured)
|
||||
and, if you want, that R-A2b's churn stays gone (one short `ACDREAM_PROBE_PORTAL_CHURN=1` capture, then
|
||||
relaunch clean). This re-validates R-A2b after the chaos.
|
||||
4. Decide §4 direction with the user (per §4): §2b corner-seal (tractable) vs defer-the-flap vs the
|
||||
dat-reader thread-safety fix. Do NOT re-try camera-damping.
|
||||
Loading…
Add table
Add a link
Reference in a new issue