acdream/docs/research/2026-05-29-a8f-visual-gate-failure-handoff.md
Erik cf3d49cbd7 docs: Phase A8.F visual-gate failure handoff + issue #103
A8.F (retail portal-frame port) shipped Tasks 0-8 but failed its visual gate:
indoor branch renders broadly wrong at runtime (terrain over walls, transparent/
invisible walls). Default game unaffected (branch gated behind
ACDREAM_A8_INDOOR_BRANCH). Two compounding root causes documented (OutsideView
under-produces; Job-A/B else-branch floods ungated terrain) + apparatus + a
first-fix hypothesis + pickup prompt. Filed #103.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 14:43:24 +02:00

191 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase A8.F — visual-gate failure + pickup handoff (2026-05-29)
## TL;DR
The retail portal-frame visibility port (**Phase A8.F**) shipped as code (Tasks 08,
committed) but **FAILED its visual gate**. With `ACDREAM_A8_INDOOR_BRANCH=1`, indoor
and outside-in rendering is broadly broken: cottage/cellar interiors are "covered in
outdoor terrain / transparent walls," and walls are invisible in other houses from
both inside and outside.
**The default game is UNAFFECTED.** `cameraInsideBuilding = a8IndoorBranchEnabled &&
(inside a building)` (GameWindow.cs:7343), so `RenderInsideOutAcdream` only runs with
the opt-in env var. Without it, rendering is the pre-A8 path (walls render; only the
old cellar flap remains). **Do not panic — normal play is fine; the A8.F branch is the
broken opt-in.**
The work is committed (not reverted): the GL-free CPU layer is solid and unit-tested;
the **integration** (CPU-built clipped NDC mask → stencil-gate all outdoor terrain/
scenery) is what fails at runtime. This doc has the root-cause analysis, the apparatus,
and a pickup prompt.
## What was built (the A8.F port)
Spec: [`docs/superpowers/specs/2026-05-29-phase-a8f-portal-frame-visibility-design.md`](../superpowers/specs/2026-05-29-phase-a8f-portal-frame-visibility-design.md)
Plan: [`docs/superpowers/plans/2026-05-29-phase-a8f-portal-frame-visibility.md`](../superpowers/plans/2026-05-29-phase-a8f-portal-frame-visibility.md)
Idea: port retail's `PView` recursive portal-clip (`ConstructView`/`ClipPortals`/
`GetClip`) — WB has NO such recursion, so the flat WB stencil can't fix the cellar flap;
retail clips each portal to its portal chain. We built a GL-free CPU builder that walks
the portal graph and produces `OutsideView` (a screen-space NDC region = exit portals
recursively clipped), then stencil-gate outdoor terrain/scenery to it.
Commits (on `claude/strange-albattani-3fc83c`, after baseline `5dc4140`):
- `bb903bc` Task 0 — strip ACDREAM_A8_DIAG_* flags.
- `406307e` Task 1 — ViewPolygon + CellView (GL-free data model). Unit-tested.
- `7f46c27` Task 2 — ScreenPolygonClip (Sutherland-Hodgman convex intersection). Unit-tested.
- `a28a176` + `9ec8330` Task 3 — PortalProjection (NDC + near-plane clip). Unit-tested.
(A near-plane bug was caught + fixed during impl: `w>=WEps``w+z>=0`.)
- `0ed462c` + `270c21f` Task 4 — PortalVisibilityBuilder (the BFS). Unit-tested.
(Known dungeon-scaling fast-follow filed as **issue #102**.)
- `d12892b` + `08f6a0c` + `d581f4c` Task 5 — IndoorCellStencilPipeline.MarkAndPunchNdc.
- `9e2eb90` Task 6 — RenderInsideOut rewrite: builder-driven mask + **Job-A/B decouple**.
- `1c02a01` + `5a012c0` Task 7 — wire-in #2 per-cell translucent clip on stencil bit 2.
(A DepthFunc-leak bug was caught + fixed by code review.)
- `e0051e0` + `452ee5b` Task 8 — wire-in #3 cross-building (ungated Step 5, clipped bit-1).
- `7c3ee43` — triage apparatus (this debugging session; see below).
All `dotnet build` + `dotnet test` green throughout (App baseline 108).
## The visual-gate failure — symptoms
With `ACDREAM_A8_INDOOR_BRANCH=1` at Holtburg cottages (camera = `+Acdream`):
1. Outside→in (looking into a cottage from outside): cellar entrance looked correct.
2. Inside the cellar: **covered in outdoor terrain; walls transparent (see-through).** Passable (render-only).
3. Looking out from inside (toward a window): looked roughly normal.
4. Passing inside→out: **buildings + ground disappear; only server-spawned things
(doors/NPCs/particles) remain.**
5. **Invisible walls in OTHER houses, both from inside and outside.**
## Root-cause analysis (evidence-based; see apparatus below)
**Finding 1 — the cell walls DO render.** `[opaque]` probe (opaque cell-render stats,
captured BEFORE the per-cell transparent loop overwrites them): `cells=7 tris=50/60`,
`cells=25 tris=108` in occupied cottage cells. `tris=0` only in transient frustum-culled
frames. So "transparent walls" is **NOT** walls failing to render — it's terrain drawn
*over* them. (NOTE: the older `[envcells]` probe reads stats AFTER the transparent loop,
so its `cells=1 tris=0` is a misleading artifact — ignore it.)
**Finding 2 — `OutsideView` is frequently EMPTY, and when non-empty it doesn't narrow.**
`[pv-dump] OUTSIDEVIEW polys=N`: `polys=0` in the majority of frames; `polys=1` sometimes.
When non-empty, the clipped region ≈ the full source window (e.g. from the cellar, the
`0xA9B40170` window passes through ~unclipped, not narrowed to the stairwell sliver). So
the recursive-clip — the entire point of A8.F — is **not constraining at runtime**.
**Finding 3 — projection/clip MATH is correct; the builder under-produces.** When a
`[pv-dump] EXIT` line fires, the local quad → NDC → clipped chain is sane (window quad
`local=[(5.55,-8.61,0)(7.45,-8.61,0)(7.45,-8.35,2.5)(5.55,-8.35,2.5)]` → reasonable NDC →
clipped region). The `viewProj` is a valid System.Numerics row-vector `view*proj`
(`M33≈M34` because far≫near makes `proj.M33≈-1`; `M44` varies with camera, expected).
`ProjectToNdc` matches the GPU convention (verified algebraically: `Vector4.Transform(v,M)`
== GPU `M*v` for transpose=false upload). **Projection is not the bug.** The bug is the
builder yielding empty/too-wide regions for most real camera positions — the exit-portal
clip produces empty (needs deeper trace: portal-side cull? FullScreen-clip producing
empty? BFS not reaching exit portals from most positions?).
**Finding 4 — the Job-A/B decoupling floods terrain when `OutsideView` is empty (the
proximate cause of "transparent walls").** Task 6 made Step-4 terrain/scenery draw
UNCONDITIONALLY, with only the stencil *state* gated. When `OutsideView` is empty
(`didInsideStencil=false`), the `else` branch **disables the stencil and draws terrain
ungated** (GameWindow.cs ~11142). Combined with Finding 2 (empty most frames), terrain
floods over the (rendered) cell interior → "covered in terrain / transparent walls."
This is exactly the Opus Task-6 code-review **Minor #2** risk, realized at scale.
**Why WB doesn't hit this but we do:** in WB, `didInsideStencil = "inside a building"`
(always true indoors, because it marks the whole building's exit-portal set, which is
non-empty). WB never has the "inside + empty mask" case. Our builder produces empty masks
frequently, so the `else` branch (which WB effectively never exercises with an empty mask)
floods. The CPU-NDC-recursive-clip mask is far more fragile at runtime than WB's flat
building mask.
## The two compounding root causes (summary)
1. **`OutsideView` builder under-produces at runtime** — empty most frames; never narrows
recursively. (Builder/clip integration with real geometry; not the projection math.)
2. **Empty-`OutsideView` → ungated terrain flood** — the Job-A/B decoupling's `else` branch
draws terrain everywhere when the mask is empty, painting over the cell interior.
## Concrete first-fix hypothesis (try this first next session)
The `else` branch is wrong: **an empty `OutsideView` means "no outdoors visible from
here," not "all outdoors visible."** When inside a building with an empty mask, draw NO
outdoor terrain/scenery (or fall back to the pre-A8 "depth-clear-when-inside" behavior),
rather than ungated terrain. That alone should stop the flooding (walls become solid;
you temporarily lose terrain-through-portal until the builder is fixed, but the interior
renders correctly). This decouples the two bugs so each can be fixed independently.
Then separately debug Finding 2 (why the builder yields empty/too-wide regions) — the
`[pv-dump]` apparatus already traces local→NDC→clipped; extend it to log the side-test
result and the per-stage vert counts for ALL exit portals (the current dump's EXIT-CULLED/
EXIT-PROJ/EXIT-CLIP lines do this — read them across many frames to see which gate kills
the portals when `polys=0`).
## The architectural question (escalate to the human before a big rewrite)
Is "CPU-build a recursively-clipped NDC region + stencil-gate ALL outdoor terrain/scenery
to it" viable in acdream's pipeline, or is it too fragile (Finding 2)? Options:
- (a) Fix the builder + the else-branch (incremental; the first-fix hypothesis above).
- (b) Reconsider enforcement — e.g., port retail's per-cell screen-space scissor more
literally, or keep WB's flat building mask (accept the cellar flap) and special-case
only the cellar. The user explicitly chose the faithful retail port (option A) at
brainstorm; revisit only if (a) proves intractable.
## Safety / current state
- **Default game safe**: indoor branch gated behind `ACDREAM_A8_INDOOR_BRANCH=1`
(`cameraInsideBuilding = a8IndoorBranchEnabled && inside`, GameWindow.cs:7343).
- Work is **committed, not reverted** (CPU layer is good; integration needs fixing).
- The old cellar flap (the original M1.5 blocker) is **still present** in the default
(pre-A8) path — A8.F did not fix it.
- Tree clean as of `7c3ee43`.
## Apparatus (all env-gated; require `ACDREAM_A8_INDOOR_BRANCH=1` to reach the code)
- `ACDREAM_A8_DUMP_PV=1``[pv-dump]` lines: per camera cell, the exit-portal
local→NDC→clipped geometry (EXIT-CULLED / EXIT-PROJ / EXIT-CLIP / EXIT) + `OUTSIDEVIEW
polys=N`. First 2 Build calls per distinct camera cell. (PortalVisibilityBuilder.cs.)
- `ACDREAM_PROBE_ENVCELL=1``[opaque]` line: opaque cell-render stats (cells/tris)
BEFORE the transparent loop overwrites `_envCellRenderer.Stats`. One-shot per camera
cell. (GameWindow.cs, after the Step-3 opaque render.)
- `ACDREAM_PROBE_VIS=1``[buildings]`/`[draworder]`/`[stencil]`/`[envcells]` (existing).
NOTE `[envcells]` is post-transparent-loop (misleading); `[stencil] verts` reflects the
OutsideView triangle count.
- `tools/A8CellAudit` — offline cell/portal dumper (`portals <cellId>` / `buildings <lb> <radius>`).
Launch (PowerShell), then walk `+Acdream` into a Holtburg cottage ground floor + cellar:
```powershell
$env:ACDREAM_DAT_DIR="$env:USERPROFILE\Documents\Asheron's Call"; $env:ACDREAM_LIVE="1"
$env:ACDREAM_TEST_HOST="127.0.0.1"; $env:ACDREAM_TEST_PORT="9000"
$env:ACDREAM_TEST_USER="testaccount"; $env:ACDREAM_TEST_PASS="testpassword"
$env:ACDREAM_A8_INDOOR_BRANCH="1"; $env:ACDREAM_A8_DUMP_PV="1"; $env:ACDREAM_PROBE_ENVCELL="1"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "a8f.log"
```
Cottage cells: `0xA9B40170` (ground floor, has window exit portal), `0xA9B40171` (cellar),
`0xA9B40174/175` (cellar rooms), building `0xA`. Inn vestibule: `0xA9B40164/162`.
## Code anchors
- `src/AcDream.App/Rendering/PortalVisibilityBuilder.cs` — the builder (Finding 2 lives here).
- `src/AcDream.App/Rendering/GameWindow.cs``RenderInsideOutAcdream` (~11012);
Step-4 `else` ungated-terrain branch (~11142, Finding 4); call-site gate (~7343, 7636).
- `src/AcDream.App/Rendering/IndoorCellStencilPipeline.cs` — MarkAndPunchNdc + bit-2 helpers.
- `src/AcDream.App/Rendering/PortalProjection.cs` / `ScreenPolygonClip.cs` / `PortalView.cs` — CPU layer (correct).
- `references/WorldBuilder/.../VisibilityManager.cs:73-239` — the WB reference (flat, no recursion).
- Retail oracle: `docs/research/named-retail/acclient_2013_pseudo_c.txt``PView::ConstructView` 433750, `ClipPortals` 433572, `GetClip` 432344.
## Pickup prompt
> Read `docs/research/2026-05-29-a8f-visual-gate-failure-handoff.md` and pick up the A8.F
> debugging. The default game is SAFE (indoor branch gated behind ACDREAM_A8_INDOOR_BRANCH).
> Use `superpowers:systematic-debugging`. Two compounding root causes are documented:
> (1) the OutsideView builder under-produces (empty most frames, never narrows); (2) the
> Job-A/B decoupling floods ungated terrain when OutsideView is empty. **Start with the
> first-fix hypothesis**: make the empty-OutsideView case draw NO outdoor terrain/scenery
> when inside (an empty mask = "no outdoors visible," not "all outdoors"), to stop the
> terrain-over-walls flood and isolate the two bugs. Verify via the apparatus
> (ACDREAM_A8_DUMP_PV / ACDREAM_PROBE_ENVCELL) — read the EXIT-CULLED/PROJ/CLIP lines across
> frames to learn which gate kills the exit portals when polys=0. Then fix the builder.
> If the builder proves intractable, escalate the architectural question (handoff §"The
> architectural question") to the user before any big rewrite — do NOT thrash. No
> speculative fixes without root cause (the Iron Law). The visual gate (user looking at a
> Holtburg cottage cellar) is the acceptance test.