acdream/docs/research/2026-05-29-a8f-visual-gate-failure-handoff.md
Erik cf3d49cbd7 docs: Phase A8.F visual-gate failure handoff + issue #103
A8.F (retail portal-frame port) shipped Tasks 0-8 but failed its visual gate:
indoor branch renders broadly wrong at runtime (terrain over walls, transparent/
invisible walls). Default game unaffected (branch gated behind
ACDREAM_A8_INDOOR_BRANCH). Two compounding root causes documented (OutsideView
under-produces; Job-A/B else-branch floods ungated terrain) + apparatus + a
first-fix hypothesis + pickup prompt. Filed #103.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 14:43:24 +02:00

12 KiB
Raw Blame History

Phase A8.F — visual-gate failure + pickup handoff (2026-05-29)

TL;DR

The retail portal-frame visibility port (Phase A8.F) shipped as code (Tasks 08, committed) but FAILED its visual gate. With ACDREAM_A8_INDOOR_BRANCH=1, indoor and outside-in rendering is broadly broken: cottage/cellar interiors are "covered in outdoor terrain / transparent walls," and walls are invisible in other houses from both inside and outside.

The default game is UNAFFECTED. cameraInsideBuilding = a8IndoorBranchEnabled && (inside a building) (GameWindow.cs:7343), so RenderInsideOutAcdream only runs with the opt-in env var. Without it, rendering is the pre-A8 path (walls render; only the old cellar flap remains). Do not panic — normal play is fine; the A8.F branch is the broken opt-in.

The work is committed (not reverted): the GL-free CPU layer is solid and unit-tested; the integration (CPU-built clipped NDC mask → stencil-gate all outdoor terrain/ scenery) is what fails at runtime. This doc has the root-cause analysis, the apparatus, and a pickup prompt.

What was built (the A8.F port)

Spec: docs/superpowers/specs/2026-05-29-phase-a8f-portal-frame-visibility-design.md Plan: docs/superpowers/plans/2026-05-29-phase-a8f-portal-frame-visibility.md

Idea: port retail's PView recursive portal-clip (ConstructView/ClipPortals/ GetClip) — WB has NO such recursion, so the flat WB stencil can't fix the cellar flap; retail clips each portal to its portal chain. We built a GL-free CPU builder that walks the portal graph and produces OutsideView (a screen-space NDC region = exit portals recursively clipped), then stencil-gate outdoor terrain/scenery to it.

Commits (on claude/strange-albattani-3fc83c, after baseline 5dc4140):

  • bb903bc Task 0 — strip ACDREAM_A8_DIAG_* flags.
  • 406307e Task 1 — ViewPolygon + CellView (GL-free data model). Unit-tested.
  • 7f46c27 Task 2 — ScreenPolygonClip (Sutherland-Hodgman convex intersection). Unit-tested.
  • a28a176 + 9ec8330 Task 3 — PortalProjection (NDC + near-plane clip). Unit-tested. (A near-plane bug was caught + fixed during impl: w>=WEpsw+z>=0.)
  • 0ed462c + 270c21f Task 4 — PortalVisibilityBuilder (the BFS). Unit-tested. (Known dungeon-scaling fast-follow filed as issue #102.)
  • d12892b + 08f6a0c + d581f4c Task 5 — IndoorCellStencilPipeline.MarkAndPunchNdc.
  • 9e2eb90 Task 6 — RenderInsideOut rewrite: builder-driven mask + Job-A/B decouple.
  • 1c02a01 + 5a012c0 Task 7 — wire-in #2 per-cell translucent clip on stencil bit 2. (A DepthFunc-leak bug was caught + fixed by code review.)
  • e0051e0 + 452ee5b Task 8 — wire-in #3 cross-building (ungated Step 5, clipped bit-1).
  • 7c3ee43 — triage apparatus (this debugging session; see below).

All dotnet build + dotnet test green throughout (App baseline 108).

The visual-gate failure — symptoms

With ACDREAM_A8_INDOOR_BRANCH=1 at Holtburg cottages (camera = +Acdream):

  1. Outside→in (looking into a cottage from outside): cellar entrance looked correct.
  2. Inside the cellar: covered in outdoor terrain; walls transparent (see-through). Passable (render-only).
  3. Looking out from inside (toward a window): looked roughly normal.
  4. Passing inside→out: buildings + ground disappear; only server-spawned things (doors/NPCs/particles) remain.
  5. Invisible walls in OTHER houses, both from inside and outside.

Root-cause analysis (evidence-based; see apparatus below)

Finding 1 — the cell walls DO render. [opaque] probe (opaque cell-render stats, captured BEFORE the per-cell transparent loop overwrites them): cells=7 tris=50/60, cells=25 tris=108 in occupied cottage cells. tris=0 only in transient frustum-culled frames. So "transparent walls" is NOT walls failing to render — it's terrain drawn over them. (NOTE: the older [envcells] probe reads stats AFTER the transparent loop, so its cells=1 tris=0 is a misleading artifact — ignore it.)

Finding 2 — OutsideView is frequently EMPTY, and when non-empty it doesn't narrow. [pv-dump] OUTSIDEVIEW polys=N: polys=0 in the majority of frames; polys=1 sometimes. When non-empty, the clipped region ≈ the full source window (e.g. from the cellar, the 0xA9B40170 window passes through ~unclipped, not narrowed to the stairwell sliver). So the recursive-clip — the entire point of A8.F — is not constraining at runtime.

Finding 3 — projection/clip MATH is correct; the builder under-produces. When a [pv-dump] EXIT line fires, the local quad → NDC → clipped chain is sane (window quad local=[(5.55,-8.61,0)(7.45,-8.61,0)(7.45,-8.35,2.5)(5.55,-8.35,2.5)] → reasonable NDC → clipped region). The viewProj is a valid System.Numerics row-vector view*proj (M33≈M34 because far≫near makes proj.M33≈-1; M44 varies with camera, expected). ProjectToNdc matches the GPU convention (verified algebraically: Vector4.Transform(v,M) == GPU M*v for transpose=false upload). Projection is not the bug. The bug is the builder yielding empty/too-wide regions for most real camera positions — the exit-portal clip produces empty (needs deeper trace: portal-side cull? FullScreen-clip producing empty? BFS not reaching exit portals from most positions?).

Finding 4 — the Job-A/B decoupling floods terrain when OutsideView is empty (the proximate cause of "transparent walls"). Task 6 made Step-4 terrain/scenery draw UNCONDITIONALLY, with only the stencil state gated. When OutsideView is empty (didInsideStencil=false), the else branch disables the stencil and draws terrain ungated (GameWindow.cs ~11142). Combined with Finding 2 (empty most frames), terrain floods over the (rendered) cell interior → "covered in terrain / transparent walls." This is exactly the Opus Task-6 code-review Minor #2 risk, realized at scale.

Why WB doesn't hit this but we do: in WB, didInsideStencil = "inside a building" (always true indoors, because it marks the whole building's exit-portal set, which is non-empty). WB never has the "inside + empty mask" case. Our builder produces empty masks frequently, so the else branch (which WB effectively never exercises with an empty mask) floods. The CPU-NDC-recursive-clip mask is far more fragile at runtime than WB's flat building mask.

The two compounding root causes (summary)

  1. OutsideView builder under-produces at runtime — empty most frames; never narrows recursively. (Builder/clip integration with real geometry; not the projection math.)
  2. Empty-OutsideView → ungated terrain flood — the Job-A/B decoupling's else branch draws terrain everywhere when the mask is empty, painting over the cell interior.

Concrete first-fix hypothesis (try this first next session)

The else branch is wrong: an empty OutsideView means "no outdoors visible from here," not "all outdoors visible." When inside a building with an empty mask, draw NO outdoor terrain/scenery (or fall back to the pre-A8 "depth-clear-when-inside" behavior), rather than ungated terrain. That alone should stop the flooding (walls become solid; you temporarily lose terrain-through-portal until the builder is fixed, but the interior renders correctly). This decouples the two bugs so each can be fixed independently.

Then separately debug Finding 2 (why the builder yields empty/too-wide regions) — the [pv-dump] apparatus already traces local→NDC→clipped; extend it to log the side-test result and the per-stage vert counts for ALL exit portals (the current dump's EXIT-CULLED/ EXIT-PROJ/EXIT-CLIP lines do this — read them across many frames to see which gate kills the portals when polys=0).

The architectural question (escalate to the human before a big rewrite)

Is "CPU-build a recursively-clipped NDC region + stencil-gate ALL outdoor terrain/scenery to it" viable in acdream's pipeline, or is it too fragile (Finding 2)? Options:

  • (a) Fix the builder + the else-branch (incremental; the first-fix hypothesis above).
  • (b) Reconsider enforcement — e.g., port retail's per-cell screen-space scissor more literally, or keep WB's flat building mask (accept the cellar flap) and special-case only the cellar. The user explicitly chose the faithful retail port (option A) at brainstorm; revisit only if (a) proves intractable.

Safety / current state

  • Default game safe: indoor branch gated behind ACDREAM_A8_INDOOR_BRANCH=1 (cameraInsideBuilding = a8IndoorBranchEnabled && inside, GameWindow.cs:7343).
  • Work is committed, not reverted (CPU layer is good; integration needs fixing).
  • The old cellar flap (the original M1.5 blocker) is still present in the default (pre-A8) path — A8.F did not fix it.
  • Tree clean as of 7c3ee43.

Apparatus (all env-gated; require ACDREAM_A8_INDOOR_BRANCH=1 to reach the code)

  • ACDREAM_A8_DUMP_PV=1[pv-dump] lines: per camera cell, the exit-portal local→NDC→clipped geometry (EXIT-CULLED / EXIT-PROJ / EXIT-CLIP / EXIT) + OUTSIDEVIEW polys=N. First 2 Build calls per distinct camera cell. (PortalVisibilityBuilder.cs.)
  • ACDREAM_PROBE_ENVCELL=1[opaque] line: opaque cell-render stats (cells/tris) BEFORE the transparent loop overwrites _envCellRenderer.Stats. One-shot per camera cell. (GameWindow.cs, after the Step-3 opaque render.)
  • ACDREAM_PROBE_VIS=1[buildings]/[draworder]/[stencil]/[envcells] (existing). NOTE [envcells] is post-transparent-loop (misleading); [stencil] verts reflects the OutsideView triangle count.
  • tools/A8CellAudit — offline cell/portal dumper (portals <cellId> / buildings <lb> <radius>).

Launch (PowerShell), then walk +Acdream into a Holtburg cottage ground floor + cellar:

$env:ACDREAM_DAT_DIR="$env:USERPROFILE\Documents\Asheron's Call"; $env:ACDREAM_LIVE="1"
$env:ACDREAM_TEST_HOST="127.0.0.1"; $env:ACDREAM_TEST_PORT="9000"
$env:ACDREAM_TEST_USER="testaccount"; $env:ACDREAM_TEST_PASS="testpassword"
$env:ACDREAM_A8_INDOOR_BRANCH="1"; $env:ACDREAM_A8_DUMP_PV="1"; $env:ACDREAM_PROBE_ENVCELL="1"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "a8f.log"

Cottage cells: 0xA9B40170 (ground floor, has window exit portal), 0xA9B40171 (cellar), 0xA9B40174/175 (cellar rooms), building 0xA. Inn vestibule: 0xA9B40164/162.

Code anchors

  • src/AcDream.App/Rendering/PortalVisibilityBuilder.cs — the builder (Finding 2 lives here).
  • src/AcDream.App/Rendering/GameWindow.csRenderInsideOutAcdream (~11012); Step-4 else ungated-terrain branch (~11142, Finding 4); call-site gate (~7343, 7636).
  • src/AcDream.App/Rendering/IndoorCellStencilPipeline.cs — MarkAndPunchNdc + bit-2 helpers.
  • src/AcDream.App/Rendering/PortalProjection.cs / ScreenPolygonClip.cs / PortalView.cs — CPU layer (correct).
  • references/WorldBuilder/.../VisibilityManager.cs:73-239 — the WB reference (flat, no recursion).
  • Retail oracle: docs/research/named-retail/acclient_2013_pseudo_c.txtPView::ConstructView 433750, ClipPortals 433572, GetClip 432344.

Pickup prompt

Read docs/research/2026-05-29-a8f-visual-gate-failure-handoff.md and pick up the A8.F debugging. The default game is SAFE (indoor branch gated behind ACDREAM_A8_INDOOR_BRANCH). Use superpowers:systematic-debugging. Two compounding root causes are documented: (1) the OutsideView builder under-produces (empty most frames, never narrows); (2) the Job-A/B decoupling floods ungated terrain when OutsideView is empty. Start with the first-fix hypothesis: make the empty-OutsideView case draw NO outdoor terrain/scenery when inside (an empty mask = "no outdoors visible," not "all outdoors"), to stop the terrain-over-walls flood and isolate the two bugs. Verify via the apparatus (ACDREAM_A8_DUMP_PV / ACDREAM_PROBE_ENVCELL) — read the EXIT-CULLED/PROJ/CLIP lines across frames to learn which gate kills the exit portals when polys=0. Then fix the builder. If the builder proves intractable, escalate the architectural question (handoff §"The architectural question") to the user before any big rewrite — do NOT thrash. No speculative fixes without root cause (the Iron Law). The visual gate (user looking at a Holtburg cottage cellar) is the acceptance test.