acdream/docs/research/2026-05-28-a8-cellar-flap-root-cause.md
Erik 5dc4140c11 feat(render): Phase A8 — indoor visibility + streaming fixes batch
Lands the working A8 indoor-rendering and streaming fixes accumulated this
session. User has verified these visually to some degree (e.g. lifestone /
translucent meshes confirmed fine under the FrontFace flip; bridge / wall /
collision regressions confirmed fixed after travel); not every path has been
exhaustively gated. The cellar-flap defect remains OPEN and will be solved
the retail-faithful way via a dedicated brainstorm (see handoff docs).

Rendering core (reviewed, high confidence):
- EnvCellRenderer SSBO stride fix: upload packed Matrix4x4[] (64B) instead of
  the 80B CPU InstanceData struct the shader never expected — fixes the
  transform/texture "explosion" for any draw with >1 instance (cells that
  dedupe to a shared cellGeomId). Real root cause.
- WB-style global FrontFace(CW) + per-batch CullMode carried through the MDI
  layout (GroupKey + BuildIndirectArrays + DrawIndirectRange split into
  same-cull runs with absolute uDrawIDOffset per run).
- EntitySet partitioning (IndoorPass / OutdoorScenery / LiveDynamic) +
  WorldEntity.BuildingShellAnchorCellId so building shells scope to their
  dat-derived building cell instead of rendering everywhere.
- RenderOutsideInAcdream (look into buildings from outside) +
  CollectVisiblePortalBuildings frustum cull of portal bounds.
- Sky-when-inside-building + per-cell audit probe + GL-state probe.

Streaming / perf (test-covered; not independently code-reviewed this session):
- Near/far priority queues so near work wins over far; PromoteToNear carries
  full landblock + mesh data; LandblockEntriesWithoutAnimatedIndex avoids
  rebuilding the animated-lookup dict in the hot draw path. Fixes the
  bridge-not-appearing / missing-walls / broken-collision-after-travel
  regressions and improves post-transition FPS.

Tooling + docs:
- tools/A8CellAudit: offline dat cell/portal/building dumper (portals +
  buildings modes) — reproduces the cellar-flap investigation with no launch.
- docs/research cellar-flap root-cause + option-2 handoff (the didInsideStencil
  double-duty finding + the WB-recursive design decision + brainstorm prompt),
  entity-taxonomy, replan, issue-78 visibility investigation.

Diagnostics retained on purpose: ACDREAM_A8_DIAG_* gates, portal_stencil.vert
provisional pos.w clamp, and the probe families are kept (env-var gated, zero
cost when off) because the pending option-2 cellar-flap brainstorm needs them.
Strip in the option-2 ship commit.

Indoor branch stays behind ACDREAM_A8_INDOOR_BRANCH=1 (default off = pre-A8
visual). Build green; App tests + Core (streaming/dispatcher/loader) tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:14:50 +02:00

5.8 KiB
Raw Blame History

A8 cellar-flap — structured debugging root cause (2026-05-28 PM)

Method

Systematic-debugging Phases 1-3, all evidence gathered offline via the tools/A8CellAudit tool (extended with a portals mode) — no live launches needed. Deterministic, instant, reproducible.

Phase 1 — evidence

Scenario (from launch-a8-probe-normal-20260528-194536.out.log):

  • Camera in cell 0xA9B40171, inside=True really=True.
  • camBldgs=[0xA], visN=7 [0x16F,0x170,0x171,0x172,0x173,0x174,0x175].
  • Portal stencil mask = 12 verts (not the old over-punch case).
  • Bisection (prior session): writer is Step 4 content; disabling Step-2 punch does not fix it.

Offline audit findings:

Building grouping (A8CellAudit buildings 0xA9B40000):

buildingOrdinal=10 registryId=0xA model=0x01002232 portalCells=[0xA9B4016F,0xA9B40170]

Building 0xA's LandBlockInfo seed = {0x16F, 0x170}. BuildingLoader then BFS-expands through interior portals → all 7 cells (incl. the cellar). The BFS matches WB's PortalService (same algorithm), so the grouping is not the divergence.

Exit-portal ownership (A8CellAudit portals ...):

Cell exit portals (0xFFFF) interior role
0x16F 1 1 ground floor (window/door)
0x170 1 1 ground floor (window/door)
0x171 (camera) 0 3 cellar
0x1720x175 0 12 cellar rooms

So the 12-vert mask = 0x16F exit (6v) + 0x170 exit (6v). The cellar camera (zero exit portals) is marking the two ground-floor windows.

Topology:

0x171.portal[0] -> 0x170  (stairwell/hatch, polyId 54)
0x170.portal[1] -> 0x171  (polyId 5)
0x170.portal[0] -> EXIT   (window/door to outside, polyId 4)

Cellar connects directly up to ground floor 0x170; 0x16F is one further hop.

Occluder geometry (A8CellAudit 0xA9B40170 / 0xA9B40171):

  • 0x170 floor poly 0x0002 (n.Z=+1) emits — the cellar's ceiling/occluder exists.
  • 0x171 has a ceiling 0x0003 (n.Z=-1, emits) AND three NoPos polys 0x0036/0x0037/0x0038 (surface 0x080000DF) that do not emit — 0x0038 is a ceiling-plane poly = the stairwell hole up to the ground floor.

Phase 2 — pattern vs WB

WB RenderInsideOut marks the building's exit portals (flat — same as us) and relies on Step-3 cell depth to occlude them: terrain only survives where the punched/cleared far-depth isn't overwritten by rendered cell geometry.

Our code matches that structure. The difference that produces the visible flap: WB's outside view through a portal is the world geometrically behind that portal; from a cellar, the only un-occluded opening is the stairwell hole (0x0038, not rendered). Through that hole, stencil=1 (ground-floor window marked) and depth=far → Step 4 draws the entire outdoor world (terrain + buildings) through the hole, not a window-sized sliver. The two ground-floor windows are 12 BFS hops above the camera and should contribute essentially nothing from the cellar, but their full silhouettes are marked.

"Disable Step-2 punch doesn't fix it" is explained: the leak pixels are the stairwell hole, which has cleared (far) depth regardless of the punch because no cell geometry covers it — terrain passes DepthFunc.Less either way.

Phase 3 — single hypothesis (root cause)

The inside-out exit-portal stencil mask is built by flat-marking the exit portals of every visibility-BFS-reached cell. From the cellar, the BFS reaches the ground-floor cells, whose windows get full-silhouette-marked. Where the cellar's stairwell hole leaves those silhouettes un-occluded, Step 4 paints the whole outdoor world through them. There is no constraint tying a deeper cell's exit portal to the portal chain (here: the narrow stairwell) through which its cell became visible.

This is a flat-vs-constrained masking gap. Not a depth bug (occluders emit and render), not the Step-2 punch, not the camera-side filter (the cellar camera is geometrically on the interior side of a ground-floor window's plane, so the per-portal filter passes it).

Phase 4 — fix options

  1. Camera-cell-scoped mask (minimal, conservative). Mark only the camera cell's own exit portals. Cellar (0 exit portals) → empty mask → no leak; windowed room → marks its own windows. Risk: loses daylight through an adjacent cell's window seen across a doorway in multi-cell ground-floor rooms (e.g. the inn) — a visible-but-minor regression, and the flat approach was wrong there anyway.

  2. Vertical-portal-aware scoping (targeted). Don't propagate exit-portal marking across a floor/ceiling (vertical-normal) portal. The cellar→ground stairwell is a horizontal-plane portal; suppressing inheritance across it stops the cellar from marking ground-floor windows while preserving same-level multi-cell rooms. Needs per-portal polygon-normal classification.

  3. WB recursive/constrained portal masking (faithful, largest). Constrain each deeper portal's stencil to the screen region of the portal chain leading to it. Correct for all cases (cellar + multi-cell rooms) but a substantial port of WB's recursive RenderInsideOut.

Recommendation: option 2 is the best correctness/effort trade — it fixes the cellar without the inn regression risk of option 1, and is a principled scoping rule (don't inherit a different vertical level's exterior openings) rather than a band-aid. Option 3 remains the eventual faithful target if cross-level portal visibility ever needs to be exact.

Reproduction / verification assets

  • tools/A8CellAudit portals mode (added this session) dumps any cell's CellPortals offline. A8CellAudit buildings <lb> <radius> dumps building→cell grouping. These make the whole investigation re-runnable in seconds with zero launches.