acdream/docs/research/2026-05-28-a8-cellar-flap-root-cause.md
Erik 5dc4140c11 feat(render): Phase A8 — indoor visibility + streaming fixes batch
Lands the working A8 indoor-rendering and streaming fixes accumulated this
session. User has verified these visually to some degree (e.g. lifestone /
translucent meshes confirmed fine under the FrontFace flip; bridge / wall /
collision regressions confirmed fixed after travel); not every path has been
exhaustively gated. The cellar-flap defect remains OPEN and will be solved
the retail-faithful way via a dedicated brainstorm (see handoff docs).

Rendering core (reviewed, high confidence):
- EnvCellRenderer SSBO stride fix: upload packed Matrix4x4[] (64B) instead of
  the 80B CPU InstanceData struct the shader never expected — fixes the
  transform/texture "explosion" for any draw with >1 instance (cells that
  dedupe to a shared cellGeomId). Real root cause.
- WB-style global FrontFace(CW) + per-batch CullMode carried through the MDI
  layout (GroupKey + BuildIndirectArrays + DrawIndirectRange split into
  same-cull runs with absolute uDrawIDOffset per run).
- EntitySet partitioning (IndoorPass / OutdoorScenery / LiveDynamic) +
  WorldEntity.BuildingShellAnchorCellId so building shells scope to their
  dat-derived building cell instead of rendering everywhere.
- RenderOutsideInAcdream (look into buildings from outside) +
  CollectVisiblePortalBuildings frustum cull of portal bounds.
- Sky-when-inside-building + per-cell audit probe + GL-state probe.

Streaming / perf (test-covered; not independently code-reviewed this session):
- Near/far priority queues so near work wins over far; PromoteToNear carries
  full landblock + mesh data; LandblockEntriesWithoutAnimatedIndex avoids
  rebuilding the animated-lookup dict in the hot draw path. Fixes the
  bridge-not-appearing / missing-walls / broken-collision-after-travel
  regressions and improves post-transition FPS.

Tooling + docs:
- tools/A8CellAudit: offline dat cell/portal/building dumper (portals +
  buildings modes) — reproduces the cellar-flap investigation with no launch.
- docs/research cellar-flap root-cause + option-2 handoff (the didInsideStencil
  double-duty finding + the WB-recursive design decision + brainstorm prompt),
  entity-taxonomy, replan, issue-78 visibility investigation.

Diagnostics retained on purpose: ACDREAM_A8_DIAG_* gates, portal_stencil.vert
provisional pos.w clamp, and the probe families are kept (env-var gated, zero
cost when off) because the pending option-2 cellar-flap brainstorm needs them.
Strip in the option-2 ship commit.

Indoor branch stays behind ACDREAM_A8_INDOOR_BRANCH=1 (default off = pre-A8
visual). Build green; App tests + Core (streaming/dispatcher/loader) tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:14:50 +02:00

120 lines
5.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# A8 cellar-flap — structured debugging root cause (2026-05-28 PM)
## Method
Systematic-debugging Phases 1-3, all evidence gathered **offline** via the
`tools/A8CellAudit` tool (extended with a `portals` mode) — no live launches
needed. Deterministic, instant, reproducible.
## Phase 1 — evidence
Scenario (from `launch-a8-probe-normal-20260528-194536.out.log`):
- Camera in cell `0xA9B40171`, `inside=True really=True`.
- `camBldgs=[0xA]`, `visN=7 [0x16F,0x170,0x171,0x172,0x173,0x174,0x175]`.
- Portal stencil mask = 12 verts (not the old over-punch case).
- Bisection (prior session): writer is **Step 4 content**; disabling Step-2
punch does **not** fix it.
Offline audit findings:
**Building grouping** (`A8CellAudit buildings 0xA9B40000`):
```
buildingOrdinal=10 registryId=0xA model=0x01002232 portalCells=[0xA9B4016F,0xA9B40170]
```
Building `0xA`'s LandBlockInfo seed = `{0x16F, 0x170}`. `BuildingLoader` then
BFS-expands through interior portals → all 7 cells (incl. the cellar). The
BFS matches WB's `PortalService` (same algorithm), so the grouping is not the
divergence.
**Exit-portal ownership** (`A8CellAudit portals ...`):
| Cell | exit portals (0xFFFF) | interior | role |
|------|----|----|------|
| `0x16F` | **1** | 1 | ground floor (window/door) |
| `0x170` | **1** | 1 | ground floor (window/door) |
| `0x171` (camera) | **0** | 3 | cellar |
| `0x172``0x175` | **0** | 12 | cellar rooms |
So the 12-vert mask = `0x16F` exit (6v) + `0x170` exit (6v). **The cellar
camera (zero exit portals) is marking the two ground-floor windows.**
**Topology**:
```
0x171.portal[0] -> 0x170 (stairwell/hatch, polyId 54)
0x170.portal[1] -> 0x171 (polyId 5)
0x170.portal[0] -> EXIT (window/door to outside, polyId 4)
```
Cellar connects directly up to ground floor `0x170`; `0x16F` is one further hop.
**Occluder geometry** (`A8CellAudit 0xA9B40170` / `0xA9B40171`):
- `0x170` floor poly `0x0002` (n.Z=+1) **emits** — the cellar's ceiling/occluder exists.
- `0x171` has a ceiling `0x0003` (n.Z=-1, emits) AND three `NoPos` polys
`0x0036/0x0037/0x0038` (surface `0x080000DF`) that do **not** emit —
`0x0038` is a ceiling-plane poly = the **stairwell hole** up to the ground floor.
## Phase 2 — pattern vs WB
WB `RenderInsideOut` marks the building's exit portals (flat — same as us) and
relies on **Step-3 cell depth** to occlude them: terrain only survives where the
punched/cleared far-depth isn't overwritten by rendered cell geometry.
Our code matches that structure. The difference that produces the visible flap:
WB's outside view through a portal is the world geometrically behind that
portal; from a cellar, the only un-occluded opening is the **stairwell hole**
(`0x0038`, not rendered). Through that hole, stencil=1 (ground-floor window
marked) and depth=far → **Step 4 draws the entire outdoor world (terrain +
buildings) through the hole**, not a window-sized sliver. The two ground-floor
windows are 12 BFS hops above the camera and should contribute essentially
nothing from the cellar, but their full silhouettes are marked.
"Disable Step-2 punch doesn't fix it" is explained: the leak pixels are the
stairwell hole, which has **cleared (far) depth** regardless of the punch
because no cell geometry covers it — terrain passes `DepthFunc.Less` either way.
## Phase 3 — single hypothesis (root cause)
**The inside-out exit-portal stencil mask is built by flat-marking the exit
portals of every visibility-BFS-reached cell. From the cellar, the BFS reaches
the ground-floor cells, whose windows get full-silhouette-marked. Where the
cellar's stairwell hole leaves those silhouettes un-occluded, Step 4 paints the
whole outdoor world through them. There is no constraint tying a deeper cell's
exit portal to the portal chain (here: the narrow stairwell) through which its
cell became visible.**
This is a flat-vs-constrained masking gap. Not a depth bug (occluders emit and
render), not the Step-2 punch, not the camera-side filter (the cellar camera is
geometrically on the interior side of a ground-floor window's plane, so the
per-portal filter passes it).
## Phase 4 — fix options
1. **Camera-cell-scoped mask (minimal, conservative).** Mark only the camera
cell's own exit portals. Cellar (0 exit portals) → empty mask → no leak;
windowed room → marks its own windows. **Risk:** loses daylight through an
*adjacent* cell's window seen across a doorway in multi-cell ground-floor
rooms (e.g. the inn) — a visible-but-minor regression, and the flat approach
was wrong there anyway.
2. **Vertical-portal-aware scoping (targeted).** Don't propagate exit-portal
marking across a floor/ceiling (vertical-normal) portal. The cellar→ground
stairwell is a horizontal-plane portal; suppressing inheritance across it
stops the cellar from marking ground-floor windows while preserving
same-level multi-cell rooms. Needs per-portal polygon-normal classification.
3. **WB recursive/constrained portal masking (faithful, largest).** Constrain
each deeper portal's stencil to the screen region of the portal chain leading
to it. Correct for all cases (cellar + multi-cell rooms) but a substantial
port of WB's recursive RenderInsideOut.
**Recommendation:** option 2 is the best correctness/effort trade — it fixes the
cellar without the inn regression risk of option 1, and is a principled scoping
rule (don't inherit a different vertical level's exterior openings) rather than a
band-aid. Option 3 remains the eventual faithful target if cross-level portal
visibility ever needs to be exact.
## Reproduction / verification assets
- `tools/A8CellAudit` `portals` mode (added this session) dumps any cell's
`CellPortals` offline. `A8CellAudit buildings <lb> <radius>` dumps
building→cell grouping. These make the whole investigation re-runnable in
seconds with zero launches.