acdream/docs/research/2026-05-28-a8-cellar-flap-option2-handoff.md
Erik 5dc4140c11 feat(render): Phase A8 — indoor visibility + streaming fixes batch
Lands the working A8 indoor-rendering and streaming fixes accumulated this
session. User has verified these visually to some degree (e.g. lifestone /
translucent meshes confirmed fine under the FrontFace flip; bridge / wall /
collision regressions confirmed fixed after travel); not every path has been
exhaustively gated. The cellar-flap defect remains OPEN and will be solved
the retail-faithful way via a dedicated brainstorm (see handoff docs).

Rendering core (reviewed, high confidence):
- EnvCellRenderer SSBO stride fix: upload packed Matrix4x4[] (64B) instead of
  the 80B CPU InstanceData struct the shader never expected — fixes the
  transform/texture "explosion" for any draw with >1 instance (cells that
  dedupe to a shared cellGeomId). Real root cause.
- WB-style global FrontFace(CW) + per-batch CullMode carried through the MDI
  layout (GroupKey + BuildIndirectArrays + DrawIndirectRange split into
  same-cull runs with absolute uDrawIDOffset per run).
- EntitySet partitioning (IndoorPass / OutdoorScenery / LiveDynamic) +
  WorldEntity.BuildingShellAnchorCellId so building shells scope to their
  dat-derived building cell instead of rendering everywhere.
- RenderOutsideInAcdream (look into buildings from outside) +
  CollectVisiblePortalBuildings frustum cull of portal bounds.
- Sky-when-inside-building + per-cell audit probe + GL-state probe.

Streaming / perf (test-covered; not independently code-reviewed this session):
- Near/far priority queues so near work wins over far; PromoteToNear carries
  full landblock + mesh data; LandblockEntriesWithoutAnimatedIndex avoids
  rebuilding the animated-lookup dict in the hot draw path. Fixes the
  bridge-not-appearing / missing-walls / broken-collision-after-travel
  regressions and improves post-transition FPS.

Tooling + docs:
- tools/A8CellAudit: offline dat cell/portal/building dumper (portals +
  buildings modes) — reproduces the cellar-flap investigation with no launch.
- docs/research cellar-flap root-cause + option-2 handoff (the didInsideStencil
  double-duty finding + the WB-recursive design decision + brainstorm prompt),
  entity-taxonomy, replan, issue-78 visibility investigation.

Diagnostics retained on purpose: ACDREAM_A8_DIAG_* gates, portal_stencil.vert
provisional pos.w clamp, and the probe families are kept (env-var gated, zero
cost when off) because the pending option-2 cellar-flap brainstorm needs them.
Strip in the option-2 ship commit.

Indoor branch stays behind ACDREAM_A8_INDOOR_BRANCH=1 (default off = pre-A8
visual). Build green; App tests + Core (streaming/dispatcher/loader) tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:14:50 +02:00

176 lines
9.5 KiB
Markdown

# A8 cellar-flap — option-2 handoff + brainstorming kickoff (2026-05-28 PM)
## Purpose
The cellar flap is the **last** A8 indoor-rendering defect. Its root cause is
fully understood (offline-confirmed). The targeted fix (option 1) was tried,
**failed**, and the failure revealed a deeper architectural coupling. The
decision is to fix it the **retail-faithful way (option 2: WB-style recursive
portal visibility)** via a fresh `superpowers:brainstorming` session. This doc
is the single pickup point for that session.
## Current tree state (do NOT reset)
- Worktree: `.claude/worktrees/strange-albattani-3fc83c/`, branch
`claude/strange-albattani-3fc83c`, tip `e415bb3` — **all A8 work is
uncommitted** in a dirty tree.
- Build green. App tests pass (90 baseline; the 3 option-1 tests were removed).
- The option-1 code (`PortalMeshBuilder.CollectSameLevelPortalCells` +
`IsVerticalPortal` + 3 tests + the GameWindow call-site change) has been
**reverted/removed** — tree is back to the working-with-cellar-flap baseline.
- `tools/A8CellAudit` gained a `portals` mode this session (offline cell/portal
dumper) — **kept**, it's the investigation workhorse.
### What WORKS in the dirty tree (the valuable A8 batch — keep)
- EnvCellRenderer SSBO **stride fix** (mat4 upload, not 80-byte InstanceData).
- WB-style global `FrontFace(CW)` + per-batch `CullMode` through MDI.
- `EntitySet` partitioning (IndoorPass / OutdoorScenery / LiveDynamic) +
`BuildingShellAnchorCellId` scoping.
- `RenderOutsideInAcdream` (look into buildings from outside).
- `CollectVisiblePortalBuildings` frustum cull of portal bounds.
- Streaming near/far priority queues + `PromoteToNear` + the
`LandblockEntriesWithoutAnimatedIndex` hot-path fix (fixed bridge/wall/collision
regressions after travel).
- Temporary `ACDREAM_A8_DIAG_*` flags (strip before any commit).
### What DOESN'T work
- **Cellar flap** + the broader inside-out fragility (see the coupling below).
> **Decision point for the human:** the working A8 batch is large and
> uncommitted. Consider committing it (after stripping the `ACDREAM_A8_DIAG_*`
> flags) so the option-2 work starts from a clean baseline. Deferred per
> "don't commit yet," but flagged.
## The cellar flap — root cause (confirmed)
Full evidence: [`docs/research/2026-05-28-a8-cellar-flap-root-cause.md`](2026-05-28-a8-cellar-flap-root-cause.md).
Short version: the inside-out stencil mask flat-marks the exit portals
(windows/doors) of **every** visibility-BFS-reached cell. From the cellar
(`0xA9B40171`, **zero** exit portals), the BFS reaches the ground-floor cells
(`0x16F`, `0x170`) up the stairwell and marks **their** windows. Step 4 then
paints the whole outdoor world through those silhouettes wherever the cellar's
stairwell hole leaves them un-occluded. There is no constraint tying a deeper
cell's exit portal to the portal chain (the narrow stairwell) it was reached
through.
## ⭐ THE KEY FINDING — `didInsideStencil` double-duty coupling
This is the expensive lesson; do not re-pay it.
`RenderInsideOutAcdream` Step 4 (GameWindow.cs ~11167) wraps **both** the
terrain draw **and** the entire `OutdoorScenery` dispatcher draw (which includes
neighbor **building shells**, scenery, and the depth-repair pass) in:
```csharp
if (didInsideStencil) { ... terrain + OutdoorScenery ... }
```
where `didInsideStencil == (camera-side-filtered exit-portal mask is non-empty)`.
So the portal mask is doing **two jobs at once**:
- **Job A (intended):** gate "paint terrain/sky *through* the portal openings."
- **Job B (accidental):** decide "draw exterior geometry (shells/scenery/depth-repair) **at all**."
**Why option 1 failed:** option 1 correctly shrank the mask (same-level cells
only) so the cellar's mask went empty → `didInsideStencil=false` → **Step 4
skipped entirely** → exterior shells + terrain vanished → "walls transparent,
sky behind, terrain gone." The old flat mask (all visN cells) *papered over*
this by almost always keeping the mask non-empty.
**Consequence for ANY fix:** correctly scoping/clipping the portal mask is not
enough on its own — it will empty the mask in legitimate cases (looking at an
interior wall, sealed cellar) and kill exterior rendering. **Job A and Job B
must be decoupled** so exterior geometry draws regardless of whether any portal
is currently visible. This is true for option 2 as much as option 1.
## Decision: option 2 (WB-faithful recursive portal visibility)
Chosen over option 1 (decouple-only) because:
- The project mandate is faithful WB/retail porting; option 1 is a structural
deviation from WB's RenderInsideOut, and prior "cleaner redesign" deviations
were reverted.
- Option 2 handles every case (cellar, stacked floors, deep dungeons) without
per-case special-casing.
- It is large enough to deserve design-first (brainstorm), not a mid-session patch.
Note: option 2 still has to solve the Job-A/Job-B decoupling above — it's not
optional.
## Open design questions for the brainstorm (resolve BEFORE coding)
1. **Does WB even render a sealed sub-cell (cellar) via inside-out?** Check how
WB derives `_buildingsWithCurrentCell` (VisibilityManager.PrepareVisibility +
PortalRenderManager.GetBuildingPortalsByCellId). If WB *excludes* a cell with
no exit portals from the inside-out path, the "fix" may be a classification
change, not recursion. **Verify against WB source — don't assume recursion exists.**
2. **How does WB ACTUALLY constrain per-portal visibility?** Re-read
`VisibilityManager.cs` (RenderInsideOut/RenderOutsideIn) + `PortalRenderManager.cs`
end-to-end. Is the clipping (a) recursive portal traversal, (b) the 3-bit
stencil Step-5 pipeline, (c) pure Step-3 depth occlusion, or (d) the BSP
portal-graph in PrepareVisibility? Our port copied the *flat* Steps 1-4; the
constraint mechanism may live in code we didn't port.
3. **Job-A/Job-B decoupling.** Design how exterior geometry (shells + scenery +
depth-repair) draws independent of the portal mask, while terrain-through-portal
stays stencil-gated. This must land regardless of the recursion design.
4. **Stencil-bit budget + occlusion-query lifecycle** if the full WB Step-5
cross-building path is adopted (currently gated off via `ACDREAM_A8_STEP5`).
## Key source references for the brainstorm
WB (the algorithm being ported):
- `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/VisibilityManager.cs`
`PrepareVisibility` (47-71), `RenderInsideOut` (73-239), `RenderOutsideIn` (241+).
- `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/PortalRenderManager.cs`
`RenderBuildingStencilMask`, `GetVisibleBuildingPortals`, `GetBuildingPortalsByCellId`.
- `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/EnvCellRenderManager.cs`.
acdream (the current port):
- `src/AcDream.App/Rendering/GameWindow.cs``RenderInsideOutAcdream`
(~11012), `RenderOutsideInAcdream` (~196), the Step-4 `didInsideStencil` gate
(~11167). **This is where the Job-A/Job-B coupling lives.**
- `src/AcDream.App/Rendering/IndoorCellStencilPipeline.cs``PortalMeshBuilder`
(camera-side filter), `RenderBuildingStencilMask`, `DrawUploadedPortalMesh`.
- `src/AcDream.App/Rendering/Wb/BuildingLoader.cs` — building cell-set BFS
(mirrors WB PortalService; cellar IS expanded into building 0xA).
Retail oracle (if WB is ambiguous):
- `docs/research/named-retail/acclient_2013_pseudo_c.txt``CObjCell::find_visible_child_cell`
(≈311397), `PView::DrawCells` (≈432709). Retail uses screen-space polygon-clip
scissor recursion — the conceptual ancestor of "clip each portal to the chain."
Offline tooling:
- `tools/A8CellAudit``dotnet run -- portals <cellId...>` dumps a cell's
CellPortals (exit vs interior); `-- buildings <lb> <radius>` dumps
building→cell grouping. Reproduces the whole investigation in seconds, no launch.
## Brainstorming kickoff prompt (copy-paste into a fresh session)
> Use the `superpowers:brainstorming` skill. We're designing the retail-faithful
> fix for the A8 "cellar flap" — the last A8 indoor-rendering defect.
>
> Read first, in order:
> 1. `docs/research/2026-05-28-a8-cellar-flap-option2-handoff.md` (this doc) —
> current state, the confirmed root cause, the `didInsideStencil` double-duty
> finding, the decision, and the open design questions.
> 2. `docs/research/2026-05-28-a8-cellar-flap-root-cause.md` — the offline evidence.
> 3. The WB + acdream source references listed in the handoff.
>
> The goal: design how acdream's indoor visibility should render outside-through-
> portals **correctly clipped to the portal chain** (so a sealed cellar shows no
> terrain, a windowed room shows its own windows, deep rooms show only the sliver
> visible through the doorway chain) **AND** decouple "draw exterior geometry at
> all" from "is the portal mask non-empty" (the coupling that made the targeted
> fix regress).
>
> Brainstorm MUST resolve the 4 open design questions in the handoff before any
> code — especially Q1 (does WB even render a sealed cellar inside-out?) and Q2
> (what is WB's ACTUAL per-portal clipping mechanism — verify against source,
> don't assume recursion). Output a written design/plan; do not start coding
> until the design is agreed.
>
> Process rules still in force: no workarounds/band-aids; faithful WB/retail
> port; one visual gate only when a complete fix is ready; the broken indoor
> branch is behind `ACDREAM_A8_INDOOR_BRANCH=1` (default off = pre-A8 visual).
> The dirty tree has valuable uncommitted A8 work — decide whether to commit it
> (strip `ACDREAM_A8_DIAG_*` first) before starting.