acdream/docs/research/2026-05-25-issue-78-visibility-culling-investigation.md
Erik 5dc4140c11 feat(render): Phase A8 — indoor visibility + streaming fixes batch
Lands the working A8 indoor-rendering and streaming fixes accumulated this
session. User has verified these visually to some degree (e.g. lifestone /
translucent meshes confirmed fine under the FrontFace flip; bridge / wall /
collision regressions confirmed fixed after travel); not every path has been
exhaustively gated. The cellar-flap defect remains OPEN and will be solved
the retail-faithful way via a dedicated brainstorm (see handoff docs).

Rendering core (reviewed, high confidence):
- EnvCellRenderer SSBO stride fix: upload packed Matrix4x4[] (64B) instead of
  the 80B CPU InstanceData struct the shader never expected — fixes the
  transform/texture "explosion" for any draw with >1 instance (cells that
  dedupe to a shared cellGeomId). Real root cause.
- WB-style global FrontFace(CW) + per-batch CullMode carried through the MDI
  layout (GroupKey + BuildIndirectArrays + DrawIndirectRange split into
  same-cull runs with absolute uDrawIDOffset per run).
- EntitySet partitioning (IndoorPass / OutdoorScenery / LiveDynamic) +
  WorldEntity.BuildingShellAnchorCellId so building shells scope to their
  dat-derived building cell instead of rendering everywhere.
- RenderOutsideInAcdream (look into buildings from outside) +
  CollectVisiblePortalBuildings frustum cull of portal bounds.
- Sky-when-inside-building + per-cell audit probe + GL-state probe.

Streaming / perf (test-covered; not independently code-reviewed this session):
- Near/far priority queues so near work wins over far; PromoteToNear carries
  full landblock + mesh data; LandblockEntriesWithoutAnimatedIndex avoids
  rebuilding the animated-lookup dict in the hot draw path. Fixes the
  bridge-not-appearing / missing-walls / broken-collision-after-travel
  regressions and improves post-transition FPS.

Tooling + docs:
- tools/A8CellAudit: offline dat cell/portal/building dumper (portals +
  buildings modes) — reproduces the cellar-flap investigation with no launch.
- docs/research cellar-flap root-cause + option-2 handoff (the didInsideStencil
  double-duty finding + the WB-recursive design decision + brainstorm prompt),
  entity-taxonomy, replan, issue-78 visibility investigation.

Diagnostics retained on purpose: ACDREAM_A8_DIAG_* gates, portal_stencil.vert
provisional pos.w clamp, and the probe families are kept (env-var gated, zero
cost when off) because the pending option-2 cellar-flap brainstorm needs them.
Strip in the option-2 ship commit.

Indoor branch stays behind ACDREAM_A8_INDOOR_BRANCH=1 (default off = pre-A8
visual). Build green; App tests + Core (streaming/dispatcher/loader) tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 10:14:50 +02:00

206 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Issue #78 + cellar-stairs visibility culling — investigation report
**Date:** 2026-05-25 PM (continuation session)
**Status:** REPORT-ONLY. Awaiting user (a) camera-rotation falsification test and (b) approach selection before any code work.
**Predecessor handoff:** [`docs/research/2026-05-25-issue-100-shipped-and-culling-handoff.md`](2026-05-25-issue-100-shipped-and-culling-handoff.md)
---
## Symptom
Two visible defects share one root cause:
1. **Cellar-stairs (observed 2026-05-25 PM, evidence for #78):** standing in a Holtburg cottage cellar with the camera at certain angles, the outdoor terrain mesh renders as a sharp-edged grass rectangle covering the cellar stair geometry. **Clears when camera moves closer** (cottage walls + stair treads geometrically occlude). Gameplay unaffected — player can walk up/down normally.
2. **Inn-wall stabs (#78, filed 2026-05-19):** standing inside the Holtburg Inn looking at the floor or walls, the user sees other buildings in the distance at their correct world position + scale, visible THROUGH the floor and walls.
The user has NOT yet run the camera-rotation falsification test (Phase 1a of the handoff). Until they do, the diagnosis below is "high confidence" but not certain.
Sibling: **#95** (dungeon portal-graph blowup) is the same visibility subsystem but a different specific failure (over-inclusion). scen5 log shows `visibleCells` per cell reaching **295** (worse than the 135-145 filed).
---
## Hypotheses (ranked)
### H1 — Indoor-camera gate missing on outdoor render passes (HIGH confidence)
**Mechanism:** `TerrainModernRenderer.Draw` and `WbDrawDispatcher` render outdoor geometry unconditionally regardless of whether the camera is inside an EnvCell. Retail and WorldBuilder both gate the outdoor passes by the indoor portal-walk result. acdream does neither.
**Evidence FOR (strong):**
- Retail anchor verified: `PView::DrawCells` at `acclient_2013_pseudo_c.txt:432709` gates `LScape::draw` (outdoor terrain dispatch) by `if (outside_view.view_count > 0)`. `outside_view.view_count` is only incremented during the indoor portal BFS (`PView::ConstructView`) when a portal targets `other_cell_id == 0xFFFFFFFF` (outdoor sentinel). When no portal sees outside, the entire outdoor pass is skipped.
- Retail's per-mesh draw (`RenderDeviceD3D::DrawMesh` line 429245) iterates `Render::PortalList->view_count` and skips meshes that straddle 0 sub-views. **No stencil** — retail uses screen-space polygon clipping via `PView::GetClip`.
- WB anchor verified: `VisibilityManager.RenderInsideOut` (lines 73-239) uses **stencil**: mark current-building portals stencil=1, punch portal regions to far depth, draw EnvCells unconditionally, then `terrain/scenery/statics` gated by `glStencilFunc(Equal, 1, 0x01)`. The top-level loop already skips the unconditional terrain draw via `if (!isInside) terrainManager.Render(...)` at GameScene.cs:965.
- acdream audit verified the gate is missing: `WbDrawDispatcher.cs:360-362` gates by `entity.ParentCellId.HasValue && !visibleCellIds.Contains(...)`. When `ParentCellId == null` (outdoor stabs, scenery, live-spawned entities), the boolean short-circuits to `cellInVis = true` — the entity passes regardless of `visibleCellIds`.
- `TerrainModernRenderer.Draw` (lines 191-208) only does per-slot frustum cull. No `visibleCellIds` parameter, no indoor-camera awareness.
- Patch geometry size (~24 m × 24 m rectangle) matches a terrain cell footprint — that's a polygon, not a precision artifact.
- "Clears when closer" matches geometric occlusion: cottage walls + stair treads come to occlude the offending terrain cells screen-space as the camera approaches. A 1 cm depth-buffer Z-fight (#100's nudge) at 2-5 m camera distance with 24-bit depth has sub-millimeter resolving power; precision is not the bottleneck.
**Evidence AGAINST:**
- User has not yet run the camera-rotation test. If the patch flickers/shimmers when rotating the camera in place, the diagnosis pivots to Z-precision.
**How to falsify:** Stand at the spot showing the cellar-stairs artifact, look at the grass patch, rotate the camera slowly without moving the character. Polygon-stable edges that track predictably with the view = culling (H1). Flickering / shimmering = Z-precision (H2).
### H2 — Residual Z-fight from #100's nudge (LOW confidence)
The 1 cm shader nudge from issue #100 might be insufficient at certain Z values or with shader precision quirks.
**Evidence FOR:** Same code area was just touched.
**Evidence AGAINST:** Predecessor research already established 1 cm @ 24-bit depth has sub-mm resolving at gameplay camera distances. Patch is rectangular polygon, not thin Z-fight strip. "Clears when closer" reverses precision direction.
**How to falsify:** Same camera-rotation test.
### H3 — #95 portal-traversal blowup is independent of H1 (HIGH confidence it IS independent)
**Mechanism:** `CellVisibility.GetVisibleCells` BFS over portals lacks termination/cap-depth logic. Network hubs expose 100+ outbound portals to disconnected dungeons, all marked visible. scen5 log shows up to 295 cells in one visible set.
**Evidence FOR independence:**
- H1 is an **asymmetric over-render** (outdoor passes ignore indoor state).
- H3 is a **symmetric over-inclusion** (BFS doesn't terminate properly).
- A fix to H1 would gate WHEN to render outdoor; H3's fix is to bound WHICH indoor cells the BFS includes.
- Different code paths: H1 lives in `TerrainModernRenderer.Draw` + `WbDrawDispatcher`; H3 lives in `CellVisibility.GetVisibleCells`.
**Conclusion:** H1 and H3 should be **separate fixes**. Closing H1 will close cellar-stairs + the outdoor-stab side of #78 but NOT close #95. The next phase should plan H1 in scope and decide whether H3 fits in the same milestone (M1.5).
---
## What we've ruled out
- **It's not the #100 cell-collapse bug returning.** `hiddenTerrainCells` plumbing was fully removed in `a64e6f2`; terrain mesh now correctly renders everywhere on the landblock per retail. The new artifact's mechanism is "outdoor geometry visible at all when indoor," not "incorrect terrain mesh shape."
- **It's not a depth-precision issue (high confidence, pending falsification).** Patch shape + "clears closer" both contradict Z-fight.
- **It's not a `ParentCellId` propagation bug.** Audit confirmed that interior cell static objects (`GameWindow.BuildInteriorEntitiesForStreaming:5476`) and cell-mesh entities (line 5416) both receive non-null `ParentCellId = envCellId`. The dispatcher's existing filter already correctly culls them when the camera is in a different building. The bug is the OPPOSITE direction (outdoor entities w/ `ParentCellId == null` always pass).
- **It's not WB extraction divergence.** Phase O extracted ~33 WB files into `src/AcDream.App/Rendering/Wb/` but the `VisibilityManager` / `RenderInsideOut` pipeline was NOT extracted — that code never existed in our tree.
- **It's not a missing camera-cell signal at the render layer.** `cameraInsideCell`, `visibility.VisibleCellIds`, and `visibility.HasExitPortalVisible` are all already computed in `GameWindow.cs:6970-6984` and live in scope at the two `Draw` call sites (lines 7074 + 7110). No new plumbing required.
---
## Approach options for the fix
Three viable approaches, with tradeoffs:
### Approach A — WB-style stencil (recommended for first ship)
Port `VisibilityManager.RenderInsideOut`'s stencil pipeline to acdream. Two-pass render: (1) mark current-building portal silhouettes in stencil, (2) gate outdoor passes by `glStencilFunc(Equal, 1, 0x01)`.
**Pros:**
- Closest to acdream's existing modern GL pipeline (we already use stencil for nothing else; adding one stencil bit is cheap).
- WB is acdream's documented rendering base (per CLAUDE.md). Cross-reference checked against retail confirms WB's intent matches retail's, just via a different mechanism.
- Handles the "see outside through open door" case correctly — terrain renders through portal silhouettes only.
- Reusable for both outdoor terrain AND outdoor entities (single stencil gate applies to all subsequent draws).
**Cons:**
- Multi-pass render adds GPU cost (small — one stencil pass per current-building's portals).
- Requires a portal-mesh upload pipeline (WB has one in `PortalRenderManager.cs:488-628`; we'd port it).
- More LOC than Approach C.
**Estimated scope:** 4-6 tasks, 1-2 weeks of implementation + verification.
### Approach B — Retail-faithful polygon-clip sub-views
Port `PView::ConstructView` + `PView::GetClip` + `Render::PortalList` from retail. Per-mesh viewport set to clipped portal polygon.
**Pros:**
- 100% retail-faithful.
**Cons:**
- Requires per-draw viewport scissor changes — current rendering uses bindless + MDI with one viewport per pass. Wedging per-mesh viewport in would break the modern pipeline's batching.
- Multi-week port. Out of scope for one session.
**Estimated scope:** 8-12 tasks, 4-6 weeks. Defer to a future milestone if needed.
### Approach C — Ship-now binary gate
When `cameraInsideCell && !visibility.HasExitPortalVisible`, skip outdoor terrain pass entirely and gate `WbDrawDispatcher` to exclude `ParentCellId == null` entities.
**Pros:**
- Smallest change. ~2-3 tasks. Closes the cellar-stairs symptom and the sealed-interior side of #78 immediately.
- All required state already computed (`HasExitPortalVisible` from `CellVisibility.GetVisibleCells` line 404).
**Cons:**
- Under-renders when player can see outside through an open door/window (renders nothing instead of clipping correctly). This is regressive vs. today for the doorway-view case.
- Per CLAUDE.md "no workarounds": this *is* a symptom-gate rather than a root-cause fix. **Would need explicit user approval.** Approach A is the correct shape; Approach C is a temporary patch.
**Estimated scope:** 2-3 tasks, 1-2 days.
---
## Recommended next step
1. **User runs the camera-rotation falsification test (~60 seconds).** Spawn at Holtburg, walk into a cottage cellar, find the angle showing the grass patch, rotate the camera in place without moving. Report what happens.
- Polygon-stable → confirms H1, proceed.
- Flickering → pivots to H2, this report needs major revision.
2. **If H1 confirmed: user picks Approach A vs C.** Recommendation: **Approach A (WB-style stencil)**. Per CLAUDE.md's "no workarounds" rule, the right thing is to port the stencil pipeline, not gate at the symptom site. Approach C is offered only if the user wants to close cellar-stairs immediately and defer doorway-view correctness as known-incomplete; that's an explicit workaround that needs user sign-off.
3. **#95 should NOT be in scope for this work.** Different mechanism, different code path. File continues as separate work in M1.5.
4. **Phase identifier:** the handoff proposes A8 (visibility) alongside A6 (physics) and A7 (lighting). I'll defer naming to the user.
5. **CLAUDE.md update for #100 ship:** the handoff calls this out as pending. Recommendation: add a brief #100 ship entry mentioning the cellar-stairs finding linked to #78. Out of scope for investigate mode; will happen at the start of the implementation session.
---
## What this is NOT
This is NOT a #100 regression. The terrain Z-nudge ship works correctly; the new artifact has a different root cause (indoor-camera gate on outdoor passes was already missing pre-#100#100 just made it more visible by removing the terrain-cell hide mechanism that incidentally masked it inside building footprints).
This is NOT a depth-precision fix. The 1cm nudge is correctly sized; larger nudges would break coplanar-floor disambiguation elsewhere.
This is NOT a `ParentCellId` data fix. Interior entities are correctly tagged.
This is NOT covered by Phase O's WB extraction. The visibility-management code was deliberately NOT extracted.
---
## Reference appendix
### Retail anchors (acclient_2013_pseudo_c.txt)
| Line | Symbol | Role |
|---|---|---|
| 92635 | `SmartBox::RenderNormalMode` | Per-frame top-level dispatcher (indoor vs outdoor branch) |
| 267912 | `LScape::draw` | Outdoor terrain dispatch |
| 311397 | `CEnvCell::find_visible_child_cell` | Point-in-visible-cell query |
| 311878 | `CEnvCell::grab_visible_cells` | Loads outdoor on `seen_outside` |
| 427843 | `RenderDeviceD3D::DrawInside` | Indoor entry point |
| 429245 | `RenderDeviceD3D::DrawMesh` | **Per-mesh portal-sub-view loop** |
| 430027 | `RenderDeviceD3D::DrawBlock` | Outdoor landblock dispatch |
| 432709 | **`PView::DrawCells`** | **The `outside_view.view_count > 0` gate** |
| 433750 | `PView::ConstructView` | BFS portal walk |
### WorldBuilder anchors
| File:Line | Role |
|---|---|
| `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/VisibilityManager.cs:73-239` | `RenderInsideOut` — full stencil pipeline |
| Same file:241-359 | `RenderOutsideIn` — outdoor branch |
| Same file:47-71 | `PrepareVisibility` — visible cell set |
| `references/WorldBuilder/Chorizite.OpenGLSDLBackend/GameScene.cs:880-1008` | Main render dispatch (lines 965, 988 are the gates) |
| `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/PortalRenderManager.cs:488-628` | Portal mesh upload |
| `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/CameraController.cs:142-174` | Camera-cell tracking (portal raycasts) |
| `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Shaders/PortalStencil.frag:7-16` | Stencil shader (writes `gl_FragDepth = 1.0`) |
### acdream extension points (audit-verified)
| File:Line | Current behavior | Extension required |
|---|---|---|
| `src/AcDream.App/Rendering/CellVisibility.cs:222-232` | Returns `VisibilityResult` with `VisibleCellIds`, `HasExitPortalVisible`, `CameraCell` | None — state already in place |
| `src/AcDream.App/Rendering/GameWindow.cs:6970-6984` | Computes `cameraInsideCell` and `playerInsideCell` per frame | None — values already in scope |
| `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs:360-374` | Gates by `ParentCellId ∈ visibleCellIds`; outdoor entities (null) always pass | Add second gate: when `cameraInsideCell == true` and entity is outdoor (`ParentCellId == null`), require stencil pass or skip entirely |
| `src/AcDream.App/Rendering/TerrainModernRenderer.cs:191-208` | Frustum-only cull; renders all loaded landblocks | Add parameter for stencil pass / indoor-camera state |
| `src/AcDream.App/Rendering/GameWindow.cs:7074` | `_terrain?.Draw(camera, frustum, neverCullLandblockId: playerLb)` | Add `cameraInsideCell` (or equivalent) parameter |
| `src/AcDream.App/Rendering/GameWindow.cs:7110` | `WbDrawDispatcher.Draw(... visibleCellIds: visibility?.VisibleCellIds, ...)` | Add `cameraInsideCell` parameter |
| `src/AcDream.Core/Rendering/RenderingDiagnostics.cs:75-77` | Existing probe flag registry (mirror of `PhysicsDiagnostics`) | Add `ProbeVisibilityEnabled` from `ACDREAM_PROBE_VIS=1` |
### Issues family map
| ID | Symptom | Closes with H1 fix? |
|---|---|---|
| #78 | Outdoor stabs visible through inn floor/walls | YES (same root cause) |
| Cellar-stairs (NEW) | Outdoor terrain visible inside cottage cellar | YES (same root cause; new evidence for #78) |
| #95 | Portal-graph visibility blowup (visibleCells up to 295) | NO — independent (different code path) |
| #79/#80/#81/#93/#94 | Indoor lighting bugs | Maybe — #93 explicitly suspects "indoor visibility culling for lights" sub-cause; lighting subsystem may share infrastructure with visibility-gate but not directly impacted |
### Workflow notes (from CLAUDE.md "How to operate")
- "No workarounds without explicit approval" — Approach C is a workaround; Approach A is the correct shape.
- Visual verification is the user's job; can't be automated.
- Phase ID for visibility work is undecided. User picks at implementation-session start.
- Per the milestones doc, this is M1.5 scope; cellar-stairs is on the M1.5 critical path because it blocks the building/cellar half of the M1.5 demo.