acdream/docs/research/2026-06-10-flap-outdoor-fullworld-CLOSED-depthmask-leak.md
2026-06-10 09:22:02 +02:00

114 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CLOSED — §4 outdoor FULL-WORLD flap: EnvCellRenderer DepthMask(false) leak no-oped the frame depth clear
**Date:** 2026-06-10. **Branch:** `claude/thirsty-goldberg-51bb9b`.
**Commits:** `682cba3` ([clip-route] probe apparatus), `c4df241` (the fix).
**Status:** FIXED — user visual gate passed + probe-verified (0 leaked frames in the
38k-line verification capture). Closes the investigation opened by
`2026-06-09-flap-outdoor-fullworld-building-flood-merge-handoff.md`.
---
## 1. Root cause
`EnvCellRenderer.RenderModernMDIInternal` established the Transparent pass state —
`Enable(GL_BLEND)` + `glDepthMask(false)`**before** the batch pass-filter loop. A
flooded cell whose batches are **all opaque** (a plain cottage interior: walls only, no
transparent surfaces) produced `totalDraws == 0` in the Transparent pass and hit the
early `return` **without ever reaching the end-of-pass restore**. The frame ended with
`dmask=0 blend=1`.
The kill is one GL semantic away: **`glClear(GL_DEPTH_BUFFER_BIT)` honors
`glDepthMask`.** With the mask left false, the next frame's depth clear silently no-oped.
The depth buffer kept the previous frame's values, and every world fragment — terrain,
entities, the player, the sky — z-tested `GL_LESS` against its **own previous-frame
ghost** at virtually identical depths → never strictly closer → rejected. The screen
showed only the color clear, which is set to the fog color. Hence: whole world replaced
by fog-tinted clear.
A second early-out of the same shape (`globalVao == 0`, after the SSBO uploads) could
leak identically; it was fixed in the same commit.
This is the **4th instance** of the `feedback_render_self_contained_gl_state` bug class —
in the **same function** that carried the 1st (the U.4 blend/depth-mask establish was
itself the fix for instance #3; its early-out paths were the gap).
## 2. Why the symptoms looked the way they did
| Symptom | Mechanism |
|---|---|
| Onset frame-exact at the building-flood merge | The merge is the first frame the flooded building shell draws → first run of the empty Transparent pass → leak arms. |
| Strobes at onset | The flood flickers in/out at the boundary while the eye settles → alternating no-op / working depth clears. |
| HOLDS (one capture: 145,238 consecutive frames) | The leak re-arms every merged frame; the depth buffer stays stale indefinitely. |
| Camera rotation recovers instantly | The doorway clips out of view → cell drops from the flood → the leaking pass stops running → frame ends `dmask=1` → next clear works. |
| Whole world INCLUDING sky | Sky also depth-tests; old-sky pixels hold depth 1.0 and `1.0 < 1.0` fails `GL_LESS`. Everything dies uniformly. |
| "Parts of the screen flash while running past cottages" / cottage enterexit artifacts | Same family: every brief merge = a 1-frame no-op depth clear. |
| The 9×21 px doorway scissor box in `[gl-state]` | Fingerprint only — `DrawRetailPViewCellParticles`' `BeginDoorwayScissor` leftover (test off, harmless). It marked "cell passes ran this frame", never the kill. |
## 3. The evidence chain (one probe run decided it)
The `[clip-route]` apparatus (`682cba3`, gate `ACDREAM_PROBE_CLIPROUTE=1`) instrumented
all three surviving suspects from the 2026-06-09 handoff in one repro run
(`flap-cliproute-capture.log`, user walk-through at the Holtburg south-slope anchor):
- **`[clip-route]`** — outside slice slot + NDC AABB + planes, CellIdToSlot, region-SSBO
bytes decoded at the routed slot, terrain-UBO head as uploaded: **full-screen planes on
both sides of every merge transition** (slot repacks 2↔3 with correct content). Suspect
(b) UBO content — exonerated.
- **`[clip-route-disp]`** — per-slot instance histogram as staged for binding=3: all
41,373 instances tracked the repacked outdoor slot exactly, `cullEnt=0` throughout.
Suspect (a) clip-slot routing — exonerated.
- **`[clip-route-scis]`** — actual GL scissor box for the landscape pass: full-screen the
entire run (printed once). Scissor — exonerated.
- **`[gl-state]`** — the answer: frames entered with `dmask=0 blend=1` for **exactly** the
merged stretches (armed at frame 239 ≈ the cottage shell finishing its streaming after
spawn-in, stable 145,238 frames through the held window, flipping in lockstep with each
end-of-run strobe/recover cycle).
Draw inputs provably correct + fragments not landing ⇒ the GPU rejected them at the
depth test ⇒ the only anomaly (`dmask=0` entering frames) was the cause. Code reading
then pinned the exact early-out: `RenderModernMDIInternal`'s `totalDraws == 0` return
between the state-set and the restore.
## 4. The fix (`c4df241`) — all paths root-cause
1. **EnvCellRenderer:** the pass-state establish moved **below** the `totalDraws == 0`
early-out; the `globalVao == 0` check hoisted above the state-set. Mutated GL state is
now established only on a path that always reaches the end-of-pass restore —
set→restore is return-free.
2. **GameWindow frame clear:** asserts `glDepthMask(true)` immediately before the
`glClear` — the clear *depends on* the depth write mask, so per the project's
self-contained-GL-state rule it sets the state it consumes rather than inheriting it.
The `[gl-state]` tripwire still detects any future leak (blend etc.).
## 5. Verification
- Build green; 218 app-layer tests green (294+218+420 suite baseline unchanged).
- **Probe gate:** `flap-fix-gate-capture.log` (38,116 lines, same spawn-by-the-cottage
conditions): **zero** `dmask=0` `[gl-state]` frames — vs the broken run where the leak
armed by frame 239. Frames stay `dmask=1 blend=0` with the cottage shell drawing.
- **Visual gate (user, 2026-06-10):** forward walk through the trigger zone + running
past cottages — no strobe, no fog hold; "seems to work."
## 6. What remains in §4 + apparatus inventory
- **(2a) edge-on doorway grey** and **(2b) corner camera-seal** remain open as the §4
siblings — **re-validate both against this fixed baseline first**: the no-op depth
clear may have inflated their apparent severity (any 1-frame merge during those repros
produced full-frame artifacts unrelated to their own mechanisms).
- Probes kept in-tree (env-gated, zero cost when off):
`ACDREAM_PROBE_CLIPROUTE``[clip-route]` / `[clip-route-disp]` / `[clip-route-scis]`;
`ACDREAM_PROBE_GLSTATE``[gl-state]` tripwire; `ACDREAM_PROBE_PVINPUT``[pv-input]`.
Strip once §4 (2a)/(2b) are resolved.
- Captures (worktree root, untracked): `flap-cliproute-capture.log` (the deciding run),
`flap-fix-gate-capture.log` (the verification run).
## 7. Durable lessons (memory updated)
- `feedback_render_self_contained_gl_state` — instance #4 recorded, with two new
corollaries: (a) GL-state ownership includes **exit paths** — establish mutated state
as late as possible, after every early-out; (b) **`glClear` is a state consumer** —
depth clears are gated by `glDepthMask`, so clear sites must assert the masks they
depend on.
- New symptom→cause mapping: *whole world drops to the clear color at an event boundary,
holds, recovers when the event stops* → leaked `DepthMask(false)` no-oping the frame
depth clear.