acdream/docs/research/2026-06-10-flap-outdoor-fullworld-CLOSED-depthmask-leak.md
2026-06-10 09:22:02 +02:00

7.1 KiB
Raw Blame History

CLOSED — §4 outdoor FULL-WORLD flap: EnvCellRenderer DepthMask(false) leak no-oped the frame depth clear

Date: 2026-06-10. Branch: claude/thirsty-goldberg-51bb9b. Commits: 682cba3 ([clip-route] probe apparatus), c4df241 (the fix). Status: FIXED — user visual gate passed + probe-verified (0 leaked frames in the 38k-line verification capture). Closes the investigation opened by 2026-06-09-flap-outdoor-fullworld-building-flood-merge-handoff.md.


1. Root cause

EnvCellRenderer.RenderModernMDIInternal established the Transparent pass state — Enable(GL_BLEND) + glDepthMask(false)before the batch pass-filter loop. A flooded cell whose batches are all opaque (a plain cottage interior: walls only, no transparent surfaces) produced totalDraws == 0 in the Transparent pass and hit the early return without ever reaching the end-of-pass restore. The frame ended with dmask=0 blend=1.

The kill is one GL semantic away: glClear(GL_DEPTH_BUFFER_BIT) honors glDepthMask. With the mask left false, the next frame's depth clear silently no-oped. The depth buffer kept the previous frame's values, and every world fragment — terrain, entities, the player, the sky — z-tested GL_LESS against its own previous-frame ghost at virtually identical depths → never strictly closer → rejected. The screen showed only the color clear, which is set to the fog color. Hence: whole world replaced by fog-tinted clear.

A second early-out of the same shape (globalVao == 0, after the SSBO uploads) could leak identically; it was fixed in the same commit.

This is the 4th instance of the feedback_render_self_contained_gl_state bug class — in the same function that carried the 1st (the U.4 blend/depth-mask establish was itself the fix for instance #3; its early-out paths were the gap).

2. Why the symptoms looked the way they did

Symptom Mechanism
Onset frame-exact at the building-flood merge The merge is the first frame the flooded building shell draws → first run of the empty Transparent pass → leak arms.
Strobes at onset The flood flickers in/out at the boundary while the eye settles → alternating no-op / working depth clears.
HOLDS (one capture: 145,238 consecutive frames) The leak re-arms every merged frame; the depth buffer stays stale indefinitely.
Camera rotation recovers instantly The doorway clips out of view → cell drops from the flood → the leaking pass stops running → frame ends dmask=1 → next clear works.
Whole world INCLUDING sky Sky also depth-tests; old-sky pixels hold depth 1.0 and 1.0 < 1.0 fails GL_LESS. Everything dies uniformly.
"Parts of the screen flash while running past cottages" / cottage enterexit artifacts Same family: every brief merge = a 1-frame no-op depth clear.
The 9×21 px doorway scissor box in [gl-state] Fingerprint only — DrawRetailPViewCellParticles' BeginDoorwayScissor leftover (test off, harmless). It marked "cell passes ran this frame", never the kill.

3. The evidence chain (one probe run decided it)

The [clip-route] apparatus (682cba3, gate ACDREAM_PROBE_CLIPROUTE=1) instrumented all three surviving suspects from the 2026-06-09 handoff in one repro run (flap-cliproute-capture.log, user walk-through at the Holtburg south-slope anchor):

  • [clip-route] — outside slice slot + NDC AABB + planes, CellIdToSlot, region-SSBO bytes decoded at the routed slot, terrain-UBO head as uploaded: full-screen planes on both sides of every merge transition (slot repacks 2↔3 with correct content). Suspect (b) UBO content — exonerated.
  • [clip-route-disp] — per-slot instance histogram as staged for binding=3: all 41,373 instances tracked the repacked outdoor slot exactly, cullEnt=0 throughout. Suspect (a) clip-slot routing — exonerated.
  • [clip-route-scis] — actual GL scissor box for the landscape pass: full-screen the entire run (printed once). Scissor — exonerated.
  • [gl-state] — the answer: frames entered with dmask=0 blend=1 for exactly the merged stretches (armed at frame 239 ≈ the cottage shell finishing its streaming after spawn-in, stable 145,238 frames through the held window, flipping in lockstep with each end-of-run strobe/recover cycle).

Draw inputs provably correct + fragments not landing ⇒ the GPU rejected them at the depth test ⇒ the only anomaly (dmask=0 entering frames) was the cause. Code reading then pinned the exact early-out: RenderModernMDIInternal's totalDraws == 0 return between the state-set and the restore.

4. The fix (c4df241) — all paths root-cause

  1. EnvCellRenderer: the pass-state establish moved below the totalDraws == 0 early-out; the globalVao == 0 check hoisted above the state-set. Mutated GL state is now established only on a path that always reaches the end-of-pass restore — set→restore is return-free.
  2. GameWindow frame clear: asserts glDepthMask(true) immediately before the glClear — the clear depends on the depth write mask, so per the project's self-contained-GL-state rule it sets the state it consumes rather than inheriting it. The [gl-state] tripwire still detects any future leak (blend etc.).

5. Verification

  • Build green; 218 app-layer tests green (294+218+420 suite baseline unchanged).
  • Probe gate: flap-fix-gate-capture.log (38,116 lines, same spawn-by-the-cottage conditions): zero dmask=0 [gl-state] frames — vs the broken run where the leak armed by frame 239. Frames stay dmask=1 blend=0 with the cottage shell drawing.
  • Visual gate (user, 2026-06-10): forward walk through the trigger zone + running past cottages — no strobe, no fog hold; "seems to work."

6. What remains in §4 + apparatus inventory

  • (2a) edge-on doorway grey and (2b) corner camera-seal remain open as the §4 siblings — re-validate both against this fixed baseline first: the no-op depth clear may have inflated their apparent severity (any 1-frame merge during those repros produced full-frame artifacts unrelated to their own mechanisms).
  • Probes kept in-tree (env-gated, zero cost when off): ACDREAM_PROBE_CLIPROUTE[clip-route] / [clip-route-disp] / [clip-route-scis]; ACDREAM_PROBE_GLSTATE[gl-state] tripwire; ACDREAM_PROBE_PVINPUT[pv-input]. Strip once §4 (2a)/(2b) are resolved.
  • Captures (worktree root, untracked): flap-cliproute-capture.log (the deciding run), flap-fix-gate-capture.log (the verification run).

7. Durable lessons (memory updated)

  • feedback_render_self_contained_gl_state — instance #4 recorded, with two new corollaries: (a) GL-state ownership includes exit paths — establish mutated state as late as possible, after every early-out; (b) glClear is a state consumer — depth clears are gated by glDepthMask, so clear sites must assert the masks they depend on.
  • New symptom→cause mapping: whole world drops to the clear color at an event boundary, holds, recovers when the event stops → leaked DepthMask(false) no-oping the frame depth clear.