# Phase A8 — EnvCellRenderer line-by-line audit findings (2026-05-28 PM) ## TL;DR The post-Wave-5 visual chaos (flickering colors, missing walls, GPU 100%, ~10 FPS) is caused by **two interconnected pool-management bugs** in `src/AcDream.App/Rendering/Wb/EnvCellRenderer.cs`. Both were introduced by the WB port author's "simplification" — dropping `PostPreparePoolIndex` from the snapshot (calling it scenery-only) and dropping `list.Clear()` from `GetPooledList`. Either bug alone is enough to corrupt rendering; together they explain every reported symptom precisely. Neither bug was found by the five post-Wave-5 speculative fixes because none of them inspected the pool-management code path. The bug is in code the subagent wrote that nobody had read line-by-line against WB. The audit pattern in the handoff doc (Process retrospective §3 "Trust-but-verify on subagent work") found it in under 30 minutes once actually applied. A third minor bug in `PopulateRecursive` is also documented for completeness. ## Bug #1 — `GetPooledList` is missing `list.Clear()` ### Code as shipped `src/AcDream.App/Rendering/Wb/EnvCellRenderer.cs:959-969`: ```csharp private List GetPooledList() { lock (_listPool) { if (_poolIndex < _listPool.Count) return _listPool[_poolIndex++]; // ← MISSING Clear() var fresh = new List(); _listPool.Add(fresh); _poolIndex++; return fresh; } } ``` ### WB canonical implementation `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectRenderManagerBase.cs:1221-1233`: ```csharp protected List GetPooledList() { lock (_listPool) { if (_poolIndex < _listPool.Count) { var list = _listPool[_poolIndex++]; list.Clear(); // ← CRITICAL: clears stale data from prior frames return list; } var newList = new List(); _listPool.Add(newList); _poolIndex++; return newList; } } ``` ### Effect of the bug `PrepareRenderBatches`'s merge phase pattern (`EnvCellRenderer.cs:494-501`) is: ```csharp if (!gfxDict.TryGetValue(gfxKvp.Key, out var list)) { list = GetPooledList(); gfxDict[gfxKvp.Key] = list; } list.AddRange(gfxKvp.Value); // ← appends to whatever was in list ``` Each frame, when `GetPooledList` returns a previously-used pool list: - That list STILL contains prior frames' instance data - `list.AddRange(gfxKvp.Value)` appends the new frame's data on top - Lists grow unbounded across frames - The GPU receives buffers with N frames' worth of instance data After ~50 frames at 60Hz, every batch's instance count is ~50× what it should be. The GPU processes 50× more vertices per draw. Hence GPU 100% + low FPS. Worse: the per-instance transforms in the appended data are STALE — they represent where instances were in prior frames, not now. Hence "flickering colors" (transforms appear/disappear randomly) and "missing walls" (instances whose latest data has been buried under stale entries). ## Bug #2 — `_poolIndex = snapshot.BatchedByCell.Count` instead of `snapshot.PostPreparePoolIndex` ### Code as shipped `src/AcDream.App/Rendering/Wb/EnvCellRenderer.cs:614`: ```csharp lock (_renderLock) { var snapshot = _activeSnapshot; _shader.Use(); _poolIndex = snapshot.BatchedByCell.Count; // reset point (mirrors WB line 405) ← WRONG ``` And `EnvCellVisibilitySnapshot.cs:11-13` (the snapshot definition): ```csharp /// The scenery-side VisibleGroups / VisibleGfxObjIds / IntersectingLandblocks /// / PostPreparePoolIndex are dropped — we render scenery through /// WbDrawDispatcher, not through this snapshot. ``` ### WB canonical implementation `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/EnvCellRenderManager.cs:405`: ```csharp _poolIndex = snapshot.PostPreparePoolIndex; ``` And `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/VisibilitySnapshot.cs:31`: ```csharp public int PostPreparePoolIndex { get; init; } ``` Stored at `EnvCellRenderManager.cs:367` during the Prepare→snapshot swap: ```csharp _activeSnapshot = new VisibilitySnapshot { BatchedByCell = newBatchedByCell, ... PostPreparePoolIndex = _poolIndex // ← captured before _poolIndex reset }; ``` ### Effect of the bug The snapshot author's reasoning ("PostPreparePoolIndex is scenery-side, we don't need it") is **wrong**. PostPreparePoolIndex has nothing to do with scenery — it's the pool-index high-water mark from Prepare's merge phase, which tells Render's filter-path where the pool's safe region begins. Specifically: - After Prepare, `_listPool[0..K-1]` hold the data referenced by `snapshot.BatchedByCell`. K = `_poolIndex` after merge. - For Render to call `GetPooledList` safely (without trampling snapshot data), `_poolIndex` must be set to K, so new pool calls return `_listPool[K..]` — past the snapshot's region. Our broken code sets `_poolIndex = BatchedByCell.Count`, which has NO RELATION to the pool's high-water mark. For a typical indoor scene at Holtburg with ~5-20 cells visible, `BatchedByCell.Count` ≈ 18, but Prepare may have used 50-100 pool lists (lists per unique (cellId, gfxObjId) combo). When Render's filter-path (`EnvCellRenderer.cs:684`) calls `GetPooledList`, it returns a list at index 18 — which IS inside `snapshot.BatchedByCell`. With Bug #1 fixed (Clear added), `Clear()` wipes that snapshot list. With Bug #1 unfixed, `AddRange` corrupts it. Either way: snapshot data destroyed mid-Render. ## Combined effect — why both bugs together explain the symptoms | Symptom | Bug #1 alone | Bug #2 alone | Both together | |---|---|---|---| | GPU 100%, ~10 FPS | YES (lists grow per frame) | partial | YES (compounding) | | Flickering colors | YES (stale + new transforms mixed) | YES (snapshot lists overwritten mid-render) | YES (both effects) | | Missing walls | YES (new instances buried under stale) | YES (snapshot data wiped before draw) | YES (both effects) | | Black diagonal sliver (final report) | YES (lists hit memory pressure / driver limits) | YES (snapshot corruption produces degenerate transforms) | YES | The "ColorMask alpha-bit off" / "cull-cache stale" / "double terrain" / "missing IndoorPass" fixes from the five post-Wave-5 attempts may all be real (some are real bugs, just not THE bug). They didn't help visually because the pool aliasing was always the dominant cause. ## Bug #3 (minor) — `PopulateRecursive` always passes `isSetup: false` ### Code as shipped `src/AcDream.App/Rendering/Wb/EnvCellRenderer.cs:358`: ```csharp foreach (var (partId, partTransform) in rd.SetupParts) { var combined = partTransform * transform; PopulateRecursive(group, partId, isSetup: false, combined, cellId); // ← always false } ``` ### WB canonical implementation `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectRenderManagerBase.cs:813`: ```csharp foreach (var (partId, partTransform) in renderData.SetupParts) { PopulateRecursive(groups, partId, (partId >> 24) == 0x02, partTransform * transform, cellId, flags); } ``` WB detects nested Setups by checking `(partId >> 24) == 0x02` (Setup IDs use the 0x02 high byte in retail). Our code always treats child parts as plain GfxObjs. ### Effect For most cells in Holtburg, this doesn't matter — typical cells reference simple GfxObj parts (high byte 0x01). But if any Setup contains a nested Setup, our code adds a stub InstanceData with the nested Setup's ID into the group dict, then `TryGetRenderData(nestedSetupId)` returns a Setup record, and Render's `if (renderData != null && !renderData.IsSetup)` check fails → instance silently dropped. Some renderings missing in specific cases but not the catastrophic chaos. Fixed as a defense-in-depth measure to match WB exactly. ## Fix plan ### Required (closes Bugs #1 + #2): 1. **`EnvCellRenderer.cs:959-969`** — Add `list.Clear()` before return in `GetPooledList` reuse branch. 2. **`EnvCellVisibilitySnapshot.cs`** — Add `public int PostPreparePoolIndex { get; init; }`. 3. **`EnvCellRenderer.cs:523-534`** — Capture `_poolIndex` into snapshot's `PostPreparePoolIndex` field during the atomic swap. 4. **`EnvCellRenderer.cs:614`** — Read `snapshot.PostPreparePoolIndex` instead of `snapshot.BatchedByCell.Count`. ### Recommended (closes Bug #3): 5. **`EnvCellRenderer.cs:358`** — `isSetup: (partId >> 24) == 0x02` to correctly detect nested Setups. ### Apparatus (safety net for any unidentified bug): 6. **GL state assertion probe extension** — extend `EmitDrawOrderProbe` to log full GL state at each step boundary and compare against WB-expected state. Lifts cost of finding the NEXT bug (if any) from a multi-hour speculation cycle to a one-launch evidence cycle. ## Regression coverage The pool aliasing bug is hard to unit-test without a GL context (Render is the mutation site, and Render requires a shader and global mesh buffer). A targeted test for the simpler invariant — `GetPooledList` returns a cleared list — is sufficient: it directly verifies Bug #1's fix and provides a guard against future regressions. Bug #2 is verified by the visual gate. ## Process retrospective The "trust-but-verify subagent work" item in the handoff doc's Process retrospective §3 names exactly this failure mode: the subagent wrote `EnvCellRenderer.cs` (Wave 2, ~1013 LOC), build-test-green checks passed, but the audit never happened. The five post-Wave-5 speculative fixes were chasing symptoms because the audit step was skipped. The cost of the missed audit: ~1 hour of speculation across five fix attempts, plus the user's five failed visual gates. The cost of the audit, when actually performed: ~30 minutes to find both root cause bugs via line-by-line comparison. **Rule for subagent-written code touching production paths:** always read the diff line-by-line against the cited WB / retail source before merging. Bugs in subagent code that "build clean and pass tests" can still be 100%-wrong on the algorithm — the test coverage was not designed to catch them.