# Phase N.6 slice 1 Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Fix the broken `gpu_us` diagnostic in `WbDrawDispatcher` (vendor-neutral OpenGL query ring) and produce one authoritative perf baseline document at Holtburg radius=12 so the next-phase decision (slice 2 vs C.1.5 vs Tier 2) is grounded in real numbers. **Architecture:** Two commits. Commit 1 changes only `WbDrawDispatcher.cs` — replaces the two `uint` GL query handles with ring-of-3 arrays and moves the result read to *before* the next frame overwrites the slot (read frame N-3's queries, then overwrite). Commit 2 adds an env-gated surface-format histogram dump in `TextureCache.cs`, captures the actual measurement, writes the baseline doc, and amends the roadmap entry. No new automated tests — the GPU-timing fix has no observable behavior in tests, and the dump path is env-gated diagnostic only; verification is manual launch-and-look. **Tech Stack:** C# / .NET 10, Silk.NET (OpenGL 4.3+), `dotnet build` / `dotnet test` from PowerShell, live ACE on `127.0.0.1:9000` for in-world verification. **Spec:** [docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md](../specs/2026-05-11-phase-n6-slice1-design.md) (committed at `05d590c`). --- ## File Structure | File | Action | Responsibility | |---|---|---| | [`src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`](../../../src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs) | Modify | Replace 2 `uint` query handles with ring-of-3 arrays; move query result read to before next-frame overwrite. | | [`src/AcDream.App/Rendering/TextureCache.cs`](../../../src/AcDream.App/Rendering/TextureCache.cs) | Modify | Add upload-time dimension/format tracking + env-gated `TickSurfaceHistogramDumpIfEnabled()` method that fires once at frame 600. | | [`src/AcDream.App/Rendering/GameWindow.cs`](../../../src/AcDream.App/Rendering/GameWindow.cs) | Modify | Call `_textureCache.TickSurfaceHistogramDumpIfEnabled()` once per frame in `OnRender`. | | `docs/plans/2026-05-11-phase-n6-perf-baseline.md` | Create | Baseline measurement doc: setup, numbers at radii 4/8/12 (standstill + walking), surface histogram summary, conclusion paragraph recommending next phase. | | [`docs/plans/2026-04-11-roadmap.md`](../../plans/2026-04-11-roadmap.md) lines 690-705 | Modify | Amend N.6 entry to reflect the slice 1 / slice 2 split. | --- ## Task 1: GPU query ring buffering (commit 1) **Files:** - Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` The five edit zones are well-isolated by exact strings. Apply them in order — do NOT reorder; the build won't fail mid-way but the resulting code is easier to review if applied as documented. - [ ] **Step 1.1: Replace the field declarations (~line 155)** Use Edit to replace the existing field block: **old_string:** ```csharp private uint _gpuQueryOpaque; private uint _gpuQueryTransparent; private readonly long[] _gpuSamples = new long[256]; // microseconds private int _gpuSampleCursor; private bool _gpuQueriesInitialized; ``` **new_string:** ```csharp // GPU timing uses a ring of 3 query-pair slots so the read of frame N-3's // result lands when the GPU has finished (~50ms after issue on a typical // 60fps frame). Ring of 3 is the vendor-neutral choice: NVIDIA drivers with // triple-buffering+vsync can queue ~3 frames ahead, AMD typically 1-2, // Intel iGPUs vary. ResultAvailable is the safety guard if the GPU is // still working when we try to read. private const int GpuQueryRingDepth = 3; private readonly uint[] _gpuQueryOpaque = new uint[GpuQueryRingDepth]; private readonly uint[] _gpuQueryTransparent = new uint[GpuQueryRingDepth]; private int _gpuQueryFrameIndex; private readonly long[] _gpuSamples = new long[256]; // microseconds private int _gpuSampleCursor; private bool _gpuQueriesInitialized; ``` - [ ] **Step 1.2: Replace the init block (~line 347)** **old_string:** ```csharp if (diag && !_gpuQueriesInitialized) { _gpuQueryOpaque = _gl.GenQuery(); _gpuQueryTransparent = _gl.GenQuery(); _gpuQueriesInitialized = true; } ``` **new_string:** ```csharp if (diag && !_gpuQueriesInitialized) { for (int i = 0; i < GpuQueryRingDepth; i++) { _gpuQueryOpaque[i] = _gl.GenQuery(); _gpuQueryTransparent[i] = _gl.GenQuery(); } _gpuQueriesInitialized = true; } ``` - [ ] **Step 1.3: Insert the read-before-overwrite block + compute slot just before the opaque query begin (~line 774)** This step replaces the existing single-line `BeginQuery` for opaque with a block that first computes the slot, reads the slot's frame N-3 result (gated on having completed one ring), then issues the new query into the same slot. **old_string:** ```csharp _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque); ``` **new_string:** ```csharp _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); // GPU timing: compute this frame's ring slot. We read frame N-3's // result (the oldest data in the ring) before overwriting it with // frame N's queries. See spec §3 Q1/Q2 + §4 in // docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md. int gpuQuerySlot = _gpuQueryFrameIndex % GpuQueryRingDepth; if (_gpuQueriesInitialized && _gpuQueryFrameIndex >= GpuQueryRingDepth) { _gl.GetQueryObject(_gpuQueryOpaque[gpuQuerySlot], QueryObjectParameterName.ResultAvailable, out int avail); if (avail != 0) { _gl.GetQueryObject(_gpuQueryOpaque[gpuQuerySlot], QueryObjectParameterName.Result, out ulong opaqueNs); _gl.GetQueryObject(_gpuQueryTransparent[gpuQuerySlot], QueryObjectParameterName.Result, out ulong transNs); long gpuUs = (long)((opaqueNs + transNs) / 1000UL); _gpuSamples[_gpuSampleCursor] = gpuUs; _gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length; } // If avail==0 the sample is dropped silently. MedianMicros // computes over the non-zero subset, so dropped samples don't // poison the median. } if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque[gpuQuerySlot]); ``` - [ ] **Step 1.4: Update the transparent query begin to use the same slot (~line 823)** **old_string:** ```csharp if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryTransparent); ``` **new_string:** ```csharp if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryTransparent[gpuQuerySlot]); ``` - [ ] **Step 1.5: Replace the buggy in-frame read block + increment frame counter (~line 849)** **old_string:** ```csharp // Read GPU samples non-blocking; the result for the previous frame's // queries should be ready by now. If not, drop the sample (don't stall // the CPU waiting for the GPU). if (_gpuQueriesInitialized) { _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.ResultAvailable, out int avail); if (avail != 0) { _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.Result, out ulong opaqueNs); _gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.Result, out ulong transNs); long gpuUs = (long)((opaqueNs + transNs) / 1000UL); _gpuSamples[_gpuSampleCursor] = gpuUs; _gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length; } } _drawsIssued += _opaqueDrawCount + _transparentDrawCount; ``` **new_string:** ```csharp // GPU sample read happens BEFORE issuing the next frame's queries // (see step 1.3 above). Increment the frame counter here so the // next call computes a fresh slot. if (_gpuQueriesInitialized) _gpuQueryFrameIndex++; _drawsIssued += _opaqueDrawCount + _transparentDrawCount; ``` - [ ] **Step 1.6: Update Dispose to delete the full ring (~line 1140)** **old_string:** ```csharp if (_gpuQueriesInitialized) { _gl.DeleteQuery(_gpuQueryOpaque); _gl.DeleteQuery(_gpuQueryTransparent); } ``` **new_string:** ```csharp if (_gpuQueriesInitialized) { for (int i = 0; i < GpuQueryRingDepth; i++) { _gl.DeleteQuery(_gpuQueryOpaque[i]); _gl.DeleteQuery(_gpuQueryTransparent[i]); } } ``` - [ ] **Step 1.7: Build** Run from the worktree root: ```powershell dotnet build ``` Expected: build succeeds with no new warnings or errors. If the build fails, the most likely cause is a missed string in one of the steps above — re-grep `_gpuQueryOpaque` and `_gpuQueryTransparent` in `WbDrawDispatcher.cs` and confirm every reference uses the array-indexed form `[gpuQuerySlot]` or `[i]`. - [ ] **Step 1.8: Run the test suite** ```powershell dotnet test --no-build ``` Expected: same pass/fail baseline as before the change (~1688 passing, ~8 pre-existing physics/input failures unchanged). No new failures. - [ ] **Step 1.9: Manual verification — launch live and confirm `gpu_us` reports non-zero** ```powershell $env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call" $env:ACDREAM_LIVE = "1" $env:ACDREAM_TEST_HOST = "127.0.0.1" $env:ACDREAM_TEST_PORT = "9000" $env:ACDREAM_TEST_USER = "testaccount" $env:ACDREAM_TEST_PASS = "testpassword" $env:ACDREAM_WB_DIAG = "1" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "task1-verify.log" ``` In-world: walk Holtburg for ~30 seconds. Close the window when done. Verification check on `task1-verify.log`: ```powershell Select-String -Path task1-verify.log -Pattern "\[WB-DIAG\]" | Select-Object -Last 5 ``` Expected output: at least one `[WB-DIAG]` line where `gpu_us=Xm/Yp95` has X > 0 (typically tens to low-hundreds of microseconds at radius=4-12 on a modern GPU). If `gpu_us=0m/0p95` persists for the entire run, the fix didn't take — check whether the build actually rebuilt (try `dotnet build -c Debug` then re-launch). Also confirm: no visible regression in the client. Entities render, animations play, sky cycles. Close the client cleanly. - [ ] **Step 1.10: Commit** ```powershell git add src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs git commit -m @' feat(perf): Phase N.6 slice 1 — fix gpu_us double-buffering in WbDrawDispatcher The dispatcher's GPU TimeElapsed queries were polled in the same frame as the indirect draw, so glGetQueryObject(ResultAvailable) always returned 0 and gpu_us in [WB-DIAG] was stuck at 0m/0p95. Replace the 2 single-handle queries with ring-of-3 arrays and move the result read to BEFORE issuing the next frame's queries into the same slot — at frame N we read slot N%3 which holds frame N-3's queries (oldest in the ring, ~50ms old at 60fps and definitely done across all desktop GL drivers). Vendor-neutral: AMD/NVIDIA/Intel desktop GL all work without driver-specific code. No new tests — the change is purely a diagnostic readout fix, no observable behavior in the rendering path. Manual verification: [WB-DIAG] now reports non-zero gpu_us at Holtburg radius=12. Spec: docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md (§4). Co-Authored-By: Claude Opus 4.7 (1M context) '@ git status ``` Expected: clean working tree after commit. Note the new commit SHA — needed for the baseline doc's "measured against" reference. --- ## Task 2: Surface-format histogram dump path (part of commit 2 setup) **Files:** - Modify: `src/AcDream.App/Rendering/TextureCache.cs` - Modify: `src/AcDream.App/Rendering/GameWindow.cs` This task adds the env-gated one-shot dump infrastructure. It does NOT commit — the commit happens in Task 4 after the baseline document is also ready. - [ ] **Step 2.1: Add upload-time metadata tracking in `TextureCache.cs`** Add a new private dictionary that records `(width, height, formatLabel)` keyed by GL texture name. This lets `DumpSurfaceHistogram` emit dimension/format data without re-querying GL. Use Edit to insert the field right after the existing bindless cache fields (~line 41, just after `_bindlessByPalette`): **old_string:** ```csharp private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new(); public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null) ``` **new_string:** ```csharp private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new(); // Phase N.6 slice 1 (2026-05-11): per-upload metadata for the // ACDREAM_DUMP_SURFACES=1 histogram dump path. Populated at upload // time so the dump method doesn't have to query GL state. Keyed by // GL texture name (same key used in cache value tuples). Format // label is "RGBA8_DECODED" for the post-decode upload (all uploads // currently land as RGBA8 regardless of source format). private readonly Dictionary _uploadMetadata = new(); // Frame counter for the one-shot ACDREAM_DUMP_SURFACES=1 trigger. // Increments per Tick call; fires the dump once at frame index 600 // and never again for the session. See spec §5. private int _dumpFrameCounter; private bool _surfaceHistogramAlreadyDumped; public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null) ``` - [ ] **Step 2.2: Find the `UploadRgba8AsLayer1Array` method and record metadata there** Locate the method using Grep: ``` pattern: "UploadRgba8AsLayer1Array" path: src/AcDream.App/Rendering/TextureCache.cs output_mode: content -n: true ``` Read the method body (typically ~30-50 lines) to find the exact `return name;` line. The decoded texture has `decoded.Width`, `decoded.Height`, and `decoded.Rgba8` available. For each `return name;` in `UploadRgba8AsLayer1Array(DecodedTexture decoded)`, insert this line immediately before it: ```csharp _uploadMetadata[name] = (decoded.Width, decoded.Height, "RGBA8_DECODED"); ``` If the method has only one `return name;` near its end, that's a single Edit. Use the surrounding 2-3 lines of context in `old_string` to make the Edit unique. - [ ] **Step 2.3: Also record metadata in the legacy `UploadRgba8` (non-bindless) path** Locate the method: ``` pattern: "private uint UploadRgba8\b" path: src/AcDream.App/Rendering/TextureCache.cs output_mode: content -n: true ``` Apply the same `_uploadMetadata[name] = (decoded.Width, decoded.Height, "RGBA8_DECODED");` insertion before each `return name;` in `UploadRgba8(DecodedTexture decoded)`. This ensures the dump captures both legacy and modern uploads. - [ ] **Step 2.4: Add the `TickSurfaceHistogramDumpIfEnabled` public method to `TextureCache.cs`** Locate `HashPaletteOverride` using Grep: ``` pattern: "internal static ulong HashPaletteOverride" path: src/AcDream.App/Rendering/TextureCache.cs output_mode: content -n: true -A: 20 ``` Identify its closing brace. Use Edit with surrounding context to insert the new methods immediately after. **old_string:** (the last few lines of `HashPaletteOverride`): ```csharp foreach (var sp in p.SubPalettes) { h = (h ^ sp.SubPaletteId) * prime; h = (h ^ sp.Offset) * prime; h = (h ^ sp.Length) * prime; } return h; } ``` **new_string:** ```csharp foreach (var sp in p.SubPalettes) { h = (h ^ sp.SubPaletteId) * prime; h = (h ^ sp.Offset) * prime; h = (h ^ sp.Length) * prime; } return h; } /// /// Phase N.6 slice 1: one-shot surface-format histogram dump for the /// atlas-opportunity audit. Activated by ACDREAM_DUMP_SURFACES=1; fires /// once at frame 600 of the session (~10s at 60fps, ~3s at 200fps — /// both well past streaming settle at radius≤12). Output goes to /// %LOCALAPPDATA%\acdream\n6-surfaces.txt. Zero cost when off. /// See spec §5 in docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md. /// public void TickSurfaceHistogramDumpIfEnabled() { if (_surfaceHistogramAlreadyDumped) return; if (!string.Equals(Environment.GetEnvironmentVariable("ACDREAM_DUMP_SURFACES"), "1", StringComparison.Ordinal)) return; _dumpFrameCounter++; if (_dumpFrameCounter < 600) return; DumpSurfaceHistogram(); _surfaceHistogramAlreadyDumped = true; } private void DumpSurfaceHistogram() { var localAppData = Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData); var outDir = System.IO.Path.Combine(localAppData, "acdream"); System.IO.Directory.CreateDirectory(outDir); var outPath = System.IO.Path.Combine(outDir, "n6-surfaces.txt"); var sb = new System.Text.StringBuilder(); sb.AppendLine($"# acdream surface-format histogram — generated {DateTime.UtcNow:yyyy-MM-ddTHH:mm:ssZ}"); sb.AppendLine("# Per-entry: surfaceId(hex), width, height, format, byteCount"); sb.AppendLine(); // Walk every cached entry across the 6 caches, dedupe by GL name. var seen = new HashSet(); long totalBytes = 0; var bucketsByDim = new Dictionary<(int W, int H), int>(); var bucketsByFormat = new Dictionary(); var bucketsByTriple = new Dictionary<(int W, int H, string F), int>(); void Emit(uint surfaceId, uint name) { if (!seen.Add(name)) return; if (!_uploadMetadata.TryGetValue(name, out var meta)) return; int bytes = meta.Width * meta.Height * 4; totalBytes += bytes; sb.AppendLine($"0x{surfaceId:X8}, {meta.Width}, {meta.Height}, {meta.Format}, {bytes}"); var dimKey = (meta.Width, meta.Height); bucketsByDim[dimKey] = bucketsByDim.GetValueOrDefault(dimKey) + 1; bucketsByFormat[meta.Format] = bucketsByFormat.GetValueOrDefault(meta.Format) + 1; var tripleKey = (meta.Width, meta.Height, meta.Format); bucketsByTriple[tripleKey] = bucketsByTriple.GetValueOrDefault(tripleKey) + 1; } foreach (var kv in _handlesBySurfaceId) Emit(kv.Key, kv.Value); foreach (var kv in _handlesByOverridden) Emit(kv.Key.surfaceId, kv.Value); foreach (var kv in _handlesByPalette) Emit(kv.Key.surfaceId, kv.Value); foreach (var kv in _bindlessBySurfaceId) Emit(kv.Key, kv.Value.Name); foreach (var kv in _bindlessByOverridden) Emit(kv.Key.surfaceId, kv.Value.Name); foreach (var kv in _bindlessByPalette) Emit(kv.Key.surfaceId, kv.Value.Name); sb.AppendLine(); sb.AppendLine("# Rollups"); sb.AppendLine($"# Total unique GL textures: {seen.Count}"); sb.AppendLine($"# Total bytes (sum of W*H*4): {totalBytes}"); sb.AppendLine("# Top 10 (W,H) dimension buckets:"); foreach (var kv in bucketsByDim.OrderByDescending(kv => kv.Value).Take(10)) sb.AppendLine($"# {kv.Key.W}x{kv.Key.H}: {kv.Value}"); sb.AppendLine("# Format buckets:"); foreach (var kv in bucketsByFormat.OrderByDescending(kv => kv.Value)) sb.AppendLine($"# {kv.Key}: {kv.Value}"); sb.AppendLine("# Top 10 (W,H,format) triples — atlas-opportunity input:"); foreach (var kv in bucketsByTriple.OrderByDescending(kv => kv.Value).Take(10)) sb.AppendLine($"# {kv.Key.W}x{kv.Key.H} {kv.Key.F}: {kv.Value}"); System.IO.File.WriteAllText(outPath, sb.ToString()); Console.WriteLine($"[N6-DUMP] Surface histogram written to {outPath} ({seen.Count} textures, {totalBytes} bytes)"); } ``` - [ ] **Step 2.5: Confirm `using System.Linq;` is present in `TextureCache.cs`** Read the file's `using` section (top of file). If `using System.Linq;` is NOT present, add it. The `OrderByDescending` and `Take` calls in `DumpSurfaceHistogram` need it. Pattern: ``` pattern: "^using System\.Linq" path: src/AcDream.App/Rendering/TextureCache.cs output_mode: count ``` If count is 0, add `using System.Linq;` in alphabetical order with the other usings at the top of the file. - [ ] **Step 2.6: Add the per-frame call site in `GameWindow.cs`** Find a stable insertion point near the top of `OnRender` (starts at line 6288). Use Grep: ``` pattern: "_gl!\.Clear\(" path: src/AcDream.App/Rendering/GameWindow.cs output_mode: content -n: true -A: 3 ``` This finds the `Clear` call(s) in or near `OnRender`. The first one after line 6288 is where you want to insert. Read 5 lines of context around it, then Edit to insert the dump tick on the line immediately after the `Clear` call returns: The insertion (one Edit): **old_string:** (find the `Clear` call in `OnRender` and capture 1-2 lines of its context — varies; common pattern is `_gl!.Clear(ClearBufferMask.ColorBufferBit | ClearBufferMask.DepthBufferBit);` followed by the next line of `OnRender` work). **new_string:** the same `Clear` call followed by: ```csharp // Phase N.6 slice 1: one-shot surface-format histogram dump under // ACDREAM_DUMP_SURFACES=1. Zero cost when off. _textureCache?.TickSurfaceHistogramDumpIfEnabled(); ``` If `OnRender` has multiple `Clear` calls, place the tick after the first one inside the method body. The call must run exactly once per frame, before any rendering work — placing it right after `Clear` accomplishes both. - [ ] **Step 2.7: Build** ```powershell dotnet build ``` Expected: build succeeds with no new warnings. If a "name 'OrderByDescending' does not exist in current context" error appears, Step 2.5 was missed — add the `using System.Linq;` and rebuild. - [ ] **Step 2.8: Run the test suite** ```powershell dotnet test --no-build ``` Expected: same pass/fail baseline (~1688 passing, ~8 pre-existing failures). No new failures. - [ ] **Step 2.9: Manual verification — confirm the dump file appears** Launch with the dump env var on: ```powershell $env:ACDREAM_DUMP_SURFACES = "1" $env:ACDREAM_WB_DIAG = "1" # Other env vars same as Task 1 Step 1.9 dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "task2-verify.log" ``` Wait ~15 seconds after the window appears, then close it. Check the file: ```powershell Get-Content "$env:LOCALAPPDATA\acdream\n6-surfaces.txt" | Select-Object -First 30 ``` Expected: a non-empty file with the header, per-entry rows, and rollup sections. Also confirm one `[N6-DUMP] Surface histogram written to ...` line in `task2-verify.log` (just before window close). If the file is empty or missing: - Check the launch log for the `[N6-DUMP]` line. - If it's not there, `_dumpFrameCounter` didn't reach 600 — the user closed too early. Re-run and wait longer. - If it's there but the file lookup fails, the path output in the log should show what was actually written; investigate that path. **Do not commit yet.** Continue to Task 3. --- ## Task 3: Capture baseline measurements **Files:** - Create: `docs/plans/2026-05-11-phase-n6-perf-baseline.md` (final content lands in Task 4 — this task just collects the numbers). This is the manual measurement task. Each step launches the client, runs a specific scenario, and captures the diagnostic output. Save each log separately for the final write-up. Total expected time: ~30-45 min. Setup once per session: ```powershell $env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call" $env:ACDREAM_LIVE = "1" $env:ACDREAM_TEST_HOST = "127.0.0.1" $env:ACDREAM_TEST_PORT = "9000" $env:ACDREAM_TEST_USER = "testaccount" $env:ACDREAM_TEST_PASS = "testpassword" $env:ACDREAM_WB_DIAG = "1" ``` For each measurement run, set `ACDREAM_STREAM_RADIUS` before launch. Use the `QualityPreset=High` default (no overrides). All runs at Holtburg with `+Acdream` at clear midday (cycle weather with F10 → Clear, time with F7 → Noon). Per run, after ~30 seconds at the target condition, close the window and grep the log for the last 3 `[WB-DIAG]` lines — those have the steady-state numbers. - [ ] **Step 3.1: Capture radius=4 standstill** ```powershell $env:ACDREAM_STREAM_RADIUS = "4" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r4-stand.log" ``` In-world: enter world, do not move, hold position for 30 seconds. Close. ```powershell Select-String -Path baseline-r4-stand.log -Pattern "\[WB-DIAG\]" | Select-Object -Last 3 ``` Record from the median of the last 3 lines: `cpu_us`, `gpu_us`, `entSeen`, `entDrawn`, `groups`. Also note the window-title FPS shown during the test. - [ ] **Step 3.2: Capture radius=4 walking** ```powershell $env:ACDREAM_STREAM_RADIUS = "4" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r4-walk.log" ``` In-world: enter world, Tab to player mode, walk N→E→S→W across one landblock over ~30 seconds. Close. Capture same numbers as 3.1. - [ ] **Step 3.3: Capture radius=8 standstill** ```powershell $env:ACDREAM_STREAM_RADIUS = "8" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r8-stand.log" ``` Same procedure as 3.1. Wait ~40 seconds before recording (streaming takes longer to settle). - [ ] **Step 3.4: Capture radius=8 walking** ```powershell $env:ACDREAM_STREAM_RADIUS = "8" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r8-walk.log" ``` Same procedure as 3.2. - [ ] **Step 3.5: Capture radius=12 standstill** ```powershell $env:ACDREAM_STREAM_RADIUS = "12" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r12-stand.log" ``` Same procedure as 3.1. Wait ~60 seconds before recording. This is the headline measurement — pay attention to whether `gpu_us` p95 is well below 16.6 ms (60 fps target) or pushing it. - [ ] **Step 3.6: Capture radius=12 walking** ```powershell $env:ACDREAM_STREAM_RADIUS = "12" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-r12-walk.log" ``` Same procedure as 3.2 (walking across one landblock, ~30 seconds of motion within the 60s+ window). - [ ] **Step 3.7: Capture the surface histogram** ```powershell $env:ACDREAM_STREAM_RADIUS = "12" $env:ACDREAM_DUMP_SURFACES = "1" dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline-surfaces.log" ``` In-world: enter world at Holtburg, do nothing for ~30 seconds (let the dump fire at frame 600). Close. Copy the file: ```powershell Copy-Item "$env:LOCALAPPDATA\acdream\n6-surfaces.txt" -Destination "baseline-surfaces.txt" ``` Inspect: ```powershell Get-Content baseline-surfaces.txt | Select-Object -Last 40 ``` Record the rollup section (total textures, total bytes, top 10 dimension buckets, format distribution, top 10 (W,H,format) triples). - [ ] **Step 3.8: Clean up the env vars and the local app data dump** ```powershell Remove-Item Env:\ACDREAM_DUMP_SURFACES -ErrorAction SilentlyContinue Remove-Item Env:\ACDREAM_STREAM_RADIUS -ErrorAction SilentlyContinue # Optional: clean up the source file so a future re-measurement isn't confused by stale data Remove-Item "$env:LOCALAPPDATA\acdream\n6-surfaces.txt" -ErrorAction SilentlyContinue ``` All log files (`baseline-r*-*.log`, `baseline-surfaces.log`, `baseline-surfaces.txt`) remain in the worktree root for Task 4. They will NOT be committed — they're scratch. --- ## Task 4: Write baseline doc + amend roadmap + ship commit 2 **Files:** - Create: `docs/plans/2026-05-11-phase-n6-perf-baseline.md` - Modify: `docs/plans/2026-04-11-roadmap.md` lines 690-705 - [ ] **Step 4.1: Write the baseline document** Use Write to create `docs/plans/2026-05-11-phase-n6-perf-baseline.md` with this content (substitute real numbers from Task 3 captures into every `` and `` placeholder; do NOT leave any unfilled): ```markdown # Phase N.6 slice 1 — perf baseline at Holtburg **Created:** 2026-05-11. **Spec:** [docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md](../superpowers/specs/2026-05-11-phase-n6-slice1-design.md) **Measured against commit:** **Purpose:** Capture authoritative CPU+GPU dispatch numbers so the next-phase decision (slice 2 vs C.1.5 vs Tier 2) rests on real data. --- ## §1. Setup - **Hardware:** Radeon RX 9070 XT - **Resolution:** 1440p (2560×1440) - **Quality preset:** High (default) - **Connection:** live ACE at `127.0.0.1:9000` - **Character:** `+Acdream` at Holtburg - **Sky / time:** clear midday (F7 → Noon, F10 → Clear) - **Build:** Debug - **Date measured:** 2026-05-11 - **Environment overrides:** `ACDREAM_WB_DIAG=1`, `ACDREAM_STREAM_RADIUS=` ## §2. Dispatch CPU / GPU numbers Each cell records the median of the last 3 `[WB-DIAG]` lines from a ~30s stable window. `entSeen / entDrawn / groups` are also from those lines. FPS read from the window title. | Radius | Motion | cpu_us median | cpu_us p95 | gpu_us median | gpu_us p95 | FPS | entSeen | entDrawn | groups | |---|---|---|---|---|---|---|---|---|---| | 4 | standstill | | | | | | | | | | 4 | walking | | | | | | | | | | 8 | standstill | | | | | | | | | | 8 | walking | | | | | | | | | | 12| standstill | | | | | | | | | | 12| walking | | | | | | | | | ## §3. Surface-format histogram From `ACDREAM_DUMP_SURFACES=1` at radius=12, ~30s after enter-world. - **Total unique GL textures:** - **Total bytes (sum of W*H*4):** - **Top 10 (W, H) dimension buckets:** - `x`: - ... (paste from baseline-surfaces.txt rollup) - **Format distribution:** - ``: - **Top 10 (W, H, format) triples — atlas-opportunity input:** - `x `: - ... **Atlas-opportunity score:** % of surfaces fall into the top-3 (W, H, format) triples. (A score >30% means atlas consolidation could meaningfully reduce sampler switches + memory overhead; <15% means scattered content and atlas is not worth the slice-2 effort.) ## §4. Conclusion + next-phase recommendation = 14000 µs: GPU-saturated, persistent-mapped buffers and compute cull help. 3. Does the atlas score justify slice-2 atlas work? 4. Given (1)-(3), which is the right next phase? - CPU-bound + low atlas score: pivot to C.1.5 (visible content, perf already comfortable). - GPU-bound + high atlas score: do N.6 slice 2 (atlas + persistent buffers). - Either-bound + headroom + low atlas score: do C.1.5 first. - GPU saturated + need for more headroom: escalate to Tier 2.> ## §5. Raw logs Scratch logs from this measurement run (not committed): - `baseline-r4-stand.log`, `baseline-r4-walk.log` - `baseline-r8-stand.log`, `baseline-r8-walk.log` - `baseline-r12-stand.log`, `baseline-r12-walk.log` - `baseline-surfaces.log`, `baseline-surfaces.txt` ``` Fill in every `` and `` and the conclusion paragraph with the real values from Task 3. **Do NOT leave any `` placeholders.** If a measurement is missing, re-run that step from Task 3 before continuing. - [ ] **Step 4.2: Read the current roadmap N.6 entry** ``` Read offset 685, limit 25 from docs/plans/2026-04-11-roadmap.md ``` Confirm the bullet starts with `- **N.6 — Perf polish.** **Planned (post-A.5 polish takes priority).**` and ends with `Plan + spec written when work begins. **Estimate: 1-2 weeks.**`. Capture the exact text verbatim for Step 4.3's `old_string`. - [ ] **Step 4.3: Amend the roadmap entry** Use Edit. The change splits N.6 into slice 1 (shipping with this commit) and slice 2 (deferred until after C.1.5). **old_string:** the exact N.6 bullet copied from the Read in Step 4.2. **new_string:** ```markdown - **N.6 slice 1 — GPU timing fix + radius=12 perf baseline.** **SHIPPED 2026-05-11.** Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated surface-format histogram dump in `TextureCache` for atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic. Plan + spec at `docs/superpowers/{specs,plans}/2026-05-11-phase-n6-slice1-*.md`. Baseline numbers + next-phase recommendation at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). - **N.6 slice 2 — Perf polish cleanup.** **Planned — deferred until after C.1.5 (PES emitter wiring) per the baseline doc's recommendation.** Builds on slice 1's measurement. Scope: retire the legacy `Texture2D`/`sampler2D` path in `TextureCache` (currently kept for Sky + Debug + particle paths now that Terrain has migrated); delete orphan `mesh.frag` (verify zero callers post-N.5 amendment); decide bindless-everywhere vs legacy-island for the remaining `sampler2D` consumers; conditionally adopt WB atlas if the slice-1 histogram shows a real opportunity; conditionally adopt persistent-mapped buffers if the slice-1 baseline shows `BufferSubData` as a hot spot; GPU compute culling remains out-of-scope (that's Tier 3 of the perf-tiers roadmap, gated on Tier 2 first). Plan + spec written when work begins. **Estimate: 1-2 weeks once C.1.5 lands.** ``` - [ ] **Step 4.4: Build (sanity check — only docs touched, but be safe)** ```powershell dotnet build ``` Expected: build succeeds. (No code touched in Task 4; this just confirms nothing was accidentally edited in src/.) - [ ] **Step 4.5: Commit 2** ```powershell git add src/AcDream.App/Rendering/TextureCache.cs ` src/AcDream.App/Rendering/GameWindow.cs ` docs/plans/2026-05-11-phase-n6-perf-baseline.md ` docs/plans/2026-04-11-roadmap.md git commit -m @' docs(perf): Phase N.6 slice 1 — radius=12 baseline + surface dump path Capture authoritative CPU+GPU dispatch numbers at Holtburg with the gpu_us diagnostic now working (commit ). Three radii (4/8/12) × two motion modes (standstill/walking) + a surface-format histogram from ACDREAM_DUMP_SURFACES=1. Adds env-gated one-shot dump path (TextureCache.TickSurfaceHistogramDumpIfEnabled, called from GameWindow.OnRender) that fires once at frame 600 of the session — zero cost when off, writes to %LOCALAPPDATA%\acdream\n6-surfaces.txt. Baseline document at docs/plans/2026-05-11-phase-n6-perf-baseline.md closes with a recommendation paragraph for the next phase. Roadmap entry amended to reflect the slice 1 / slice 2 split. Spec: docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md (§5, §6). Co-Authored-By: Claude Opus 4.7 (1M context) '@ git status ``` Expected: clean working tree. - [ ] **Step 4.6: Final sanity sweep** ```powershell git log -3 --oneline ``` Expected: two new commits from this slice (the GPU timing fix from Task 1.10, then this docs/perf commit), under the spec commit `05d590c`. Also confirm the scratch baseline-r*.log and baseline-surfaces.* files are still NOT in the commit (they were not staged): ```powershell git status ``` Expected: clean working tree. If the scratch logs show as untracked but uncommitted, that's fine — they can be deleted manually: ```powershell Remove-Item baseline-r*.log, baseline-surfaces.log, baseline-surfaces.txt, task1-verify.log, task2-verify.log -ErrorAction SilentlyContinue ``` --- ## Acceptance check (spec §9) After Task 4 commits, walk through the spec's acceptance criteria and confirm each one. This is a paper-walk, not a re-run — the steps above produce the conditions. - [ ] **A1: `[WB-DIAG]` reports non-zero `gpu_us` at radius=12.** Verified in Task 1.9 (initial check) and Task 3.5-3.6 (full baseline run). Confirm by re-grepping `baseline-r12-stand.log`: ```powershell Select-String -Path baseline-r12-stand.log -Pattern "gpu_us=[1-9]" ``` Should return at least one line. - [ ] **A2: Vendor-neutral.** No `GL_*_NV` or `GL_*_AMD` or `GL_*_INTEL` extension references in the change. Re-grep: ```powershell Select-String -Path src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs -Pattern "NV_|AMD_|INTEL_|GL_NV|GL_AMD|GL_INTEL" ``` Expected: no matches in the new code (matches elsewhere in the file from unrelated existing code don't count). - [ ] **A3: Baseline doc has real numbers + conclusion.** Open `docs/plans/2026-05-11-phase-n6-perf-baseline.md` and visually confirm no ``, ``, `TBD`, or empty conclusion section. - [ ] **A4: Roadmap split shipped.** ```powershell Select-String -Path docs/plans/2026-04-11-roadmap.md -Pattern "N\.6 slice" ``` Expected: two matches (slice 1 + slice 2 bullets). - [ ] **A5: `dotnet build` green, no new warnings.** ```powershell dotnet build ``` Expected: succeeds. Note any new warnings vs the build output before the slice started. - [ ] **A6: `dotnet test` green at baseline (~1688 passing, ~8 pre-existing failures).** ```powershell dotnet test --no-build ``` Expected: pass count unchanged from before the slice started; failure list unchanged. - [ ] **A7: No visible regression.** Confirmed during Task 1.9 and Task 3 measurements — the user was in-world repeatedly and didn't observe any rendering issue. If anything looked off during measurement, file it as an issue and decide whether it blocks slice 1 acceptance. If any acceptance criterion fails, return to the relevant task and re-do it. Do not declare slice 1 complete with failing acceptance. --- ## After slice 1 lands The baseline document's conclusion paragraph (§4) determines the next phase: - **If conclusion recommends C.1.5:** brainstorm C.1.5 spec next, using [docs/plans/2026-04-27-phase-c1-pes-particles.md:285-295](../../plans/2026-04-27-phase-c1-pes-particles.md) as the starting scope. - **If conclusion recommends N.6 slice 2:** brainstorm slice 2 spec next, addressing legacy `TextureCache` cleanup + atlas + persistent-mapped buffers based on the histogram data. - **If conclusion recommends Tier 2:** consult [docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md](../../plans/2026-05-10-perf-tiers-2-3-roadmap.md) and brainstorm a Tier 2 spec. The choice is data-driven; the recommendation paragraph is the contract. Don't re-litigate the decision once the numbers are in.