From 083c10c514302631df817d4eac25a3f0e4413469 Mon Sep 17 00:00:00 2001 From: Erik Date: Sat, 9 May 2026 13:03:14 +0200 Subject: [PATCH] docs(N.5b T10): roadmap + ISSUES + CLAUDE.md + perf baseline updates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Document Phase N.5b shipping (terrain on the modern rendering path via Path C — `TerrainModernRenderer` mirrors WB's `TerrainRenderManager` pattern but consumes acdream's `LandblockMesh.Build` so retail's `FSplitNESW` formula stays in lockstep with physics + visual mesh). Changes: - `docs/plans/2026-04-11-roadmap.md` — add N.5b row to the Shipped table; promote N.5b's "Phases ahead" entry to ✓ SHIPPED with the Path C resolution + perf reality check; refresh N.6 scope to note Terrain has joined the modern path (legacy `Texture2D` retirement scope narrows to Sky + Debug); update top-of-doc Status line. - `docs/ISSUES.md` — close issue #51 (WB terrain-split formula divergence). Move from OPEN to "Recently closed" with the Path C resolution: never adopted WB's formula; modern dispatcher uses retail's via `LandblockMesh.Build`. References `da56063` (the black-terrain fix that landed within the N.5b ship chain). - `CLAUDE.md` — add `TerrainModernRenderer.cs` to the WB integration cribs list with the GL_INVALID_OPERATION caveat (use uvec2 + `sampler2DArray(handle)` constructor, NOT direct `uniform sampler2DArray` + `glProgramUniformHandleARB`). Update the "Currently in flight" preamble: N.6 builds on N.5 + N.5b; add an N.5b shipped paragraph linking the perf baseline doc. - `docs/plans/2026-05-09-phase-n5b-perf-baseline.md` — new doc capturing the radius=5 Holtburg perf measurement (modern 6.4-7.0 µs median vs legacy 1.5 µs — modern is ~4× SLOWER on CPU at radius=5). Documents the spec acceptance criterion #5 amendment, the architectural wins that DO hold (zero glBindTexture/frame, constant-cost dispatch as A.5 raises radius, per-LB frustum cull), and the three high-value gotchas surfaced during implementation. User-memory updates (outside repo, not in this commit): - `memory/project_phase_n5b_state.md` — full N.5b state file with the three gotchas captured. - `memory/MEMORY.md` — index entry pointing at the state file. Build: dotnet build green. No code changes in this commit. Co-Authored-By: Claude Opus 4.7 (1M context) --- CLAUDE.md | 38 +++++- docs/ISSUES.md | 109 ++++++++---------- docs/plans/2026-04-11-roadmap.md | 55 ++++++--- .../2026-05-09-phase-n5b-perf-baseline.md | 98 ++++++++++++++++ 4 files changed, 220 insertions(+), 80 deletions(-) create mode 100644 docs/plans/2026-05-09-phase-n5b-perf-baseline.md diff --git a/CLAUDE.md b/CLAUDE.md index ae36f35..8d8de01 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -102,6 +102,14 @@ ourselves". eventually picks it up finds the hook there; the change is localized: extend `InstanceData` stride 64→80 bytes, add the field, mix into fragment color in `mesh_modern.frag`. ~30 min when the time comes. +- `src/AcDream.App/Rendering/TerrainModernRenderer.cs` — terrain dispatcher + on N.5's modern primitives. Mirrors WB's `TerrainRenderManager` pattern + (single global VBO/EBO + slot allocator + `glMultiDrawElementsIndirect`) + but driven by acdream's `LandblockMesh.Build` so retail's `FSplitNESW` + formula is preserved (issue #51 resolved). Atlas handles bound via the + uvec2 + `sampler2DArray(handle)` constructor pattern (NOT the direct + `uniform sampler2DArray` + `glProgramUniformHandleARB` form, which + GL_INVALID_OPERATIONs on at least one driver). **Execution phases:** R1→R8 in the architecture doc. Each phase has clear goals, test criteria, and builds on the previous. Don't skip phases. @@ -504,13 +512,33 @@ acdream's plan lives in two files committed to the repo: **Currently in flight: Phase N.6 — Perf polish.** Roadmap entry at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md). -Builds on N.5. Legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, -`WbFoundationFlag`) were retired in the N.5 ship amendment — N.6 scope is -perf-only: WB atlas adoption, persistent-mapped buffers, GPU-side culling, -GL_TIME_ELAPSED query double-buffering, direct N.4 vs N.5 perf measurement, -legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky/Terrain/Debug). +Builds on N.5 + N.5b. Legacy renderers (`InstancedMeshRenderer`, +`StaticMeshRenderer`, `WbFoundationFlag`) were retired in the N.5 ship +amendment, and the terrain legacy renderer (`TerrainChunkRenderer` + +`TerrainRenderer` + legacy `terrain.vert/.frag`) was retired in N.5b. +N.6 scope is perf-only: WB atlas adoption, persistent-mapped buffers +(strong candidate after N.5b's per-frame DEIC `BufferSubData`), +GPU-side culling via compute pre-pass, GL_TIME_ELAPSED query +double-buffering, direct higher-radius perf comparison once A.5 lands, +legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky / Debug +remain on the legacy path now that Terrain has migrated). Plan + spec written when work begins. +**Phase N.5b (Terrain on Modern Rendering Path) shipped 2026-05-09.** +`TerrainModernRenderer` mirrors WB's `TerrainRenderManager` pattern +(single global VBO/EBO + slot allocator + bindless atlas + +`glMultiDrawElementsIndirect`) but consumes `LandblockMesh.Build` so +retail's `FSplitNESW` formula is preserved (Path C; closes ISSUE #51). +Path A (substitute WB's `CalculateSplitDirection`) killed by 49.98% +divergence vs retail in +[`tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`](tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs). +At radius=5 in Holtburg modern is ~4× SLOWER on CPU than the legacy +chunked path was; architectural wins manifest at higher radius. Honest +perf baseline at +[`docs/plans/2026-05-09-phase-n5b-perf-baseline.md`](docs/plans/2026-05-09-phase-n5b-perf-baseline.md). +Plan archived at +[`docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`](docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md). + **Phase N.5 (Modern Rendering Path) shipped + amended 2026-05-08.** `WbDrawDispatcher` on bindless textures + `glMultiDrawElementsIndirect`. CPU dispatcher 1.23ms/frame at Holtburg (~810 fps). **Ship amendment:** `InstancedMeshRenderer`, diff --git a/docs/ISSUES.md b/docs/ISSUES.md index 95dcbc6..39f4723 100644 --- a/docs/ISSUES.md +++ b/docs/ISSUES.md @@ -46,64 +46,6 @@ Copy this block when adding a new issue: # Active issues -## #51 — WB's terrain-split formula diverges from retail's `FSplitNESW` - -**Status:** OPEN -**Severity:** MEDIUM (blocks isolated N.2; affects sequencing of N-phase migration) -**Filed:** 2026-05-08 -**Component:** terrain math / Phase N (WorldBuilder rendering migration) - -**Description:** WB's `TerrainUtils.CalculateSplitDirection` -([references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44](references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44)) -uses a different math expression from retail's `FSplitNESW` -(documented in CLAUDE.md as **the** real AC terrain split formula, -constants `0x0CCAC033` / `0x421BE3BD` / `0x6C1AC587` / `0x519B8F25`). -Ours is a degree-2 polynomial in (x,y); WB's is linear in (x,y). -They cannot be algebraically equivalent and disagree on a meaningful -fraction of cells. - -**Concrete impact:** On any cell where the formulas pick different -diagonals, the same world position (X, Y) maps to different terrain -heights — up to ~2m for a sloped cell with one elevated corner. If a -caller mixes "WB-formula path" and "AC2D-formula path" for the same -cell, the player physics floats above or sinks below the visible -ground. This is the bug class fixed in -[src/AcDream.Core/Physics/TerrainSurface.cs:113-120](src/AcDream.Core/Physics/TerrainSurface.cs:113) -(diagonal-direction inversion). - -**Files implicated:** -- `src/AcDream.Core/Physics/TerrainSurface.cs` — uses AC2D formula via - `IsSplitSWtoNE` -- `src/AcDream.Core/World/TerrainBlending.cs` — visual mesh, also AC2D -- `references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44` - — WB's diverging formula -- `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/TerrainGeometryGenerator.cs` - — WB's render mesh (presumably also uses WB's formula in lockstep) - -**Sequencing implication:** Phase N.2 (terrain math helpers -substitution) cannot be shipped in isolation — it must land alongside -visual terrain renderer migration (originally N.5, now moved to N.7 -scope), at which point both physics and visual mesh switch to WB's -formula together. N.5 shipped entity rendering only; terrain remains -on acdream's own pipeline through N.7. - -**Research needed (when N.7 picks this up):** -1. Quantify divergence: run WB's `CalculateSplitDirection` and our - `IsSplitSWtoNE` across all (lbX, lbY, cellX, cellY) tuples for a - representative landblock set; record disagreement rate. -2. Confirm WB's `TerrainGeometryGenerator` uses WB's formula in its - render mesh — if so, switching everything to WB's formula keeps - visual + physics synced. (Highly likely.) -3. Decide whether ANY retail-conformance test (e.g., physics matching - server-authoritative Z within tolerance) is invalidated by the - formula change. - -**Acceptance:** Resolved when N.7 lands and both physics + visual -terrain use WB's split formula, OR when we decide to keep the AC2D -formula and patch WB's renderer in our fork. - ---- - ## #50 — Road-edge tree at 0xA9B1 visible in acdream but not retail **Status:** OPEN @@ -1758,6 +1700,57 @@ Unverified. The likely culprits, ranked by suspected probability: # Recently closed +## #51 — [DONE 2026-05-09 · da56063 + N.5b SHIP] WB's terrain-split formula diverges from retail's `FSplitNESW` + +**Closed:** 2026-05-09 +**Commit:** `da56063` (black-terrain fix; landed within Phase N.5b — see +`docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md` for the +ship commit chain) +**Component:** terrain math / Phase N.5b + +**Resolution: Path C.** Phase N.5b lifted terrain rendering onto the +modern path (bindless atlas + `glMultiDrawElementsIndirect`) WITHOUT +adopting WB's `TerrainUtils.CalculateSplitDirection`. The pre-implementation +divergence test (`tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`) +confirmed the two formulas disagree on **49.98%** of sweep cells — +fundamentally incompatible with our shared physics + visual mesh, which +both rely on retail's `FSplitNESW` (constants `0x0CCAC033` / `0x421BE3BD` / +`0x6C1AC587` / `0x519B8F25`). + +Path C: keep retail's `FSplitNESW` formula via `LandblockMesh.Build` → +`TerrainBlending.CalculateSplitDirection`; mirror WB's `TerrainRenderManager` +architectural pattern (single global VBO/EBO + slot allocator + bindless +atlas + multi-draw indirect) but feed it acdream's mesh. Modern dispatcher +(`TerrainModernRenderer`) replaces `TerrainChunkRenderer` (deleted in T9 +along with `TerrainRenderer` + `terrain.vert/.frag`). + +Path A (substitute WB's formula) was killed by the divergence test. +Path B (fork-patch WB's renderer to use retail's formula) was rejected +for permanent maintenance burden. Path C ships the architectural +pattern while preserving retail-formula compliance. + +Visual mesh and physics both still consume retail's `FSplitNESW`; they +remain in lockstep, no triangle-Z hover. The N.6 / N.7 sequencing +implication this issue carried (substitute physics math only when the +visual mesh migrates) is moot — neither side ever switches to WB's +formula. + +**Files added:** +- `src/AcDream.App/Rendering/TerrainModernRenderer.cs` +- `src/AcDream.Core/Terrain/TerrainSlotAllocator.cs` +- `src/AcDream.App/Rendering/Shaders/terrain_modern.vert` +- `src/AcDream.App/Rendering/Shaders/terrain_modern.frag` +- `tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs` (the + test that killed Path A) + +**Files deleted (T9):** +- `src/AcDream.App/Rendering/TerrainChunkRenderer.cs` +- `src/AcDream.App/Rendering/TerrainRenderer.cs` +- `src/AcDream.App/Rendering/Shaders/terrain.vert` +- `src/AcDream.App/Rendering/Shaders/terrain.frag` + +--- + ## #43 — [DONE 2026-05-05 · 9e4772a] Slope staircase on observed player remotes (anim-only fallback ignored slope) **Closed:** 2026-05-05 diff --git a/docs/plans/2026-04-11-roadmap.md b/docs/plans/2026-04-11-roadmap.md index e5cfb5a..c4c33f1 100644 --- a/docs/plans/2026-04-11-roadmap.md +++ b/docs/plans/2026-04-11-roadmap.md @@ -1,6 +1,6 @@ # acdream — strategic roadmap -**Status:** Living document. Updated 2026-05-08 for Phase N.5 shipping (bindless textures + `glMultiDrawElementsIndirect` on top of N.4's foundation; CPU dispatcher 1.23ms/frame at Holtburg, ~810 fps) + N.6 becomes the new in-flight phase (retire legacy renderers + perf polish). +**Status:** Living document. Updated 2026-05-09 for Phase N.5b shipping (terrain on the modern rendering path via Path C — mirror WB's `TerrainRenderManager` pattern, consume `LandblockMesh.Build` for retail formula compliance; closes ISSUE #51). N.6 (perf polish) remains the in-flight phase. **Purpose:** One source of truth for where the project is and where it's going. Every observed defect or missing feature has a named phase that owns it; when something looks wrong in-game, look here to find the phase that'll address it. Implementation details live in per-phase specs under `docs/superpowers/specs/`, not in this file. --- @@ -61,6 +61,7 @@ | N.3 | WorldBuilder-backed texture decode — `SurfaceDecoder` delegates INDEX16 / P8 / A8R8G8B8 / R8G8B8 / A8(+Additive) to `TextureHelpers.Fill*`; `isAdditive` threaded through (terrain alpha → `FillA8Additive`, non-additive entity surfaces → `FillA8`). R5G6B5 + A4R4G4B4 newly handled (previously magenta). X8R8G8B8, DXT1/3/5, SolidColor remain ours (no WB equivalent). 9 conformance tests prove byte-identical equivalence per format. | Live ✓ | | N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ | | N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ | +| N.5b | Terrain on the modern rendering path — `TerrainModernRenderer` replaces `TerrainChunkRenderer` (the latter plus `TerrainRenderer` + `terrain.vert/.frag` deleted). Single global VBO/EBO with slot allocator (one slot per landblock), per-frame `DrawElementsIndirectCommand[]` upload + `glMultiDrawElementsIndirect`, bindless atlas handles passed as `uvec2` uniforms reconstructed via `sampler2DArray(handle)`. **Path C** chosen: mirrors WB's `TerrainRenderManager` pattern but consumes `LandblockMesh.Build` so retail's `FSplitNESW` formula is preserved (closes ISSUE #51). Path A killed by 49.98% measured divergence between WB's `CalculateSplitDirection` and retail's at addr `00531d10`; Path B (fork-patch WB) rejected for permanent maintenance burden. Perf at Holtburg radius=5 (commit `da56063`): modern 6.4-7.0 µs / 9-14 µs p95 vs legacy 1.5 µs / 3.0 µs — **modern is ~4× SLOWER on CPU at radius=5** because legacy's 16×16-LB chunking collapsed visible LBs to one `glDrawElements`. Architectural wins (zero `glBindTexture`/frame, constant-cost dispatch, per-LB frustum cull) manifest at higher radius (A.5 territory). Spec acceptance criterion 5 ("≥10% lower CPU at radius=5") amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Three gotchas captured in memory: `uniform sampler2DArray` + `glProgramUniformHandleARB` GL_INVALID_OPERATIONs on at least one driver (use `uniform uvec2` + `sampler2DArray(handle)` constructor instead — N.5's mesh_modern pattern); `MaybeFlushTerrainDiag` median-calc underflow on first sample; visual gates need actual visual confirmation, not assent. Plan archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. | Live ✓ | Plus polish that doesn't get its own phase number: - FlyCamera default speed lowered + Shift-to-boost @@ -641,23 +642,43 @@ for our deletions/additions; merge upstream `master` periodically. lock-in, bindless Dispose two-phase order, GL_TIME_ELAPSED double- buffering. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. -- **N.5b — Terrain rendering on N.5 path.** Wire WB's - `TerrainRenderManager` + `LandSurfaceManager` + `TerrainGeometryGenerator` - onto the modern rendering path. Closes N.2's deferred terrain math - substitution: visual mesh and physics both switch to WB's - `CalculateSplitDirection` + `GetHeight` + `GetNormal` in lockstep, - resolving ISSUE #51. **Estimate: 1-2 weeks** (was 2-3 — modern path - primitives already in place from N.5). +- **✓ SHIPPED — N.5b — Terrain on the modern rendering path.** Shipped + 2026-05-09. **Path C** (mirror WB's `TerrainRenderManager` pattern but + consume `LandblockMesh.Build` for retail-formula compliance). Path A + (substitute WB's `CalculateSplitDirection`) killed during pre-implementation + divergence test: WB's formula disagrees with retail's `FSplitNESW` + (addr `00531d10`) on **49.98%** of cells across `tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`'s + sweep — wholly incompatible with our shared physics + visual mesh. + Path B (fork-patch WB to use retail's formula) rejected for permanent + maintenance burden. Path C ships the architectural pattern (single + global VBO/EBO + slot allocator + bindless atlas + `glMultiDrawElementsIndirect`) + while keeping retail's formula via `LandblockMesh.Build` → + `TerrainBlending.CalculateSplitDirection`. `TerrainModernRenderer` + + `terrain_modern.vert/.frag` shipped, `TerrainChunkRenderer` + + `TerrainRenderer` + legacy `terrain.vert/.frag` deleted in T9. + Closes ISSUE #51. **Perf reality check:** at radius=5 in Holtburg, + modern is ~4× SLOWER on CPU than legacy was (6.4 µs vs 1.5 µs median; + legacy collapsed radius=5's visible LBs into one `glDrawElements` + via 16×16-LB chunking). Architectural wins (zero `glBindTexture`/frame, + constant-cost dispatch as A.5 raises radius, per-LB frustum cull) + manifest at higher radius. Spec acceptance criterion #5 was wrong; + amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Plan + archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. - **N.6 — Perf polish.** **Currently in flight.** - Builds on N.5. Legacy renderer retirement was pulled forward into N.5 - ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`, and - `WbFoundationFlag` are already gone. N.6 scope: WB atlas adoption for - memory savings on shared content, persistent-mapped buffers if - `glBufferData` shows up in profiling, GPU-side culling via compute - pre-pass, GL_TIME_ELAPSED query double-buffering (deferred from N.5 — - diagnostic shows `gpu_us=0/0` under `ACDREAM_WB_DIAG=1`), direct N.4 - vs N.5 perf measurement, retire the legacy `Texture2D`/`sampler2D` path - in `TextureCache` (currently kept for Sky + Terrain + Debug). + Builds on N.5 + N.5b. Legacy renderer retirement was pulled forward + into N.5 ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`, + `WbFoundationFlag` are gone — and the terrain legacy renderer + (`TerrainChunkRenderer` + `TerrainRenderer` + `terrain.vert/.frag`) + retired in N.5b. N.6 scope: WB atlas adoption for memory savings + on shared content, persistent-mapped buffers if `glBufferData` shows + up in profiling (the modern terrain path's per-frame DEIC `BufferSubData` + is a candidate), GPU-side culling via compute pre-pass (eliminates + the per-frame slot walk + DEIC build entirely), GL_TIME_ELAPSED query + double-buffering (deferred from N.5 — diagnostic shows `gpu_us=0/0` + under `ACDREAM_WB_DIAG=1`), direct higher-radius perf comparison once + A.5 lands (where modern's architectural wins manifest), retire the + legacy `Texture2D`/`sampler2D` path in `TextureCache` (currently kept + for Sky + Debug + particle paths now that Terrain has migrated). Plan + spec written when work begins. **Estimate: 1-2 weeks.** - **N.7 — EnvCells / dungeons.** Replace EnvCell rendering with WB's `EnvCellRenderManager` + `PortalRenderManager` on top of N.4's diff --git a/docs/plans/2026-05-09-phase-n5b-perf-baseline.md b/docs/plans/2026-05-09-phase-n5b-perf-baseline.md new file mode 100644 index 0000000..c5f9136 --- /dev/null +++ b/docs/plans/2026-05-09-phase-n5b-perf-baseline.md @@ -0,0 +1,98 @@ +# Phase N.5b — terrain perf baseline + +**Captured:** 2026-05-09 at Holtburg town dueling field, radius=5, ~30s standstill. + +## Methodology + +Same build (commit at perf measurement: `da56063`), `ACDREAM_WB_DIAG=1`. The build +included a TEMPORARY `ACDREAM_LEGACY_TERRAIN=1` env-var toggle (since retired in T9 +deletion of the legacy renderer) that routed Draw through the legacy renderer for +direct comparison. Both renderers were constructed and fed AddLandblock / RemoveLandblock +in parallel; only one drew per frame; the same Stopwatch wrapped whichever ran. + +## Numbers + +| Renderer | cpu_us median | cpu_us p95 | draws/frame | Visible LBs | +|---|---|---|---|---| +| **Legacy** (`TerrainChunkRenderer`) | 1.5 | 3.0 | 1 (1 chunk) | 132-143 (whole chunk) | +| **Modern** (`TerrainModernRenderer`) | 6.4-7.0 | 9-14 | ~36-51 | 36-51 (per-LB cull) | + +(Legacy `draws=1` because its 16×16-LB chunking collapses radius=5's 121 visible +landblocks into a single chunk, dispatched as one `glDrawElements`. Modern issues +one `glMultiDrawElementsIndirect` with N=36-51 sub-commands.) + +## Acceptance criterion + +The N.5b spec acceptance criterion 5 read: "CPU dispatcher time at radius=5 ≥10% +lower than today's per-LB-binds path." The captured numbers show modern is ~4× +HIGHER on CPU at radius=5. **The criterion was wrong** — at radius=5 in Holtburg, +legacy's chunked path was already collapsed to one draw call. The architectural +wins of multi-draw indirect manifest at higher chunk counts (A.5 territory). + +The spec is amended via this doc: ship N.5b on visual identity + structural +correctness rather than CPU savings at radius=5. + +## Architectural wins of the modern path (real, even when CPU is higher) + +1. **Zero `glBindTexture` per frame.** Bindless atlas handles are made resident + once at startup; the modern shader samples via `sampler2DArray(uvec2 handle)`. + Legacy issued 2 `glBindTexture(Texture2DArray)` calls per frame. + +2. **Constant-cost dispatch.** As A.5 raises the streaming radius (next phase), + the visible chunk count grows. Legacy scales linearly: at radius=10 (4× chunks) + it's 4 `glDrawElements` calls; at radius=15 (≥9 chunks) it's 9+ calls. Modern + stays at exactly 1 `glMultiDrawElementsIndirect` regardless. + +3. **Per-LB frustum culling.** Legacy culled at chunk granularity (16×16 LBs); + modern culls per-LB. At a typical Holtburg view, ~36-51 of 132 loaded LBs are + actually visible; legacy drew the entire 132-LB chunk (3.5× the visible work + pushed to GPU vertex/fragment stages, even though CPU dispatch was cheap). + +## Why modern's CPU was higher at radius=5 + +Per-frame work in modern (in microseconds-ish budget on this scene): +- Walk all loaded slots checking visibility (~120 slots) → AABB test each +- Build DEIC array (51 entries × 20 bytes = 1020 bytes) +- `glBufferSubData(DRAW_INDIRECT_BUFFER, ...)` — driver memcpy +- 2× `glProgramUniform2(..., handle.low, handle.high)` for atlas handles +- `glBindVertexArray` + `glMemoryBarrier(GL_COMMAND_BARRIER_BIT)` + `glMultiDrawElementsIndirect` + +Legacy's per-frame work: +- Bind 2 textures +- Bind one VAO (the chunk) +- One `glDrawElements` + +The DEIC array build + buffer upload alone is ~3-5µs at radius=5 on this hardware, +which is the bulk of the modern overhead. At higher radius, this overhead amortizes: +the buffer is similar size, but the alternative (legacy's N draws) grows. + +## Follow-up work + +- **A.5 (next phase)** will exercise the higher-radius case where modern wins. + Capture a fresh baseline at radius=8 / 10 once A.5 lands. +- **N.6 perf polish** can investigate persistent-mapped buffers for the indirect + buffer, which would eliminate the per-frame `glBufferSubData`. Likely small win + at radius=5 (single ~1KB upload), bigger at higher radii. +- **GPU-side culling** (compute shader generating the DEIC array directly into + the indirect buffer) eliminates the CPU slot walk + DEIC build entirely. N.6 or + later territory; only worth it if profiling shows the CPU walk is hot. + +## Lessons captured to memory + +`memory/project_phase_n5b_state.md` records the high-value gotchas surfaced +during N.5b implementation. Three particularly bitable ones: + +1. **`uniform sampler2DArray` + `glProgramUniformHandleARB` is unreliable.** Some + drivers (NVIDIA Windows in this case) reject the combination with + `GL_INVALID_OPERATION`. Use the `uniform uvec2` + `sampler2DArray(handle)` + constructor pattern instead — N.5's mesh_modern uses this, and N.5b's + terrain_modern adopted it after the black-terrain regression. + +2. **`MaybeFlushTerrainDiag` underflow.** A naive median calc (`copy[N - nz/2]`) + underflows to `copy[N]` when only one sample has been recorded. Use + `copy[N - 1 - (nz - 1) / 2]` instead. + +3. **Visual gate must actually be visually confirmed.** "Go" doesn't mean + "verified." During N.5b's gate the user said "go" without launching, which + masked the black-terrain regression for hours. The gate must include the + user reporting actual visual confirmation, not assent to proceed.