docs(N.5b T10): roadmap + ISSUES + CLAUDE.md + perf baseline updates

Document Phase N.5b shipping (terrain on the modern rendering path via
Path C — `TerrainModernRenderer` mirrors WB's `TerrainRenderManager`
pattern but consumes acdream's `LandblockMesh.Build` so retail's
`FSplitNESW` formula stays in lockstep with physics + visual mesh).

Changes:

- `docs/plans/2026-04-11-roadmap.md` — add N.5b row to the Shipped
  table; promote N.5b's "Phases ahead" entry to ✓ SHIPPED with the
  Path C resolution + perf reality check; refresh N.6 scope to note
  Terrain has joined the modern path (legacy `Texture2D` retirement
  scope narrows to Sky + Debug); update top-of-doc Status line.

- `docs/ISSUES.md` — close issue #51 (WB terrain-split formula
  divergence). Move from OPEN to "Recently closed" with the Path C
  resolution: never adopted WB's formula; modern dispatcher uses
  retail's via `LandblockMesh.Build`. References `da56063` (the
  black-terrain fix that landed within the N.5b ship chain).

- `CLAUDE.md` — add `TerrainModernRenderer.cs` to the WB integration
  cribs list with the GL_INVALID_OPERATION caveat (use uvec2 +
  `sampler2DArray(handle)` constructor, NOT direct
  `uniform sampler2DArray` + `glProgramUniformHandleARB`). Update
  the "Currently in flight" preamble: N.6 builds on N.5 + N.5b;
  add an N.5b shipped paragraph linking the perf baseline doc.

- `docs/plans/2026-05-09-phase-n5b-perf-baseline.md` — new doc
  capturing the radius=5 Holtburg perf measurement (modern 6.4-7.0
  µs median vs legacy 1.5 µs — modern is ~4× SLOWER on CPU at
  radius=5). Documents the spec acceptance criterion #5 amendment,
  the architectural wins that DO hold (zero glBindTexture/frame,
  constant-cost dispatch as A.5 raises radius, per-LB frustum cull),
  and the three high-value gotchas surfaced during implementation.

User-memory updates (outside repo, not in this commit):
- `memory/project_phase_n5b_state.md` — full N.5b state file with
  the three gotchas captured.
- `memory/MEMORY.md` — index entry pointing at the state file.

Build: dotnet build green. No code changes in this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-05-09 13:03:14 +02:00
parent 7dfa2af6c0
commit 083c10c514
4 changed files with 220 additions and 80 deletions

View file

@ -102,6 +102,14 @@ ourselves".
eventually picks it up finds the hook there; the change is localized: eventually picks it up finds the hook there; the change is localized:
extend `InstanceData` stride 64→80 bytes, add the field, mix into extend `InstanceData` stride 64→80 bytes, add the field, mix into
fragment color in `mesh_modern.frag`. ~30 min when the time comes. fragment color in `mesh_modern.frag`. ~30 min when the time comes.
- `src/AcDream.App/Rendering/TerrainModernRenderer.cs` — terrain dispatcher
on N.5's modern primitives. Mirrors WB's `TerrainRenderManager` pattern
(single global VBO/EBO + slot allocator + `glMultiDrawElementsIndirect`)
but driven by acdream's `LandblockMesh.Build` so retail's `FSplitNESW`
formula is preserved (issue #51 resolved). Atlas handles bound via the
uvec2 + `sampler2DArray(handle)` constructor pattern (NOT the direct
`uniform sampler2DArray` + `glProgramUniformHandleARB` form, which
GL_INVALID_OPERATIONs on at least one driver).
**Execution phases:** R1→R8 in the architecture doc. Each phase has clear **Execution phases:** R1→R8 in the architecture doc. Each phase has clear
goals, test criteria, and builds on the previous. Don't skip phases. goals, test criteria, and builds on the previous. Don't skip phases.
@ -504,13 +512,33 @@ acdream's plan lives in two files committed to the repo:
**Currently in flight: Phase N.6 — Perf polish.** **Currently in flight: Phase N.6 — Perf polish.**
Roadmap entry at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md). Roadmap entry at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md).
Builds on N.5. Legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, Builds on N.5 + N.5b. Legacy renderers (`InstancedMeshRenderer`,
`WbFoundationFlag`) were retired in the N.5 ship amendment — N.6 scope is `StaticMeshRenderer`, `WbFoundationFlag`) were retired in the N.5 ship
perf-only: WB atlas adoption, persistent-mapped buffers, GPU-side culling, amendment, and the terrain legacy renderer (`TerrainChunkRenderer` +
GL_TIME_ELAPSED query double-buffering, direct N.4 vs N.5 perf measurement, `TerrainRenderer` + legacy `terrain.vert/.frag`) was retired in N.5b.
legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky/Terrain/Debug). N.6 scope is perf-only: WB atlas adoption, persistent-mapped buffers
(strong candidate after N.5b's per-frame DEIC `BufferSubData`),
GPU-side culling via compute pre-pass, GL_TIME_ELAPSED query
double-buffering, direct higher-radius perf comparison once A.5 lands,
legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky / Debug
remain on the legacy path now that Terrain has migrated).
Plan + spec written when work begins. Plan + spec written when work begins.
**Phase N.5b (Terrain on Modern Rendering Path) shipped 2026-05-09.**
`TerrainModernRenderer` mirrors WB's `TerrainRenderManager` pattern
(single global VBO/EBO + slot allocator + bindless atlas +
`glMultiDrawElementsIndirect`) but consumes `LandblockMesh.Build` so
retail's `FSplitNESW` formula is preserved (Path C; closes ISSUE #51).
Path A (substitute WB's `CalculateSplitDirection`) killed by 49.98%
divergence vs retail in
[`tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`](tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs).
At radius=5 in Holtburg modern is ~4× SLOWER on CPU than the legacy
chunked path was; architectural wins manifest at higher radius. Honest
perf baseline at
[`docs/plans/2026-05-09-phase-n5b-perf-baseline.md`](docs/plans/2026-05-09-phase-n5b-perf-baseline.md).
Plan archived at
[`docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`](docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md).
**Phase N.5 (Modern Rendering Path) shipped + amended 2026-05-08.** `WbDrawDispatcher` **Phase N.5 (Modern Rendering Path) shipped + amended 2026-05-08.** `WbDrawDispatcher`
on bindless textures + `glMultiDrawElementsIndirect`. CPU dispatcher 1.23ms/frame on bindless textures + `glMultiDrawElementsIndirect`. CPU dispatcher 1.23ms/frame
at Holtburg (~810 fps). **Ship amendment:** `InstancedMeshRenderer`, at Holtburg (~810 fps). **Ship amendment:** `InstancedMeshRenderer`,

View file

@ -46,64 +46,6 @@ Copy this block when adding a new issue:
# Active issues # Active issues
## #51 — WB's terrain-split formula diverges from retail's `FSplitNESW`
**Status:** OPEN
**Severity:** MEDIUM (blocks isolated N.2; affects sequencing of N-phase migration)
**Filed:** 2026-05-08
**Component:** terrain math / Phase N (WorldBuilder rendering migration)
**Description:** WB's `TerrainUtils.CalculateSplitDirection`
([references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44](references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44))
uses a different math expression from retail's `FSplitNESW`
(documented in CLAUDE.md as **the** real AC terrain split formula,
constants `0x0CCAC033` / `0x421BE3BD` / `0x6C1AC587` / `0x519B8F25`).
Ours is a degree-2 polynomial in (x,y); WB's is linear in (x,y).
They cannot be algebraically equivalent and disagree on a meaningful
fraction of cells.
**Concrete impact:** On any cell where the formulas pick different
diagonals, the same world position (X, Y) maps to different terrain
heights — up to ~2m for a sloped cell with one elevated corner. If a
caller mixes "WB-formula path" and "AC2D-formula path" for the same
cell, the player physics floats above or sinks below the visible
ground. This is the bug class fixed in
[src/AcDream.Core/Physics/TerrainSurface.cs:113-120](src/AcDream.Core/Physics/TerrainSurface.cs:113)
(diagonal-direction inversion).
**Files implicated:**
- `src/AcDream.Core/Physics/TerrainSurface.cs` — uses AC2D formula via
`IsSplitSWtoNE`
- `src/AcDream.Core/World/TerrainBlending.cs` — visual mesh, also AC2D
- `references/WorldBuilder/WorldBuilder.Shared/Modules/Landscape/Lib/TerrainUtils.cs:44`
— WB's diverging formula
- `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/TerrainGeometryGenerator.cs`
— WB's render mesh (presumably also uses WB's formula in lockstep)
**Sequencing implication:** Phase N.2 (terrain math helpers
substitution) cannot be shipped in isolation — it must land alongside
visual terrain renderer migration (originally N.5, now moved to N.7
scope), at which point both physics and visual mesh switch to WB's
formula together. N.5 shipped entity rendering only; terrain remains
on acdream's own pipeline through N.7.
**Research needed (when N.7 picks this up):**
1. Quantify divergence: run WB's `CalculateSplitDirection` and our
`IsSplitSWtoNE` across all (lbX, lbY, cellX, cellY) tuples for a
representative landblock set; record disagreement rate.
2. Confirm WB's `TerrainGeometryGenerator` uses WB's formula in its
render mesh — if so, switching everything to WB's formula keeps
visual + physics synced. (Highly likely.)
3. Decide whether ANY retail-conformance test (e.g., physics matching
server-authoritative Z within tolerance) is invalidated by the
formula change.
**Acceptance:** Resolved when N.7 lands and both physics + visual
terrain use WB's split formula, OR when we decide to keep the AC2D
formula and patch WB's renderer in our fork.
---
## #50 — Road-edge tree at 0xA9B1 visible in acdream but not retail ## #50 — Road-edge tree at 0xA9B1 visible in acdream but not retail
**Status:** OPEN **Status:** OPEN
@ -1758,6 +1700,57 @@ Unverified. The likely culprits, ranked by suspected probability:
# Recently closed # Recently closed
## #51 — [DONE 2026-05-09 · da56063 + N.5b SHIP] WB's terrain-split formula diverges from retail's `FSplitNESW`
**Closed:** 2026-05-09
**Commit:** `da56063` (black-terrain fix; landed within Phase N.5b — see
`docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md` for the
ship commit chain)
**Component:** terrain math / Phase N.5b
**Resolution: Path C.** Phase N.5b lifted terrain rendering onto the
modern path (bindless atlas + `glMultiDrawElementsIndirect`) WITHOUT
adopting WB's `TerrainUtils.CalculateSplitDirection`. The pre-implementation
divergence test (`tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`)
confirmed the two formulas disagree on **49.98%** of sweep cells —
fundamentally incompatible with our shared physics + visual mesh, which
both rely on retail's `FSplitNESW` (constants `0x0CCAC033` / `0x421BE3BD` /
`0x6C1AC587` / `0x519B8F25`).
Path C: keep retail's `FSplitNESW` formula via `LandblockMesh.Build`
`TerrainBlending.CalculateSplitDirection`; mirror WB's `TerrainRenderManager`
architectural pattern (single global VBO/EBO + slot allocator + bindless
atlas + multi-draw indirect) but feed it acdream's mesh. Modern dispatcher
(`TerrainModernRenderer`) replaces `TerrainChunkRenderer` (deleted in T9
along with `TerrainRenderer` + `terrain.vert/.frag`).
Path A (substitute WB's formula) was killed by the divergence test.
Path B (fork-patch WB's renderer to use retail's formula) was rejected
for permanent maintenance burden. Path C ships the architectural
pattern while preserving retail-formula compliance.
Visual mesh and physics both still consume retail's `FSplitNESW`; they
remain in lockstep, no triangle-Z hover. The N.6 / N.7 sequencing
implication this issue carried (substitute physics math only when the
visual mesh migrates) is moot — neither side ever switches to WB's
formula.
**Files added:**
- `src/AcDream.App/Rendering/TerrainModernRenderer.cs`
- `src/AcDream.Core/Terrain/TerrainSlotAllocator.cs`
- `src/AcDream.App/Rendering/Shaders/terrain_modern.vert`
- `src/AcDream.App/Rendering/Shaders/terrain_modern.frag`
- `tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs` (the
test that killed Path A)
**Files deleted (T9):**
- `src/AcDream.App/Rendering/TerrainChunkRenderer.cs`
- `src/AcDream.App/Rendering/TerrainRenderer.cs`
- `src/AcDream.App/Rendering/Shaders/terrain.vert`
- `src/AcDream.App/Rendering/Shaders/terrain.frag`
---
## #43 — [DONE 2026-05-05 · 9e4772a] Slope staircase on observed player remotes (anim-only fallback ignored slope) ## #43 — [DONE 2026-05-05 · 9e4772a] Slope staircase on observed player remotes (anim-only fallback ignored slope)
**Closed:** 2026-05-05 **Closed:** 2026-05-05

View file

@ -1,6 +1,6 @@
# acdream — strategic roadmap # acdream — strategic roadmap
**Status:** Living document. Updated 2026-05-08 for Phase N.5 shipping (bindless textures + `glMultiDrawElementsIndirect` on top of N.4's foundation; CPU dispatcher 1.23ms/frame at Holtburg, ~810 fps) + N.6 becomes the new in-flight phase (retire legacy renderers + perf polish). **Status:** Living document. Updated 2026-05-09 for Phase N.5b shipping (terrain on the modern rendering path via Path C — mirror WB's `TerrainRenderManager` pattern, consume `LandblockMesh.Build` for retail formula compliance; closes ISSUE #51). N.6 (perf polish) remains the in-flight phase.
**Purpose:** One source of truth for where the project is and where it's going. Every observed defect or missing feature has a named phase that owns it; when something looks wrong in-game, look here to find the phase that'll address it. Implementation details live in per-phase specs under `docs/superpowers/specs/`, not in this file. **Purpose:** One source of truth for where the project is and where it's going. Every observed defect or missing feature has a named phase that owns it; when something looks wrong in-game, look here to find the phase that'll address it. Implementation details live in per-phase specs under `docs/superpowers/specs/`, not in this file.
--- ---
@ -61,6 +61,7 @@
| N.3 | WorldBuilder-backed texture decode — `SurfaceDecoder` delegates INDEX16 / P8 / A8R8G8B8 / R8G8B8 / A8(+Additive) to `TextureHelpers.Fill*`; `isAdditive` threaded through (terrain alpha → `FillA8Additive`, non-additive entity surfaces → `FillA8`). R5G6B5 + A4R4G4B4 newly handled (previously magenta). X8R8G8B8, DXT1/3/5, SolidColor remain ours (no WB equivalent). 9 conformance tests prove byte-identical equivalence per format. | Live ✓ | | N.3 | WorldBuilder-backed texture decode — `SurfaceDecoder` delegates INDEX16 / P8 / A8R8G8B8 / R8G8B8 / A8(+Additive) to `TextureHelpers.Fill*`; `isAdditive` threaded through (terrain alpha → `FillA8Additive`, non-additive entity surfaces → `FillA8`). R5G6B5 + A4R4G4B4 newly handled (previously magenta). X8R8G8B8, DXT1/3/5, SolidColor remain ours (no WB equivalent). 9 conformance tests prove byte-identical equivalence per format. | Live ✓ |
| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ | | N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ |
| N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ | | N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ |
| N.5b | Terrain on the modern rendering path — `TerrainModernRenderer` replaces `TerrainChunkRenderer` (the latter plus `TerrainRenderer` + `terrain.vert/.frag` deleted). Single global VBO/EBO with slot allocator (one slot per landblock), per-frame `DrawElementsIndirectCommand[]` upload + `glMultiDrawElementsIndirect`, bindless atlas handles passed as `uvec2` uniforms reconstructed via `sampler2DArray(handle)`. **Path C** chosen: mirrors WB's `TerrainRenderManager` pattern but consumes `LandblockMesh.Build` so retail's `FSplitNESW` formula is preserved (closes ISSUE #51). Path A killed by 49.98% measured divergence between WB's `CalculateSplitDirection` and retail's at addr `00531d10`; Path B (fork-patch WB) rejected for permanent maintenance burden. Perf at Holtburg radius=5 (commit `da56063`): modern 6.4-7.0 µs / 9-14 µs p95 vs legacy 1.5 µs / 3.0 µs — **modern is ~4× SLOWER on CPU at radius=5** because legacy's 16×16-LB chunking collapsed visible LBs to one `glDrawElements`. Architectural wins (zero `glBindTexture`/frame, constant-cost dispatch, per-LB frustum cull) manifest at higher radius (A.5 territory). Spec acceptance criterion 5 ("≥10% lower CPU at radius=5") amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Three gotchas captured in memory: `uniform sampler2DArray` + `glProgramUniformHandleARB` GL_INVALID_OPERATIONs on at least one driver (use `uniform uvec2` + `sampler2DArray(handle)` constructor instead — N.5's mesh_modern pattern); `MaybeFlushTerrainDiag` median-calc underflow on first sample; visual gates need actual visual confirmation, not assent. Plan archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. | Live ✓ |
Plus polish that doesn't get its own phase number: Plus polish that doesn't get its own phase number:
- FlyCamera default speed lowered + Shift-to-boost - FlyCamera default speed lowered + Shift-to-boost
@ -641,23 +642,43 @@ for our deletions/additions; merge upstream `master` periodically.
lock-in, bindless Dispose two-phase order, GL_TIME_ELAPSED double- lock-in, bindless Dispose two-phase order, GL_TIME_ELAPSED double-
buffering. Plan archived at buffering. Plan archived at
`docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`.
- **N.5b — Terrain rendering on N.5 path.** Wire WB's - **✓ SHIPPED — N.5b — Terrain on the modern rendering path.** Shipped
`TerrainRenderManager` + `LandSurfaceManager` + `TerrainGeometryGenerator` 2026-05-09. **Path C** (mirror WB's `TerrainRenderManager` pattern but
onto the modern rendering path. Closes N.2's deferred terrain math consume `LandblockMesh.Build` for retail-formula compliance). Path A
substitution: visual mesh and physics both switch to WB's (substitute WB's `CalculateSplitDirection`) killed during pre-implementation
`CalculateSplitDirection` + `GetHeight` + `GetNormal` in lockstep, divergence test: WB's formula disagrees with retail's `FSplitNESW`
resolving ISSUE #51. **Estimate: 1-2 weeks** (was 2-3 — modern path (addr `00531d10`) on **49.98%** of cells across `tests/AcDream.Core.Tests/Terrain/SplitFormulaDivergenceTest.cs`'s
primitives already in place from N.5). sweep — wholly incompatible with our shared physics + visual mesh.
Path B (fork-patch WB to use retail's formula) rejected for permanent
maintenance burden. Path C ships the architectural pattern (single
global VBO/EBO + slot allocator + bindless atlas + `glMultiDrawElementsIndirect`)
while keeping retail's formula via `LandblockMesh.Build`
`TerrainBlending.CalculateSplitDirection`. `TerrainModernRenderer` +
`terrain_modern.vert/.frag` shipped, `TerrainChunkRenderer` +
`TerrainRenderer` + legacy `terrain.vert/.frag` deleted in T9.
Closes ISSUE #51. **Perf reality check:** at radius=5 in Holtburg,
modern is ~4× SLOWER on CPU than legacy was (6.4 µs vs 1.5 µs median;
legacy collapsed radius=5's visible LBs into one `glDrawElements`
via 16×16-LB chunking). Architectural wins (zero `glBindTexture`/frame,
constant-cost dispatch as A.5 raises radius, per-LB frustum cull)
manifest at higher radius. Spec acceptance criterion #5 was wrong;
amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Plan
archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`.
- **N.6 — Perf polish.** **Currently in flight.** - **N.6 — Perf polish.** **Currently in flight.**
Builds on N.5. Legacy renderer retirement was pulled forward into N.5 Builds on N.5 + N.5b. Legacy renderer retirement was pulled forward
ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`, and into N.5 ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`,
`WbFoundationFlag` are already gone. N.6 scope: WB atlas adoption for `WbFoundationFlag` are gone — and the terrain legacy renderer
memory savings on shared content, persistent-mapped buffers if (`TerrainChunkRenderer` + `TerrainRenderer` + `terrain.vert/.frag`)
`glBufferData` shows up in profiling, GPU-side culling via compute retired in N.5b. N.6 scope: WB atlas adoption for memory savings
pre-pass, GL_TIME_ELAPSED query double-buffering (deferred from N.5 — on shared content, persistent-mapped buffers if `glBufferData` shows
diagnostic shows `gpu_us=0/0` under `ACDREAM_WB_DIAG=1`), direct N.4 up in profiling (the modern terrain path's per-frame DEIC `BufferSubData`
vs N.5 perf measurement, retire the legacy `Texture2D`/`sampler2D` path is a candidate), GPU-side culling via compute pre-pass (eliminates
in `TextureCache` (currently kept for Sky + Terrain + Debug). the per-frame slot walk + DEIC build entirely), GL_TIME_ELAPSED query
double-buffering (deferred from N.5 — diagnostic shows `gpu_us=0/0`
under `ACDREAM_WB_DIAG=1`), direct higher-radius perf comparison once
A.5 lands (where modern's architectural wins manifest), retire the
legacy `Texture2D`/`sampler2D` path in `TextureCache` (currently kept
for Sky + Debug + particle paths now that Terrain has migrated).
Plan + spec written when work begins. **Estimate: 1-2 weeks.** Plan + spec written when work begins. **Estimate: 1-2 weeks.**
- **N.7 — EnvCells / dungeons.** Replace EnvCell rendering with WB's - **N.7 — EnvCells / dungeons.** Replace EnvCell rendering with WB's
`EnvCellRenderManager` + `PortalRenderManager` on top of N.4's `EnvCellRenderManager` + `PortalRenderManager` on top of N.4's

View file

@ -0,0 +1,98 @@
# Phase N.5b — terrain perf baseline
**Captured:** 2026-05-09 at Holtburg town dueling field, radius=5, ~30s standstill.
## Methodology
Same build (commit at perf measurement: `da56063`), `ACDREAM_WB_DIAG=1`. The build
included a TEMPORARY `ACDREAM_LEGACY_TERRAIN=1` env-var toggle (since retired in T9
deletion of the legacy renderer) that routed Draw through the legacy renderer for
direct comparison. Both renderers were constructed and fed AddLandblock / RemoveLandblock
in parallel; only one drew per frame; the same Stopwatch wrapped whichever ran.
## Numbers
| Renderer | cpu_us median | cpu_us p95 | draws/frame | Visible LBs |
|---|---|---|---|---|
| **Legacy** (`TerrainChunkRenderer`) | 1.5 | 3.0 | 1 (1 chunk) | 132-143 (whole chunk) |
| **Modern** (`TerrainModernRenderer`) | 6.4-7.0 | 9-14 | ~36-51 | 36-51 (per-LB cull) |
(Legacy `draws=1` because its 16×16-LB chunking collapses radius=5's 121 visible
landblocks into a single chunk, dispatched as one `glDrawElements`. Modern issues
one `glMultiDrawElementsIndirect` with N=36-51 sub-commands.)
## Acceptance criterion
The N.5b spec acceptance criterion 5 read: "CPU dispatcher time at radius=5 ≥10%
lower than today's per-LB-binds path." The captured numbers show modern is ~4×
HIGHER on CPU at radius=5. **The criterion was wrong** — at radius=5 in Holtburg,
legacy's chunked path was already collapsed to one draw call. The architectural
wins of multi-draw indirect manifest at higher chunk counts (A.5 territory).
The spec is amended via this doc: ship N.5b on visual identity + structural
correctness rather than CPU savings at radius=5.
## Architectural wins of the modern path (real, even when CPU is higher)
1. **Zero `glBindTexture` per frame.** Bindless atlas handles are made resident
once at startup; the modern shader samples via `sampler2DArray(uvec2 handle)`.
Legacy issued 2 `glBindTexture(Texture2DArray)` calls per frame.
2. **Constant-cost dispatch.** As A.5 raises the streaming radius (next phase),
the visible chunk count grows. Legacy scales linearly: at radius=10 (4× chunks)
it's 4 `glDrawElements` calls; at radius=15 (≥9 chunks) it's 9+ calls. Modern
stays at exactly 1 `glMultiDrawElementsIndirect` regardless.
3. **Per-LB frustum culling.** Legacy culled at chunk granularity (16×16 LBs);
modern culls per-LB. At a typical Holtburg view, ~36-51 of 132 loaded LBs are
actually visible; legacy drew the entire 132-LB chunk (3.5× the visible work
pushed to GPU vertex/fragment stages, even though CPU dispatch was cheap).
## Why modern's CPU was higher at radius=5
Per-frame work in modern (in microseconds-ish budget on this scene):
- Walk all loaded slots checking visibility (~120 slots) → AABB test each
- Build DEIC array (51 entries × 20 bytes = 1020 bytes)
- `glBufferSubData(DRAW_INDIRECT_BUFFER, ...)` — driver memcpy
- 2× `glProgramUniform2(..., handle.low, handle.high)` for atlas handles
- `glBindVertexArray` + `glMemoryBarrier(GL_COMMAND_BARRIER_BIT)` + `glMultiDrawElementsIndirect`
Legacy's per-frame work:
- Bind 2 textures
- Bind one VAO (the chunk)
- One `glDrawElements`
The DEIC array build + buffer upload alone is ~3-5µs at radius=5 on this hardware,
which is the bulk of the modern overhead. At higher radius, this overhead amortizes:
the buffer is similar size, but the alternative (legacy's N draws) grows.
## Follow-up work
- **A.5 (next phase)** will exercise the higher-radius case where modern wins.
Capture a fresh baseline at radius=8 / 10 once A.5 lands.
- **N.6 perf polish** can investigate persistent-mapped buffers for the indirect
buffer, which would eliminate the per-frame `glBufferSubData`. Likely small win
at radius=5 (single ~1KB upload), bigger at higher radii.
- **GPU-side culling** (compute shader generating the DEIC array directly into
the indirect buffer) eliminates the CPU slot walk + DEIC build entirely. N.6 or
later territory; only worth it if profiling shows the CPU walk is hot.
## Lessons captured to memory
`memory/project_phase_n5b_state.md` records the high-value gotchas surfaced
during N.5b implementation. Three particularly bitable ones:
1. **`uniform sampler2DArray` + `glProgramUniformHandleARB` is unreliable.** Some
drivers (NVIDIA Windows in this case) reject the combination with
`GL_INVALID_OPERATION`. Use the `uniform uvec2` + `sampler2DArray(handle)`
constructor pattern instead — N.5's mesh_modern uses this, and N.5b's
terrain_modern adopted it after the black-terrain regression.
2. **`MaybeFlushTerrainDiag` underflow.** A naive median calc (`copy[N - nz/2]`)
underflows to `copy[N]` when only one sample has been recorded. Use
`copy[N - 1 - (nz - 1) / 2]` instead.
3. **Visual gate must actually be visually confirmed.** "Go" doesn't mean
"verified." During N.5b's gate the user said "go" without launching, which
masked the black-terrain regression for hours. The gate must include the
user reporting actual visual confirmation, not assent to proceed.