Phase N.5b: terrain on the modern rendering path. TerrainModernRenderer
replaces TerrainChunkRenderer; bindless atlas via uvec2 handle + GLSL
sampler-from-handle constructor; constant-cost dispatch (~6µs/frame)
regardless of radius. Closes issue #51. 114 tests pass; conformance
sentinel max |delta| = 0.015 mm. Honest perf baseline doc captures
that modern is ~4x slower on CPU at radius=5 because legacy's chunked
pattern already collapsed the scene to one draw call; architectural
wins manifest at higher radius (A.5).
Three high-value gotchas captured to memory:
1. uniform sampler2DArray + glProgramUniformHandleARB unreliable
across drivers; default to uvec2 handle + sampler constructor.
2. Median calc copy[N - nz/2] underflows for nz<2.
3. Visual gate 'go' != 'verified'.
Plus: A.5 cold-start handoff at docs/research/2026-05-10-phase-a5-handoff.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records what N.5b shipped, where the actual FPS bottleneck lives
(WbDrawDispatcher entity cull at ~4.3ms/frame, 86% of frame budget;
terrain dispatcher is now <1% of frame), and what A.5 has to do to
make the world look big without falling off a perf cliff.
Three concrete A.5 deliverables:
1. Two-tier streaming (near = full, far = terrain-only)
2. Per-LB entity bucketing in WbDrawDispatcher
3. Off-thread LandblockMesh.Build to avoid streaming hitches at higher
radius
Eight brainstorm questions for the next session, plus acceptance
criteria, files-to-read list, and explicit "don't do" warnings (don't
raise STREAM_RADIUS without tiering in place; don't put scenery in
far tier without an impostor pipeline; don't break the N.5b conformance
sentinel; etc.).
User's stated goal verbatim: "great smooth HIGH fps visuals. Should
look great. As long as it scales and we get very high FPS." This
reframes priorities away from radius=5 micro-optimization toward
visual scale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TerrainModernRenderer replaces TerrainChunkRenderer. Single global
VBO/EBO + slot allocator + glMultiDrawElementsIndirect. Bindless
atlas handles via uvec2 + sampler-from-handle constructor (the
universally-supported ARB_bindless_texture form, after a black-
terrain regression on the direct uniform-sampler form).
Path C: WB renderer pattern + acdream's LandblockMesh.Build for
retail's FSplitNESW formula compliance. Closes issue #51.
Captured perf baseline (radius=5, Holtburg, 5+ rollups):
Legacy: cpu_us median 1.5 / p95 3.0 (1 chunk = 1 glDrawElements)
Modern: cpu_us median 6.4-7.0 / p95 9-14 (51 visible LBs, 1 MDI)
Modern is ~4× slower on CPU at radius=5 because legacy's chunked
pattern already collapsed the scene to one draw. Architectural wins
(zero glBindTexture/frame; constant-cost dispatch as A.5 raises
radius) manifest at higher scene complexity. Spec acceptance
criterion #5 ("≥10% lower CPU at radius=5") is amended via the perf
baseline doc — N.5b ships on visual identity + structural correctness.
Three high-value gotchas captured to memory:
1. `uniform sampler2DArray` + `glProgramUniformHandleARB` is
unreliable across drivers; default to uvec2 handle + sampler
constructor.
2. Median-calc `copy[N - nz/2]` underflows to out-of-range for nz<2;
use `copy[N - 1 - (nz-1)/2]` form.
3. Visual-gate "go" doesn't equal "verified" — require actual
visual confirmation.
Visual verification: confirmed at Holtburg town. 114/114 tests pass
in N.5+N.5b filter. Conformance sentinel max ‖Δ‖ = 0.015 mm across
1000 sample points / 10 representative landblocks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document Phase N.5b shipping (terrain on the modern rendering path via
Path C — `TerrainModernRenderer` mirrors WB's `TerrainRenderManager`
pattern but consumes acdream's `LandblockMesh.Build` so retail's
`FSplitNESW` formula stays in lockstep with physics + visual mesh).
Changes:
- `docs/plans/2026-04-11-roadmap.md` — add N.5b row to the Shipped
table; promote N.5b's "Phases ahead" entry to ✓ SHIPPED with the
Path C resolution + perf reality check; refresh N.6 scope to note
Terrain has joined the modern path (legacy `Texture2D` retirement
scope narrows to Sky + Debug); update top-of-doc Status line.
- `docs/ISSUES.md` — close issue #51 (WB terrain-split formula
divergence). Move from OPEN to "Recently closed" with the Path C
resolution: never adopted WB's formula; modern dispatcher uses
retail's via `LandblockMesh.Build`. References `da56063` (the
black-terrain fix that landed within the N.5b ship chain).
- `CLAUDE.md` — add `TerrainModernRenderer.cs` to the WB integration
cribs list with the GL_INVALID_OPERATION caveat (use uvec2 +
`sampler2DArray(handle)` constructor, NOT direct
`uniform sampler2DArray` + `glProgramUniformHandleARB`). Update
the "Currently in flight" preamble: N.6 builds on N.5 + N.5b;
add an N.5b shipped paragraph linking the perf baseline doc.
- `docs/plans/2026-05-09-phase-n5b-perf-baseline.md` — new doc
capturing the radius=5 Holtburg perf measurement (modern 6.4-7.0
µs median vs legacy 1.5 µs — modern is ~4× SLOWER on CPU at
radius=5). Documents the spec acceptance criterion #5 amendment,
the architectural wins that DO hold (zero glBindTexture/frame,
constant-cost dispatch as A.5 raises radius, per-LB frustum cull),
and the three high-value gotchas surfaced during implementation.
User-memory updates (outside repo, not in this commit):
- `memory/project_phase_n5b_state.md` — full N.5b state file with
the three gotchas captured.
- `memory/MEMORY.md` — index entry pointing at the state file.
Build: dotnet build green. No code changes in this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deletes:
- TerrainChunkRenderer.cs (454 lines, replaced by TerrainModernRenderer)
- TerrainRenderer.cs (247 lines, older sibling, no production users)
- terrain.vert / terrain.frag (replaced by terrain_modern.{vert,frag})
Removes the temporary Task 8 perf-benchmark toggle (ACDREAM_LEGACY_TERRAIN
env var, _useLegacyTerrain field, parallel _terrainLegacy renderer
instance, [TERRAIN-DIAG/modern|legacy] label suffix). The modern path
is now the only path. Mirror N.5's mandatory-modern amendment: missing
GL_ARB_bindless_texture throws NotSupportedException at startup
(already in place via the BindlessSupport.TryCreate gate).
Three load-bearing research comments preserved verbatim from terrain.vert
into terrain_modern.vert before deletion: the MIN_FACTOR = 0.0 N-dot-L
floor block (cross-ref Lambert brightness split), the aPacked3 bit
layout, the gl_VertexID corner-table 2026-04-21 ConstructPolygons fix.
Also retires the now-orphaned _shader field (legacy terrain pipeline
was its only user).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: terrain renders pure black in modern path (legacy renderer
correct). Diagnostic at TerrainModernRenderer.Draw showed:
glProgramUniformHandle(prog=4, loc=5, handle=0x100251xxx) → GL_INVALID_OPERATION (0x0502)
on both terrain and alpha sampler uniforms.
Root cause: the `uniform sampler2DArray` + glProgramUniformHandleARB
combination is rejected by the NVIDIA Windows driver in this configuration.
The handle is valid and resident; the uniform location is valid; the
program is valid; but the driver refuses to bind a 64-bit handle to a
sampler uniform via the program-uniform path.
Fix: switch to N.5's mesh_modern pattern — pass each 64-bit handle as a
`uniform uvec2` (low + high 32-bit halves) and construct the sampler at
the use site via the GLSL `sampler2DArray(handle)` constructor. This
form is what ARB_bindless_texture documents as universally supported and
is what N.5 already uses successfully.
Files:
- terrain_modern.frag: replace `uniform sampler2DArray uTerrain/uAlpha`
with `uniform uvec2 uTerrainHandle/uAlphaHandle` + `#define`s
- TerrainModernRenderer.cs: cache uvec2 uniform locations; set via
`glProgramUniform2(program, loc, low32, high32)` per frame
- BindlessSupport.cs: remove now-unused `SetSamplerHandleUniform`,
leave a comment noting why the helper was retired
- GameWindow.cs: also strip the temporary [TERRAIN-DBG] cursor-wrap
print added during the perf-baseline investigation
Build green; 114/114 tests in N.5+N.5b filter still pass; user-verified
terrain renders correctly in modern path post-fix. Captured fresh perf
baseline:
- Legacy: cpu_us median 1.5 / p95 3.0 (1 chunk = 1 glDrawElements)
- Modern: cpu_us median 6.4-7.0 / p95 9-14 (51 visible LBs, 1 MDI call)
Modern is ~4× slower on CPU at radius=5 because the chunked legacy path
already collapsed the scene to one draw call. The architectural wins
(zero glBindTexture/frame; constant-cost dispatch as A.5 raises radius)
will be documented in T10's perf baseline doc; the spec's
"≥10% lower CPU" acceptance criterion is invalid at radius=5 and needs
revision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First diag flush fires ~5s after process start (Environment.TickCount64
threshold), but at that point only 1 sample may have been recorded if
the user is mid-login. The original `copy[copy.Length - nz / 2]` form
underflowed to copy[copy.Length] when nz=1 (nz/2=0), throwing
IndexOutOfRangeException at GameWindow.cs:8799 on the first OnRender
after login.
Fix: use `copy.Length - 1 - (nz - 1) / 2` for median (always >= 0 for
nz >= 1, returns the single sample for nz=1) and clamp the percentile
offset via `(nz - 1) * 0.05` for the same reason.
Caught by user's perf-baseline launch with ACDREAM_LEGACY_TERRAIN=1
(the benchmark toggle from 336ad34). The bug exists in T8 itself
regardless of the toggle.
Build green; existing tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an ACDREAM_LEGACY_TERRAIN=1 env var that routes Draw through the
legacy TerrainChunkRenderer instead of the new TerrainModernRenderer.
Both renderers are constructed and fed AddLandblock/RemoveLandblock so
they stay in sync; only one is drawn per frame. The [TERRAIN-DIAG]
log line is labeled /modern or /legacy so the user can tell which
numbers they're capturing.
Removed in Task 9 along with TerrainChunkRenderer.cs, terrain.vert,
and terrain.frag.
Usage:
\$env:ACDREAM_LEGACY_TERRAIN = "1" # legacy mode
\$env:ACDREAM_LEGACY_TERRAIN = \$null # modern mode (default)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Swap TerrainChunkRenderer → TerrainModernRenderer (drop-in: same
AddLandblock/RemoveLandblock/Draw interface). Pass BindlessSupport
to TerrainAtlas.Build so GetBindlessHandles() is callable. Load the
new terrain_modern shader pair and pass to the renderer ctor. Add
[TERRAIN-DIAG] rollup mirroring the existing [WB-DIAG] pattern.
Bindless detection moved above terrain construction so atlas + ctor
can consume BindlessSupport (was previously detected after — order
required for N.5b).
Visual verification at four scenes (Holtburg flat + sloped, Foundry,
sloped landblock) is the next gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review (Important #1): AddLandblock validated Vertices.Length but
not Indices.Length. The indices loop indexes meshData.Indices[0..383]
unconditionally — out-of-range input would throw IndexOutOfRangeException
instead of the clearer ArgumentException the vertex check raises. Today
LandblockMesh.Build always produces 384/384, so this is defensive
forward-compat for future mesh sources.
Code review (Important #2): The shader (terrain_modern.vert:gl_VertexID
% 6) only correctly picks the cell-corner index because we bake
`slot * VertsPerLandblock` into indices and 384 is a multiple of 6.
That invariant is now documented in a comment near the constant — anyone
changing it must audit the shader.
Build green: 0 errors / 0 warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new terrain dispatcher. Single global VBO/EBO with a slot
allocator (one slot per landblock, 384 verts × 40 bytes per slot).
Per-frame: build DEIC array from visible slots, upload, dispatch
via glMultiDrawElementsIndirect. Atlas textures bound via bindless
handles set per-frame as sampler uniforms.
Total ~6-8 GL calls per frame for terrain regardless of visible
landblock count (vs today's per-LB binds at radius=2 → ~25 calls,
radius=5 → ~121 calls).
API mirrors TerrainChunkRenderer so GameWindow integration in T8 is
a drop-in field+ctor swap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review identified a latent false-positive flake risk: physics
path clamps fx = localX/24 to (CellsPerSide - 0.001f) = 7.999, which
corresponds to localX <= 191.976. With samples up to 191.999f,
physics computes Z at the clamped position while the mesh sampler
uses the actual position — a difference of up to 23 mm at the upper
edge, which on a steep slope would falsely trip the 1 mm sentinel.
Tighten upper bound to 191.975f (strictly below the clamp boundary)
so both oracles compute Z at the same (cellX, tx). Also restored the
"worst-case from SplitFormulaDivergenceTest" inline comment for
landblock 0x4D96 per code review suggestion #3.
Test still passes: 10/10 landblocks, 1000 samples, max |delta|
= 0.0153 mm (previously 0.0305 mm — confirms the prior worst-case
was indeed at the boundary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Z-conformance sentinel for issue #51's bug class. Sweeps 10
representative landblocks x 100 sample points (uniform random in
local 0..192 with fixed seed 42). For each point: compute meshTriZ
via barycentric interpolation in the matching triangle of the
LandblockMesh.Build output; compute physicsZ via
TerrainSurface.SampleZFromHeightmap; assert |delta| < 0.001m.
Catches any silent formula or vertex-layout drift between the
visual and physics paths. Skips gracefully if ACDREAM_DAT_DIR
isn't set (CI without dat data).
Local run with dat data: 10/10 landblocks loaded, 1000 samples,
max |delta| = 0.0305 mm (worst case: Direlands 0xC040).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fragment shader for the modern terrain dispatcher. Bit-identical math
to today's terrain.frag (per-cell maskBlend3 + Phase G fog + lightning
flash). Same #version 460 + GL_ARB_bindless_texture preamble change
as terrain_modern.vert. Sampling syntax unchanged — the bindless-ness
is invisible at the GLSL level.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vertex shader for the modern terrain dispatcher. Bit-identical math
to today's terrain.vert (Phase 3c per-cell mesh + Phase G AdjustPlanes
lighting). The only structural change is the version + bindless
extension preamble — sampler access stays a regular sampler2DArray
uniform; bindless-ness is invisible at the GLSL level.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add optional BindlessSupport ctor parameter + GetBindlessHandles()
method that returns (terrainHandle, alphaHandle) ulongs with both
textures made resident. Two-phase Dispose mirroring TextureCache
(MakeNonResident before DeleteTexture per ARB_bindless_texture spec).
Existing callers pass `Build(gl, dats)` unchanged; bindless = null
default keeps them working until T6/T8 wires the renderer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweeps all (lbX, lbY, cellX, cellY) tuples for the full 255x255
landblock map (~4.16M cells) and reports both the raw enum-output
disagreement (50.02%) and the diagonal-actually-painted disagreement
(49.98%) between WB's CalculateSplitDirection and acdream's
TerrainBlending.CalculateSplitDirection (which retail uses per
CLandBlockStruct::ConstructPolygons at retail addr 00531d10).
The two formulas behave like independent random hashes. Adopting
WB's pipeline wholesale would mis-render ~half the diagonals on
every landblock (Holtburg 0xA9B0: 29/64 cells = 45.3% wrong). This
data is the foundation for N.5b's Path A vs B vs C decision (kills
Path A).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything a fresh agent needs to pick up Phase N.5b (Terrain
on the Modern Rendering Path) without spelunking through the N.5
session history.
Front-loads the load-bearing constraint: issue #51 (WB's terrain split
formula diverges from retail's FSplitNESW). Lays out three viable
design paths (A: adopt WB's formula everywhere; B: keep retail's
formula and fork-patch WB; C: WB mesh layout but our formula). The
brainstorm needs to pick one, informed by quantified divergence rate
across representative landblocks.
Includes file-by-file inventory of acdream's terrain stack (1383 lines
across TerrainRenderer + TerrainChunkRenderer + TerrainAtlas + shaders)
vs WB's (1937 lines across TerrainRenderManager + TerrainGeometryGenerator
+ LandSurfaceManager). Eight brainstorm questions covering atlas model,
mesh ownership, index format, shader unification, streaming integration,
conformance test, and visual verification gate.
Mirrors the N.5 handoff structure that worked well last session:
TL;DR + where N.5 left things + what N.5b inherits + technical detail
+ files to read + brainstorm questions + acceptance criteria + first
30 minutes + things to NOT do.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records a new Phase A sub-piece: split the single ACDREAM_STREAM_RADIUS
into separate terrain + entity radii so terrain renders to a much
further horizon (WB-style) while entities/scenery stay at the current
closer radius.
Motivated by perf at ACDREAM_STREAM_RADIUS=5 dropping from ~810 fps
to ~200-300 fps because everything stays full-detail. Both retail and
WorldBuilder render terrain way out and strip entities at distance.
Estimate: 3-5 days for the radius split + fog tuning; +1 week if
terrain LOD via mesh decimation is included. Not yet brainstormed.
N.8 (sky + particles via WB's SkyboxRenderManager + ParticleEmitterRenderer)
was already on the roadmap; user confirmed they want it tracked there.
No edit needed for N.8 — already at the right level of detail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframe the selection-blink follow-up so it doesn't suggest near-term
work. Was listed in N.5 ship record as "Phase B.4 follow-up adds the
field" — now phrased as open backlog with the hook reserved in
mesh_modern.vert's InstanceData comment for whoever eventually picks
it up.
The shader hook itself is unchanged — change is purely doc wording in
the plan SHIP record + CLAUDE.md WB integration cribs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
N.5: Modern Rendering Path. WbDrawDispatcher now uses bindless
textures + glMultiDrawElementsIndirect on top of N.4's grouped
pipeline. Three SSBO uploads + 2 indirect calls per frame, ~12-15
total GL calls for entity rendering regardless of scene complexity.
Measured 1.23 ms / frame median at Holtburg courtyard (1662 groups,
~810 fps). User-gated visual verification PASS at Holtburg.
Includes ship-amendment: legacy renderer path formally retired
(InstancedMeshRenderer + StaticMeshRenderer + WbFoundationFlag
deleted). Bindless is now mandatory; missing extensions throw
NotSupportedException at startup with a clear error message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Corrects the SHIP commit's acceptance gate verdict on the legacy
escape hatch. The original gate "[x] ACDREAM_USE_WB_FOUNDATION=0
still works" was inaccurate — Task 15's mesh_instanced deletion left
InstancedMeshRenderer orphaned + non-functional. Resolution: formal
retirement of the legacy path within N.5 (the prior commit).
Updated acceptance gate verdict:
- [N/A] ACDREAM_USE_WB_FOUNDATION=0 — escape hatch retired in N.5;
modern path is now mandatory, bindless required at startup. Missing
bindless throws NotSupportedException with a clear error message.
All other gates unchanged from the SHIP commit:
- [x] Visual identity to N.4 — Task 10 + Task 14 USER GATE PASS
- [x] CPU dispatcher time <= 70% of N.4 — measured 1.23 ms/frame at
Holtburg courtyard, comfortably under threshold
- [x] drawsIssued <= 5 per pass (CPU GL calls) — 2 indirect calls/frame
- [x] All tests green — 71/71 in the relevant filter
- [ ] GPU rendering time +-10% of N.4 — DEFERRED (timer query
double-buffering, N.6 follow-up)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final cross-cutting review of N.5 found that Task 15's deletion of
mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned —
ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with
no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still
works" claim was inaccurate.
Resolution: formal retirement of the legacy renderer path within N.5
instead of deferring to N.6.
Deleted:
- src/AcDream.App/Rendering/InstancedMeshRenderer.cs
- src/AcDream.App/Rendering/StaticMeshRenderer.cs
- src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs
GameWindow simplified — capability detection is unconditional, missing
bindless throws NotSupportedException with a clear message at startup.
WbDrawDispatcher + mesh_modern shader load are mandatory after init.
No escape hatch.
GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on
AddLandblock/RemoveLandblock removed; adapter calls are unconditional
when the adapter is non-null.
PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable
static ctor removed (flag is gone; adapter calls are unconditional).
The ApplyLoadedTerrain physics-data loop was also simplified: the
EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone;
_pendingCellMeshes is now explicitly cleared to prevent unbounded
accumulation (the worker thread still populates it, but WB handles
EnvCell geometry through its own pipeline).
Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment
section added. Roadmap updated (N.5 ships with retirement; N.6 scope
narrowed to perf-only). CLAUDE.md "WB integration cribs" updated.
Perf baseline doc updated. WbDrawDispatcher class summary docstring
corrected to describe the as-shipped SSBO + multi-draw-indirect path.
ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7).
Bindless support is now a hard requirement. Modern desktop GPUs
universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters;
if a user hits the NotSupportedException, that's a real bug report
worth investigating, not a silent fallback.
Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bindless textures + glMultiDrawElementsIndirect on top of N.4's grouped
pipeline. Per-frame entity rendering: 3 SSBO uploads (instance matrices
@ binding=0, batch data @ binding=1, indirect commands) + 2 indirect
calls (opaque + transparent). Total ~12-15 GL calls per frame for entity
rendering, regardless of scene complexity.
Acceptance gates (spec §8.3):
- [x] Visual identity to N.4 — Task 10 USER GATE PASS (Holtburg courtyard)
+ Task 14 USER GATE PASS (general roaming, no regressions seen)
- [x] CPU dispatcher time ≤ 70% of N.4 — measured 1.23 ms/frame median
at Holtburg courtyard (1662 groups, ~810 fps); estimated N.4
hot path ≥2.5 ms/frame; comfortably under threshold
- [x] drawsIssued ≤ 5 per pass (CPU GL calls) — exactly 2 indirect calls
per frame regardless of scene size
- [x] All tests green — 71/71 in
FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless
- [x] ACDREAM_USE_WB_FOUNDATION=0 still works — InstancedMeshRenderer
escape hatch preserved (its own shader path, untouched)
- [ ] GPU rendering time within ±10% of N.4 — DEFERRED to N.6.
GL_TIME_ELAPSED query polling never reports avail!=1 within the
same frame; needs double-buffering. CPU is the load-bearing metric.
Plan amendments captured during execution:
- Task 2: parallel Texture2DArray upload path (replacing the original
"switch globally" framing that would've broken 4 legacy consumers)
- Task 3+4: parallel bindless cache dictionaries (avoiding the GLSL
type mismatch from sampling a Texture2D handle via sampler2DArray)
- Task 5: preserved mesh_instanced.frag's full SceneLighting UBO + 8
lights + fog + lightning flash + per-channel clamp
- Task 9: BatchDataPublic Pack=8 (required for safe MemoryMarshal.Cast)
Plan archived at:
docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md
Spec at:
docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md
Perf baseline at:
docs/plans/2026-05-08-phase-n5-perf-baseline.md
Memory at:
~/.claude/.../memory/project_phase_n5_state.md
Files changed: 6 added, 6 modified, 2 deleted. 19 tasks shipped across
~40 commits including amendments + fixups + reviews.
N.6 follow-ups: retire InstancedMeshRenderer entirely; GPU timer query
double-buffering; persistent-mapped buffers if profiling shows the
residual glBufferData hot spot; possible WB atlas adoption for memory
savings on shared content; possible GPU-side culling via compute pre-pass;
per-instance highlight (selection blink) for retail-faithful click feedback
(field reserved in mesh_modern.vert's InstanceData struct).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves N.5 from in-flight to Shipped (2026-05-08). N.6 (retire
InstancedMeshRenderer + perf polish) becomes the in-flight phase.
CLAUDE.md in-flight pointer updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the as-shipped state: acceptance gate verdicts, plan amendments
captured during execution, code-review adjustments per task, out-of-scope
N.6 follow-ups, and a complete files-changed summary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mesh_instanced.vert + .frag deleted. WbDrawDispatcher always uses
mesh_modern when WB foundation is on. Legacy escape hatch
(ACDREAM_USE_WB_FOUNDATION=0 or bindless missing) runs through
InstancedMeshRenderer which has its own shader path — untouched.
GameWindow's else-branch removed; if bindless is missing, _meshShader
stays unloaded, _wbDrawDispatcher stays null, and _staticMesh is not
constructed (its guard requires _meshShader non-null). All downstream
_staticMesh usages were already null-safe (null-conditional operators
or explicit null guards). Two null-forgiving suppressors added at the
WbDrawDispatcher + SkyRenderer construction sites where the compiler
couldn't prove non-null but the logic guarantees it (both require
_bindlessSupport non-null, which implies _meshShader was assigned;
_textureCache is assigned unconditionally).
InstancedMeshRenderer.cs: the one reference to mesh_instanced was
a code comment (location 3 NOT used by mesh_instanced.vert) — not
a file load. Escape hatch code path is preserved; the shader comment
is now stale but low priority.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CPU dispatcher: 1227 µs / frame median (1303 µs p95) at Holtburg
courtyard, 1662 groups in working set. Inferred ~810 fps sustained.
CPU dispatcher acceptance gate (≤70% of N.4): PASS — N.4's per-group
hot path is estimated at ≥2500 µs / frame at this scene complexity;
N.5 is comfortably under half.
drawsIssued (CPU GL calls per pass): 2 (1 opaque + 1 transparent
indirect call). Down from N.4's ~hundreds per pass. PASS.
GPU timing: unmeasured. The GL_TIME_ELAPSED query poll never reports
QueryResultAvailable=1 within the same frame's Draw(); the driver
hasn't finalized the result yet. Fix is double-buffering (queryA
on frame N, read on N+2). Deferred to N.6 perf polish — doesn't block
N.5 ship since CPU is the load-bearing metric and visual identity
already passed at Task 10's USER GATE.
Direct N.4 baseline NOT measured. Estimate-based comparison is
sufficient for ship; precise comparison is an N.6 follow-up.
Baseline doc at docs/plans/2026-05-08-phase-n5-perf-baseline.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds median + 95th-percentile CPU + GPU dispatch time to the existing
5-second [WB-DIAG] rollup. CPU via Stopwatch (always running, cheap;
only logged under ACDREAM_WB_DIAG=1). GPU via two GL_TIME_ELAPSED
queries (opaque + transparent) wrapping each glMultiDrawElementsIndirect,
polled non-blocking via QueryResultAvailable on the next frame.
Sample window is 256 frames per signal; median + p95 reported.
Numbers populate the SHIP commit's perf table at Task 19.
Silk.NET naming note: GL_TIME_ELAPSED queries use QueryTarget.TimeElapsed
(confirmed present in Silk.NET.OpenGL 2.23.0 DLL). The 64-bit result is
read via GetQueryObject(..., out ulong) which dispatches to
glGetQueryObjectui64v; the int overload (glGetQueryObjectiv) is used for
the ResultAvailable poll, matching WorldBuilder's VisibilityManager pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in Decision 2 (Opaque + ClipMap → opaque indirect; AlphaBlend +
Additive + InvAlpha → transparent indirect). Catches future refactors
that drift the partition — silent visual regression otherwise (groups
rendered in the wrong pass with the wrong blend state).
Adds public static IsOpaquePublic shim on WbDrawDispatcher; the
underlying IsOpaque stays private.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces WbDrawDispatcher's per-group glDrawElementsInstancedBaseVertexBaseInstance
loop with two glMultiDrawElementsIndirect calls (opaque + transparent).
Per-frame uploads three SSBOs:
- _instanceSsbo @ binding=0 (mat4 per instance, indexed by gl_BaseInstanceARB + gl_InstanceID)
- _batchSsbo @ binding=1 (BatchData per group, indexed by gl_DrawIDARB)
- _indirectBuffer (DrawElementsIndirectCommand[] — opaque first, transparent second)
GameWindow swaps the shader load to mesh_modern when _bindlessSupport
is non-null. Capability detection + shader load now run in the right
order (capability before TextureCache + before Shader).
Deletes the obsolete DrawGroup stub, EnsureInstanceAttribs, _instanceBuffer,
_patchedVaos. ClassifyBatches + ResolveTexture already migrated in
Task 8 to use ulong bindless handles.
BuildIndirectArrays (Task 9) wired in: _opaqueDraws + _translucentDraws
are flattened into IndirectGroupInput[], laid out via the helper into
contiguous indirect commands + parallel BatchData[]. opaqueByteOffset=0,
transparentByteOffset = opaqueCount × DrawCommandStride.
Visual verification (USER GATE) PASS: Holtburg courtyard renders
identical to N.4 — terrain, scenery, characters, NPCs all visible
without artifacts. [N.5] modern path capabilities present + mesh_modern
shader loaded log lines confirm the boot path. [WB-DIAG] hot-path
counters show healthy entity/draw activity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught:
- sizeofDEIC was a local; promoted to public const DrawCommandStride
so tests can reference it symbolically.
- BatchDataPublic layout invariant (size + field offsets) wasn't
asserted in tests. Added BatchDataPublic_LayoutMatchesPrivateBatchData
+ DrawCommandStride_MatchesStructSize tests to gate Task 10's
MemoryMarshal.Cast<BatchData, BatchDataPublic> safety.
- Plan doc updated: BatchDataPublic spec was Pack=4 (wrong — must
match private BatchData's Pack=8 for the cast to work). Implementation
was already correct; plan now matches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure CPU helper that lays out a group list into a contiguous indirect
buffer (DrawElementsIndirectCommand[]) and parallel BatchData[] —
opaque section first, transparent section second. Returns counts +
byte offset for the transparent section.
Tests cover: spec §5 walk-through layout; empty group list edge case;
ClipMap classification (treated as opaque, not transparent).
Static + public so tests can exercise without a GL context. Task 10
wires it into the rewritten Draw() method.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces uint TextureHandle (32-bit GL name) with ulong
BindlessTextureHandle (64-bit) in InstanceGroup + GroupKey + ResolveTexture
return type. Adds TextureLayer (always 0 for per-instance composites,
becomes meaningful when WB atlas is adopted in N.6).
ClassifyBatches now calls TextureCache.GetOrUpload*Bindless variants —
these return Texture2DArray-backed bindless handles (Task 3 work).
DrawGroup body throws NotImplementedException — Task 10 rewrites the
whole Draw() method to use glMultiDrawElementsIndirect, which makes
DrawGroup obsolete. CPU-only tests don't invoke DrawGroup so the build
+ test gates stay green; visual launch fails until Task 10 (intentional).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught that BatchData uses Pack=4 but contains a
ulong field. With the current field order (TextureHandle first), offset
0 is always 8-byte aligned so std430 works. But adding a 4-byte field
before TextureHandle without bumping Pack would silently misalign the
GPU struct. Pack=8 makes the alignment requirement explicit and adds
a comment documenting expected std430 offsets.
No runtime change — current offsets (0/8/12) are identical under both
Pack values for this field order.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds DrawElementsIndirectCommand struct (20-byte layout for
glMultiDrawElementsIndirect). Replaces _instanceVbo field on
WbDrawDispatcher with three buffers: _instanceSsbo (mat4[]),
_batchSsbo (BatchData[]), _indirectBuffer (DEIC[]). Adds BindlessSupport
constructor parameter — non-null required since the dispatcher is only
constructed when WB foundation is on (which implies bindless is present
per Task 6 capability detection).
Existing Draw() method substitutes _instanceVbo -> _instanceSsbo for
compile. Behavior is temporarily wrong (SSBO bound as ArrayBuffer for
per-vertex attribs); Tasks 9-10 fully rewrite the draw loop and the
per-frame uploads to use BindBufferBase + glMultiDrawElementsIndirect.
GameWindow construction site updated to add _bindlessSupport guard and
pass it as the new last argument to the constructor. Dispatcher is only
constructed when bindless is guaranteed present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught:
- Silent failure when ARB_bindless_texture absent — the && short-circuit
meant the most common fallback case (no bindless on the GPU) had no
log, while ARB_shader_draw_parameters absent did log. Restructured to
three nested ifs so each failure path logs symmetrically.
- Redundant `bindless is not null` guard removed (TryCreate's non-null
guarantee covers it; the nested-if structure makes this implicit).
- HasShaderDrawParameters in BindlessSupport.cs replaced its manual
GL_NUM_EXTENSIONS scan with `gl.IsExtensionPresent(...)` — same
pattern WB uses, less code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detects ARB_bindless_texture + ARB_shader_draw_parameters at startup
when WbFoundationFlag is enabled. Stores BindlessSupport on GameWindow
and passes it to TextureCache so the parallel Texture2DArray upload
path is available to future bindless callers.
Mesh shader load remains mesh_instanced for now — Task 10 swaps to
mesh_modern after Tasks 7-9 rewire the dispatcher to consume the
bindless + SSBO + indirect machinery.
Capability missing → BindlessSupport stays null → TextureCache runs
without the bindless path → legacy callers (StaticMeshRenderer,
InstancedMeshRenderer, ParticleRenderer, current WbDrawDispatcher
draw loop) are unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught four issues:
- Unnecessary GL_ARB_bindless_texture extension in mesh_modern.vert
(vert doesn't use bindless types). Removed; only the frag needs it.
- SSBO binding=1 (BatchBuffer) and UBO binding=1 (SceneLighting) are
in distinct GL namespaces — added a comment in the vert documenting
this so Task 10's bind site doesn't get confused.
- Misleading "0=opaque, 1=transparent" comment expanded to spell out
the full Decision 2 two-pass alpha-test logic and what each discard
threshold protects against.
- BatchData.flags field is reserved; documented that N.5's dispatcher
owns all blend state, with a hook for future shader-side additive.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New entity shaders for the WB modern rendering path. Modeled on WB's
StaticObjectModern.* but adapted to acdream's lighting model:
- Drops uActiveCells (we cull cells on CPU in WbDrawDispatcher)
- Drops uDrawIDOffset (full passes, no pagination)
- Drops uHighlightColor (deferred to Phase B.4 follow-up; field reserved
in InstanceData struct comment)
- Preserves mesh_instanced's SceneLighting UBO at binding=1 with 8 lights,
fog params, lightning flash, per-channel clamp — full visual identity
vert reads InstanceData[] @ binding=0 indexed by gl_BaseInstanceARB +
gl_InstanceID for the per-entity model matrix; reads BatchData[] @
binding=1 indexed by gl_DrawIDARB for the per-group bindless texture
handle + layer.
frag samples sampler2DArray reconstructed from a uvec2 bindless handle
+ uint layer. uRenderPass uniform picks two-pass alpha-test thresholds:
0 = opaque (discard alpha<0.95), 1 = transparent (discard alpha>=0.95
and alpha<0.05).
Not yet wired to the dispatcher — Task 6 sets up shader load + capability
detection in GameWindow; Task 7-10 rewrite the dispatcher to use SSBO +
glMultiDrawElementsIndirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original Task 5 draft used hardcoded vec3 ambient/sun uniforms in
mesh_modern.frag. Reading actual mesh_instanced.frag revealed it uses
a SceneLighting UBO at binding=1 with 8 lights, fog params (start/end/
lightning/mode), fog color, camera/time, and per-channel clamp.
Revised: mesh_modern.frag preserves the full SceneLighting UBO +
accumulateLights + applyFog + lightning flash + per-channel clamp.
mesh_modern.vert adds vWorldPos output (consumed by accumulateLights
and applyFog). Visual identity to N.4's lighting model preserved.
Two-pass alpha-test (N.5 Decision 2) sits inside the same shader,
gated by uRenderPass instead of uTranslucencyKind.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught four issues:
- Critical: Dispose interleaved MakeNonResident + DeleteTexture per
entry, violating ARB_bindless_texture's "all handles non-resident
before any texture deletion" requirement. Reordered to two phases:
Phase 1 makes ALL bindless handles non-resident; Phase 2 deletes
ALL bindless textures; Phase 3 deletes legacy Texture2D textures.
- Important: per-call _bindless?.MakeNonResident replaced with a
single if (_bindless is not null) guard around the whole Phase 1
block — cleaner reasoning, one null check.
- Minor: test contract comment referenced wrong task number for
visual gate; corrected to match current plan.
- Minor: two abbreviated XML docs (GetOrUploadWithOrigTextureOverrideBindless,
GetOrUploadWithPaletteOverrideBindless) expanded to mention the
throw-on-null-bindless contract for IDE readers.
This fixup also completes Task 4's Dispose work — Task 4 will be
marked complete since this commit does its full job.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds three Bindless variants (GetOrUploadBindless,
GetOrUploadWithOrigTextureOverrideBindless,
GetOrUploadWithPaletteOverrideBindless) that decode + upload via
UploadRgba8AsLayer1Array (Texture2DArray) and cache in three new
dictionaries that mirror the legacy three-cache structure. Each entry
stores both the GL texture name (for Dispose cleanup in Task 4) and
the resident bindless handle.
Constructor gains optional BindlessSupport param; null keeps backward
compat. EnsureBindlessAvailable throws InvalidOperationException if
Bindless* methods are called without BindlessSupport (fail-fast vs
silent zero handle that would produce GPU faults).
Dispose extended to make handles non-resident before deleting the
underlying Texture2DArray names (bindless handles must be made
non-resident before the texture is deleted; skipping this causes
GPU faults on driver cleanup).
Marker test in TextureCacheBindlessTests documents the throw contract
for future engineers; real bindless integration is verified at
Task 14's visual gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Original Task 3 had Bindless* methods calling the legacy Texture2D
GetOrUpload* and converting the GL name to a bindless handle —
producing a sampler2D texture sampled via sampler2DArray (GLSL type
mismatch).
Revised: Task 3 introduces three parallel cache dictionaries
(_bindlessBySurfaceId / _bindlessByOverridden / _bindlessByPalette)
storing both the GL texture name and the resident handle. Bindless*
methods call DecodeFromDats + UploadRgba8AsLayer1Array directly with
their own caching; legacy three-cache structure mirrored exactly.
Task 4 (Dispose) updated to: (1) MakeNonResident on every bindless
handle FIRST, (2) DeleteTexture on every Texture2DArray name, (3)
DeleteTexture on every legacy Texture2D handle. Order matters per
ARB_bindless_texture spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught that the TexImage3D call dropped the
depth: and border: named arguments specified in the plan. The bare
positional `1, 0` is hard to disambiguate from the surrounding 10
parameters. Adds them back, no runtime change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds UploadRgba8AsLayer1Array — uploads pixel data as a 1-layer
Texture2DArray. Existing UploadRgba8 (Texture2D) untouched, so legacy
callers (StaticMeshRenderer, InstancedMeshRenderer, ParticleRenderer,
WbDrawDispatcher's pre-rewrite path) keep working unchanged.
Required for Task 3's Bindless* methods which need the Texture2DArray
target so the WB modern shader can sample via sampler2DArray. Same
surface may be uploaded both ways during the N.5/N.6 transition;
doubling is bounded and acceptable. After N.6 retires legacy
renderers entirely, the legacy UploadRgba8 becomes unused and is
deleted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>