acdream/docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md
Erik dcae2b6b94 phase(N.5): retirement amendment — InstancedMeshRenderer + StaticMeshRenderer + WbFoundationFlag deleted
Final cross-cutting review of N.5 found that Task 15's deletion of
mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned —
ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with
no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still
works" claim was inaccurate.

Resolution: formal retirement of the legacy renderer path within N.5
instead of deferring to N.6.

Deleted:
- src/AcDream.App/Rendering/InstancedMeshRenderer.cs
- src/AcDream.App/Rendering/StaticMeshRenderer.cs
- src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs

GameWindow simplified — capability detection is unconditional, missing
bindless throws NotSupportedException with a clear message at startup.
WbDrawDispatcher + mesh_modern shader load are mandatory after init.
No escape hatch.

GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on
AddLandblock/RemoveLandblock removed; adapter calls are unconditional
when the adapter is non-null.

PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable
static ctor removed (flag is gone; adapter calls are unconditional).

The ApplyLoadedTerrain physics-data loop was also simplified: the
EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone;
_pendingCellMeshes is now explicitly cleared to prevent unbounded
accumulation (the worker thread still populates it, but WB handles
EnvCell geometry through its own pipeline).

Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment
section added. Roadmap updated (N.5 ships with retirement; N.6 scope
narrowed to perf-only). CLAUDE.md "WB integration cribs" updated.
Perf baseline doc updated. WbDrawDispatcher class summary docstring
corrected to describe the as-shipped SSBO + multi-draw-indirect path.
ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7).

Bindless support is now a hard requirement. Modern desktop GPUs
universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters;
if a user hits the NotSupportedException, that's a real bug report
worth investigating, not a silent fallback.

Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:01:36 +02:00

554 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase N.5 — Modern Rendering Path — Design Spec
**Status:** Draft (brainstormed 2026-05-08, not yet implemented).
**Author:** acdream lead engineer + Claude.
**Builds on:** Phase N.4 (`WbDrawDispatcher`, shipped 2026-05-08).
**Predecessor docs:**
- `docs/research/2026-05-08-phase-n5-handoff.md` (cold-start briefing).
- `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md` (N.4 plan; Adjustments 7-10 are required reading).
- `docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md` (N.4 spec).
---
## 1. Problem statement
N.4 collapsed entity rendering from O(entities × batches) per-draw GL calls to O(unique GfxObj × surface × translucency) grouped instanced draws. The remaining hot path still does, per group:
```
glActiveTexture(0)
glBindTexture(2D, texHandle)
glBindBuffer(EBO, batchIbo)
glDrawElementsInstancedBaseVertexBaseInstance(...)
```
Across a typical Holtburg-courtyard scene that's still ~100-300 GL calls per frame for entities. Modern GPUs and our drivers (GL 4.3 + bindless, gated by WB's `_useModernRendering`) support patterns that eliminate ALL of those per-group calls:
- **Bindless textures** (`GL_ARB_bindless_texture`) — texture handles are 64-bit tokens that don't require `glBindTexture` to use; the shader samples from a handle read out of buffer data.
- **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — one GL call dispatches N draws from a `DrawElementsIndirectCommand` buffer; the driver issues all of them with no CPU-side per-draw work.
N.5 lifts `WbDrawDispatcher` onto these primitives. Target: ≥30% reduction in CPU dispatcher time, draw call count down to ~5/frame, no visual regression vs N.4.
---
## 2. Decisions log
This section records the brainstorm outcomes that the rest of the doc relies on.
| # | Decision | Choice | Reason |
|---|---|---|---|
| 1 | Texture sampler model | **`sampler2DArray`** for ALL textures (1-layer wrapping for per-instance composites) | Matches WB's modern shader exactly; future-proofs for atlas adoption in N.6+; avoids two shader files. ~50 lines of TextureCache change. |
| 2 | Translucent rendering | **WB's two-pass alpha-test** (opaque pass discards `α<0.95`, transparent pass discards `α≥0.95`) | Single blend mode per pass enables one indirect call per pass. Loses native `Additive` blend on GfxObj surfaces; sky + particles have own renderers and aren't affected. Falsifiable at visual verification — if we see a regression, add an additive sub-pass (~30-min fix). |
| 3 | Per-instance + per-draw data delivery | **All-SSBO**: `Instances[]` at binding=0 (mat4 per instance), `Batches[]` at binding=1 (texture handle + layer + flags per group) | Matches WB's modern shader. SSBOs avoid the 16-attrib stride limit, scale to large instance counts, give clean per-draw indexing via `gl_DrawIDARB`. |
| 4 | Bindless handle residency | **Resident on upload, never release** | acdream's content set is bounded (~1-5K unique textures per session). Handles persist for process lifetime; no eviction code in N.5. Diagnostic logging of handle count under `ACDREAM_WB_DIAG=1` to spot growth. |
| 5 | Escape hatch | **Modern path mandatory (N.5 ship amendment)**. `WbFoundationFlag` and `ACDREAM_USE_WB_FOUNDATION` env var have been deleted. Missing `GL_ARB_bindless_texture` or `GL_ARB_shader_draw_parameters` throws `NotSupportedException` at startup with a clear error message. No fallback. | Escape hatch was never exercised after N.4 ship. Legacy `InstancedMeshRenderer` + `StaticMeshRenderer` deleted in the N.5 retirement commit. N.6 scope narrowed accordingly. |
| 6 | Perf measurement | **CPU stopwatch + GL timer queries** logged via `[WB-DIAG]` | Captures both CPU dispatcher time and GPU rendering time. Acceptance gate compares before/after numbers in fixed Holtburg/Foundry scenes. |
| 7 | Persistent-mapped buffers | **Defer to N.6** | Bindless+indirect win is 70-80% of achievable savings. Persistent-mapped + ring + sync is the last 5-10% with non-trivial sync-fence complexity; not worth the risk in N.5's 2-3 week budget. Add post-N.5 if profiling shows residual `glBufferData` cost. |
| 8 | Per-instance highlight (selection blink) | **Defer to a Phase B.4 follow-up** | Retail pulses click targets as visual confirmation; the right mechanism is per-instance highlight color (NOT WB's global `uHighlightColor` which would tint everything in our single-indirect-call design). Field is reserved in design (extend `InstanceData` to include `vec4 highlightColor`); N.5 ships without the field, future phase plumbs it without shader rewrite. |
---
## 3. Architecture overview
### What changes
`WbDrawDispatcher.Draw` swaps its inner loop. Phases 1-3 (entity walk, group bucketing, matrix layout) stay intact. Phases 5-6 (per-group GL calls) are replaced by a single `glMultiDrawElementsIndirect` per pass, fed by SSBO-resident per-instance and per-draw data.
### What's preserved from N.4
- Group bucketing pipeline (entity AABB cull, palette hash memo, group key dictionary).
- `AcSurfaceMetadataTable` for translucency classification.
- `EntitySpawnAdapter` / `LandblockSpawnAdapter` (mesh lifecycle bridge).
- `WbMeshAdapter` (the seam over WB's `ObjectMeshManager`).
- Front-to-back sort of opaque groups (depth-test reject of overdrawn fragments).
- Per-entity 5m AABB frustum cull.
### What's new
- `TextureCache` uploads as 1-layer `Texture2DArray` instead of `Texture2D`. Generates 64-bit bindless handles at upload, makes them resident.
- New shader pair `mesh_modern.vert/.frag` modeled on WB's `StaticObjectModern` but adapted (see §6).
- Three new GPU buffers in the dispatcher:
- `_instanceSsbo``std430` layout, `mat4[]`, all visible matrices.
- `_batchSsbo``std430` layout, `BatchData[]`, one entry per group.
- `_indirectBuffer``DrawElementsIndirectCommand[]`, one per group.
- Two diagnostic measurements in `[WB-DIAG]`: CPU stopwatch span around `Draw()`; GPU `GL_TIME_ELAPSED` query around the indirect dispatch.
### What gets deleted
- `WbDrawDispatcher.DrawGroup` (replaced by indirect).
- `WbDrawDispatcher.EnsureInstanceAttribs` (no more vertex attribs at locations 3-6).
- Per-blend-mode `glBlendFunc` switch in the translucent loop.
- `mesh_instanced.vert/.frag` (replaced by `mesh_modern.*`).
### What stays under the escape hatch
`InstancedMeshRenderer` is untouched. `ACDREAM_USE_WB_FOUNDATION=0` still routes there. N.6 retires it.
---
## 4. Component changes
### 4.1 `TextureCache`
Texture upload path becomes Texture2DArray with depth=1:
```csharp
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
{
uint tex = _gl.GenTexture();
_gl.BindTexture(TextureTarget.Texture2DArray, tex);
fixed (byte* p = decoded.Rgba8)
_gl.TexImage3D(
TextureTarget.Texture2DArray, 0, InternalFormat.Rgba8,
(uint)decoded.Width, (uint)decoded.Height, depth: 1,
border: 0, PixelFormat.Rgba, PixelType.UnsignedByte, p);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat);
_gl.BindTexture(TextureTarget.Texture2DArray, 0);
return tex;
}
```
Bindless handle generation, eager + resident-on-upload, parallel cache:
```csharp
private readonly Dictionary<uint, ulong> _bindlessHandlesByGlName = new();
private ulong MakeResidentHandle(uint glTextureName)
{
if (_bindlessHandlesByGlName.TryGetValue(glTextureName, out var h))
return h;
h = _bindless.GetTextureHandleARB(glTextureName);
_bindless.MakeTextureHandleResidentARB(h);
_bindlessHandlesByGlName[glTextureName] = h;
return h;
}
```
Three new methods returning `ulong` bindless handles, paralleling the existing `uint` GL-name methods:
```csharp
public ulong GetOrUploadBindless(uint surfaceId);
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId);
public ulong GetOrUploadWithPaletteOverrideBindless(uint surfaceId, uint? overrideOrigTextureId, PaletteOverride paletteOverride, ulong precomputedPaletteHash);
```
Each delegates to its existing `uint` sibling to populate the underlying GL texture, then calls `MakeResidentHandle` and returns the 64-bit handle.
The `uint`-returning methods stay (used by `SkyRenderer`, `TerrainAtlas`, anything outside the WB modern path).
`Dispose` releases bindless handles BEFORE deleting their textures: iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`, then `glDeleteTextures` proceeds as today.
### 4.2 `WbDrawDispatcher`
Three new GPU buffers (replacing `_instanceVbo`):
```csharp
private uint _instanceSsbo; // binding=0, std430, mat4[]
private uint _batchSsbo; // binding=1, std430, BatchData[]
private uint _indirectBuffer; // GL_DRAW_INDIRECT_BUFFER, DEIC[]
```
`InstanceGroup` becomes:
```csharp
private sealed class InstanceGroup
{
public uint Ibo;
public uint FirstIndex;
public int BaseVertex;
public int IndexCount;
public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4)
public uint TextureLayer; // always 0 in N.5 (per-instance composites are 1-layer arrays)
public TranslucencyKind Translucency;
public int FirstInstance;
public int InstanceCount;
public float SortDistance;
public readonly List<Matrix4x4> Matrices = new();
}
```
`GroupKey` adds the layer:
```csharp
private readonly record struct GroupKey(
uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount,
ulong BindlessTextureHandle, uint TextureLayer, TranslucencyKind Translucency);
```
Per-frame draw flow:
1. **Walk entities → build `_groups` dict** (unchanged from N.4).
2. **Lay matrices contiguously, split opaque/transparent, sort opaque** (unchanged).
3. **Build per-group BatchData and DEIC arrays.** One `BatchData` per group `(handle, layer, flags=0)`. One DEIC per group `(count = IndexCount, instanceCount = InstanceCount, firstIndex = FirstIndex, baseVertex = BaseVertex, baseInstance = FirstInstance)`. Indirect commands are laid out contiguously: opaque section first (sorted front-to-back), transparent section second. `_opaqueDrawCount` and `_transparentDrawCount` track section sizes; `_transparentByteOffset = _opaqueDrawCount * sizeof(DEIC)`.
4. **Three `glBufferData` uploads** to `_instanceSsbo`, `_batchSsbo`, `_indirectBuffer` (single buffer, both sections).
5. **Bind global VAO once** (preserved from N.4 — modern rendering shares one VAO).
6. **Bind SSBOs once** via `glBindBufferBase(SHADER_STORAGE_BUFFER, 0, _instanceSsbo)` and `... 1, _batchSsbo`.
7. **Opaque pass.** Set `uRenderPass = 0`. `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)0, drawcount=_opaqueDrawCount, stride=sizeof(DEIC))`.
8. **Transparent pass.** Set `uRenderPass = 1`. `glEnable(BLEND)` + `glBlendFunc(SrcAlpha, OneMinusSrcAlpha)` + `glDepthMask(false)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)_transparentByteOffset, drawcount=_transparentDrawCount, stride=sizeof(DEIC))`.
9. **Restore state.** `glDepthMask(true)` + `glDisable(BLEND)` + `glBindVertexArray(0)`.
Diagnostic timing (under `ACDREAM_WB_DIAG=1`):
- CPU: `Stopwatch` started at the top of `Draw()`, stopped at the bottom. Median + 95th-percentile flushed in the 5-second `[WB-DIAG]` rollup.
- GPU: `glGenQueries` two query objects (one for opaque, one for transparent). `glBeginQuery(TIME_ELAPSED) / glEndQuery` around each `glMultiDrawElementsIndirect`. Result polled with `GL_QUERY_RESULT_NO_WAIT` on the next frame's start; if not ready, drop the sample and try again.
### 4.3 New shader files
`src/AcDream.App/Shaders/mesh_modern.vert`:
```glsl
#version 430 core
#extension GL_ARB_bindless_texture : require
#extension GL_ARB_shader_draw_parameters : require
layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;
struct InstanceData {
mat4 transform;
// Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight):
// vec4 highlightColor; // RGBA — when non-zero alpha, fragment shader mixes into output.
// Add field here, increase stride to 80 bytes, and read at fragment via flat varying.
};
struct BatchData {
uvec2 textureHandle; // bindless handle for sampler2DArray
uint textureLayer; // layer index (always 0 for per-instance composites)
uint flags; // reserved for future use
};
layout(std430, binding = 0) readonly buffer InstanceBuffer {
InstanceData Instances[];
};
layout(std430, binding = 1) readonly buffer BatchBuffer {
BatchData Batches[];
};
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
// matches existing acdream lighting UBO; do not change layout
};
uniform mat4 uViewProjection;
uniform int uRenderPass; // 0=opaque, 1=transparent (consumed in fragment shader)
out vec3 vNormal;
out vec2 vTexCoord;
out flat uvec2 vTextureHandle;
out flat uint vTextureLayer;
void main() {
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
mat4 model = Instances[instanceIndex].transform;
vec4 worldPos = model * vec4(aPosition, 1.0);
gl_Position = uViewProjection * worldPos;
vNormal = normalize(mat3(model) * aNormal);
vTexCoord = aTexCoord;
BatchData b = Batches[gl_DrawIDARB];
vTextureHandle = b.textureHandle;
vTextureLayer = b.textureLayer;
}
```
`src/AcDream.App/Shaders/mesh_modern.frag`:
```glsl
#version 430 core
#extension GL_ARB_bindless_texture : require
in vec3 vNormal;
in vec2 vTexCoord;
in flat uvec2 vTextureHandle;
in flat uint vTextureLayer;
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
};
uniform int uRenderPass;
out vec4 FragColor;
void main() {
sampler2DArray tex = sampler2DArray(vTextureHandle);
vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));
if (uRenderPass == 0) {
// Opaque pass: discard soft pixels (alpha cutout), write to depth
if (color.a < 0.95) discard;
} else {
// Transparent pass: discard hard pixels (already drawn opaque), no depth write
if (color.a >= 0.95) discard;
if (color.a < 0.05) discard; // skip totally-empty fragments — perf for large transparent overdraw
}
// Diffuse lighting (preserved from acdream's existing lighting model)
vec3 N = normalize(vNormal);
vec3 L = normalize(uSunDir.xyz);
float diff = max(dot(N, L), 0.0);
vec3 lit = uAmbient.rgb + uSunColor.rgb * diff;
color.rgb *= clamp(lit, 0.0, 1.0);
FragColor = color;
}
```
Differences from WB's `StaticObjectModern.*`:
- Drops `uActiveCells[]` cell-filtering (acdream culls cells on CPU).
- Drops `uDrawIDOffset` (acdream issues full passes, no pagination).
- Drops `uHighlightColor` (deferred to Phase B.4 follow-up; reserved as per-instance `highlightColor` field, not a global uniform).
- Adapts the lighting model to acdream's existing UBO at binding=1 instead of WB's `SceneData` UBO.
- Uses 1-layer `sampler2DArray` for ALL textures (WB uses multi-layer atlases — same shader works for both shapes).
---
## 5. Per-frame data flow walk-through
A concrete trace. Visible work for frame N:
| Group | GfxObj | Surface | Translucency | Instances |
|---|---|---|---|---|
| 0 | oak tree | bark | Opaque | 12 |
| 1 | oak tree | leaves | AlphaBlend | 12 |
| 2 | drudge | skin (palette override) | Opaque | 1 |
| 3 | drudge | eyes | Opaque | 1 |
**Instance SSBO** (binding=0), 26 entries (each batch contributes its own copy of the entity matrix):
```
[0..11] = oak instance matrices (group 0 — bark)
[12..23] = oak instance matrices (group 1 — leaves)
[24] = drudge instance matrix (group 2 — skin)
[25] = drudge instance matrix (group 3 — eyes)
```
**Batch SSBO** (binding=1), 4 entries indexed by `gl_DrawIDARB`:
```
Batches[0] = (oak_bark_handle, layer=0, flags=0)
Batches[1] = (oak_leaves_handle, layer=0, flags=0)
Batches[2] = (drudge_skin_handle_with_palette, layer=0, flags=0)
Batches[3] = (drudge_eyes_handle, layer=0, flags=0)
```
**Indirect buffer** (single buffer, two sections):
```
_indirectBuffer[0..2] = opaque section (3 entries, sorted front-to-back)
[0] = (count=oakBarkIdx, instanceCount=12, firstIndex=oakBarkFI, baseVertex=oakBV, baseInstance=0)
[1] = (count=drudgeSkinIdx, instanceCount=1, firstIndex=drudgeSkinFI, baseVertex=drudgeBV, baseInstance=24)
[2] = (count=drudgeEyesIdx, instanceCount=1, firstIndex=drudgeEyesFI, baseVertex=drudgeBV, baseInstance=25)
_indirectBuffer[3] = transparent section (1 entry)
[3] = (count=oakLeavesIdx, instanceCount=12, firstIndex=oakLeavesFI, baseVertex=oakBV, baseInstance=12)
_opaqueDrawCount = 3; _transparentDrawCount = 1; _transparentByteOffset = 3 * sizeof(DEIC) = 60.
```
**Shader access pattern** (per vertex):
```glsl
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; // unique per (group, instance) pair
mat4 model = Instances[instanceIndex].transform;
BatchData b = Batches[gl_DrawIDARB]; // shared across all verts in this draw
sampler2DArray tex = sampler2DArray(b.textureHandle);
vec4 color = texture(tex, vec3(aTexCoord, float(b.textureLayer)));
```
**Per-frame CPU GL calls** (entity rendering, total):
- 3× `glBufferData` (instance SSBO, batch SSBO, indirect buffer).
- 1× `glBindVertexArray(globalVAO)`.
- 2× `glBindBufferBase` (SSBOs at bindings 0 + 1).
- 1× `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`.
- 2× `glMultiDrawElementsIndirect` (one opaque, one transparent).
- ~5 state changes (blend, depth mask, render pass uniform).
Total: ~15-20 GL calls per frame for entity rendering, regardless of group count. N.4 baseline is "few hundred."
---
## 6. Translucent rendering detail
Per Decision 2: WB's two-pass alpha-test pattern.
**Group classification.** `ClassifyBatches` puts groups into one of two arrays:
- **Opaque indirect:** `TranslucencyKind.Opaque` and `TranslucencyKind.ClipMap`.
- **Transparent indirect:** `TranslucencyKind.AlphaBlend`, `Additive`, `InvAlpha` all merged. Per Decision 2, additive renders as alpha-blend; falsifiable at visual verification.
Opaque groups stay sorted front-to-back by `SortDistance` (preserved from N.4 — depth-test reject of overdrawn fragments is a meaningful win on dense scenes).
**Pass GL state:**
```csharp
// Opaque pass
_gl.Disable(EnableCap.Blend);
_gl.DepthMask(true);
_gl.Enable(EnableCap.CullFace); _gl.CullFace(TriangleFace.Back); _gl.FrontFace(FrontFaceDirection.Ccw);
_shader.SetInt("uRenderPass", 0);
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)0, drawcount: _opaqueDrawCount, stride: (uint)sizeof(DEIC));
// Transparent pass
_gl.Enable(EnableCap.Blend);
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
_gl.DepthMask(false);
_shader.SetInt("uRenderPass", 1);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)_transparentByteOffset, drawcount: _transparentDrawCount, stride: (uint)sizeof(DEIC));
// Cleanup
_gl.DepthMask(true); _gl.Disable(EnableCap.Blend); _gl.BindVertexArray(0);
```
**Visual verification gate (additive fallback plan).** During Week 2-3 visual verification, look at:
- Holtburg courtyard, dungeon entrance — confirm scenery + characters identical.
- Foundry interior — magic-themed content with potentially additive-flagged surfaces.
- Any glowing weapon decals, magical aura effects, or self-luminous textures observed.
If a visible regression appears (faded glow, missing additive bloom): amend spec to add a third indirect call within the transparent pass with `glBlendFunc(SrcAlpha, One)`. Group classification splits Additive into its own bucket. ~30-min change.
---
## 7. Error handling and fallback
### 7.1 GPU capability detection
WB's `OpenGLGraphicsDevice` already detects:
- `HasOpenGL43` (required for SSBOs, multi-draw indirect, `gl_BaseInstanceARB`).
- `HasBindless` (required for bindless texture handles).
`WbDrawDispatcher` is only constructed when `WbFoundationFlag.Enabled` is true, which gates on `_useModernRendering = HasOpenGL43 && HasBindless`. We inherit WB's gating.
**Additional check:** `GL_ARB_shader_draw_parameters` (for `gl_BaseInstanceARB`, `gl_DrawIDARB`). Standard on GL 4.6, available as extension on 4.3+. Add to N.5's capability check; if missing, `WbDrawDispatcher` constructor logs a one-time warning and the foundation flag flips off (falls back to `InstancedMeshRenderer`).
### 7.2 Shader compile failure
If `mesh_modern.vert/.frag` fails to compile (driver bug, GLSL version mismatch, extension issue): catch the compile exception in `WbDrawDispatcher` constructor, log the GLSL info log + GPU vendor/renderer string ONCE, flip `WbFoundationFlag.Enabled = false` for the session, fall back to `InstancedMeshRenderer`. Do not crash.
### 7.3 Non-resident handle (the bindless foot-gun)
Sampling a non-resident handle causes undefined behavior (driver-dependent: black texture, GPU fault, device-lost).
Mitigation in code: `TextureCache.MakeResidentHandle` is the only API that produces a handle, and it makes the handle resident in the same call. There is no API surface that produces a non-resident handle. Defense-in-depth: dispatcher asserts `BindlessTextureHandle != 0` before queuing a draw (zero handles get filtered out, same as zero `surfaceId` does today).
### 7.4 Indirect command corruption
`count`, `firstIndex`, `baseVertex` come from WB's `ObjectRenderBatch` (never user input; WB-internal correctness). `instanceCount` is `grp.Matrices.Count` (we control). `baseInstance` is `grp.FirstInstance` (we control, computed cumulatively). Bug-class is "WB-internal corruption + our cumulative-offset bug" — same surface area as N.4's `BaseInstance` already trusts. Add a debug-build assertion: cumulative `baseInstance` values must be strictly increasing.
### 7.5 Disposal order
`WbDrawDispatcher.Dispose` releases bindless handles before deleting underlying textures (driver UB otherwise). `TextureCache.Dispose` does this:
1. Iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`.
2. Call `_glExtensions.MakeAllNonResidentARB` if available (some drivers prefer batch).
3. Then `glDeleteTextures` proceeds as today.
Dispatcher's own buffer cleanup (`_instanceSsbo`, `_batchSsbo`, `_indirectBuffer`) via `glDeleteBuffers`.
### 7.6 Persistent first-failure diagnostic
If shader compile fails OR an extension check fails OR `glMultiDrawElementsIndirect` returns `GL_INVALID_OPERATION` on first frame: log ONCE with GPU vendor/renderer string + GLSL info log. Don't spam. User pastes the line into a bug report; we know exactly where to look.
---
## 8. Testing and acceptance
### 8.1 Unit / conformance tests
- **`TextureCacheBindlessTests`** — for each `Bindless`-suffixed `GetOrUpload*`: returns non-zero `ulong`, returns same handle for same key (cache hit), distinct keys yield distinct handles, returned handle is resident per GL state query.
- **`WbDrawDispatcherIndirectBuilderTests`** — pure CPU test: given a fixture of `(entity, mesh, batch)` tuples, verify the indirect buffer layout: `count` / `firstIndex` / `baseVertex` / `baseInstance` per group, opaque section sorted front-to-back, transparent section in classification order (no sort — back-to-front sort can be added in a follow-up if measured useful).
- **`WbDrawDispatcherTranslucencyTests`** — verify groups land in correct indirect buffer (opaque vs transparent) per `TranslucencyKind`. `Additive`/`InvAlpha` go to transparent. `ClipMap` goes to opaque. Empty groups skipped.
- **Existing N.4 tests stay green.** All 60 tests captured by `FullyQualifiedName~Wb|MatrixComposition` filter remain at 60/0.
### 8.2 Visual verification
Same gate as N.4 used. Live ACE + retail dat, in-world testing.
- **Holtburg courtyard** — characters + scenery + buildings render identically to N.4. No missing entities, no z-fighting, no exploded parts.
- **Foundry interior** — dense static-object scene, stress-tests indirect call count and translucency classification.
- **Indoor → outdoor cell transition** — confirms cell visibility filtering still works (we cull on CPU; dispatcher should never see invisible-cell entities).
- **Drudge / character close-up** — confirms Issue #47 close-detail mesh preservation.
- **Magic content (additive fallback check)** — Foundry runes, glowing weapons if observable, boss models with luminous decals. Trigger spec amendment if regression spotted.
User-confirms each. These are visual identity checks against the running N.4 behavior (use `git stash` of N.5 changes + relaunch as the comparison baseline).
### 8.3 Perf measurement (the win gate)
`[WB-DIAG]` augmented:
```
[WB-DIAG] entSeen=N entDrawn=M ... drawsIssued=K groups=G (existing)
[WB-DIAG] cpu_us=Xmedian/Y95p gpu_us=Zmedian/W95p (new)
```
Capture before/after numbers in fixed scenes/cameras:
| Scene | Camera position | Metric |
|---|---|---|
| Holtburg courtyard | 30m elevated, looking SW | `cpu`, `gpu`, `drawsIssued` |
| Foundry interior | character spawn, default heading | `cpu`, `gpu`, `drawsIssued` |
| Open landscape | terrain wander, no entities | `cpu`, `gpu`, `drawsIssued` (sanity) |
**Acceptance gates** (paste into SHIP commit message):
- Visual identity to N.4 — confirmed via §8.2.
- CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction).
- GPU rendering time within ±10% of N.4 (sanity: no regression).
- `drawsIssued ≤ 5 per pass` (down from "few hundred per pass").
- All tests green — 60+ Wb tests + new bindless/indirect tests.
- `ACDREAM_USE_WB_FOUNDATION=0` still works — `InstancedMeshRenderer` fallback runs and renders correctly.
### 8.4 Long-session sanity check
Hour-long session with `ACDREAM_WB_DIAG=1`. Watch resident-handle count grow. Expected: bounded plateau under 5K once content set is fully traversed. If unbounded growth, residency policy revisit required in N.6.
---
## 9. Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Driver bug in bindless residency | Low (mature in 2025+ drivers) | Crash / black textures | One-time logging on first failure; legacy fallback under flag-off |
| Driver bug in `glMultiDrawElementsIndirect` | Low | GL_INVALID_OPERATION | Capability check + first-failure logging + fallback |
| Resident handle count exceeds driver limit in long session | Low (acdream content is bounded) | Cumulative GPU memory pressure → eventual eviction surprises | `[WB-DIAG]` resident-count log; revisit eviction in N.6 if it grows unbounded |
| Shader compile fails on weird GPU | Medium-low | First-launch failure | Compile-error catch + fallback to `InstancedMeshRenderer` |
| Additive fidelity regression on rare GfxObj surfaces | Medium | Subtle visual difference | Visual verification at magic-themed content; spec amendment for additive sub-pass if found |
| `gl_BaseInstanceARB` fields not advancing per-instance attribs we still use | Low (we drop attribs entirely) | Wrong matrices | All instance data via SSBO; no vertex attrib at locations 3-6 to misalign |
| SSBO indexing GPU cost worse than uniform-array | Low (well-optimized in modern drivers) | Possible GPU time regression | GL timer queries detect; if observed, fall back to uniform array of bounded size |
| Persistent-mapped buffer foot-guns (chosen NOT to use in N.5) | n/a | n/a | Decision 7 defers to N.6 |
| Per-instance highlight (selection blink) feature creep | Low | Scope grows | Decision 8 defers; field reserved in design doc |
---
## 10. Out of scope (explicitly)
The following are NOT N.5 work. They become possible follow-ons.
- **WB's `TextureAtlasManager` adoption for atlas tier.** N.5 keeps acdream's `TextureCache` as the texture owner for everything. Atlas adoption is N.6+ if memory pressure shows up.
- **Persistent-mapped buffer ring with sync fences.** Decision 7. N.6 candidate if profiling shows residual `glBufferData` cost.
- **GPU-side culling (compute pre-pass).** Future phase.
- **Texture array repacking for multi-layer per-instance composites.** Future, if many palette-overrides actually share dimensions and could be packed.
- **Selection-blink highlight color.** Decision 8. Phase B.4 follow-up. Field reserved in `InstanceData` design (extend stride to 80 bytes when implementing).
- ~~**Deletion of legacy `InstancedMeshRenderer`.** N.6.~~ **Done in N.5 ship amendment**`InstancedMeshRenderer`, `StaticMeshRenderer`, and `WbFoundationFlag` were deleted in the retirement commit.
- **Terrain wiring through WB.** Future.
---
## 11. Open questions
None outstanding. All 8 brainstorm questions resolved + 1 clarification on highlight semantics. Ready for plan.
---
*End of design.*