Final cross-cutting review of N.5 found that Task 15's deletion of mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned — ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still works" claim was inaccurate. Resolution: formal retirement of the legacy renderer path within N.5 instead of deferring to N.6. Deleted: - src/AcDream.App/Rendering/InstancedMeshRenderer.cs - src/AcDream.App/Rendering/StaticMeshRenderer.cs - src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs GameWindow simplified — capability detection is unconditional, missing bindless throws NotSupportedException with a clear message at startup. WbDrawDispatcher + mesh_modern shader load are mandatory after init. No escape hatch. GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on AddLandblock/RemoveLandblock removed; adapter calls are unconditional when the adapter is non-null. PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable static ctor removed (flag is gone; adapter calls are unconditional). The ApplyLoadedTerrain physics-data loop was also simplified: the EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone; _pendingCellMeshes is now explicitly cleared to prevent unbounded accumulation (the worker thread still populates it, but WB handles EnvCell geometry through its own pipeline). Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment section added. Roadmap updated (N.5 ships with retirement; N.6 scope narrowed to perf-only). CLAUDE.md "WB integration cribs" updated. Perf baseline doc updated. WbDrawDispatcher class summary docstring corrected to describe the as-shipped SSBO + multi-draw-indirect path. ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7). Bindless support is now a hard requirement. Modern desktop GPUs universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters; if a user hits the NotSupportedException, that's a real bug report worth investigating, not a silent fallback. Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
29 KiB
Phase N.5 — Modern Rendering Path — Design Spec
Status: Draft (brainstormed 2026-05-08, not yet implemented).
Author: acdream lead engineer + Claude.
Builds on: Phase N.4 (WbDrawDispatcher, shipped 2026-05-08).
Predecessor docs:
docs/research/2026-05-08-phase-n5-handoff.md(cold-start briefing).docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md(N.4 plan; Adjustments 7-10 are required reading).docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md(N.4 spec).
1. Problem statement
N.4 collapsed entity rendering from O(entities × batches) per-draw GL calls to O(unique GfxObj × surface × translucency) grouped instanced draws. The remaining hot path still does, per group:
glActiveTexture(0)
glBindTexture(2D, texHandle)
glBindBuffer(EBO, batchIbo)
glDrawElementsInstancedBaseVertexBaseInstance(...)
Across a typical Holtburg-courtyard scene that's still ~100-300 GL calls per frame for entities. Modern GPUs and our drivers (GL 4.3 + bindless, gated by WB's _useModernRendering) support patterns that eliminate ALL of those per-group calls:
- Bindless textures (
GL_ARB_bindless_texture) — texture handles are 64-bit tokens that don't requireglBindTextureto use; the shader samples from a handle read out of buffer data. - Multi-draw indirect (
glMultiDrawElementsIndirect) — one GL call dispatches N draws from aDrawElementsIndirectCommandbuffer; the driver issues all of them with no CPU-side per-draw work.
N.5 lifts WbDrawDispatcher onto these primitives. Target: ≥30% reduction in CPU dispatcher time, draw call count down to ~5/frame, no visual regression vs N.4.
2. Decisions log
This section records the brainstorm outcomes that the rest of the doc relies on.
| # | Decision | Choice | Reason |
|---|---|---|---|
| 1 | Texture sampler model | sampler2DArray for ALL textures (1-layer wrapping for per-instance composites) |
Matches WB's modern shader exactly; future-proofs for atlas adoption in N.6+; avoids two shader files. ~50 lines of TextureCache change. |
| 2 | Translucent rendering | WB's two-pass alpha-test (opaque pass discards α<0.95, transparent pass discards α≥0.95) |
Single blend mode per pass enables one indirect call per pass. Loses native Additive blend on GfxObj surfaces; sky + particles have own renderers and aren't affected. Falsifiable at visual verification — if we see a regression, add an additive sub-pass (~30-min fix). |
| 3 | Per-instance + per-draw data delivery | All-SSBO: Instances[] at binding=0 (mat4 per instance), Batches[] at binding=1 (texture handle + layer + flags per group) |
Matches WB's modern shader. SSBOs avoid the 16-attrib stride limit, scale to large instance counts, give clean per-draw indexing via gl_DrawIDARB. |
| 4 | Bindless handle residency | Resident on upload, never release | acdream's content set is bounded (~1-5K unique textures per session). Handles persist for process lifetime; no eviction code in N.5. Diagnostic logging of handle count under ACDREAM_WB_DIAG=1 to spot growth. |
| 5 | Escape hatch | Modern path mandatory (N.5 ship amendment). WbFoundationFlag and ACDREAM_USE_WB_FOUNDATION env var have been deleted. Missing GL_ARB_bindless_texture or GL_ARB_shader_draw_parameters throws NotSupportedException at startup with a clear error message. No fallback. |
Escape hatch was never exercised after N.4 ship. Legacy InstancedMeshRenderer + StaticMeshRenderer deleted in the N.5 retirement commit. N.6 scope narrowed accordingly. |
| 6 | Perf measurement | CPU stopwatch + GL timer queries logged via [WB-DIAG] |
Captures both CPU dispatcher time and GPU rendering time. Acceptance gate compares before/after numbers in fixed Holtburg/Foundry scenes. |
| 7 | Persistent-mapped buffers | Defer to N.6 | Bindless+indirect win is 70-80% of achievable savings. Persistent-mapped + ring + sync is the last 5-10% with non-trivial sync-fence complexity; not worth the risk in N.5's 2-3 week budget. Add post-N.5 if profiling shows residual glBufferData cost. |
| 8 | Per-instance highlight (selection blink) | Defer to a Phase B.4 follow-up | Retail pulses click targets as visual confirmation; the right mechanism is per-instance highlight color (NOT WB's global uHighlightColor which would tint everything in our single-indirect-call design). Field is reserved in design (extend InstanceData to include vec4 highlightColor); N.5 ships without the field, future phase plumbs it without shader rewrite. |
3. Architecture overview
What changes
WbDrawDispatcher.Draw swaps its inner loop. Phases 1-3 (entity walk, group bucketing, matrix layout) stay intact. Phases 5-6 (per-group GL calls) are replaced by a single glMultiDrawElementsIndirect per pass, fed by SSBO-resident per-instance and per-draw data.
What's preserved from N.4
- Group bucketing pipeline (entity AABB cull, palette hash memo, group key dictionary).
AcSurfaceMetadataTablefor translucency classification.EntitySpawnAdapter/LandblockSpawnAdapter(mesh lifecycle bridge).WbMeshAdapter(the seam over WB'sObjectMeshManager).- Front-to-back sort of opaque groups (depth-test reject of overdrawn fragments).
- Per-entity 5m AABB frustum cull.
What's new
TextureCacheuploads as 1-layerTexture2DArrayinstead ofTexture2D. Generates 64-bit bindless handles at upload, makes them resident.- New shader pair
mesh_modern.vert/.fragmodeled on WB'sStaticObjectModernbut adapted (see §6). - Three new GPU buffers in the dispatcher:
_instanceSsbo—std430layout,mat4[], all visible matrices._batchSsbo—std430layout,BatchData[], one entry per group._indirectBuffer—DrawElementsIndirectCommand[], one per group.
- Two diagnostic measurements in
[WB-DIAG]: CPU stopwatch span aroundDraw(); GPUGL_TIME_ELAPSEDquery around the indirect dispatch.
What gets deleted
WbDrawDispatcher.DrawGroup(replaced by indirect).WbDrawDispatcher.EnsureInstanceAttribs(no more vertex attribs at locations 3-6).- Per-blend-mode
glBlendFuncswitch in the translucent loop. mesh_instanced.vert/.frag(replaced bymesh_modern.*).
What stays under the escape hatch
InstancedMeshRenderer is untouched. ACDREAM_USE_WB_FOUNDATION=0 still routes there. N.6 retires it.
4. Component changes
4.1 TextureCache
Texture upload path becomes Texture2DArray with depth=1:
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
{
uint tex = _gl.GenTexture();
_gl.BindTexture(TextureTarget.Texture2DArray, tex);
fixed (byte* p = decoded.Rgba8)
_gl.TexImage3D(
TextureTarget.Texture2DArray, 0, InternalFormat.Rgba8,
(uint)decoded.Width, (uint)decoded.Height, depth: 1,
border: 0, PixelFormat.Rgba, PixelType.UnsignedByte, p);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat);
_gl.BindTexture(TextureTarget.Texture2DArray, 0);
return tex;
}
Bindless handle generation, eager + resident-on-upload, parallel cache:
private readonly Dictionary<uint, ulong> _bindlessHandlesByGlName = new();
private ulong MakeResidentHandle(uint glTextureName)
{
if (_bindlessHandlesByGlName.TryGetValue(glTextureName, out var h))
return h;
h = _bindless.GetTextureHandleARB(glTextureName);
_bindless.MakeTextureHandleResidentARB(h);
_bindlessHandlesByGlName[glTextureName] = h;
return h;
}
Three new methods returning ulong bindless handles, paralleling the existing uint GL-name methods:
public ulong GetOrUploadBindless(uint surfaceId);
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId);
public ulong GetOrUploadWithPaletteOverrideBindless(uint surfaceId, uint? overrideOrigTextureId, PaletteOverride paletteOverride, ulong precomputedPaletteHash);
Each delegates to its existing uint sibling to populate the underlying GL texture, then calls MakeResidentHandle and returns the 64-bit handle.
The uint-returning methods stay (used by SkyRenderer, TerrainAtlas, anything outside the WB modern path).
Dispose releases bindless handles BEFORE deleting their textures: iterate _bindlessHandlesByGlName.Values, call glMakeTextureHandleNonResidentARB(handle), then glDeleteTextures proceeds as today.
4.2 WbDrawDispatcher
Three new GPU buffers (replacing _instanceVbo):
private uint _instanceSsbo; // binding=0, std430, mat4[]
private uint _batchSsbo; // binding=1, std430, BatchData[]
private uint _indirectBuffer; // GL_DRAW_INDIRECT_BUFFER, DEIC[]
InstanceGroup becomes:
private sealed class InstanceGroup
{
public uint Ibo;
public uint FirstIndex;
public int BaseVertex;
public int IndexCount;
public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4)
public uint TextureLayer; // always 0 in N.5 (per-instance composites are 1-layer arrays)
public TranslucencyKind Translucency;
public int FirstInstance;
public int InstanceCount;
public float SortDistance;
public readonly List<Matrix4x4> Matrices = new();
}
GroupKey adds the layer:
private readonly record struct GroupKey(
uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount,
ulong BindlessTextureHandle, uint TextureLayer, TranslucencyKind Translucency);
Per-frame draw flow:
- Walk entities → build
_groupsdict (unchanged from N.4). - Lay matrices contiguously, split opaque/transparent, sort opaque (unchanged).
- Build per-group BatchData and DEIC arrays. One
BatchDataper group(handle, layer, flags=0). One DEIC per group(count = IndexCount, instanceCount = InstanceCount, firstIndex = FirstIndex, baseVertex = BaseVertex, baseInstance = FirstInstance). Indirect commands are laid out contiguously: opaque section first (sorted front-to-back), transparent section second._opaqueDrawCountand_transparentDrawCounttrack section sizes;_transparentByteOffset = _opaqueDrawCount * sizeof(DEIC). - Three
glBufferDatauploads to_instanceSsbo,_batchSsbo,_indirectBuffer(single buffer, both sections). - Bind global VAO once (preserved from N.4 — modern rendering shares one VAO).
- Bind SSBOs once via
glBindBufferBase(SHADER_STORAGE_BUFFER, 0, _instanceSsbo)and... 1, _batchSsbo. - Opaque pass. Set
uRenderPass = 0.glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer).glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)0, drawcount=_opaqueDrawCount, stride=sizeof(DEIC)). - Transparent pass. Set
uRenderPass = 1.glEnable(BLEND)+glBlendFunc(SrcAlpha, OneMinusSrcAlpha)+glDepthMask(false).glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)_transparentByteOffset, drawcount=_transparentDrawCount, stride=sizeof(DEIC)). - Restore state.
glDepthMask(true)+glDisable(BLEND)+glBindVertexArray(0).
Diagnostic timing (under ACDREAM_WB_DIAG=1):
- CPU:
Stopwatchstarted at the top ofDraw(), stopped at the bottom. Median + 95th-percentile flushed in the 5-second[WB-DIAG]rollup. - GPU:
glGenQueriestwo query objects (one for opaque, one for transparent).glBeginQuery(TIME_ELAPSED) / glEndQueryaround eachglMultiDrawElementsIndirect. Result polled withGL_QUERY_RESULT_NO_WAITon the next frame's start; if not ready, drop the sample and try again.
4.3 New shader files
src/AcDream.App/Shaders/mesh_modern.vert:
#version 430 core
#extension GL_ARB_bindless_texture : require
#extension GL_ARB_shader_draw_parameters : require
layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;
struct InstanceData {
mat4 transform;
// Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight):
// vec4 highlightColor; // RGBA — when non-zero alpha, fragment shader mixes into output.
// Add field here, increase stride to 80 bytes, and read at fragment via flat varying.
};
struct BatchData {
uvec2 textureHandle; // bindless handle for sampler2DArray
uint textureLayer; // layer index (always 0 for per-instance composites)
uint flags; // reserved for future use
};
layout(std430, binding = 0) readonly buffer InstanceBuffer {
InstanceData Instances[];
};
layout(std430, binding = 1) readonly buffer BatchBuffer {
BatchData Batches[];
};
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
// matches existing acdream lighting UBO; do not change layout
};
uniform mat4 uViewProjection;
uniform int uRenderPass; // 0=opaque, 1=transparent (consumed in fragment shader)
out vec3 vNormal;
out vec2 vTexCoord;
out flat uvec2 vTextureHandle;
out flat uint vTextureLayer;
void main() {
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
mat4 model = Instances[instanceIndex].transform;
vec4 worldPos = model * vec4(aPosition, 1.0);
gl_Position = uViewProjection * worldPos;
vNormal = normalize(mat3(model) * aNormal);
vTexCoord = aTexCoord;
BatchData b = Batches[gl_DrawIDARB];
vTextureHandle = b.textureHandle;
vTextureLayer = b.textureLayer;
}
src/AcDream.App/Shaders/mesh_modern.frag:
#version 430 core
#extension GL_ARB_bindless_texture : require
in vec3 vNormal;
in vec2 vTexCoord;
in flat uvec2 vTextureHandle;
in flat uint vTextureLayer;
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
};
uniform int uRenderPass;
out vec4 FragColor;
void main() {
sampler2DArray tex = sampler2DArray(vTextureHandle);
vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));
if (uRenderPass == 0) {
// Opaque pass: discard soft pixels (alpha cutout), write to depth
if (color.a < 0.95) discard;
} else {
// Transparent pass: discard hard pixels (already drawn opaque), no depth write
if (color.a >= 0.95) discard;
if (color.a < 0.05) discard; // skip totally-empty fragments — perf for large transparent overdraw
}
// Diffuse lighting (preserved from acdream's existing lighting model)
vec3 N = normalize(vNormal);
vec3 L = normalize(uSunDir.xyz);
float diff = max(dot(N, L), 0.0);
vec3 lit = uAmbient.rgb + uSunColor.rgb * diff;
color.rgb *= clamp(lit, 0.0, 1.0);
FragColor = color;
}
Differences from WB's StaticObjectModern.*:
- Drops
uActiveCells[]cell-filtering (acdream culls cells on CPU). - Drops
uDrawIDOffset(acdream issues full passes, no pagination). - Drops
uHighlightColor(deferred to Phase B.4 follow-up; reserved as per-instancehighlightColorfield, not a global uniform). - Adapts the lighting model to acdream's existing UBO at binding=1 instead of WB's
SceneDataUBO. - Uses 1-layer
sampler2DArrayfor ALL textures (WB uses multi-layer atlases — same shader works for both shapes).
5. Per-frame data flow walk-through
A concrete trace. Visible work for frame N:
| Group | GfxObj | Surface | Translucency | Instances |
|---|---|---|---|---|
| 0 | oak tree | bark | Opaque | 12 |
| 1 | oak tree | leaves | AlphaBlend | 12 |
| 2 | drudge | skin (palette override) | Opaque | 1 |
| 3 | drudge | eyes | Opaque | 1 |
Instance SSBO (binding=0), 26 entries (each batch contributes its own copy of the entity matrix):
[0..11] = oak instance matrices (group 0 — bark)
[12..23] = oak instance matrices (group 1 — leaves)
[24] = drudge instance matrix (group 2 — skin)
[25] = drudge instance matrix (group 3 — eyes)
Batch SSBO (binding=1), 4 entries indexed by gl_DrawIDARB:
Batches[0] = (oak_bark_handle, layer=0, flags=0)
Batches[1] = (oak_leaves_handle, layer=0, flags=0)
Batches[2] = (drudge_skin_handle_with_palette, layer=0, flags=0)
Batches[3] = (drudge_eyes_handle, layer=0, flags=0)
Indirect buffer (single buffer, two sections):
_indirectBuffer[0..2] = opaque section (3 entries, sorted front-to-back)
[0] = (count=oakBarkIdx, instanceCount=12, firstIndex=oakBarkFI, baseVertex=oakBV, baseInstance=0)
[1] = (count=drudgeSkinIdx, instanceCount=1, firstIndex=drudgeSkinFI, baseVertex=drudgeBV, baseInstance=24)
[2] = (count=drudgeEyesIdx, instanceCount=1, firstIndex=drudgeEyesFI, baseVertex=drudgeBV, baseInstance=25)
_indirectBuffer[3] = transparent section (1 entry)
[3] = (count=oakLeavesIdx, instanceCount=12, firstIndex=oakLeavesFI, baseVertex=oakBV, baseInstance=12)
_opaqueDrawCount = 3; _transparentDrawCount = 1; _transparentByteOffset = 3 * sizeof(DEIC) = 60.
Shader access pattern (per vertex):
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; // unique per (group, instance) pair
mat4 model = Instances[instanceIndex].transform;
BatchData b = Batches[gl_DrawIDARB]; // shared across all verts in this draw
sampler2DArray tex = sampler2DArray(b.textureHandle);
vec4 color = texture(tex, vec3(aTexCoord, float(b.textureLayer)));
Per-frame CPU GL calls (entity rendering, total):
- 3×
glBufferData(instance SSBO, batch SSBO, indirect buffer). - 1×
glBindVertexArray(globalVAO). - 2×
glBindBufferBase(SSBOs at bindings 0 + 1). - 1×
glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer). - 2×
glMultiDrawElementsIndirect(one opaque, one transparent). - ~5 state changes (blend, depth mask, render pass uniform).
Total: ~15-20 GL calls per frame for entity rendering, regardless of group count. N.4 baseline is "few hundred."
6. Translucent rendering detail
Per Decision 2: WB's two-pass alpha-test pattern.
Group classification. ClassifyBatches puts groups into one of two arrays:
- Opaque indirect:
TranslucencyKind.OpaqueandTranslucencyKind.ClipMap. - Transparent indirect:
TranslucencyKind.AlphaBlend,Additive,InvAlphaall merged. Per Decision 2, additive renders as alpha-blend; falsifiable at visual verification.
Opaque groups stay sorted front-to-back by SortDistance (preserved from N.4 — depth-test reject of overdrawn fragments is a meaningful win on dense scenes).
Pass GL state:
// Opaque pass
_gl.Disable(EnableCap.Blend);
_gl.DepthMask(true);
_gl.Enable(EnableCap.CullFace); _gl.CullFace(TriangleFace.Back); _gl.FrontFace(FrontFaceDirection.Ccw);
_shader.SetInt("uRenderPass", 0);
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)0, drawcount: _opaqueDrawCount, stride: (uint)sizeof(DEIC));
// Transparent pass
_gl.Enable(EnableCap.Blend);
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
_gl.DepthMask(false);
_shader.SetInt("uRenderPass", 1);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)_transparentByteOffset, drawcount: _transparentDrawCount, stride: (uint)sizeof(DEIC));
// Cleanup
_gl.DepthMask(true); _gl.Disable(EnableCap.Blend); _gl.BindVertexArray(0);
Visual verification gate (additive fallback plan). During Week 2-3 visual verification, look at:
- Holtburg courtyard, dungeon entrance — confirm scenery + characters identical.
- Foundry interior — magic-themed content with potentially additive-flagged surfaces.
- Any glowing weapon decals, magical aura effects, or self-luminous textures observed.
If a visible regression appears (faded glow, missing additive bloom): amend spec to add a third indirect call within the transparent pass with glBlendFunc(SrcAlpha, One). Group classification splits Additive into its own bucket. ~30-min change.
7. Error handling and fallback
7.1 GPU capability detection
WB's OpenGLGraphicsDevice already detects:
HasOpenGL43(required for SSBOs, multi-draw indirect,gl_BaseInstanceARB).HasBindless(required for bindless texture handles).
WbDrawDispatcher is only constructed when WbFoundationFlag.Enabled is true, which gates on _useModernRendering = HasOpenGL43 && HasBindless. We inherit WB's gating.
Additional check: GL_ARB_shader_draw_parameters (for gl_BaseInstanceARB, gl_DrawIDARB). Standard on GL 4.6, available as extension on 4.3+. Add to N.5's capability check; if missing, WbDrawDispatcher constructor logs a one-time warning and the foundation flag flips off (falls back to InstancedMeshRenderer).
7.2 Shader compile failure
If mesh_modern.vert/.frag fails to compile (driver bug, GLSL version mismatch, extension issue): catch the compile exception in WbDrawDispatcher constructor, log the GLSL info log + GPU vendor/renderer string ONCE, flip WbFoundationFlag.Enabled = false for the session, fall back to InstancedMeshRenderer. Do not crash.
7.3 Non-resident handle (the bindless foot-gun)
Sampling a non-resident handle causes undefined behavior (driver-dependent: black texture, GPU fault, device-lost).
Mitigation in code: TextureCache.MakeResidentHandle is the only API that produces a handle, and it makes the handle resident in the same call. There is no API surface that produces a non-resident handle. Defense-in-depth: dispatcher asserts BindlessTextureHandle != 0 before queuing a draw (zero handles get filtered out, same as zero surfaceId does today).
7.4 Indirect command corruption
count, firstIndex, baseVertex come from WB's ObjectRenderBatch (never user input; WB-internal correctness). instanceCount is grp.Matrices.Count (we control). baseInstance is grp.FirstInstance (we control, computed cumulatively). Bug-class is "WB-internal corruption + our cumulative-offset bug" — same surface area as N.4's BaseInstance already trusts. Add a debug-build assertion: cumulative baseInstance values must be strictly increasing.
7.5 Disposal order
WbDrawDispatcher.Dispose releases bindless handles before deleting underlying textures (driver UB otherwise). TextureCache.Dispose does this:
- Iterate
_bindlessHandlesByGlName.Values, callglMakeTextureHandleNonResidentARB(handle). - Call
_glExtensions.MakeAllNonResidentARBif available (some drivers prefer batch). - Then
glDeleteTexturesproceeds as today.
Dispatcher's own buffer cleanup (_instanceSsbo, _batchSsbo, _indirectBuffer) via glDeleteBuffers.
7.6 Persistent first-failure diagnostic
If shader compile fails OR an extension check fails OR glMultiDrawElementsIndirect returns GL_INVALID_OPERATION on first frame: log ONCE with GPU vendor/renderer string + GLSL info log. Don't spam. User pastes the line into a bug report; we know exactly where to look.
8. Testing and acceptance
8.1 Unit / conformance tests
TextureCacheBindlessTests— for eachBindless-suffixedGetOrUpload*: returns non-zeroulong, returns same handle for same key (cache hit), distinct keys yield distinct handles, returned handle is resident per GL state query.WbDrawDispatcherIndirectBuilderTests— pure CPU test: given a fixture of(entity, mesh, batch)tuples, verify the indirect buffer layout:count/firstIndex/baseVertex/baseInstanceper group, opaque section sorted front-to-back, transparent section in classification order (no sort — back-to-front sort can be added in a follow-up if measured useful).WbDrawDispatcherTranslucencyTests— verify groups land in correct indirect buffer (opaque vs transparent) perTranslucencyKind.Additive/InvAlphago to transparent.ClipMapgoes to opaque. Empty groups skipped.- Existing N.4 tests stay green. All 60 tests captured by
FullyQualifiedName~Wb|MatrixCompositionfilter remain at 60/0.
8.2 Visual verification
Same gate as N.4 used. Live ACE + retail dat, in-world testing.
- Holtburg courtyard — characters + scenery + buildings render identically to N.4. No missing entities, no z-fighting, no exploded parts.
- Foundry interior — dense static-object scene, stress-tests indirect call count and translucency classification.
- Indoor → outdoor cell transition — confirms cell visibility filtering still works (we cull on CPU; dispatcher should never see invisible-cell entities).
- Drudge / character close-up — confirms Issue #47 close-detail mesh preservation.
- Magic content (additive fallback check) — Foundry runes, glowing weapons if observable, boss models with luminous decals. Trigger spec amendment if regression spotted.
User-confirms each. These are visual identity checks against the running N.4 behavior (use git stash of N.5 changes + relaunch as the comparison baseline).
8.3 Perf measurement (the win gate)
[WB-DIAG] augmented:
[WB-DIAG] entSeen=N entDrawn=M ... drawsIssued=K groups=G (existing)
[WB-DIAG] cpu_us=Xmedian/Y95p gpu_us=Zmedian/W95p (new)
Capture before/after numbers in fixed scenes/cameras:
| Scene | Camera position | Metric |
|---|---|---|
| Holtburg courtyard | 30m elevated, looking SW | cpu, gpu, drawsIssued |
| Foundry interior | character spawn, default heading | cpu, gpu, drawsIssued |
| Open landscape | terrain wander, no entities | cpu, gpu, drawsIssued (sanity) |
Acceptance gates (paste into SHIP commit message):
- Visual identity to N.4 — confirmed via §8.2.
- CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction).
- GPU rendering time within ±10% of N.4 (sanity: no regression).
drawsIssued ≤ 5 per pass(down from "few hundred per pass").- All tests green — 60+ Wb tests + new bindless/indirect tests.
ACDREAM_USE_WB_FOUNDATION=0still works —InstancedMeshRendererfallback runs and renders correctly.
8.4 Long-session sanity check
Hour-long session with ACDREAM_WB_DIAG=1. Watch resident-handle count grow. Expected: bounded plateau under 5K once content set is fully traversed. If unbounded growth, residency policy revisit required in N.6.
9. Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Driver bug in bindless residency | Low (mature in 2025+ drivers) | Crash / black textures | One-time logging on first failure; legacy fallback under flag-off |
Driver bug in glMultiDrawElementsIndirect |
Low | GL_INVALID_OPERATION | Capability check + first-failure logging + fallback |
| Resident handle count exceeds driver limit in long session | Low (acdream content is bounded) | Cumulative GPU memory pressure → eventual eviction surprises | [WB-DIAG] resident-count log; revisit eviction in N.6 if it grows unbounded |
| Shader compile fails on weird GPU | Medium-low | First-launch failure | Compile-error catch + fallback to InstancedMeshRenderer |
| Additive fidelity regression on rare GfxObj surfaces | Medium | Subtle visual difference | Visual verification at magic-themed content; spec amendment for additive sub-pass if found |
gl_BaseInstanceARB fields not advancing per-instance attribs we still use |
Low (we drop attribs entirely) | Wrong matrices | All instance data via SSBO; no vertex attrib at locations 3-6 to misalign |
| SSBO indexing GPU cost worse than uniform-array | Low (well-optimized in modern drivers) | Possible GPU time regression | GL timer queries detect; if observed, fall back to uniform array of bounded size |
| Persistent-mapped buffer foot-guns (chosen NOT to use in N.5) | n/a | n/a | Decision 7 defers to N.6 |
| Per-instance highlight (selection blink) feature creep | Low | Scope grows | Decision 8 defers; field reserved in design doc |
10. Out of scope (explicitly)
The following are NOT N.5 work. They become possible follow-ons.
- WB's
TextureAtlasManageradoption for atlas tier. N.5 keeps acdream'sTextureCacheas the texture owner for everything. Atlas adoption is N.6+ if memory pressure shows up. - Persistent-mapped buffer ring with sync fences. Decision 7. N.6 candidate if profiling shows residual
glBufferDatacost. - GPU-side culling (compute pre-pass). Future phase.
- Texture array repacking for multi-layer per-instance composites. Future, if many palette-overrides actually share dimensions and could be packed.
- Selection-blink highlight color. Decision 8. Phase B.4 follow-up. Field reserved in
InstanceDatadesign (extend stride to 80 bytes when implementing). Deletion of legacyDone in N.5 ship amendment —InstancedMeshRenderer. N.6.InstancedMeshRenderer,StaticMeshRenderer, andWbFoundationFlagwere deleted in the retirement commit.- Terrain wiring through WB. Future.
11. Open questions
None outstanding. All 8 brainstorm questions resolved + 1 clarification on highlight semantics. Ready for plan.
End of design.