docs(N.5): design spec — bindless + multi-draw indirect on N.4 dispatcher
Brainstormed 2026-05-08 over 8 design questions. Captures: - Texture model: sampler2DArray for ALL textures (1-layer wrap for per-instance composites). Matches WB's modern shader, future-proofs for atlas adoption in N.6+. - Translucency: WB's two-pass alpha-test (no native Additive on GfxObj surfaces; falsifiable at visual verification). - Data delivery: all-SSBO. Instances[] at binding=0, Batches[] at binding=1. Indexed by gl_BaseInstanceARB+gl_InstanceID and gl_DrawIDARB respectively. - Bindless residency: resident on upload, never release. Bounded content; instrument under ACDREAM_WB_DIAG=1. - Escape hatch: two-way flag preserved. N.5 replaces N.4's draw method in place; legacy InstancedMeshRenderer remains the safety net. - Perf measurement: CPU stopwatch + GL_TIME_ELAPSED queries, logged via [WB-DIAG]. Acceptance gates pasted into SHIP commit. - Persistent-mapped buffers: deferred to N.6. - Per-instance highlight (selection blink): deferred; field reserved in InstanceData for Phase B.4 follow-up. Spec at docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md covers architecture, components, per-frame data flow walk-through, translucent rendering, error handling + fallback, testing + acceptance, risks, and explicit out-of-scope list. Plan + task breakdown comes next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c1e31148bb
commit
1834b16cd1
1 changed files with 554 additions and 0 deletions
|
|
@ -0,0 +1,554 @@
|
|||
# Phase N.5 — Modern Rendering Path — Design Spec
|
||||
|
||||
**Status:** Draft (brainstormed 2026-05-08, not yet implemented).
|
||||
**Author:** acdream lead engineer + Claude.
|
||||
**Builds on:** Phase N.4 (`WbDrawDispatcher`, shipped 2026-05-08).
|
||||
**Predecessor docs:**
|
||||
- `docs/research/2026-05-08-phase-n5-handoff.md` (cold-start briefing).
|
||||
- `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md` (N.4 plan; Adjustments 7-10 are required reading).
|
||||
- `docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md` (N.4 spec).
|
||||
|
||||
---
|
||||
|
||||
## 1. Problem statement
|
||||
|
||||
N.4 collapsed entity rendering from O(entities × batches) per-draw GL calls to O(unique GfxObj × surface × translucency) grouped instanced draws. The remaining hot path still does, per group:
|
||||
|
||||
```
|
||||
glActiveTexture(0)
|
||||
glBindTexture(2D, texHandle)
|
||||
glBindBuffer(EBO, batchIbo)
|
||||
glDrawElementsInstancedBaseVertexBaseInstance(...)
|
||||
```
|
||||
|
||||
Across a typical Holtburg-courtyard scene that's still ~100-300 GL calls per frame for entities. Modern GPUs and our drivers (GL 4.3 + bindless, gated by WB's `_useModernRendering`) support patterns that eliminate ALL of those per-group calls:
|
||||
|
||||
- **Bindless textures** (`GL_ARB_bindless_texture`) — texture handles are 64-bit tokens that don't require `glBindTexture` to use; the shader samples from a handle read out of buffer data.
|
||||
- **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — one GL call dispatches N draws from a `DrawElementsIndirectCommand` buffer; the driver issues all of them with no CPU-side per-draw work.
|
||||
|
||||
N.5 lifts `WbDrawDispatcher` onto these primitives. Target: ≥30% reduction in CPU dispatcher time, draw call count down to ~5/frame, no visual regression vs N.4.
|
||||
|
||||
---
|
||||
|
||||
## 2. Decisions log
|
||||
|
||||
This section records the brainstorm outcomes that the rest of the doc relies on.
|
||||
|
||||
| # | Decision | Choice | Reason |
|
||||
|---|---|---|---|
|
||||
| 1 | Texture sampler model | **`sampler2DArray`** for ALL textures (1-layer wrapping for per-instance composites) | Matches WB's modern shader exactly; future-proofs for atlas adoption in N.6+; avoids two shader files. ~50 lines of TextureCache change. |
|
||||
| 2 | Translucent rendering | **WB's two-pass alpha-test** (opaque pass discards `α<0.95`, transparent pass discards `α≥0.95`) | Single blend mode per pass enables one indirect call per pass. Loses native `Additive` blend on GfxObj surfaces; sky + particles have own renderers and aren't affected. Falsifiable at visual verification — if we see a regression, add an additive sub-pass (~30-min fix). |
|
||||
| 3 | Per-instance + per-draw data delivery | **All-SSBO**: `Instances[]` at binding=0 (mat4 per instance), `Batches[]` at binding=1 (texture handle + layer + flags per group) | Matches WB's modern shader. SSBOs avoid the 16-attrib stride limit, scale to large instance counts, give clean per-draw indexing via `gl_DrawIDARB`. |
|
||||
| 4 | Bindless handle residency | **Resident on upload, never release** | acdream's content set is bounded (~1-5K unique textures per session). Handles persist for process lifetime; no eviction code in N.5. Diagnostic logging of handle count under `ACDREAM_WB_DIAG=1` to spot growth. |
|
||||
| 5 | Escape hatch | **Two-way flag (no change)**. `ACDREAM_USE_WB_FOUNDATION=0/1` controls `WbFoundationFlag`; flag-on is the N.5 modern path; flag-off falls back to legacy `InstancedMeshRenderer`. N.4's draw method is replaced in place. | N.4's grouped-instanced draw is not preserved as an A/B fallback; legacy `InstancedMeshRenderer` is the existing safety net for "modern rendering broken on this GPU." |
|
||||
| 6 | Perf measurement | **CPU stopwatch + GL timer queries** logged via `[WB-DIAG]` | Captures both CPU dispatcher time and GPU rendering time. Acceptance gate compares before/after numbers in fixed Holtburg/Foundry scenes. |
|
||||
| 7 | Persistent-mapped buffers | **Defer to N.6** | Bindless+indirect win is 70-80% of achievable savings. Persistent-mapped + ring + sync is the last 5-10% with non-trivial sync-fence complexity; not worth the risk in N.5's 2-3 week budget. Add post-N.5 if profiling shows residual `glBufferData` cost. |
|
||||
| 8 | Per-instance highlight (selection blink) | **Defer to a Phase B.4 follow-up** | Retail pulses click targets as visual confirmation; the right mechanism is per-instance highlight color (NOT WB's global `uHighlightColor` which would tint everything in our single-indirect-call design). Field is reserved in design (extend `InstanceData` to include `vec4 highlightColor`); N.5 ships without the field, future phase plumbs it without shader rewrite. |
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture overview
|
||||
|
||||
### What changes
|
||||
|
||||
`WbDrawDispatcher.Draw` swaps its inner loop. Phases 1-3 (entity walk, group bucketing, matrix layout) stay intact. Phases 5-6 (per-group GL calls) are replaced by a single `glMultiDrawElementsIndirect` per pass, fed by SSBO-resident per-instance and per-draw data.
|
||||
|
||||
### What's preserved from N.4
|
||||
|
||||
- Group bucketing pipeline (entity AABB cull, palette hash memo, group key dictionary).
|
||||
- `AcSurfaceMetadataTable` for translucency classification.
|
||||
- `EntitySpawnAdapter` / `LandblockSpawnAdapter` (mesh lifecycle bridge).
|
||||
- `WbMeshAdapter` (the seam over WB's `ObjectMeshManager`).
|
||||
- Front-to-back sort of opaque groups (depth-test reject of overdrawn fragments).
|
||||
- Per-entity 5m AABB frustum cull.
|
||||
|
||||
### What's new
|
||||
|
||||
- `TextureCache` uploads as 1-layer `Texture2DArray` instead of `Texture2D`. Generates 64-bit bindless handles at upload, makes them resident.
|
||||
- New shader pair `mesh_modern.vert/.frag` modeled on WB's `StaticObjectModern` but adapted (see §6).
|
||||
- Three new GPU buffers in the dispatcher:
|
||||
- `_instanceSsbo` — `std430` layout, `mat4[]`, all visible matrices.
|
||||
- `_batchSsbo` — `std430` layout, `BatchData[]`, one entry per group.
|
||||
- `_indirectBuffer` — `DrawElementsIndirectCommand[]`, one per group.
|
||||
- Two diagnostic measurements in `[WB-DIAG]`: CPU stopwatch span around `Draw()`; GPU `GL_TIME_ELAPSED` query around the indirect dispatch.
|
||||
|
||||
### What gets deleted
|
||||
|
||||
- `WbDrawDispatcher.DrawGroup` (replaced by indirect).
|
||||
- `WbDrawDispatcher.EnsureInstanceAttribs` (no more vertex attribs at locations 3-6).
|
||||
- Per-blend-mode `glBlendFunc` switch in the translucent loop.
|
||||
- `mesh_instanced.vert/.frag` (replaced by `mesh_modern.*`).
|
||||
|
||||
### What stays under the escape hatch
|
||||
|
||||
`InstancedMeshRenderer` is untouched. `ACDREAM_USE_WB_FOUNDATION=0` still routes there. N.6 retires it.
|
||||
|
||||
---
|
||||
|
||||
## 4. Component changes
|
||||
|
||||
### 4.1 `TextureCache`
|
||||
|
||||
Texture upload path becomes Texture2DArray with depth=1:
|
||||
|
||||
```csharp
|
||||
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
|
||||
{
|
||||
uint tex = _gl.GenTexture();
|
||||
_gl.BindTexture(TextureTarget.Texture2DArray, tex);
|
||||
|
||||
fixed (byte* p = decoded.Rgba8)
|
||||
_gl.TexImage3D(
|
||||
TextureTarget.Texture2DArray, 0, InternalFormat.Rgba8,
|
||||
(uint)decoded.Width, (uint)decoded.Height, depth: 1,
|
||||
border: 0, PixelFormat.Rgba, PixelType.UnsignedByte, p);
|
||||
|
||||
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
|
||||
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
|
||||
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat);
|
||||
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat);
|
||||
_gl.BindTexture(TextureTarget.Texture2DArray, 0);
|
||||
return tex;
|
||||
}
|
||||
```
|
||||
|
||||
Bindless handle generation, eager + resident-on-upload, parallel cache:
|
||||
|
||||
```csharp
|
||||
private readonly Dictionary<uint, ulong> _bindlessHandlesByGlName = new();
|
||||
|
||||
private ulong MakeResidentHandle(uint glTextureName)
|
||||
{
|
||||
if (_bindlessHandlesByGlName.TryGetValue(glTextureName, out var h))
|
||||
return h;
|
||||
h = _bindless.GetTextureHandleARB(glTextureName);
|
||||
_bindless.MakeTextureHandleResidentARB(h);
|
||||
_bindlessHandlesByGlName[glTextureName] = h;
|
||||
return h;
|
||||
}
|
||||
```
|
||||
|
||||
Three new methods returning `ulong` bindless handles, paralleling the existing `uint` GL-name methods:
|
||||
|
||||
```csharp
|
||||
public ulong GetOrUploadBindless(uint surfaceId);
|
||||
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId);
|
||||
public ulong GetOrUploadWithPaletteOverrideBindless(uint surfaceId, uint? overrideOrigTextureId, PaletteOverride paletteOverride, ulong precomputedPaletteHash);
|
||||
```
|
||||
|
||||
Each delegates to its existing `uint` sibling to populate the underlying GL texture, then calls `MakeResidentHandle` and returns the 64-bit handle.
|
||||
|
||||
The `uint`-returning methods stay (used by `SkyRenderer`, `TerrainAtlas`, anything outside the WB modern path).
|
||||
|
||||
`Dispose` releases bindless handles BEFORE deleting their textures: iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`, then `glDeleteTextures` proceeds as today.
|
||||
|
||||
### 4.2 `WbDrawDispatcher`
|
||||
|
||||
Three new GPU buffers (replacing `_instanceVbo`):
|
||||
|
||||
```csharp
|
||||
private uint _instanceSsbo; // binding=0, std430, mat4[]
|
||||
private uint _batchSsbo; // binding=1, std430, BatchData[]
|
||||
private uint _indirectBuffer; // GL_DRAW_INDIRECT_BUFFER, DEIC[]
|
||||
```
|
||||
|
||||
`InstanceGroup` becomes:
|
||||
|
||||
```csharp
|
||||
private sealed class InstanceGroup
|
||||
{
|
||||
public uint Ibo;
|
||||
public uint FirstIndex;
|
||||
public int BaseVertex;
|
||||
public int IndexCount;
|
||||
public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4)
|
||||
public uint TextureLayer; // always 0 in N.5 (per-instance composites are 1-layer arrays)
|
||||
public TranslucencyKind Translucency;
|
||||
public int FirstInstance;
|
||||
public int InstanceCount;
|
||||
public float SortDistance;
|
||||
public readonly List<Matrix4x4> Matrices = new();
|
||||
}
|
||||
```
|
||||
|
||||
`GroupKey` adds the layer:
|
||||
|
||||
```csharp
|
||||
private readonly record struct GroupKey(
|
||||
uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount,
|
||||
ulong BindlessTextureHandle, uint TextureLayer, TranslucencyKind Translucency);
|
||||
```
|
||||
|
||||
Per-frame draw flow:
|
||||
|
||||
1. **Walk entities → build `_groups` dict** (unchanged from N.4).
|
||||
2. **Lay matrices contiguously, split opaque/transparent, sort opaque** (unchanged).
|
||||
3. **Build per-group BatchData and DEIC arrays.** One `BatchData` per group `(handle, layer, flags=0)`. One DEIC per group `(count = IndexCount, instanceCount = InstanceCount, firstIndex = FirstIndex, baseVertex = BaseVertex, baseInstance = FirstInstance)`. Indirect commands are laid out contiguously: opaque section first (sorted front-to-back), transparent section second. `_opaqueDrawCount` and `_transparentDrawCount` track section sizes; `_transparentByteOffset = _opaqueDrawCount * sizeof(DEIC)`.
|
||||
4. **Three `glBufferData` uploads** to `_instanceSsbo`, `_batchSsbo`, `_indirectBuffer` (single buffer, both sections).
|
||||
5. **Bind global VAO once** (preserved from N.4 — modern rendering shares one VAO).
|
||||
6. **Bind SSBOs once** via `glBindBufferBase(SHADER_STORAGE_BUFFER, 0, _instanceSsbo)` and `... 1, _batchSsbo`.
|
||||
7. **Opaque pass.** Set `uRenderPass = 0`. `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)0, drawcount=_opaqueDrawCount, stride=sizeof(DEIC))`.
|
||||
8. **Transparent pass.** Set `uRenderPass = 1`. `glEnable(BLEND)` + `glBlendFunc(SrcAlpha, OneMinusSrcAlpha)` + `glDepthMask(false)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)_transparentByteOffset, drawcount=_transparentDrawCount, stride=sizeof(DEIC))`.
|
||||
9. **Restore state.** `glDepthMask(true)` + `glDisable(BLEND)` + `glBindVertexArray(0)`.
|
||||
|
||||
Diagnostic timing (under `ACDREAM_WB_DIAG=1`):
|
||||
|
||||
- CPU: `Stopwatch` started at the top of `Draw()`, stopped at the bottom. Median + 95th-percentile flushed in the 5-second `[WB-DIAG]` rollup.
|
||||
- GPU: `glGenQueries` two query objects (one for opaque, one for transparent). `glBeginQuery(TIME_ELAPSED) / glEndQuery` around each `glMultiDrawElementsIndirect`. Result polled with `GL_QUERY_RESULT_NO_WAIT` on the next frame's start; if not ready, drop the sample and try again.
|
||||
|
||||
### 4.3 New shader files
|
||||
|
||||
`src/AcDream.App/Shaders/mesh_modern.vert`:
|
||||
|
||||
```glsl
|
||||
#version 430 core
|
||||
#extension GL_ARB_bindless_texture : require
|
||||
#extension GL_ARB_shader_draw_parameters : require
|
||||
|
||||
layout(location = 0) in vec3 aPosition;
|
||||
layout(location = 1) in vec3 aNormal;
|
||||
layout(location = 2) in vec2 aTexCoord;
|
||||
|
||||
struct InstanceData {
|
||||
mat4 transform;
|
||||
// Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight):
|
||||
// vec4 highlightColor; // RGBA — when non-zero alpha, fragment shader mixes into output.
|
||||
// Add field here, increase stride to 80 bytes, and read at fragment via flat varying.
|
||||
};
|
||||
|
||||
struct BatchData {
|
||||
uvec2 textureHandle; // bindless handle for sampler2DArray
|
||||
uint textureLayer; // layer index (always 0 for per-instance composites)
|
||||
uint flags; // reserved for future use
|
||||
};
|
||||
|
||||
layout(std430, binding = 0) readonly buffer InstanceBuffer {
|
||||
InstanceData Instances[];
|
||||
};
|
||||
|
||||
layout(std430, binding = 1) readonly buffer BatchBuffer {
|
||||
BatchData Batches[];
|
||||
};
|
||||
|
||||
layout(std140, binding = 1) uniform LightingUbo {
|
||||
vec4 uAmbient;
|
||||
vec4 uSunDir;
|
||||
vec4 uSunColor;
|
||||
// matches existing acdream lighting UBO; do not change layout
|
||||
};
|
||||
|
||||
uniform mat4 uViewProjection;
|
||||
uniform int uRenderPass; // 0=opaque, 1=transparent (consumed in fragment shader)
|
||||
|
||||
out vec3 vNormal;
|
||||
out vec2 vTexCoord;
|
||||
out flat uvec2 vTextureHandle;
|
||||
out flat uint vTextureLayer;
|
||||
|
||||
void main() {
|
||||
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
|
||||
mat4 model = Instances[instanceIndex].transform;
|
||||
|
||||
vec4 worldPos = model * vec4(aPosition, 1.0);
|
||||
gl_Position = uViewProjection * worldPos;
|
||||
|
||||
vNormal = normalize(mat3(model) * aNormal);
|
||||
vTexCoord = aTexCoord;
|
||||
|
||||
BatchData b = Batches[gl_DrawIDARB];
|
||||
vTextureHandle = b.textureHandle;
|
||||
vTextureLayer = b.textureLayer;
|
||||
}
|
||||
```
|
||||
|
||||
`src/AcDream.App/Shaders/mesh_modern.frag`:
|
||||
|
||||
```glsl
|
||||
#version 430 core
|
||||
#extension GL_ARB_bindless_texture : require
|
||||
|
||||
in vec3 vNormal;
|
||||
in vec2 vTexCoord;
|
||||
in flat uvec2 vTextureHandle;
|
||||
in flat uint vTextureLayer;
|
||||
|
||||
layout(std140, binding = 1) uniform LightingUbo {
|
||||
vec4 uAmbient;
|
||||
vec4 uSunDir;
|
||||
vec4 uSunColor;
|
||||
};
|
||||
|
||||
uniform int uRenderPass;
|
||||
|
||||
out vec4 FragColor;
|
||||
|
||||
void main() {
|
||||
sampler2DArray tex = sampler2DArray(vTextureHandle);
|
||||
vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));
|
||||
|
||||
if (uRenderPass == 0) {
|
||||
// Opaque pass: discard soft pixels (alpha cutout), write to depth
|
||||
if (color.a < 0.95) discard;
|
||||
} else {
|
||||
// Transparent pass: discard hard pixels (already drawn opaque), no depth write
|
||||
if (color.a >= 0.95) discard;
|
||||
if (color.a < 0.05) discard; // skip totally-empty fragments — perf for large transparent overdraw
|
||||
}
|
||||
|
||||
// Diffuse lighting (preserved from acdream's existing lighting model)
|
||||
vec3 N = normalize(vNormal);
|
||||
vec3 L = normalize(uSunDir.xyz);
|
||||
float diff = max(dot(N, L), 0.0);
|
||||
vec3 lit = uAmbient.rgb + uSunColor.rgb * diff;
|
||||
color.rgb *= clamp(lit, 0.0, 1.0);
|
||||
|
||||
FragColor = color;
|
||||
}
|
||||
```
|
||||
|
||||
Differences from WB's `StaticObjectModern.*`:
|
||||
|
||||
- Drops `uActiveCells[]` cell-filtering (acdream culls cells on CPU).
|
||||
- Drops `uDrawIDOffset` (acdream issues full passes, no pagination).
|
||||
- Drops `uHighlightColor` (deferred to Phase B.4 follow-up; reserved as per-instance `highlightColor` field, not a global uniform).
|
||||
- Adapts the lighting model to acdream's existing UBO at binding=1 instead of WB's `SceneData` UBO.
|
||||
- Uses 1-layer `sampler2DArray` for ALL textures (WB uses multi-layer atlases — same shader works for both shapes).
|
||||
|
||||
---
|
||||
|
||||
## 5. Per-frame data flow walk-through
|
||||
|
||||
A concrete trace. Visible work for frame N:
|
||||
|
||||
| Group | GfxObj | Surface | Translucency | Instances |
|
||||
|---|---|---|---|---|
|
||||
| 0 | oak tree | bark | Opaque | 12 |
|
||||
| 1 | oak tree | leaves | AlphaBlend | 12 |
|
||||
| 2 | drudge | skin (palette override) | Opaque | 1 |
|
||||
| 3 | drudge | eyes | Opaque | 1 |
|
||||
|
||||
**Instance SSBO** (binding=0), 26 entries (each batch contributes its own copy of the entity matrix):
|
||||
```
|
||||
[0..11] = oak instance matrices (group 0 — bark)
|
||||
[12..23] = oak instance matrices (group 1 — leaves)
|
||||
[24] = drudge instance matrix (group 2 — skin)
|
||||
[25] = drudge instance matrix (group 3 — eyes)
|
||||
```
|
||||
|
||||
**Batch SSBO** (binding=1), 4 entries indexed by `gl_DrawIDARB`:
|
||||
```
|
||||
Batches[0] = (oak_bark_handle, layer=0, flags=0)
|
||||
Batches[1] = (oak_leaves_handle, layer=0, flags=0)
|
||||
Batches[2] = (drudge_skin_handle_with_palette, layer=0, flags=0)
|
||||
Batches[3] = (drudge_eyes_handle, layer=0, flags=0)
|
||||
```
|
||||
|
||||
**Indirect buffer** (single buffer, two sections):
|
||||
```
|
||||
_indirectBuffer[0..2] = opaque section (3 entries, sorted front-to-back)
|
||||
[0] = (count=oakBarkIdx, instanceCount=12, firstIndex=oakBarkFI, baseVertex=oakBV, baseInstance=0)
|
||||
[1] = (count=drudgeSkinIdx, instanceCount=1, firstIndex=drudgeSkinFI, baseVertex=drudgeBV, baseInstance=24)
|
||||
[2] = (count=drudgeEyesIdx, instanceCount=1, firstIndex=drudgeEyesFI, baseVertex=drudgeBV, baseInstance=25)
|
||||
|
||||
_indirectBuffer[3] = transparent section (1 entry)
|
||||
[3] = (count=oakLeavesIdx, instanceCount=12, firstIndex=oakLeavesFI, baseVertex=oakBV, baseInstance=12)
|
||||
|
||||
_opaqueDrawCount = 3; _transparentDrawCount = 1; _transparentByteOffset = 3 * sizeof(DEIC) = 60.
|
||||
```
|
||||
|
||||
**Shader access pattern** (per vertex):
|
||||
```glsl
|
||||
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; // unique per (group, instance) pair
|
||||
mat4 model = Instances[instanceIndex].transform;
|
||||
BatchData b = Batches[gl_DrawIDARB]; // shared across all verts in this draw
|
||||
sampler2DArray tex = sampler2DArray(b.textureHandle);
|
||||
vec4 color = texture(tex, vec3(aTexCoord, float(b.textureLayer)));
|
||||
```
|
||||
|
||||
**Per-frame CPU GL calls** (entity rendering, total):
|
||||
- 3× `glBufferData` (instance SSBO, batch SSBO, indirect buffer).
|
||||
- 1× `glBindVertexArray(globalVAO)`.
|
||||
- 2× `glBindBufferBase` (SSBOs at bindings 0 + 1).
|
||||
- 1× `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`.
|
||||
- 2× `glMultiDrawElementsIndirect` (one opaque, one transparent).
|
||||
- ~5 state changes (blend, depth mask, render pass uniform).
|
||||
|
||||
Total: ~15-20 GL calls per frame for entity rendering, regardless of group count. N.4 baseline is "few hundred."
|
||||
|
||||
---
|
||||
|
||||
## 6. Translucent rendering detail
|
||||
|
||||
Per Decision 2: WB's two-pass alpha-test pattern.
|
||||
|
||||
**Group classification.** `ClassifyBatches` puts groups into one of two arrays:
|
||||
|
||||
- **Opaque indirect:** `TranslucencyKind.Opaque` and `TranslucencyKind.ClipMap`.
|
||||
- **Transparent indirect:** `TranslucencyKind.AlphaBlend`, `Additive`, `InvAlpha` all merged. Per Decision 2, additive renders as alpha-blend; falsifiable at visual verification.
|
||||
|
||||
Opaque groups stay sorted front-to-back by `SortDistance` (preserved from N.4 — depth-test reject of overdrawn fragments is a meaningful win on dense scenes).
|
||||
|
||||
**Pass GL state:**
|
||||
|
||||
```csharp
|
||||
// Opaque pass
|
||||
_gl.Disable(EnableCap.Blend);
|
||||
_gl.DepthMask(true);
|
||||
_gl.Enable(EnableCap.CullFace); _gl.CullFace(TriangleFace.Back); _gl.FrontFace(FrontFaceDirection.Ccw);
|
||||
_shader.SetInt("uRenderPass", 0);
|
||||
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
|
||||
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
|
||||
indirect: (void*)0, drawcount: _opaqueDrawCount, stride: (uint)sizeof(DEIC));
|
||||
|
||||
// Transparent pass
|
||||
_gl.Enable(EnableCap.Blend);
|
||||
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
|
||||
_gl.DepthMask(false);
|
||||
_shader.SetInt("uRenderPass", 1);
|
||||
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
|
||||
indirect: (void*)_transparentByteOffset, drawcount: _transparentDrawCount, stride: (uint)sizeof(DEIC));
|
||||
|
||||
// Cleanup
|
||||
_gl.DepthMask(true); _gl.Disable(EnableCap.Blend); _gl.BindVertexArray(0);
|
||||
```
|
||||
|
||||
**Visual verification gate (additive fallback plan).** During Week 2-3 visual verification, look at:
|
||||
- Holtburg courtyard, dungeon entrance — confirm scenery + characters identical.
|
||||
- Foundry interior — magic-themed content with potentially additive-flagged surfaces.
|
||||
- Any glowing weapon decals, magical aura effects, or self-luminous textures observed.
|
||||
|
||||
If a visible regression appears (faded glow, missing additive bloom): amend spec to add a third indirect call within the transparent pass with `glBlendFunc(SrcAlpha, One)`. Group classification splits Additive into its own bucket. ~30-min change.
|
||||
|
||||
---
|
||||
|
||||
## 7. Error handling and fallback
|
||||
|
||||
### 7.1 GPU capability detection
|
||||
|
||||
WB's `OpenGLGraphicsDevice` already detects:
|
||||
- `HasOpenGL43` (required for SSBOs, multi-draw indirect, `gl_BaseInstanceARB`).
|
||||
- `HasBindless` (required for bindless texture handles).
|
||||
|
||||
`WbDrawDispatcher` is only constructed when `WbFoundationFlag.Enabled` is true, which gates on `_useModernRendering = HasOpenGL43 && HasBindless`. We inherit WB's gating.
|
||||
|
||||
**Additional check:** `GL_ARB_shader_draw_parameters` (for `gl_BaseInstanceARB`, `gl_DrawIDARB`). Standard on GL 4.6, available as extension on 4.3+. Add to N.5's capability check; if missing, `WbDrawDispatcher` constructor logs a one-time warning and the foundation flag flips off (falls back to `InstancedMeshRenderer`).
|
||||
|
||||
### 7.2 Shader compile failure
|
||||
|
||||
If `mesh_modern.vert/.frag` fails to compile (driver bug, GLSL version mismatch, extension issue): catch the compile exception in `WbDrawDispatcher` constructor, log the GLSL info log + GPU vendor/renderer string ONCE, flip `WbFoundationFlag.Enabled = false` for the session, fall back to `InstancedMeshRenderer`. Do not crash.
|
||||
|
||||
### 7.3 Non-resident handle (the bindless foot-gun)
|
||||
|
||||
Sampling a non-resident handle causes undefined behavior (driver-dependent: black texture, GPU fault, device-lost).
|
||||
|
||||
Mitigation in code: `TextureCache.MakeResidentHandle` is the only API that produces a handle, and it makes the handle resident in the same call. There is no API surface that produces a non-resident handle. Defense-in-depth: dispatcher asserts `BindlessTextureHandle != 0` before queuing a draw (zero handles get filtered out, same as zero `surfaceId` does today).
|
||||
|
||||
### 7.4 Indirect command corruption
|
||||
|
||||
`count`, `firstIndex`, `baseVertex` come from WB's `ObjectRenderBatch` (never user input; WB-internal correctness). `instanceCount` is `grp.Matrices.Count` (we control). `baseInstance` is `grp.FirstInstance` (we control, computed cumulatively). Bug-class is "WB-internal corruption + our cumulative-offset bug" — same surface area as N.4's `BaseInstance` already trusts. Add a debug-build assertion: cumulative `baseInstance` values must be strictly increasing.
|
||||
|
||||
### 7.5 Disposal order
|
||||
|
||||
`WbDrawDispatcher.Dispose` releases bindless handles before deleting underlying textures (driver UB otherwise). `TextureCache.Dispose` does this:
|
||||
1. Iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`.
|
||||
2. Call `_glExtensions.MakeAllNonResidentARB` if available (some drivers prefer batch).
|
||||
3. Then `glDeleteTextures` proceeds as today.
|
||||
|
||||
Dispatcher's own buffer cleanup (`_instanceSsbo`, `_batchSsbo`, `_indirectBuffer`) via `glDeleteBuffers`.
|
||||
|
||||
### 7.6 Persistent first-failure diagnostic
|
||||
|
||||
If shader compile fails OR an extension check fails OR `glMultiDrawElementsIndirect` returns `GL_INVALID_OPERATION` on first frame: log ONCE with GPU vendor/renderer string + GLSL info log. Don't spam. User pastes the line into a bug report; we know exactly where to look.
|
||||
|
||||
---
|
||||
|
||||
## 8. Testing and acceptance
|
||||
|
||||
### 8.1 Unit / conformance tests
|
||||
|
||||
- **`TextureCacheBindlessTests`** — for each `Bindless`-suffixed `GetOrUpload*`: returns non-zero `ulong`, returns same handle for same key (cache hit), distinct keys yield distinct handles, returned handle is resident per GL state query.
|
||||
- **`WbDrawDispatcherIndirectBuilderTests`** — pure CPU test: given a fixture of `(entity, mesh, batch)` tuples, verify the indirect buffer layout: `count` / `firstIndex` / `baseVertex` / `baseInstance` per group, opaque section sorted front-to-back, transparent section in classification order (no sort — back-to-front sort can be added in a follow-up if measured useful).
|
||||
- **`WbDrawDispatcherTranslucencyTests`** — verify groups land in correct indirect buffer (opaque vs transparent) per `TranslucencyKind`. `Additive`/`InvAlpha` go to transparent. `ClipMap` goes to opaque. Empty groups skipped.
|
||||
- **Existing N.4 tests stay green.** All 60 tests captured by `FullyQualifiedName~Wb|MatrixComposition` filter remain at 60/0.
|
||||
|
||||
### 8.2 Visual verification
|
||||
|
||||
Same gate as N.4 used. Live ACE + retail dat, in-world testing.
|
||||
|
||||
- **Holtburg courtyard** — characters + scenery + buildings render identically to N.4. No missing entities, no z-fighting, no exploded parts.
|
||||
- **Foundry interior** — dense static-object scene, stress-tests indirect call count and translucency classification.
|
||||
- **Indoor → outdoor cell transition** — confirms cell visibility filtering still works (we cull on CPU; dispatcher should never see invisible-cell entities).
|
||||
- **Drudge / character close-up** — confirms Issue #47 close-detail mesh preservation.
|
||||
- **Magic content (additive fallback check)** — Foundry runes, glowing weapons if observable, boss models with luminous decals. Trigger spec amendment if regression spotted.
|
||||
|
||||
User-confirms each. These are visual identity checks against the running N.4 behavior (use `git stash` of N.5 changes + relaunch as the comparison baseline).
|
||||
|
||||
### 8.3 Perf measurement (the win gate)
|
||||
|
||||
`[WB-DIAG]` augmented:
|
||||
|
||||
```
|
||||
[WB-DIAG] entSeen=N entDrawn=M ... drawsIssued=K groups=G (existing)
|
||||
[WB-DIAG] cpu_us=Xmedian/Y95p gpu_us=Zmedian/W95p (new)
|
||||
```
|
||||
|
||||
Capture before/after numbers in fixed scenes/cameras:
|
||||
|
||||
| Scene | Camera position | Metric |
|
||||
|---|---|---|
|
||||
| Holtburg courtyard | 30m elevated, looking SW | `cpu`, `gpu`, `drawsIssued` |
|
||||
| Foundry interior | character spawn, default heading | `cpu`, `gpu`, `drawsIssued` |
|
||||
| Open landscape | terrain wander, no entities | `cpu`, `gpu`, `drawsIssued` (sanity) |
|
||||
|
||||
**Acceptance gates** (paste into SHIP commit message):
|
||||
|
||||
- Visual identity to N.4 — confirmed via §8.2.
|
||||
- CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction).
|
||||
- GPU rendering time within ±10% of N.4 (sanity: no regression).
|
||||
- `drawsIssued ≤ 5 per pass` (down from "few hundred per pass").
|
||||
- All tests green — 60+ Wb tests + new bindless/indirect tests.
|
||||
- `ACDREAM_USE_WB_FOUNDATION=0` still works — `InstancedMeshRenderer` fallback runs and renders correctly.
|
||||
|
||||
### 8.4 Long-session sanity check
|
||||
|
||||
Hour-long session with `ACDREAM_WB_DIAG=1`. Watch resident-handle count grow. Expected: bounded plateau under 5K once content set is fully traversed. If unbounded growth, residency policy revisit required in N.6.
|
||||
|
||||
---
|
||||
|
||||
## 9. Risks
|
||||
|
||||
| Risk | Likelihood | Impact | Mitigation |
|
||||
|---|---|---|---|
|
||||
| Driver bug in bindless residency | Low (mature in 2025+ drivers) | Crash / black textures | One-time logging on first failure; legacy fallback under flag-off |
|
||||
| Driver bug in `glMultiDrawElementsIndirect` | Low | GL_INVALID_OPERATION | Capability check + first-failure logging + fallback |
|
||||
| Resident handle count exceeds driver limit in long session | Low (acdream content is bounded) | Cumulative GPU memory pressure → eventual eviction surprises | `[WB-DIAG]` resident-count log; revisit eviction in N.6 if it grows unbounded |
|
||||
| Shader compile fails on weird GPU | Medium-low | First-launch failure | Compile-error catch + fallback to `InstancedMeshRenderer` |
|
||||
| Additive fidelity regression on rare GfxObj surfaces | Medium | Subtle visual difference | Visual verification at magic-themed content; spec amendment for additive sub-pass if found |
|
||||
| `gl_BaseInstanceARB` fields not advancing per-instance attribs we still use | Low (we drop attribs entirely) | Wrong matrices | All instance data via SSBO; no vertex attrib at locations 3-6 to misalign |
|
||||
| SSBO indexing GPU cost worse than uniform-array | Low (well-optimized in modern drivers) | Possible GPU time regression | GL timer queries detect; if observed, fall back to uniform array of bounded size |
|
||||
| Persistent-mapped buffer foot-guns (chosen NOT to use in N.5) | n/a | n/a | Decision 7 defers to N.6 |
|
||||
| Per-instance highlight (selection blink) feature creep | Low | Scope grows | Decision 8 defers; field reserved in design doc |
|
||||
|
||||
---
|
||||
|
||||
## 10. Out of scope (explicitly)
|
||||
|
||||
The following are NOT N.5 work. They become possible follow-ons.
|
||||
|
||||
- **WB's `TextureAtlasManager` adoption for atlas tier.** N.5 keeps acdream's `TextureCache` as the texture owner for everything. Atlas adoption is N.6+ if memory pressure shows up.
|
||||
- **Persistent-mapped buffer ring with sync fences.** Decision 7. N.6 candidate if profiling shows residual `glBufferData` cost.
|
||||
- **GPU-side culling (compute pre-pass).** Future phase.
|
||||
- **Texture array repacking for multi-layer per-instance composites.** Future, if many palette-overrides actually share dimensions and could be packed.
|
||||
- **Selection-blink highlight color.** Decision 8. Phase B.4 follow-up. Field reserved in `InstanceData` design (extend stride to 80 bytes when implementing).
|
||||
- **Deletion of legacy `InstancedMeshRenderer`.** N.6.
|
||||
- **Terrain wiring through WB.** Future.
|
||||
|
||||
---
|
||||
|
||||
## 11. Open questions
|
||||
|
||||
None outstanding. All 8 brainstorm questions resolved + 1 clarification on highlight semantics. Ready for plan.
|
||||
|
||||
---
|
||||
|
||||
*End of design.*
|
||||
Loading…
Add table
Add a link
Reference in a new issue