acdream/docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md
Erik 38eb999f2c phase(N.5) Task 18: plan finalization — SHIP record appended
Records the as-shipped state: acceptance gate verdicts, plan amendments
captured during execution, code-review adjustments per task, out-of-scope
N.6 follow-ups, and a complete files-changed summary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:13:37 +02:00

100 KiB
Raw Blame History

Phase N.5 — Modern Rendering Path — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Lift WbDrawDispatcher onto bindless textures + multi-draw indirect, reducing per-pass GL calls from ~hundreds to ~5, with visual identity to N.4.

Architecture: SSBO-resident per-instance (mat4) and per-draw (texture handle + layer + flags) data. One glMultiDrawElementsIndirect per pass over a contiguous DrawElementsIndirectCommand buffer (opaque section sorted front-to-back, transparent section in classification order). 1-layer sampler2DArray for ALL textures so the shader unifies with WB's atlas pattern (future-proofs N.6+ atlas adoption). WB's two-pass alpha-test for translucency.

Tech Stack: .NET 10, C#, Silk.NET.OpenGL 2.23, Silk.NET.OpenGL.Extensions.ARB, GLSL 4.30 + GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters. xUnit for tests.

Predecessor: N.4 ship at c445364 + spec at docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md.


File map

Create:

  • src/AcDream.App/Rendering/Wb/BindlessSupport.cs — thin wrapper around Silk.NET.OpenGL.Extensions.ARB.ArbBindlessTexture, capability detection.
  • src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs — DEIC struct for indirect dispatch.
  • src/AcDream.App/Rendering/Shaders/mesh_modern.vert — bindless + SSBO + indirect vertex shader.
  • src/AcDream.App/Rendering/Shaders/mesh_modern.frag — alpha-test discard fragment shader.
  • tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs
  • tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs
  • tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs

Modify:

  • src/AcDream.App/AcDream.App.csproj — add Silk.NET.OpenGL.Extensions.ARB package.
  • src/AcDream.App/Rendering/TextureCache.cs — Texture2DArray uploads, three Bindless GetOrUpload* methods, Dispose order.
  • src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs — replace draw loop with SSBO + indirect dispatch, add timing diagnostics.
  • src/AcDream.App/Rendering/GameWindow.cs — load mesh_modern shaders + capability check + fallback.
  • CLAUDE.md — extend "WB integration cribs" with N.5 patterns.
  • docs/plans/2026-04-11-roadmap.md — move N.5 to "shipped" at end.

Delete (Task 15):

  • src/AcDream.App/Rendering/Shaders/mesh_instanced.vert
  • src/AcDream.App/Rendering/Shaders/mesh_instanced.frag

Workflow per task

  1. Read the spec section the task implements.
  2. For TDD-friendly tasks: write the failing test → run → verify failure → implement → run → verify pass → commit.
  3. For shader / pure-integration tasks (no unit-testable behavior): build green → visual smoke test → commit.
  4. After every commit, run dotnet build (full) + dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless". Both must be green.

Commit message convention (matching N.4):

  • Tasks 1-14: phase(N.5) Task N: <description>
  • Tasks 15-19: phase(N.5): <description>
  • Task 20: phase(N.5): SHIP — <perf numbers + summary>

Always co-author: Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


Task 1: Add ArbBindlessTexture package + BindlessSupport wrapper

Files:

  • Modify: src/AcDream.App/AcDream.App.csproj
  • Create: src/AcDream.App/Rendering/Wb/BindlessSupport.cs

(The test file tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs is created in Task 3, NOT this task.)

  • Step 1.1: Add package reference

In src/AcDream.App/AcDream.App.csproj, add inside the existing <ItemGroup> containing Silk.NET.OpenGL:

<PackageReference Include="Silk.NET.OpenGL.Extensions.ARB" Version="2.23.0" />
  • Step 1.2: Build to verify package resolves

Run: dotnet build src/AcDream.App/AcDream.App.csproj Expected: PASS, package restored.

  • Step 1.3: Write the BindlessSupport class

Create src/AcDream.App/Rendering/Wb/BindlessSupport.cs:

using Silk.NET.OpenGL;
using Silk.NET.OpenGL.Extensions.ARB;

namespace AcDream.App.Rendering.Wb;

/// <summary>
/// Thin wrapper around <see cref="ArbBindlessTexture"/> + capability detection
/// for the modern rendering path. Constructed once at startup. Throws if the
/// extension isn't available — callers must check <see cref="IsAvailable"/>
/// before constructing for production use.
/// </summary>
public sealed class BindlessSupport
{
    private readonly GL _gl;
    private readonly ArbBindlessTexture _ext;

    public bool IsAvailable => true;  // Construction succeeded

    public BindlessSupport(GL gl, ArbBindlessTexture extension)
    {
        _gl = gl;
        _ext = extension;
    }

    public static bool TryCreate(GL gl, out BindlessSupport? support)
    {
        if (gl.TryGetExtension<ArbBindlessTexture>(out var ext))
        {
            support = new BindlessSupport(gl, ext);
            return true;
        }
        support = null;
        return false;
    }

    /// <summary>Get a 64-bit bindless handle for the texture and make it resident.
    /// Idempotent: handle is the same for a given texture name.</summary>
    public ulong GetResidentHandle(uint textureName)
    {
        ulong h = _ext.GetTextureHandle(textureName);
        if (!_ext.IsTextureHandleResident(h))
            _ext.MakeTextureHandleResident(h);
        return h;
    }

    /// <summary>Release residency for a handle. Call before deleting the underlying texture.</summary>
    public void MakeNonResident(ulong handle)
    {
        if (_ext.IsTextureHandleResident(handle))
            _ext.MakeTextureHandleNonResident(handle);
    }

    /// <summary>Detect <c>GL_ARB_shader_draw_parameters</c> in addition to bindless.
    /// N.5's vertex shader uses <c>gl_BaseInstanceARB</c> and <c>gl_DrawIDARB</c>
    /// from this extension.</summary>
    public bool HasShaderDrawParameters(GL gl)
    {
        int n = 0;
        gl.GetInteger(GLEnum.NumExtensions, out n);
        for (int i = 0; i < n; i++)
        {
            string ext = gl.GetStringS(StringName.Extensions, (uint)i);
            if (ext == "GL_ARB_shader_draw_parameters") return true;
        }
        return false;
    }
}
  • Step 1.4: Build to verify

Run: dotnet build Expected: PASS.

  • Step 1.5: Commit
git add src/AcDream.App/AcDream.App.csproj src/AcDream.App/Rendering/Wb/BindlessSupport.cs
git commit -m "phase(N.5) Task 1: ArbBindlessTexture wrapper + capability detection

[heredoc body]"

Use this exact heredoc body:

phase(N.5) Task 1: ArbBindlessTexture wrapper + capability detection

Adds Silk.NET.OpenGL.Extensions.ARB 2.23.0 package and a thin
BindlessSupport wrapper exposing GetResidentHandle / MakeNonResident /
HasShaderDrawParameters. TryCreate returns false if the bindless
extension isn't present, letting WbFoundationFlag fall back to legacy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 2: Add parallel Texture2DArray upload path to TextureCache

Files:

  • Modify: src/AcDream.App/Rendering/TextureCache.cs

AMENDED 2026-05-08 after first-pass implementation surfaced a flaw. Originally Task 2 wanted to globally switch UploadRgba8 to Texture2DArray. Implementer audit found four legacy consumers that bind a TextureCache return value with glBindTexture(Texture2D, ...): WbDrawDispatcher.cs:363 (rewritten in Task 10 — but breaks meanwhile), StaticMeshRenderer.cs:126,223, InstancedMeshRenderer.cs:282,361 (legacy escape hatch — must keep working under foundation flag-off), and ParticleRenderer.cs:162. A texture has ONE GL target — can't be both Texture2D and Texture2DArray. The legacy consumers' shaders also sample via sampler2D; sampling a Texture2DArray via sampler2D is a GLSL type mismatch.

Revised approach: ADD a parallel UploadRgba8AsLayer1Array method. Don't touch the existing UploadRgba8. Task 3's Bindless* methods will call the new array version with their own cache dictionaries. Legacy callers stay on the Texture2D path, untouched. WB modern dispatcher (Task 10) uses the array path.

Cost: same surface uploaded twice if used by both legacy and modern paths simultaneously. In practice the overlap is small, and N.6 deletes the legacy path entirely. Acceptable transition cost.

  • Step 2.1: Read existing UploadRgba8 in TextureCache.cs

Read src/AcDream.App/Rendering/TextureCache.cs:256-280. Confirm it uses TextureTarget.Texture2D + TexImage2D.

  • Step 2.2: ADD UploadRgba8AsLayer1Array method (do NOT replace UploadRgba8)

ADD this NEW method to src/AcDream.App/Rendering/TextureCache.cs immediately after the existing UploadRgba8 (which stays untouched):

/// <summary>
/// Variant of <see cref="UploadRgba8"/> that uploads pixel data as a 1-layer
/// Texture2DArray. Required by the WB modern rendering path which samples via
/// sampler2DArray in its bindless shader. Pixel data is identical.
/// </summary>
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
{
    uint tex = _gl.GenTexture();
    _gl.BindTexture(TextureTarget.Texture2DArray, tex);

    fixed (byte* p = decoded.Rgba8)
        _gl.TexImage3D(
            TextureTarget.Texture2DArray,
            0,
            InternalFormat.Rgba8,
            (uint)decoded.Width,
            (uint)decoded.Height,
            depth: 1,
            border: 0,
            PixelFormat.Rgba,
            PixelType.UnsignedByte,
            p);

    _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
    _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
    _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS,     (int)TextureWrapMode.Repeat);
    _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT,     (int)TextureWrapMode.Repeat);

    _gl.BindTexture(TextureTarget.Texture2DArray, 0);
    return tex;
}
  • Step 2.3: Build + run tests

Run: dotnet build Expected: PASS. The new method is unused at this point, but that's fine — Task 3 wires the bindless variants to call it. If TreatWarningsAsErrors=true flags the unused method, suppress the warning with the existing project pattern (typically a per-method attribute) or accept the warning since Task 3 fixes it within hours.

Run: dotnet test --filter "FullyQualifiedName~TextureCache" Expected: existing tests PASS (no behavior change for legacy callers).

  • Step 2.4: Commit
phase(N.5) Task 2: parallel Texture2DArray upload path in TextureCache

Adds UploadRgba8AsLayer1Array — uploads pixel data as a 1-layer
Texture2DArray. Existing UploadRgba8 (Texture2D) untouched, so all
legacy callers (StaticMeshRenderer, InstancedMeshRenderer, ParticleRenderer,
WbDrawDispatcher's pre-rewrite path) keep working unchanged.

Required for Task 3's Bindless* methods which need the Texture2DArray
target so the WB modern shader can sample via sampler2DArray. Same
surface may be uploaded both ways during the N.5/N.6 transition;
doubling is bounded and acceptable. After N.6 retires legacy
renderers entirely, the legacy UploadRgba8 becomes unused and is
deleted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 3: Add bindless GetOrUpload methods with parallel Texture2DArray cache

AMENDED 2026-05-08: the original Task 3 had Bindless* methods calling the legacy Texture2D GetOrUpload* then converting the GL name to a bindless handle. That produces a sampler2D texture sampled via sampler2DArray in the shader — a GLSL type mismatch. Revised: Bindless* methods use the parallel Texture2DArray upload path (Task 2's UploadRgba8AsLayer1Array) with their own three cache dictionaries mirroring the legacy three-cache structure.

Files:

  • Modify: src/AcDream.App/Rendering/TextureCache.cs

  • Create: tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs

  • Step 3.1: Read TextureCache constructor + cache fields

Read src/AcDream.App/Rendering/TextureCache.cs:1-50. Note the existing dictionaries: _handlesBySurfaceId, _handlesByOverridden, _handlesByPalette — these stay untouched, serving the legacy Texture2D path.

  • Step 3.2: Add BindlessSupport dependency + three parallel cache dicts

Add these fields to TextureCache, near the existing legacy cache dicts:

private readonly Wb.BindlessSupport? _bindless;

// Bindless / Texture2DArray parallel caches. Keys mirror the legacy three
// caches so a surface used by both the legacy (Texture2D, sampler2D) and
// modern (Texture2DArray, sampler2DArray) paths is uploaded twice — once
// per target. Each entry stores both the GL texture name (for Dispose
// cleanup) and the resident bindless handle (returned to callers).
private readonly Dictionary<uint, (uint Name, ulong Handle)> _bindlessBySurfaceId = new();
private readonly Dictionary<(uint surfaceId, uint origTexOverride), (uint Name, ulong Handle)> _bindlessByOverridden = new();
private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new();

Change the constructor signature:

public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null)
{
    _gl = gl;
    _dats = dats;
    _bindless = bindless;
}

The optional bindless parameter keeps backward compatibility — legacy GetOrUpload* keeps working without it. The Bindless* methods throw if bindless is null.

  • Step 3.3: Update TextureCache constructor sites

Run: Grep for new TextureCache\( in the codebase.

Identified call site: src/AcDream.App/Rendering/GameWindow.cs (typically around the WB foundation init).

Modify GameWindow.cs to pass the BindlessSupport instance — but only after Task 6 wires it up. For Task 3 leave the parameter as default-null; existing callers compile unchanged.

  • Step 3.4: Add three Bindless GetOrUpload methods

Add to src/AcDream.App/Rendering/TextureCache.cs immediately after the existing GetOrUploadWithPaletteOverride overloads:

/// <summary>
/// 64-bit bindless handle variant of <see cref="GetOrUpload"/> for the WB
/// modern rendering path. Uploads the texture as a 1-layer Texture2DArray
/// (so the shader's <c>sampler2DArray</c> can sample at layer 0) and returns
/// a resident bindless handle. Caches by surfaceId in a separate dictionary
/// from the legacy Texture2D path; the same surface may be uploaded twice
/// if used by both paths (acceptable transition cost — N.6 deletes the legacy
/// path).
/// Throws if BindlessSupport wasn't provided to the constructor.
/// </summary>
public ulong GetOrUploadBindless(uint surfaceId)
{
    EnsureBindlessAvailable();
    if (_bindlessBySurfaceId.TryGetValue(surfaceId, out var entry))
        return entry.Handle;
    var decoded = DecodeFromDats(surfaceId, origTextureOverride: null, paletteOverride: null);
    uint name = UploadRgba8AsLayer1Array(decoded);
    ulong handle = _bindless!.GetResidentHandle(name);
    _bindlessBySurfaceId[surfaceId] = (name, handle);
    return handle;
}

/// <summary>64-bit bindless variant of <see cref="GetOrUploadWithOrigTextureOverride"/>.
/// Uses the parallel Texture2DArray upload path.</summary>
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId)
{
    EnsureBindlessAvailable();
    var key = (surfaceId, overrideOrigTextureId);
    if (_bindlessByOverridden.TryGetValue(key, out var entry))
        return entry.Handle;
    var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: null);
    uint name = UploadRgba8AsLayer1Array(decoded);
    ulong handle = _bindless!.GetResidentHandle(name);
    _bindlessByOverridden[key] = (name, handle);
    return handle;
}

/// <summary>64-bit bindless variant of <see cref="GetOrUploadWithPaletteOverride"/>
/// taking a precomputed palette hash. Uses the parallel Texture2DArray upload path.</summary>
public ulong GetOrUploadWithPaletteOverrideBindless(
    uint surfaceId,
    uint? overrideOrigTextureId,
    PaletteOverride paletteOverride,
    ulong precomputedPaletteHash)
{
    EnsureBindlessAvailable();
    uint origTexKey = overrideOrigTextureId ?? 0;
    var key = (surfaceId, origTexKey, precomputedPaletteHash);
    if (_bindlessByPalette.TryGetValue(key, out var entry))
        return entry.Handle;
    var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: paletteOverride);
    uint name = UploadRgba8AsLayer1Array(decoded);
    ulong handle = _bindless!.GetResidentHandle(name);
    _bindlessByPalette[key] = (name, handle);
    return handle;
}

private void EnsureBindlessAvailable()
{
    if (_bindless is null)
        throw new InvalidOperationException(
            "TextureCache constructed without BindlessSupport — cannot generate bindless handles. " +
            "WbDrawDispatcher requires the bindless-aware ctor overload (pass non-null BindlessSupport).");
}

Note: DecodeFromDats is the existing private helper that produces RGBA8 pixel data. It's target-agnostic — same decoded pixels go to either Texture2D (legacy) or Texture2DArray (bindless) upload. No duplication of the decode pipeline.

  • Step 3.5: Write the failing tests

Create tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs:

using AcDream.App.Rendering;
using AcDream.App.Rendering.Wb;
using DatReaderWriter;
using Xunit;

namespace AcDream.Core.Tests.Rendering;

/// <summary>
/// Lightweight unit tests that exercise <see cref="TextureCache"/>'s bindless
/// methods through their dependency on <see cref="BindlessSupport"/>.
/// These tests run without a GL context — they verify guard behavior. Real
/// bindless integration is covered by visual verification (Task 17).
/// </summary>
public sealed class TextureCacheBindlessTests
{
    [Fact]
    public void GetOrUploadBindless_ThrowsWithoutBindlessSupport()
    {
        // We can't easily construct a real TextureCache in a headless test.
        // This test documents the contract: a TextureCache built without
        // BindlessSupport must throw on any Bindless* method to fail-fast
        // rather than silently return 0 (which would route a draw to handle 0
        // and produce a silent non-resident GPU fault).

        // Marker test — the actual throw lives in TextureCache.MakeResidentHandle
        // and is reached only via GL-bound Bindless* methods. This test passes
        // by virtue of the throw existing in source. See Task 3 Step 3.4 for
        // the contract definition.
        Assert.True(true, "Contract documented in TextureCache.MakeResidentHandle.");
    }
}

(The "real" bindless test surface is the visual gate at Task 17 — there's no headless GL context for unit-testing handle generation. This test fixes the contract in writing so future engineers don't accidentally break the throw-on-null guard.)

  • Step 3.6: Run + verify

Run: dotnet test --filter "FullyQualifiedName~TextureCacheBindless" Expected: PASS (1 test).

Run full build: dotnet build Expected: PASS.

  • Step 3.7: Commit
phase(N.5) Task 3: TextureCache bindless GetOrUpload methods

Adds GetOrUploadBindless / GetOrUploadWithOrigTextureOverrideBindless /
GetOrUploadWithPaletteOverrideBindless that delegate to the existing
GL-name-returning methods + map the name to a 64-bit resident handle
via BindlessSupport. Cache miss generates + makes resident; cache hit
returns the cached handle.

Constructor gains an optional BindlessSupport parameter — null keeps
backward compat for callers (sky, terrain, debug) that don't need
bindless. Throws InvalidOperationException if Bindless* methods are
called without BindlessSupport (fail-fast vs silent zero handle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 4: Update TextureCache.Dispose for bindless release order

Files:

  • Modify: src/AcDream.App/Rendering/TextureCache.cs

  • Step 4.1: Replace Dispose method

Replace the existing Dispose in src/AcDream.App/Rendering/TextureCache.cs (currently around line 282) with:

public void Dispose()
{
    // Release bindless handles BEFORE deleting underlying textures.
    // glDeleteTextures of a texture with a resident bindless handle is
    // undefined behavior per ARB_bindless_texture.
    if (_bindless is not null)
    {
        foreach (var (name, handle) in _bindlessBySurfaceId.Values)
            _bindless.MakeNonResident(handle);
        foreach (var (name, handle) in _bindlessByOverridden.Values)
            _bindless.MakeNonResident(handle);
        foreach (var (name, handle) in _bindlessByPalette.Values)
            _bindless.MakeNonResident(handle);
    }

    // Then delete the array textures backing those handles.
    foreach (var (name, _) in _bindlessBySurfaceId.Values)
        _gl.DeleteTexture(name);
    _bindlessBySurfaceId.Clear();
    foreach (var (name, _) in _bindlessByOverridden.Values)
        _gl.DeleteTexture(name);
    _bindlessByOverridden.Clear();
    foreach (var (name, _) in _bindlessByPalette.Values)
        _gl.DeleteTexture(name);
    _bindlessByPalette.Clear();

    // Legacy Texture2D textures.
    foreach (var h in _handlesBySurfaceId.Values)
        _gl.DeleteTexture(h);
    _handlesBySurfaceId.Clear();

    foreach (var h in _handlesByOverridden.Values)
        _gl.DeleteTexture(h);
    _handlesByOverridden.Clear();

    foreach (var h in _handlesByPalette.Values)
        _gl.DeleteTexture(h);
    _handlesByPalette.Clear();

    if (_magentaHandle != 0)
    {
        _gl.DeleteTexture(_magentaHandle);
        _magentaHandle = 0;
    }
}
  • Step 4.2: Build + tests

Run: dotnet build && dotnet test --filter "FullyQualifiedName~TextureCache" Expected: PASS.

  • Step 4.3: Commit
phase(N.5) Task 4: TextureCache.Dispose releases bindless handles first

Iterating _bindlessHandlesByGlName + MakeNonResident before any
glDeleteTexture call, per ARB_bindless_texture spec — deleting a
texture with a resident handle is undefined behavior. Order: bindless
release → texture delete → magenta cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 5: Create mesh_modern.vert + mesh_modern.frag

Files:

  • Create: src/AcDream.App/Rendering/Shaders/mesh_modern.vert
  • Create: src/AcDream.App/Rendering/Shaders/mesh_modern.frag

Both files must be added to <Content> <CopyToOutputDirectory> block in AcDream.App.csproj if shaders aren't auto-included. Check the existing pattern in the csproj — the existing mesh_instanced.vert/.frag should already be there.

  • Step 5.1: Read csproj content includes

Read src/AcDream.App/AcDream.App.csproj. Find the <Content> block(s) that include *.vert / *.frag files. Confirm whether the include uses a glob (covers new files automatically) or names files explicitly.

If glob: nothing to do. If explicit: add mesh_modern.vert + mesh_modern.frag entries.

  • Step 5.2: Write mesh_modern.vert

Create src/AcDream.App/Rendering/Shaders/mesh_modern.vert:

#version 430 core
#extension GL_ARB_bindless_texture : require
#extension GL_ARB_shader_draw_parameters : require

layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;

struct InstanceData {
    mat4 transform;
    // Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight):
    //   vec4 highlightColor;
    // When implementing, extend stride here, increase _instanceSsbo upload
    // size in WbDrawDispatcher, add a flat varying out, and consume in frag.
};

struct BatchData {
    uvec2 textureHandle;   // bindless handle for sampler2DArray
    uint  textureLayer;    // layer index (always 0 for per-instance composites)
    uint  flags;           // reserved
};

layout(std430, binding = 0) readonly buffer InstanceBuffer {
    InstanceData Instances[];
};

layout(std430, binding = 1) readonly buffer BatchBuffer {
    BatchData Batches[];
};

uniform mat4 uViewProjection;

out vec3 vNormal;
out vec2 vTexCoord;
out flat uvec2 vTextureHandle;
out flat uint  vTextureLayer;

void main() {
    int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
    mat4 model = Instances[instanceIndex].transform;

    vec4 worldPos = model * vec4(aPosition, 1.0);
    gl_Position = uViewProjection * worldPos;

    vNormal = normalize(mat3(model) * aNormal);
    vTexCoord = aTexCoord;

    BatchData b = Batches[gl_DrawIDARB];
    vTextureHandle = b.textureHandle;
    vTextureLayer = b.textureLayer;
}
  • Step 5.3: Write mesh_modern.frag — preserve existing lighting model

AMENDED 2026-05-08: original plan draft used hardcoded uAmbient/uSunDir/uSunColor uniforms. Reading the actual src/AcDream.App/Rendering/Shaders/mesh_instanced.frag revealed it uses a SceneLighting UBO at binding=1 with 8 lights, fog params, and lightning flash. The N.5 shader must preserve this lighting machinery to maintain visual identity to N.4.

The vert outputs need to ADD vWorldPos (used by accumulateLights and applyFog). Update the vert from Step 5.2 to also emit out vec3 vWorldPos; and vWorldPos = worldPos.xyz; in main.

Create src/AcDream.App/Rendering/Shaders/mesh_modern.frag with the same lighting UBO + functions as mesh_instanced.frag, plus the bindless texture + alpha-test discard logic:

#version 430 core
#extension GL_ARB_bindless_texture : require

in vec3 vNormal;
in vec2 vTexCoord;
in vec3 vWorldPos;
in flat uvec2 vTextureHandle;
in flat uint  vTextureLayer;

// 0 = opaque (discard alpha<0.95), 1 = transparent (discard alpha>=0.95)
uniform int uRenderPass;

// SceneLighting UBO — IDENTICAL layout to mesh_instanced.frag binding=1.
struct Light {
    vec4 posAndKind;
    vec4 dirAndRange;
    vec4 colorAndIntensity;
    vec4 coneAngleEtc;
};
layout(std140, binding = 1) uniform SceneLighting {
    Light uLights[8];
    vec4  uCellAmbient;
    vec4  uFogParams;
    vec4  uFogColor;
    vec4  uCameraAndTime;
};

vec3 accumulateLights(vec3 N, vec3 worldPos) {
    vec3 lit = uCellAmbient.xyz;
    int activeLights = int(uCellAmbient.w);
    for (int i = 0; i < 8; ++i) {
        if (i >= activeLights) break;
        int kind = int(uLights[i].posAndKind.w);
        vec3 Lcol = uLights[i].colorAndIntensity.xyz * uLights[i].colorAndIntensity.w;
        if (kind == 0) {
            vec3 Ldir = -uLights[i].dirAndRange.xyz;
            float ndl = max(0.0, dot(N, Ldir));
            lit += Lcol * ndl;
        } else {
            vec3 toL = uLights[i].posAndKind.xyz - worldPos;
            float d  = length(toL);
            float range = uLights[i].dirAndRange.w;
            if (d < range && range > 1e-3) {
                vec3 Ldir = toL / max(d, 1e-4);
                float ndl = max(0.0, dot(N, Ldir));
                float atten = 1.0;
                if (kind == 2) {
                    float cos_edge = cos(uLights[i].coneAngleEtc.x * 0.5);
                    float cos_l    = dot(-Ldir, uLights[i].dirAndRange.xyz);
                    atten *= (cos_l > cos_edge) ? 1.0 : 0.0;
                }
                lit += Lcol * ndl * atten;
            }
        }
    }
    return lit;
}

vec3 applyFog(vec3 lit, vec3 worldPos) {
    int mode = int(uFogParams.w);
    if (mode == 0) return lit;
    float d = length(worldPos - uCameraAndTime.xyz);
    float fogStart = uFogParams.x;
    float fogEnd   = uFogParams.y;
    float span = max(1e-3, fogEnd - fogStart);
    float fog = clamp((d - fogStart) / span, 0.0, 1.0);
    return mix(lit, uFogColor.xyz, fog);
}

out vec4 FragColor;

void main() {
    sampler2DArray tex = sampler2DArray(vTextureHandle);
    vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));

    // Two-pass alpha-test (N.5 Decision 2 — replaces mesh_instanced's
    // uTranslucencyKind=1 ClipMap-only discard with a more aggressive
    // pattern that also handles AlphaBlend correctly via two passes).
    if (uRenderPass == 0) {
        if (color.a < 0.95) discard;        // opaque pass
    } else {
        if (color.a >= 0.95) discard;       // transparent pass
        if (color.a < 0.05) discard;        // skip totally-empty
    }

    vec3 N = normalize(vNormal);
    vec3 lit = accumulateLights(N, vWorldPos);

    // Lightning flash — additive scene bump (matches mesh_instanced.frag).
    lit += uFogParams.z * vec3(0.6, 0.6, 0.75);

    // Retail clamp per-channel to 1.0 (r13 §13.1).
    lit = min(lit, vec3(1.0));

    vec3 rgb = color.rgb * lit;
    rgb = applyFog(rgb, vWorldPos);
    FragColor = vec4(rgb, color.a);
}
  • Step 5.4: Update mesh_modern.vert to emit vWorldPos

Add vWorldPos output to the vert from Step 5.2. The full vert becomes:

#version 430 core
#extension GL_ARB_bindless_texture : require
#extension GL_ARB_shader_draw_parameters : require

layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;

struct InstanceData {
    mat4 transform;
    // Reserved for Phase B.4 follow-up (selection-blink retail-faithful
    // highlight): vec4 highlightColor; — extend stride here, increase the
    // _instanceSsbo upload size in WbDrawDispatcher, add a flat varying out,
    // and consume in mesh_modern.frag.
};

struct BatchData {
    uvec2 textureHandle;   // bindless handle for sampler2DArray
    uint  textureLayer;    // layer index (always 0 for per-instance composites)
    uint  flags;           // reserved
};

layout(std430, binding = 0) readonly buffer InstanceBuffer {
    InstanceData Instances[];
};

layout(std430, binding = 1) readonly buffer BatchBuffer {
    BatchData Batches[];
};

uniform mat4 uViewProjection;

out vec3 vNormal;
out vec2 vTexCoord;
out vec3 vWorldPos;
out flat uvec2 vTextureHandle;
out flat uint  vTextureLayer;

void main() {
    int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
    mat4 model = Instances[instanceIndex].transform;

    vec4 worldPos = model * vec4(aPosition, 1.0);
    gl_Position = uViewProjection * worldPos;

    vWorldPos = worldPos.xyz;
    vNormal = normalize(mat3(model) * aNormal);
    vTexCoord = aTexCoord;

    BatchData b = Batches[gl_DrawIDARB];
    vTextureHandle = b.textureHandle;
    vTextureLayer = b.textureLayer;
}

(The vert from Step 5.2 should be REPLACED with this. The two are the same except for vWorldPos and a small comment cleanup.)

  • Step 5.5: Build to verify shaders are copied to output

Run: dotnet build src/AcDream.App/AcDream.App.csproj Expected: PASS. After build, check src/AcDream.App/bin/Debug/net10.0/Rendering/Shaders/ contains mesh_modern.vert + mesh_modern.frag.

  • Step 5.6: Commit
phase(N.5) Task 5: mesh_modern.vert + .frag — bindless + SSBO + indirect

New entity shaders modeled on WB's StaticObjectModern.* but adapted:
- Drops uActiveCells (we cull cells on CPU)
- Drops uDrawIDOffset (full passes, no pagination)
- Drops uHighlightColor (deferred to Phase B.4 follow-up)
- Uses acdream's existing lighting layout

vert reads InstanceData[] @ binding=0 indexed by gl_BaseInstanceARB +
gl_InstanceID, BatchData[] @ binding=1 indexed by gl_DrawIDARB.
frag samples sampler2DArray reconstructed from a uvec2 bindless handle
+ uint layer; uRenderPass uniform picks alpha-test threshold.

Not yet wired to the dispatcher — Task 7 swaps shader load,
Tasks 9-10 swap the draw loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 6: Wire mesh_modern shader load + capability check in GameWindow

Files:

  • Modify: src/AcDream.App/Rendering/GameWindow.cs

  • Step 6.1: Read existing mesh_instanced load site

Read src/AcDream.App/Rendering/GameWindow.cs:960-980 (around the _meshShader = new Shader(...) line). Note the surrounding context — the WB foundation flag check, how the dispatcher is constructed.

  • Step 6.2: Add capability-gated mesh_modern load

Find this block:

_meshShader = new Shader(_gl,
    Path.Combine(shadersDir, "mesh_instanced.vert"),
    Path.Combine(shadersDir, "mesh_instanced.frag"));

Replace with:

// N.5: prefer mesh_modern (bindless + SSBO + indirect) when WB foundation
// + ARB_shader_draw_parameters are available. Falls back to legacy
// mesh_instanced if any capability is missing — same code path as
// ACDREAM_USE_WB_FOUNDATION=0.
bool wbFoundationOn = WbFoundationFlag.IsEnabled;
bool useModernShader = false;
if (wbFoundationOn && BindlessSupport.TryCreate(_gl, out var bindless) && bindless is not null)
{
    if (bindless.HasShaderDrawParameters(_gl))
    {
        try
        {
            _meshShader = new Shader(_gl,
                Path.Combine(shadersDir, "mesh_modern.vert"),
                Path.Combine(shadersDir, "mesh_modern.frag"));
            _bindlessSupport = bindless;
            useModernShader = true;
            Console.WriteLine("[N.5] mesh_modern shader loaded (bindless + ARB_shader_draw_parameters)");
        }
        catch (Exception ex)
        {
            Console.WriteLine($"[N.5] mesh_modern compile failed, falling back: {ex.Message}");
        }
    }
    else
    {
        Console.WriteLine("[N.5] GL_ARB_shader_draw_parameters not present, using legacy shader");
    }
}
if (!useModernShader)
{
    _meshShader = new Shader(_gl,
        Path.Combine(shadersDir, "mesh_instanced.vert"),
        Path.Combine(shadersDir, "mesh_instanced.frag"));
    _bindlessSupport = null;
}

Add the _bindlessSupport field declaration alongside _meshShader:

private BindlessSupport? _bindlessSupport;

Also add using AcDream.App.Rendering.Wb; at the top of the file if not already there.

  • Step 6.3: Pass BindlessSupport to TextureCache constructor

Find the existing new TextureCache(_gl, _dats) site in GameWindow.cs. Replace with:

_textureCache = new TextureCache(_gl, _dats, _bindlessSupport);

This requires _bindlessSupport to already be set. If the construction order is TextureCache before _meshShader, swap so _meshShader block runs first. Read 30 lines of context around both initializations to confirm safe ordering.

  • Step 6.4: Build + smoke test

Run: dotnet build Expected: PASS.

Run: dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition" Expected: 60+ tests PASS.

Smoke launch (manual, optional at this point — modern shader loaded but dispatcher still uses legacy draw path so visual should be identical to N.4):

$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call"
$env:ACDREAM_LIVE = "1"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch-task6.log

Expected: launch logs show [N.5] mesh_modern shader loaded line. Visual is broken (modern shader is loaded but dispatcher's per-group draw loop hands it the wrong data layout) — this is fine, expected, and gets fixed in Tasks 7-10.

If you want to verify shader compiles without breaking visual, swap the _meshShader to mesh_modern only AFTER Task 10 lands.

For now, leave useModernShader = true path commented out and only run the legacy load. Tasks 9-10 flip it on. Update the block:

if (wbFoundationOn && BindlessSupport.TryCreate(_gl, out var bindless) && bindless is not null)
{
    if (bindless.HasShaderDrawParameters(_gl))
    {
        // Capability detected — store the support for later tasks.
        // Shader swap happens in Task 10 once dispatcher is ready.
        _bindlessSupport = bindless;
        Console.WriteLine("[N.5] modern path capabilities present (bindless + ARB_shader_draw_parameters)");
    }
}
// Legacy shader load happens unconditionally for Task 6:
_meshShader = new Shader(_gl,
    Path.Combine(shadersDir, "mesh_instanced.vert"),
    Path.Combine(shadersDir, "mesh_instanced.frag"));

Task 10 will switch the shader load. Task 6 just plumbs _bindlessSupport so Task 7+ can use it.

  • Step 6.5: Commit
phase(N.5) Task 6: capability detection + BindlessSupport plumb in GameWindow

Detects ARB_bindless_texture + ARB_shader_draw_parameters at startup
when the WB foundation flag is enabled. Stores BindlessSupport on
GameWindow and passes it to TextureCache so Task 7+ can generate
bindless handles. Mesh shader load remains mesh_instanced for now —
Task 10 swaps to mesh_modern after the dispatcher is rewired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 7: Add SSBO + indirect buffer infrastructure to WbDrawDispatcher

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs

  • Create: src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs

  • Step 7.1: Create DrawElementsIndirectCommand struct

Create src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs:

using System.Runtime.InteropServices;

namespace AcDream.App.Rendering.Wb;

/// <summary>
/// Layout matches what <c>glMultiDrawElementsIndirect</c> expects.
/// Total size 20 bytes; arrays are typically uploaded with stride = sizeof(this).
/// </summary>
[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DrawElementsIndirectCommand
{
    public uint Count;          // index count for this draw
    public uint InstanceCount;  // number of instances
    public uint FirstIndex;     // offset into IBO, in indices
    public int  BaseVertex;     // vertex offset into VBO
    public uint BaseInstance;   // first instance ID (offsets per-instance attribs / SSBO read)
}
  • Step 7.2: Add SSBO + indirect buffer fields + BatchData struct to WbDrawDispatcher

In src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs, add at the top of the class (replacing the existing _instanceVbo field):

private readonly BindlessSupport _bindless;

// SSBO buffer ids
private uint _instanceSsbo;
private uint _batchSsbo;
private uint _indirectBuffer;

// Per-frame scratch arrays
private float[] _instanceData = new float[256 * 16];      // mat4 floats per instance
private BatchData[] _batchData = new BatchData[256];
private DrawElementsIndirectCommand[] _indirectCommands = new DrawElementsIndirectCommand[256];

private int _opaqueDrawCount;
private int _transparentDrawCount;
private int _transparentByteOffset;

[StructLayout(LayoutKind.Sequential, Pack = 4)]
private struct BatchData
{
    public ulong TextureHandle;   // bindless handle (uvec2 in GLSL)
    public uint  TextureLayer;
    public uint  Flags;
}

Remove the existing private readonly uint _instanceVbo; field.

  • Step 7.3: Update constructor

Change the constructor signature from:

public WbDrawDispatcher(
    GL gl,
    Shader shader,
    TextureCache textures,
    WbMeshAdapter meshAdapter,
    EntitySpawnAdapter entitySpawnAdapter)

to:

public WbDrawDispatcher(
    GL gl,
    Shader shader,
    TextureCache textures,
    WbMeshAdapter meshAdapter,
    EntitySpawnAdapter entitySpawnAdapter,
    BindlessSupport bindless)

In the body, replace _instanceVbo = _gl.GenBuffer(); with:

_bindless = bindless ?? throw new ArgumentNullException(nameof(bindless));
_instanceSsbo  = _gl.GenBuffer();
_batchSsbo     = _gl.GenBuffer();
_indirectBuffer = _gl.GenBuffer();
  • Step 7.4: Update Dispose

Replace the existing Dispose() body:

public void Dispose()
{
    if (_disposed) return;
    _disposed = true;
    _gl.DeleteBuffer(_instanceSsbo);
    _gl.DeleteBuffer(_batchSsbo);
    _gl.DeleteBuffer(_indirectBuffer);
}
  • Step 7.5: Update WbDrawDispatcher construction site in GameWindow

Find the existing new WbDrawDispatcher(...) call in GameWindow.cs and add the _bindlessSupport! argument (the ! non-null asserts; the dispatcher is only constructed when WB foundation is on, which already implies bindless is present).

  • Step 7.6: Build + tests

Run: dotnet build Expected: PASS.

Run: dotnet test --filter "FullyQualifiedName~Wb" Expected: PASS (existing tests don't exercise the changed buffer plumbing yet — we removed _instanceVbo but we'll restore the draw path in Task 9).

If WbDrawDispatcher.Draw references _instanceVbo, those references break. Comment out the body of Draw() temporarily — it'll be rewritten in Tasks 9-10. Wrap with // TASK 9-10: rewriting. Build must still pass.

Actually, easier: replace _instanceVbo references with _instanceSsbo and let the existing draw path use the SSBO as if it were a vertex buffer. The legacy draw will be functionally broken but compile. Visual will break but only after we flip the shader in Task 10. For the scope of Tasks 7-9 we want the build to compile.

The cleanest pattern: leave the existing Draw() method untouched except for substituting _instanceVbo_instanceSsbo. The behavior is wrong but compiles, and Tasks 9-10 fully rewrite it.

  • Step 7.7: Commit
phase(N.5) Task 7: dispatcher SSBO + indirect buffer infrastructure

Adds DrawElementsIndirectCommand struct (20-byte layout for
glMultiDrawElementsIndirect). Replaces _instanceVbo field on
WbDrawDispatcher with three buffers: _instanceSsbo (mat4[]),
_batchSsbo (BatchData[]), _indirectBuffer (DEIC[]). Adds BindlessSupport
constructor parameter — non-null required since the dispatcher is only
constructed when WB foundation is on.

Existing Draw() method substitutes _instanceVbo → _instanceSsbo for
compile. Behavior temporarily wrong; Tasks 9-10 fully rewrite the
draw loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 8: Update InstanceGroup + GroupKey for bindless handles

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs

  • Step 8.1: Update InstanceGroup

In WbDrawDispatcher.cs, replace the existing InstanceGroup class with:

private sealed class InstanceGroup
{
    public uint Ibo;
    public uint FirstIndex;
    public int BaseVertex;
    public int IndexCount;
    public ulong BindlessTextureHandle;   // 64-bit (was uint TextureHandle in N.4)
    public uint TextureLayer;             // 0 for per-instance composites
    public TranslucencyKind Translucency;
    public int FirstInstance;
    public int InstanceCount;
    public float SortDistance;
    public readonly List<Matrix4x4> Matrices = new();
}
  • Step 8.2: Update GroupKey

Replace the GroupKey record:

private readonly record struct GroupKey(
    uint Ibo,
    uint FirstIndex,
    int BaseVertex,
    int IndexCount,
    ulong BindlessTextureHandle,
    uint TextureLayer,
    TranslucencyKind Translucency);
  • Step 8.3: Update ResolveTexture method

Replace the existing ResolveTexture method (returns uint) with:

private ulong ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash)
{
    uint surfaceId = batch.Key.SurfaceId;
    if (surfaceId == 0 || surfaceId == 0xFFFFFFFF) return 0;

    uint overrideOrigTex = 0;
    bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null
        && meshRef.SurfaceOverrides.TryGetValue(surfaceId, out overrideOrigTex);
    uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null;

    if (entity.PaletteOverride is not null)
    {
        return _textures.GetOrUploadWithPaletteOverrideBindless(
            surfaceId, origTexOverride, entity.PaletteOverride, palHash);
    }
    else if (hasOrigTexOverride)
    {
        return _textures.GetOrUploadWithOrigTextureOverrideBindless(surfaceId, overrideOrigTex);
    }
    else
    {
        return _textures.GetOrUploadBindless(surfaceId);
    }
}
  • Step 8.4: Update ClassifyBatches to use the new return type

Replace the existing ClassifyBatches to use ulong texHandle and pass the layer:

private void ClassifyBatches(
    ObjectRenderData renderData,
    ulong gfxObjId,
    Matrix4x4 model,
    WorldEntity entity,
    MeshRef meshRef,
    ulong palHash,
    AcSurfaceMetadataTable metaTable)
{
    for (int batchIdx = 0; batchIdx < renderData.Batches.Count; batchIdx++)
    {
        var batch = renderData.Batches[batchIdx];

        TranslucencyKind translucency;
        if (metaTable.TryLookup(gfxObjId, batchIdx, out var meta))
        {
            translucency = meta.Translucency;
        }
        else
        {
            translucency = batch.IsAdditive ? TranslucencyKind.Additive
                : batch.IsTransparent ? TranslucencyKind.AlphaBlend
                : TranslucencyKind.Opaque;
        }

        ulong texHandle = ResolveTexture(entity, meshRef, batch, palHash);
        if (texHandle == 0) continue;

        // For per-instance composites we use 1-layer Texture2DArray, layer always 0.
        // When N.6 adopts WB's atlas, this becomes batch's layer index.
        uint texLayer = 0;

        var key = new GroupKey(
            batch.IBO, batch.FirstIndex, (int)batch.BaseVertex,
            batch.IndexCount, texHandle, texLayer, translucency);

        if (!_groups.TryGetValue(key, out var grp))
        {
            grp = new InstanceGroup
            {
                Ibo = batch.IBO,
                FirstIndex = batch.FirstIndex,
                BaseVertex = (int)batch.BaseVertex,
                IndexCount = batch.IndexCount,
                BindlessTextureHandle = texHandle,
                TextureLayer = texLayer,
                Translucency = translucency,
            };
            _groups[key] = grp;
        }
        grp.Matrices.Add(model);
    }
}
  • Step 8.5: Update remaining DrawGroup/EnsureInstanceAttribs references

Comment out DrawGroup and EnsureInstanceAttribs methods (Task 10 deletes them). Also comment out their call sites in Draw(). Build will fail until Task 9-10 lands; that's expected.

For build-greenness during Task 8, replace the DrawGroup body with throw new NotImplementedException("Task 9-10 rewrites this"); so calls compile but throw at runtime. Visual will be broken until Task 10. That's expected.

Update the Draw() method's per-group loop to compile:

foreach (var grp in _opaqueDraws)
{
    _shader.SetInt("uTranslucencyKind", (int)grp.Translucency);
    DrawGroup(grp);  // throws — Task 10 fixes
}

(The user does NOT visually verify at this task. Build green only.)

  • Step 8.6: Build

Run: dotnet build Expected: PASS.

Run: dotnet test --filter "FullyQualifiedName~Wb" Expected: existing tests PASS (they're CPU-only — they don't actually invoke DrawGroup).

  • Step 8.7: Commit
phase(N.5) Task 8: InstanceGroup + GroupKey carry bindless handle + layer

Replaces uint TextureHandle (32-bit GL name) with ulong
BindlessTextureHandle (64-bit) in InstanceGroup + GroupKey + ResolveTexture
return type. Adds TextureLayer (always 0 for per-instance composites,
becomes meaningful when WB atlas is adopted in N.6).

ClassifyBatches now calls TextureCache.GetOrUpload*Bindless variants.
DrawGroup body throws NotImplementedException — Task 9-10 rewrites
the draw loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 9: Build BatchData + DEIC arrays per frame (TDD)

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs
  • Create: tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs

This task adds a pure CPU method BuildIndirectArrays() that the dispatcher will call before issuing draws. Unit-testable without GL context.

  • Step 9.1: Write the failing test

Create tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs:

using System.Numerics;
using AcDream.App.Rendering.Wb;
using AcDream.Core.Meshing;
using Xunit;

namespace AcDream.Core.Tests.Rendering.Wb;

/// <summary>
/// Pure CPU test of <see cref="WbDrawDispatcher.BuildIndirectArrays"/>.
/// Builds a synthetic group set and verifies the laid-out indirect commands
/// match the spec §5 walk-through.
/// </summary>
public sealed class WbDrawDispatcherIndirectBuilderTests
{
    [Fact]
    public void TwoOpaqueGroupsAndOneTransparent_LaysOutContiguouslyOpaqueFirst()
    {
        // Arrange — synthetic groups laid out as in spec §5
        var groups = new List<WbDrawDispatcher.IndirectGroupInput>
        {
            new(IndexCount: 100, FirstIndex: 0,   BaseVertex: 0,   InstanceCount: 12, FirstInstance: 0,  TextureHandle: 0xAA, TextureLayer: 0, Translucency: TranslucencyKind.Opaque),
            new(IndexCount: 200, FirstIndex: 100, BaseVertex: 0,   InstanceCount: 12, FirstInstance: 12, TextureHandle: 0xBB, TextureLayer: 0, Translucency: TranslucencyKind.AlphaBlend),
            new(IndexCount: 50,  FirstIndex: 300, BaseVertex: 100, InstanceCount: 1,  FirstInstance: 24, TextureHandle: 0xCC, TextureLayer: 0, Translucency: TranslucencyKind.Opaque),
        };

        var indirect = new DrawElementsIndirectCommand[16];
        var batch = new WbDrawDispatcher.BatchDataPublic[16];

        // Act
        var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch);

        // Assert layout
        Assert.Equal(2, result.OpaqueCount);
        Assert.Equal(1, result.TransparentCount);
        Assert.Equal(2 * 20, result.TransparentByteOffset);  // sizeof(DEIC) = 20

        // Opaque section, sorted as input order (Task 11 adds sort)
        Assert.Equal(100u, indirect[0].Count);
        Assert.Equal(0u,   indirect[0].FirstIndex);
        Assert.Equal(0,    indirect[0].BaseVertex);
        Assert.Equal(12u,  indirect[0].InstanceCount);
        Assert.Equal(0u,   indirect[0].BaseInstance);

        Assert.Equal(50u,  indirect[1].Count);
        Assert.Equal(300u, indirect[1].FirstIndex);
        Assert.Equal(100,  indirect[1].BaseVertex);
        Assert.Equal(1u,   indirect[1].InstanceCount);
        Assert.Equal(24u,  indirect[1].BaseInstance);

        // Transparent section
        Assert.Equal(200u, indirect[2].Count);
        Assert.Equal(100u, indirect[2].FirstIndex);
        Assert.Equal(12u,  indirect[2].InstanceCount);
        Assert.Equal(12u,  indirect[2].BaseInstance);

        // BatchData parallel
        Assert.Equal(0xAAul, batch[0].TextureHandle);
        Assert.Equal(0xCCul, batch[1].TextureHandle);
        Assert.Equal(0xBBul, batch[2].TextureHandle);
    }

    [Fact]
    public void EmptyGroupList_ProducesZeroCounts()
    {
        var groups = new List<WbDrawDispatcher.IndirectGroupInput>();
        var indirect = new DrawElementsIndirectCommand[0];
        var batch = new WbDrawDispatcher.BatchDataPublic[0];

        var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch);

        Assert.Equal(0, result.OpaqueCount);
        Assert.Equal(0, result.TransparentCount);
        Assert.Equal(0, result.TransparentByteOffset);
    }
}
  • Step 9.2: Run, verify it fails

Run: dotnet test --filter "FullyQualifiedName~WbDrawDispatcherIndirectBuilder" Expected: COMPILE FAIL — BuildIndirectArrays and supporting public types don't exist.

  • Step 9.3: Implement BuildIndirectArrays + supporting types

In src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs, add public helper types + static method (above the private InstanceGroup class):

/// <summary>Public view of the per-group inputs to <see cref="BuildIndirectArrays"/> — used in tests.</summary>
public readonly record struct IndirectGroupInput(
    int IndexCount,
    uint FirstIndex,
    int BaseVertex,
    int InstanceCount,
    int FirstInstance,
    ulong TextureHandle,
    uint TextureLayer,
    TranslucencyKind Translucency);

/// <summary>Public mirror of the per-group BatchData laid into the SSBO. Tests verify alignment.</summary>
// Pack=8 (not 4) — must stay layout-identical to private BatchData for Task 10's MemoryMarshal.Cast.
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct BatchDataPublic
{
    public ulong TextureHandle;
    public uint  TextureLayer;
    public uint  Flags;
}

public readonly record struct IndirectLayoutResult(
    int OpaqueCount,
    int TransparentCount,
    int TransparentByteOffset);

/// <summary>
/// Lays out the indirect commands + parallel BatchData array contiguously:
/// opaque section first, transparent section second. Pure CPU, no GL state.
/// Caller passes scratch arrays (pre-sized).
/// </summary>
public static IndirectLayoutResult BuildIndirectArrays(
    IReadOnlyList<IndirectGroupInput> groups,
    DrawElementsIndirectCommand[] indirectScratch,
    BatchDataPublic[] batchScratch)
{
    int opaqueCount = 0;
    int transparentCount = 0;

    // First pass: count
    foreach (var g in groups)
    {
        if (IsOpaque(g.Translucency)) opaqueCount++;
        else transparentCount++;
    }

    // Second pass: lay out — opaque [0..opaqueCount), transparent [opaqueCount..opaqueCount+transparentCount)
    int oi = 0;
    int ti = opaqueCount;
    foreach (var g in groups)
    {
        var dec = new DrawElementsIndirectCommand
        {
            Count = (uint)g.IndexCount,
            InstanceCount = (uint)g.InstanceCount,
            FirstIndex = g.FirstIndex,
            BaseVertex = g.BaseVertex,
            BaseInstance = (uint)g.FirstInstance,
        };
        var bd = new BatchDataPublic
        {
            TextureHandle = g.TextureHandle,
            TextureLayer = g.TextureLayer,
            Flags = 0,
        };

        if (IsOpaque(g.Translucency))
        {
            indirectScratch[oi] = dec;
            batchScratch[oi] = bd;
            oi++;
        }
        else
        {
            indirectScratch[ti] = dec;
            batchScratch[ti] = bd;
            ti++;
        }
    }

    return new IndirectLayoutResult(opaqueCount, transparentCount, opaqueCount * DrawCommandStride);
}

private static bool IsOpaque(TranslucencyKind t)
    => t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap;
  • Step 9.4: Run test, verify pass

Run: dotnet test --filter "FullyQualifiedName~WbDrawDispatcherIndirectBuilder" Expected: PASS (2 tests).

Run full filter: dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition" Expected: 60+ existing tests + 2 new = PASS.

  • Step 9.5: Commit
phase(N.5) Task 9: BuildIndirectArrays — CPU layout for indirect dispatch

Pure CPU helper that lays out a group list into a contiguous indirect
buffer (DrawElementsIndirectCommand[]) and parallel BatchData[] —
opaque section first, transparent section second. Returns counts +
byte offset for the transparent section.

Tests cover the spec §5 walk-through layout: per-group fields propagate
correctly, opaque/transparent partition lands at the expected indices.

Static + public so tests can exercise without a GL context. Tasks
10-11 wire it into Draw().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 10: Replace draw loop with glMultiDrawElementsIndirect (visual verification)

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs
  • Modify: src/AcDream.App/Rendering/GameWindow.cs

This is the load-bearing task. After this lands, visual verification is required.

  • Step 10.1: Rewrite WbDrawDispatcher.Draw

Replace the entire Draw() method body in src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs. The phase 1-3 (entity walk, group bucketing, matrix layout) stay; phases 4-6 are rewritten:

public unsafe void Draw(
    ICamera camera,
    IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
    FrustumPlanes? frustum = null,
    uint? neverCullLandblockId = null,
    HashSet<uint>? visibleCellIds = null,
    HashSet<uint>? animatedEntityIds = null)
{
    _shader.Use();
    var vp = camera.View * camera.Projection;
    _shader.SetMatrix4("uViewProjection", vp);

    // Lighting uniforms — match what mesh_modern.frag declares (Task 5.3).
    // Read the existing N.4 GameWindow lighting wire-up to copy the values
    // verbatim (look for `lighting` UBO bind or `uAmbient` SetVec3 calls
    // around the same place where _meshShader.Use() / SetMatrix4 happens).
    // If N.4 used a UBO: change mesh_modern.frag in Task 5.3 to match the UBO,
    // then bind the UBO here via `_gl.BindBufferBase(UniformBuffer, 1, lightingUbo)`.
    // If N.4 used uniforms: replicate the same SetVec3 calls here.

    bool diag = string.Equals(Environment.GetEnvironmentVariable("ACDREAM_WB_DIAG"), "1", StringComparison.Ordinal);

    Vector3 camPos = Vector3.Zero;
    if (Matrix4x4.Invert(camera.View, out var invView))
        camPos = invView.Translation;

    // ── Phases 1-2: walk entities, build groups, lay matrices ───────────
    foreach (var grp in _groups.Values) grp.Matrices.Clear();
    var metaTable = _meshAdapter.MetadataTable;
    uint anyVao = 0;

    foreach (var entry in landblockEntries)
    {
        bool landblockVisible = frustum is null
            || entry.LandblockId == neverCullLandblockId
            || FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax);
        if (!landblockVisible && (animatedEntityIds is null || animatedEntityIds.Count == 0))
            continue;

        foreach (var entity in entry.Entities)
        {
            if (entity.MeshRefs.Count == 0) continue;

            bool isAnimated = animatedEntityIds?.Contains(entity.Id) == true;
            if (!landblockVisible && !isAnimated) continue;
            if (entity.ParentCellId.HasValue && visibleCellIds is not null
                && !visibleCellIds.Contains(entity.ParentCellId.Value))
                continue;

            if (frustum is not null && !isAnimated && entry.LandblockId != neverCullLandblockId)
            {
                var p = entity.Position;
                var aMin = new Vector3(p.X - PerEntityCullRadius, p.Y - PerEntityCullRadius, p.Z - PerEntityCullRadius);
                var aMax = new Vector3(p.X + PerEntityCullRadius, p.Y + PerEntityCullRadius, p.Z + PerEntityCullRadius);
                if (!FrustumCuller.IsAabbVisible(frustum.Value, aMin, aMax))
                    continue;
            }

            if (diag) _entitiesSeen++;

            var entityWorld =
                Matrix4x4.CreateFromQuaternion(entity.Rotation) *
                Matrix4x4.CreateTranslation(entity.Position);

            ulong palHash = 0;
            if (entity.PaletteOverride is not null)
                palHash = TextureCache.HashPaletteOverride(entity.PaletteOverride);

            bool drewAny = false;
            for (int partIdx = 0; partIdx < entity.MeshRefs.Count; partIdx++)
            {
                var meshRef = entity.MeshRefs[partIdx];
                ulong gfxObjId = meshRef.GfxObjId;
                var renderData = _meshAdapter.TryGetRenderData(gfxObjId);
                if (renderData is null) { if (diag) _meshesMissing++; continue; }
                drewAny = true;
                if (anyVao == 0) anyVao = renderData.VAO;

                if (renderData.IsSetup && renderData.SetupParts.Count > 0)
                {
                    foreach (var (partGfxObjId, partTransform) in renderData.SetupParts)
                    {
                        var partData = _meshAdapter.TryGetRenderData(partGfxObjId);
                        if (partData is null) continue;
                        var model = ComposePartWorldMatrix(entityWorld, meshRef.PartTransform, partTransform);
                        ClassifyBatches(partData, partGfxObjId, model, entity, meshRef, palHash, metaTable);
                    }
                }
                else
                {
                    var model = meshRef.PartTransform * entityWorld;
                    ClassifyBatches(renderData, gfxObjId, model, entity, meshRef, palHash, metaTable);
                }
            }

            if (diag && drewAny) _entitiesDrawn++;
        }
    }

    if (anyVao == 0) { if (diag) MaybeFlushDiag(); return; }

    int totalInstances = 0;
    foreach (var grp in _groups.Values) totalInstances += grp.Matrices.Count;
    if (totalInstances == 0) { if (diag) MaybeFlushDiag(); return; }

    // ── Phase 3: assign FirstInstance per group, lay matrices contiguous ─
    int needed = totalInstances * 16;
    if (_instanceData.Length < needed)
        _instanceData = new float[needed + 256 * 16];

    _opaqueDraws.Clear();
    _translucentDraws.Clear();
    int cursor = 0;
    foreach (var grp in _groups.Values)
    {
        if (grp.Matrices.Count == 0) continue;
        grp.FirstInstance = cursor;
        grp.InstanceCount = grp.Matrices.Count;
        var first = grp.Matrices[0];
        var grpPos = new Vector3(first.M41, first.M42, first.M43);
        grp.SortDistance = Vector3.DistanceSquared(camPos, grpPos);

        for (int i = 0; i < grp.Matrices.Count; i++)
        {
            WriteMatrix(_instanceData, cursor * 16, grp.Matrices[i]);
            cursor++;
        }

        if (IsOpaqueGroup(grp.Translucency))
            _opaqueDraws.Add(grp);
        else
            _translucentDraws.Add(grp);
    }
    _opaqueDraws.Sort(static (a, b) => a.SortDistance.CompareTo(b.SortDistance));

    // ── Phase 4: build BatchData + DEIC arrays ──────────────────────────
    int totalDraws = _opaqueDraws.Count + _translucentDraws.Count;
    if (_batchData.Length < totalDraws)
        _batchData = new BatchData[totalDraws + 64];
    if (_indirectCommands.Length < totalDraws)
        _indirectCommands = new DrawElementsIndirectCommand[totalDraws + 64];

    var groupInputs = new List<IndirectGroupInput>(totalDraws);
    foreach (var g in _opaqueDraws) groupInputs.Add(ToInput(g));
    foreach (var g in _translucentDraws) groupInputs.Add(ToInput(g));

    // BuildIndirectArrays takes BatchDataPublic; cast view of _batchData.
    // We rely on layout equivalence (BatchData and BatchDataPublic both
    // [StructLayout(Sequential, Pack=4)] with same fields).
    var batchView = MemoryMarshal.Cast<BatchData, BatchDataPublic>(_batchData);
    var layout = BuildIndirectArrays(groupInputs, _indirectCommands, batchView.ToArray());
    // Copy back to _batchData (BuildIndirectArrays writes to a copy because of array boxing)
    for (int i = 0; i < totalDraws; i++)
    {
        _batchData[i] = new BatchData
        {
            TextureHandle = batchView[i].TextureHandle,
            TextureLayer = batchView[i].TextureLayer,
            Flags = batchView[i].Flags,
        };
    }
    _opaqueDrawCount = layout.OpaqueCount;
    _transparentDrawCount = layout.TransparentCount;
    _transparentByteOffset = layout.TransparentByteOffset;

    // ── Phase 5: upload three buffers ───────────────────────────────────
    fixed (float* ip = _instanceData)
        UploadSsbo(_instanceSsbo, 0, ip, totalInstances * 16 * sizeof(float));
    fixed (BatchData* bp = _batchData)
        UploadSsbo(_batchSsbo, 1, bp, totalDraws * sizeof(BatchData));
    fixed (DrawElementsIndirectCommand* cp = _indirectCommands)
    {
        _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
        _gl.BufferData(BufferTargetARB.DrawIndirectBuffer,
            (nuint)(totalDraws * sizeof(DrawElementsIndirectCommand)), cp, BufferUsageARB.DynamicDraw);
    }

    // ── Phase 6: bind global VAO once ───────────────────────────────────
    _gl.BindVertexArray(anyVao);

    if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal))
        _gl.Disable(EnableCap.CullFace);

    // ── Phase 7: opaque pass ───────────────────────────────────────────
    if (_opaqueDrawCount > 0)
    {
        _gl.Disable(EnableCap.Blend);
        _gl.DepthMask(true);
        _shader.SetInt("uRenderPass", 0);
        _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
        _gl.MultiDrawElementsIndirect(
            PrimitiveType.Triangles,
            DrawElementsType.UnsignedShort,
            indirect: (void*)0,
            drawcount: (uint)_opaqueDrawCount,
            stride: (uint)sizeof(DrawElementsIndirectCommand));
    }

    // ── Phase 8: transparent pass ──────────────────────────────────────
    if (_transparentDrawCount > 0)
    {
        _gl.Enable(EnableCap.Blend);
        _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
        _gl.DepthMask(false);
        _shader.SetInt("uRenderPass", 1);
        _gl.MultiDrawElementsIndirect(
            PrimitiveType.Triangles,
            DrawElementsType.UnsignedShort,
            indirect: (void*)_transparentByteOffset,
            drawcount: (uint)_transparentDrawCount,
            stride: (uint)sizeof(DrawElementsIndirectCommand));
        _gl.DepthMask(true);
        _gl.Disable(EnableCap.Blend);
    }

    _gl.Disable(EnableCap.CullFace);
    _gl.BindVertexArray(0);

    if (diag)
    {
        _drawsIssued += _opaqueDrawCount + _transparentDrawCount;
        _instancesIssued += totalInstances;
        MaybeFlushDiag();
    }
}

private static bool IsOpaqueGroup(TranslucencyKind t)
    => t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap;

private static IndirectGroupInput ToInput(InstanceGroup g) => new(
    IndexCount: g.IndexCount,
    FirstIndex: g.FirstIndex,
    BaseVertex: g.BaseVertex,
    InstanceCount: g.InstanceCount,
    FirstInstance: g.FirstInstance,
    TextureHandle: g.BindlessTextureHandle,
    TextureLayer: g.TextureLayer,
    Translucency: g.Translucency);

private unsafe void UploadSsbo(uint ssbo, uint binding, void* data, int byteCount)
{
    _gl.BindBuffer(BufferTargetARB.ShaderStorageBuffer, ssbo);
    _gl.BufferData(BufferTargetARB.ShaderStorageBuffer, (nuint)byteCount, data, BufferUsageARB.DynamicDraw);
    _gl.BindBufferBase(BufferTargetARB.ShaderStorageBuffer, binding, ssbo);
}

Delete the old DrawGroup, EnsureInstanceAttribs, and ResolveTexture (the old uint-returning version) methods — they're no longer called.

  • Step 10.2: Switch GameWindow shader load to mesh_modern

Find the Task 6 block in GameWindow.cs and change the shader load from mesh_instanced to mesh_modern when _bindlessSupport != null:

if (_bindlessSupport is not null)
{
    _meshShader = new Shader(_gl,
        Path.Combine(shadersDir, "mesh_modern.vert"),
        Path.Combine(shadersDir, "mesh_modern.frag"));
    Console.WriteLine("[N.5] mesh_modern shader loaded");
}
else
{
    _meshShader = new Shader(_gl,
        Path.Combine(shadersDir, "mesh_instanced.vert"),
        Path.Combine(shadersDir, "mesh_instanced.frag"));
}
  • Step 10.3: Build + run all tests

Run: dotnet build Expected: PASS.

Run: dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition" Expected: 60+ tests + 2 new BuildIndirectArrays tests PASS.

  • Step 10.4: Visual smoke test (USER GATE)

Launch:

$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call"
$env:ACDREAM_LIVE = "1"
$env:ACDREAM_TEST_HOST = "127.0.0.1"
$env:ACDREAM_TEST_PORT = "9000"
$env:ACDREAM_TEST_USER = "testaccount"
$env:ACDREAM_TEST_PASS = "testpassword"
$env:ACDREAM_WB_DIAG = "1"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch-task10.log

Expected:

  • Console shows [N.5] mesh_modern shader loaded.
  • Holtburg renders with characters + scenery + buildings visible.
  • [WB-DIAG] shows draws dropping from N.4's hundreds to ~3-5 per frame for entity rendering.

User confirms visual identity. If broken, debug — most likely failure modes:

  1. Shader compile failure → console log will show GLSL info log; fix vert/frag.
  2. Black textures everywhere → bindless handle generation broken; check _bindless is non-null in TextureCache.
  3. Wrong geometry → BaseVertex / FirstIndex misaligned; verify against N.4's DrawElementsInstancedBaseVertexBaseInstance signature in the original DrawGroup.
  4. Wrong matrices on entities → InstanceSsbo upload size wrong; verify totalInstances * 16 * sizeof(float).
  • Step 10.5: Commit only after visual verification passes
phase(N.5) Task 10: glMultiDrawElementsIndirect dispatch — visual verified

Replaces WbDrawDispatcher's per-group glDrawElementsInstancedBaseVertexBaseInstance
loop with two glMultiDrawElementsIndirect calls (opaque + transparent).
Per-frame uploads three SSBOs (instance matrices @ binding=0, batch
data @ binding=1, indirect commands).

Switches GameWindow's shader load to mesh_modern when bindless is
present.

Visual verification: Holtburg courtyard renders identical to N.4.
Entity draw calls drop from "few hundred per pass" to 1 per pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 11: Update ClassifyBatches for translucency restructure (TDD)

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs
  • Create: tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs

Per Decision 2: Additive and InvAlpha merge into transparent (alpha-blend). The dispatcher already does this in Task 10's IsOpaqueGroup (which returns true only for Opaque + ClipMap). This task ADDS a unit test and tightens the contract.

  • Step 11.1: Write the failing test

Create tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs:

using AcDream.App.Rendering.Wb;
using AcDream.Core.Meshing;
using Xunit;

namespace AcDream.Core.Tests.Rendering.Wb;

/// <summary>
/// Locks in the N.5 translucency partition contract (Decision 2):
/// Opaque + ClipMap → opaque indirect; AlphaBlend + Additive + InvAlpha → transparent.
/// </summary>
public sealed class WbDrawDispatcherTranslucencyTests
{
    [Theory]
    [InlineData(TranslucencyKind.Opaque,     true)]
    [InlineData(TranslucencyKind.ClipMap,    true)]
    [InlineData(TranslucencyKind.AlphaBlend, false)]
    [InlineData(TranslucencyKind.Additive,   false)]
    [InlineData(TranslucencyKind.InvAlpha,   false)]
    public void IsOpaque_PartitionsByKind(TranslucencyKind kind, bool expected)
    {
        Assert.Equal(expected, WbDrawDispatcher.IsOpaquePublic(kind));
    }
}
  • Step 11.2: Add IsOpaquePublic to WbDrawDispatcher

Make IsOpaqueGroup public (or add a public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaqueGroup(t); shim):

public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaqueGroup(t);
  • Step 11.3: Run test, verify PASS

Run: dotnet test --filter "FullyQualifiedName~WbDrawDispatcherTranslucency" Expected: 5 tests PASS.

Run all: dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition" Expected: 60+ + 2 + 5 = 67+ PASS.

  • Step 11.4: Commit
phase(N.5) Task 11: lock in translucency partition contract

Adds WbDrawDispatcherTranslucencyTests verifying that the N.5 dispatcher
partitions groups exactly per Decision 2 of the spec: Opaque + ClipMap
go opaque, AlphaBlend + Additive + InvAlpha go transparent. Catches
future refactors that drift the partition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 12: Add CPU stopwatch + GL timer query timing in [WB-DIAG]

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs

  • Step 12.1: Add timing fields

In WbDrawDispatcher.cs, add to the diagnostic-counter block:

// CPU + GPU timing for [WB-DIAG] under ACDREAM_WB_DIAG=1
private readonly System.Diagnostics.Stopwatch _cpuStopwatch = new();
private readonly long[] _cpuSamples = new long[256];   // microseconds
private int _cpuSampleCursor;
private uint _gpuQueryOpaque;
private uint _gpuQueryTransparent;
private readonly long[] _gpuSamples = new long[256];   // microseconds
private int _gpuSampleCursor;
private bool _gpuQueriesInitialized;
  • Step 12.2: Initialize GPU queries lazily in Draw()

At the top of Draw() (after _shader.Use() but before bool diag = ...), add:

if (diag && !_gpuQueriesInitialized)
{
    _gpuQueryOpaque = _gl.GenQuery();
    _gpuQueryTransparent = _gl.GenQuery();
    _gpuQueriesInitialized = true;
}
  • Step 12.3: Wrap the draw passes with timing

Replace if (diag) _cpuStopwatch.Restart(); semantics — use a top-of-method _cpuStopwatch.Restart(); (always on, cheap) and only LOG under diag.

At the very top of Draw() (just inside the method):

_cpuStopwatch.Restart();

Wrap the opaque pass MultiDrawElementsIndirect call:

if (diag) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque);
_gl.MultiDrawElementsIndirect(...);  // existing call
if (diag) _gl.EndQuery(QueryTarget.TimeElapsed);

Same for transparent pass with _gpuQueryTransparent.

At the bottom of Draw() (after _gl.BindVertexArray(0)):

_cpuStopwatch.Stop();
if (diag)
{
    long cpuUs = _cpuStopwatch.ElapsedTicks * 1_000_000L / System.Diagnostics.Stopwatch.Frequency;
    _cpuSamples[_cpuSampleCursor] = cpuUs;
    _cpuSampleCursor = (_cpuSampleCursor + 1) % _cpuSamples.Length;

    // GPU sample read — non-blocking, may not be ready yet on first frames
    int avail = 0;
    _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.QueryResultAvailable, out avail);
    if (avail != 0)
    {
        _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.QueryResult, out long opaqueNs);
        _gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.QueryResult, out long transNs);
        long gpuUs = (opaqueNs + transNs) / 1000;
        _gpuSamples[_gpuSampleCursor] = gpuUs;
        _gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length;
    }
}
  • Step 12.4: Update MaybeFlushDiag to log timing percentiles

Replace the existing MaybeFlushDiag body:

private void MaybeFlushDiag()
{
    long now = Environment.TickCount64;
    if (now - _lastLogTick > 5000)
    {
        long cpuMed = MedianMicros(_cpuSamples);
        long cpuP95 = Percentile95Micros(_cpuSamples);
        long gpuMed = MedianMicros(_gpuSamples);
        long gpuP95 = Percentile95Micros(_gpuSamples);
        Console.WriteLine(
            $"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count} " +
            $"cpu_us={cpuMed}m/{cpuP95}p95 gpu_us={gpuMed}m/{gpuP95}p95");
        _entitiesSeen = _entitiesDrawn = _meshesMissing = _drawsIssued = _instancesIssued = 0;
        _lastLogTick = now;
    }
}

private static long MedianMicros(long[] samples)
{
    var copy = (long[])samples.Clone();
    Array.Sort(copy);
    int nz = 0;
    foreach (var v in copy) if (v > 0) { nz++; }
    if (nz == 0) return 0;
    return copy[copy.Length - nz / 2];
}

private static long Percentile95Micros(long[] samples)
{
    var copy = (long[])samples.Clone();
    Array.Sort(copy);
    int nz = 0;
    foreach (var v in copy) if (v > 0) { nz++; }
    if (nz == 0) return 0;
    int idx = copy.Length - 1 - (int)(nz * 0.05);
    return copy[idx];
}
  • Step 12.5: Update Dispose

Add to Dispose():

if (_gpuQueriesInitialized)
{
    _gl.DeleteQuery(_gpuQueryOpaque);
    _gl.DeleteQuery(_gpuQueryTransparent);
}
  • Step 12.6: Build + smoke test

Run: dotnet build Expected: PASS.

Smoke launch with ACDREAM_WB_DIAG=1. Confirm [WB-DIAG] line includes cpu_us= and gpu_us= numbers after ~5 seconds in-world.

  • Step 12.7: Commit
phase(N.5) Task 12: CPU stopwatch + GL_TIME_ELAPSED queries in [WB-DIAG]

Adds median + 95th-percentile CPU + GPU dispatch time to the existing
5-second [WB-DIAG] rollup. CPU via Stopwatch (always running, cheap;
only logged under ACDREAM_WB_DIAG=1). GPU via two GL_TIME_ELAPSED
queries (opaque + transparent), polled non-blocking on next frame.

Numbers populate the SHIP commit message (Task 20).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 13: Capture before/after perf numbers (USER GATE)

Files:

  • (none — measurement task)

  • Step 13.1: Capture N.5 numbers in Holtburg courtyard

Launch acdream with ACDREAM_WB_DIAG=1. Position character at Holtburg courtyard, 30m elevated, looking SW. Stand still for ~30 seconds. Read the [WB-DIAG] line. Record:

N.5 Holtburg courtyard:
  cpu_us=Xmedian/Yp95
  gpu_us=Zmedian/Wp95
  drawsIssued=K
  groups=G
  • Step 13.2: Capture N.5 numbers in Foundry interior

Move to Foundry interior, default heading. Same 30s. Record same metrics.

  • Step 13.3: Compare against N.4 baseline

Stash N.5 changes:

git stash
git checkout c445364   # N.4 SHIP
dotnet build

Repeat measurements with N.4 active. Record numbers in the same format. Compare:

Scene N.4 cpu med N.5 cpu med Δ% N.4 gpu med N.5 gpu med Δ% N.4 draws N.5 draws
Holtburg courtyard
Foundry interior

Restore N.5:

git checkout claude/priceless-feistel-c12935
git stash pop
  • Step 13.4: Verify acceptance gates

Acceptance per spec §8.3:

  • CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction).
  • GPU rendering time within ±10% of N.4 (sanity).
  • drawsIssued ≤ 5 per pass.

If gates fail: investigate. Common causes:

  • Per-frame glBufferData is the bottleneck → defer to N.6 persistent-mapping (per Decision 7).
  • SSBO indexing slower than expected on driver → check NVidia / AMD / Intel separately.
  • Group bucketing not sharing groups well → groups count dominates drawsIssued.

Save the table to a file: docs/plans/2026-05-08-phase-n5-perf-baseline.md. This goes in the SHIP commit.

  • Step 13.5: Commit perf baseline
git add docs/plans/2026-05-08-phase-n5-perf-baseline.md
git commit -m "phase(N.5) Task 13: perf baseline — N.4 vs N.5 in Holtburg + Foundry

[heredoc body]"

Heredoc body:

phase(N.5) Task 13: perf baseline — N.4 vs N.5 in Holtburg + Foundry

Captures CPU + GPU + draw-count numbers for the SHIP gate.

Acceptance gates:
- CPU dispatcher time ≤ 70% of N.4: [PASS / FAIL]
- GPU rendering time within ±10% of N.4: [PASS / FAIL]
- drawsIssued ≤ 5 per pass: [PASS / FAIL]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 14: Visual verification at Holtburg + Foundry + magic content (USER GATE)

Files:

  • (none — verification task; only commits if regressions found)

  • Step 14.1: Holtburg courtyard visual identity

Launch acdream, position at Holtburg courtyard. Compare side-by-side against N.4 (use git stash + checkout flow from Task 13 if needed). Confirm:

  • All scenery (trees, fences, rocks, buildings) renders correctly.

  • No missing entities.

  • No z-fighting introduced.

  • No exploded character parts.

  • Step 14.2: Foundry interior visual identity

Move to Foundry. Confirm same checklist. Pay attention to dense static-object scenes.

  • Step 14.3: Indoor → outdoor transition

Walk through portal/door from outdoors to indoors and back. Confirm cell visibility filtering still works (no "indoor entities visible from outdoors" or vice-versa).

  • Step 14.4: Drudge / character close-up

Find a drudge or NPC. Walk close. Confirm Issue #47 close-detail mesh still preserved (high-detail face / hands, not the low-detail far-LOD).

  • Step 14.5: Magic content (additive fallback check per Q2)

Move through magic-themed content: any glowing weapon decals, runes on walls, magical aura textures. Compare against N.4. If anything appears "darker" or "less luminous" → that's the Decision 2 additive regression.

If found: AMEND THE SPEC with an additive sub-pass design and add a Task 14a between this task and Task 15. Do NOT proceed to ship without resolving.

  • Step 14.6: Long-session sanity check (USER GATE)

Run an hour-long session with ACDREAM_WB_DIAG=1. Watch the [WB-DIAG] resident handle count grow (you'll need to add a bindlessHandlesCount field to the diag log — small task; if not done, just monitor process VRAM via Task Manager / similar). Expected: bounded plateau under 5K handles.

If unbounded growth: file an N.6 follow-up issue, don't block the ship.

  • Step 14.7: Document findings

Append to docs/plans/2026-05-08-phase-n5-perf-baseline.md:

## Visual verification (Task 14)

- Holtburg courtyard: PASS / FAIL (note specific issues)
- Foundry interior: PASS / FAIL
- Cell transitions: PASS / FAIL
- Character close-up (Issue #47): PASS / FAIL
- Magic content (additive check): PASS / FAIL
- Long-session sanity: PASS / FAIL — peak resident handles ~N
  • Step 14.8: Commit findings (no code change)
phase(N.5) Task 14: visual verification — all gates pass

[Or if any failed: amend with sub-task to address.]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 15: Delete legacy mesh_instanced shader files

Files:

  • Delete: src/AcDream.App/Rendering/Shaders/mesh_instanced.vert
  • Delete: src/AcDream.App/Rendering/Shaders/mesh_instanced.frag
  • Modify: src/AcDream.App/Rendering/GameWindow.cs (remove fallback path)

This task removes the fallback shader path. After this lands, ACDREAM_USE_WB_FOUNDATION=0 falls all the way back to InstancedMeshRenderer (which has its own shader). The intermediate "WB foundation on but bindless missing" state no longer exists — if bindless is missing, we treat it as foundation-off.

  • Step 15.1: Delete shader files
git rm src/AcDream.App/Rendering/Shaders/mesh_instanced.vert
git rm src/AcDream.App/Rendering/Shaders/mesh_instanced.frag
  • Step 15.2: Update GameWindow shader load

Replace the conditional shader load block in GameWindow.cs with the single modern path:

if (_bindlessSupport is not null)
{
    _meshShader = new Shader(_gl,
        Path.Combine(shadersDir, "mesh_modern.vert"),
        Path.Combine(shadersDir, "mesh_modern.frag"));
    Console.WriteLine("[N.5] mesh_modern shader loaded");
}
else
{
    // Bindless missing — log and skip WbDrawDispatcher construction so
    // InstancedMeshRenderer handles all rendering (same effect as
    // ACDREAM_USE_WB_FOUNDATION=0).
    Console.WriteLine("[N.5] bindless extension missing — falling back to InstancedMeshRenderer");
    // _meshShader stays unloaded; InstancedMeshRenderer owns its own shader path.
    // The `_dispatcher = new WbDrawDispatcher(...)` site below must be wrapped:
    //     _dispatcher = (_bindlessSupport is not null) ? new WbDrawDispatcher(...) : null;
    // and the per-frame draw call must guard `_dispatcher?.Draw(...)`.
}

Then guard the dispatcher construction site (find _dispatcher = new WbDrawDispatcher(...) in the same file):

_dispatcher = (_bindlessSupport is not null)
    ? new WbDrawDispatcher(_gl, _meshShader, _textureCache, _meshAdapter, _entitySpawnAdapter, _bindlessSupport)
    : null;

And the per-frame call site:

_dispatcher?.Draw(camera, landblockEntries, frustum, ...);

If _dispatcher is null, InstancedMeshRenderer (which is unconditionally constructed elsewhere) does all entity rendering.

  • Step 15.3: Build + tests

Run: dotnet build Expected: PASS.

Run: dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition" Expected: PASS.

  • Step 15.4: Smoke test (legacy fallback path)

Test the legacy fallback by running with foundation off:

$env:ACDREAM_USE_WB_FOUNDATION = "0"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug

Confirm InstancedMeshRenderer renders correctly (this exercises the escape hatch the SHIP commit message claims still works).

  • Step 15.5: Commit
phase(N.5) Task 15: delete legacy mesh_instanced shader files

mesh_instanced.vert + .frag deleted. WbDrawDispatcher always uses
mesh_modern (bindless + multi-draw indirect). Legacy escape hatch
runs via InstancedMeshRenderer + ACDREAM_USE_WB_FOUNDATION=0 — its
own shader path, untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 16: Update CLAUDE.md WB integration cribs

Files:

  • Modify: CLAUDE.md

  • Step 16.1: Read existing WB integration cribs section

Read CLAUDE.md lines 28-80 (the "WB integration cribs" section).

  • Step 16.2: Add N.5 patterns

Append to the WB integration cribs section after the existing bullets:

- **N.5 modern dispatch** uses bindless textures + multi-draw indirect.
  `WbDrawDispatcher.Draw` builds three SSBOs per frame: `_instanceSsbo`
  (mat4 per instance), `_batchSsbo` (texture handle + layer + flags per
  group), `_indirectBuffer` (`DrawElementsIndirectCommand[]`). Two
  `glMultiDrawElementsIndirect` calls per frame — opaque, transparent.
  See `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`.
- **`TextureCache` requires `BindlessSupport`** for the WB modern path.
  Three `Bindless`-suffixed `GetOrUpload*` methods return 64-bit handles
  made resident at upload time. Old `uint`-returning methods stay for
  Sky / Terrain / Debug renderers.
- **Translucency model is two-pass alpha-test** (WB pattern, not
  per-blend-mode subpasses). Opaque pass discards `α<0.95`, transparent
  pass discards `α≥0.95`. Native `Additive` blend renders as alpha-blend
  on GfxObj surfaces — falsifiable; if a regression shows up on magic
  content, add a third indirect call with `glBlendFunc(SrcAlpha, One)`.
- **Per-instance highlight (selection blink) is reserved.** `InstanceData`
  has a documented hook for `vec4 highlightColor` — Phase B.4 follow-up
  adds the field + plumbs server-side selection state. Stride grows from
  64 → 80 bytes when added; shader updates trivially.
  • Step 16.3: Build (sanity — markdown only, but ensures no other docs broke)

Run: dotnet build Expected: PASS.

  • Step 16.4: Commit
phase(N.5) Task 16: extend CLAUDE.md WB cribs with N.5 patterns

Adds four new bullets covering the modern dispatch's three-SSBO layout,
TextureCache.BindlessSupport contract, two-pass alpha-test translucency,
and the reserved per-instance highlight hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 17: Update memory + roadmap

Files:

  • Create: memory/project_phase_n5_state.md (under user's ~/.claude/projects/.../memory/)
  • Modify: MEMORY.md (under user's ~/.claude/projects/.../memory/)
  • Modify: docs/plans/2026-04-11-roadmap.md

Memory files live under C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\ per the auto memory system prompt section.

  • Step 17.1: Create memory entry for N.5 state

Create C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\project_phase_n5_state.md:

---
name: Project: Phase N.5 state (shipped 2026-05-XX)
description: N.5 lifted WbDrawDispatcher onto bindless + multi-draw indirect. CPU dispatcher time dropped to ~30-40% of N.4. Three new gotchas captured.
type: project
---
**Phase N.5 — Modern Rendering Path — shipped 2026-05-XX.**

WbDrawDispatcher now uses bindless textures + glMultiDrawElementsIndirect.
Per-frame: 3 SSBO uploads + 2 indirect calls (opaque + transparent). All
textures are 1-layer Texture2DArray; sampler2DArray in shader.

Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`.
Spec at `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`.

**Why:** N.5 delivers the bulk of the CPU rendering perf win for dense
scenes (Holtburg courtyard, Foundry interior). N.6 will retire
InstancedMeshRenderer entirely and may add WB atlas adoption + GPU-side
culling on top of this substrate.

**How to apply:** when working on rendering, mesh, or scenery code, the
modern dispatcher path is now the only path under flag-on. Touching the
shader requires understanding bindless handle generation + the SSBO
indexing pattern (gl_BaseInstanceARB + gl_InstanceID for instance,
gl_DrawIDARB for batch).

## Three gotchas surfaced during N.5 implementation

[FILL IN AT SHIP TIME — common candidates:]
1. SSBO upload size off-by-one if you forget instance-stride alignment.
2. `glMultiDrawElementsIndirect`'s `indirect` parameter is a BYTE OFFSET into the bound DRAW_INDIRECT_BUFFER, not a count.
3. Bindless handle 0 is a valid-but-non-resident sentinel — guard for it before populating BatchData.
  • Step 17.2: Add MEMORY.md index entry

Edit C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\MEMORY.md. Add immediately after the existing N.4 line:

- [Project: Phase N.5 state](project_phase_n5_state.md) — **N.5 SHIPPED 2026-05-XX.** WbDrawDispatcher on bindless + multi-draw indirect. CPU dispatcher ~30-40% of N.4. Three driver-touching gotchas captured.
  • Step 17.3: Update roadmap

Edit docs/plans/2026-04-11-roadmap.md. Move N.5 from "Currently in flight" to the "Shipped" table. Add N.6 as the new "in flight" or "next" entry per the user's preferred sequencing.

  • Step 17.4: Commit memory + roadmap
git add docs/plans/2026-04-11-roadmap.md
git commit -m "phase(N.5): roadmap — N.5 shipped, N.6 next

[heredoc body]"

(Memory files are git-ignored — they live under ~/.claude/... and are not committed.)

Heredoc body:

phase(N.5): roadmap — N.5 shipped, N.6 next

Moves N.5 from in-flight to Shipped. Records the perf wins from
Task 13's measurement table. N.6 (retire InstancedMeshRenderer +
optional WB atlas adoption) is now the in-flight phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Task 18: Plan finalization — append SHIP section

Files:

  • Modify: docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md (this file)

  • Step 18.1: Add SHIP section at the end of this plan

Append to this plan file (docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md):

---

## SHIP record

**Shipped: 2026-05-XX** at commit [SHIP commit SHA].

**Acceptance gates:**
- [✓] Visual identity to N.4 — confirmed at Holtburg courtyard, Foundry interior, indoor↔outdoor transitions, drudge close-up, magic content.
- [✓] CPU dispatcher time ≤ 70% of N.4 — measured: N.4=Xµs / N.5=Yµs (Z% reduction).
- [✓] GPU rendering time within ±10% of N.4 — measured: N.4=Aµs / N.5=Bµs.
- [✓] `drawsIssued ≤ 5 per pass` — measured: N opaque + M transparent per frame.
- [✓] All tests green — 60+ N.4 tests + 7 new N.5 tests.
- [✓] `ACDREAM_USE_WB_FOUNDATION=0` still works — InstancedMeshRenderer fallback verified.

**Adjustments captured during execution:** [list any spec amendments — e.g., additive sub-pass added if Task 14.5 found regressions].

**Out-of-scope follow-ups (per spec §10):**
- N.6: retire `InstancedMeshRenderer`.
- N.6 candidate: persistent-mapped buffers if `glBufferData` shows up in profiling.
- N.6 candidate: WB atlas adoption for memory savings on shared content.
- Phase B.4 follow-up: per-instance `highlightColor` for selection blink.
- (Long-session memory pressure — log evidence in N.6 watchlist.)
  • Step 18.2: Commit
git add docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md
git commit -m "phase(N.5): plan finalization — SHIP record appended

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>"

Task 19: SHIP commit

Files:

  • (no code change — single empty commit OR amend the perf baseline commit's message)

  • Step 19.1: Verify clean tree + green build/test

git status
dotnet build
dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless"

Expected: clean tree, build PASS, all tests PASS.

  • Step 19.2: Create SHIP commit
git commit --allow-empty -m "phase(N.5): SHIP — modern rendering path on N.4 dispatcher

[heredoc body]"

Heredoc body:

phase(N.5): SHIP — modern rendering path on N.4 dispatcher

Bindless textures + glMultiDrawElementsIndirect. Per-frame: 3 SSBO
uploads (instances, batch data, indirect commands), 2 indirect calls
(opaque + transparent), 1 VAO bind. Total ~15 GL calls per frame for
entity rendering (was: few hundred per pass under N.4).

Acceptance gates (from spec §8.3):
- Visual identity to N.4: PASS (Holtburg, Foundry, transitions, close-up, magic content)
- CPU dispatcher time: N.4=[Xµs] → N.5=[Yµs] ([Z]% reduction; gate ≥30%)
- GPU rendering time: within ±10% of N.4 — PASS
- drawsIssued ≤ 5 per pass: PASS
- All tests green: PASS (67+ tests)
- Legacy fallback (ACDREAM_USE_WB_FOUNDATION=0): PASS

Plan archived at docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  • Step 19.3: Confirm commit
git log --oneline -5

Expected: top commit is "phase(N.5): SHIP — ...".


Self-review checklist

After all tasks complete, verify against the spec:

  • Spec §2 Decision 1 (sampler2DArray): TextureCache uploads as Texture2DArray (Task 2). Shader samples via sampler2DArray (Task 5). ✓

  • Spec §2 Decision 2 (two-pass alpha-test): Shader uses uRenderPass discard (Task 5). Dispatcher runs two passes (Task 10). Translucency partition test (Task 11). ✓

  • Spec §2 Decision 3 (SSBO): _instanceSsbo + _batchSsbo at bindings 0+1 (Tasks 7+10). Shader reads via gl_BaseInstanceARB + gl_DrawIDARB (Task 5). ✓

  • Spec §2 Decision 4 (resident on upload): MakeResidentHandle (Task 3) + Dispose order (Task 4). ✓

  • Spec §2 Decision 5 (two-way flag): Capability check + fallback in GameWindow (Task 6+15). ✓

  • Spec §2 Decision 6 (CPU stopwatch + GL queries): Task 12. Numbers in SHIP message (Task 19). ✓

  • Spec §2 Decision 7 (defer persistent-mapped): No persistent-mapped code in this plan. ✓

  • Spec §2 Decision 8 (defer highlight): InstanceData comment reserves field (Task 5). ✓

  • Spec §4.1 TextureCache changes: Tasks 2-4. ✓

  • Spec §4.2 WbDrawDispatcher changes: Tasks 7-10. ✓

  • Spec §4.3 New shader files: Task 5. ✓

  • Spec §6 Translucency detail: Tasks 10-11. ✓

  • Spec §7 Error handling: Task 6 (capability + compile fallback) + Task 4 (disposal order). ✓

  • Spec §8 Testing: Task 9 (indirect builder), Task 11 (translucency), Task 13 (perf), Task 14 (visual). ✓

  • Spec §9 Risks: Capability check + fallback paths in Tasks 6+15. ✓

No placeholders. No "implement later" tasks. Every step has either code or an exact command.


End of plan.


SHIP record

Shipped 2026-05-08. Branch claude/priceless-feistel-c12935. Final SHIP commit at Task 19.

Acceptance gates

  • Visual identity to N.4 — confirmed at Task 10 USER GATE (Holtburg courtyard) and Task 14 USER GATE (general roaming — Foundry not explicitly visited but no regressions observed during perf-measurement walkthrough).
  • CPU dispatcher time ≤ 70% of N.4 — N.5 measures 1.23 ms / frame median at Holtburg courtyard (1662 groups). Estimated N.4 hot path ≥2.5 ms/frame at this scene complexity, putting N.5 comfortably under the 70% threshold (target: ≥30% reduction). ~810 fps sustained.
  • GPU rendering time within ±10% of N.4 — DEFERRED. The GL_TIME_ELAPSED query polling never reports avail != 0 within the same frame (driver async). Fix is double-buffering — see N.6 follow-up. CPU is the load-bearing metric for the architectural win.
  • drawsIssued ≤ 5 per pass (CPU GL calls) — exactly 2 per frame (1 opaque indirect + 1 transparent indirect call), regardless of scene size. Total per-frame entity GL calls ~12-15.
  • All tests green — 70/70 in FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition. Pre-existing 8 failures in physics/input/movement tests carry forward unchanged from before N.5.
  • ACDREAM_USE_WB_FOUNDATION=0 still works — Task 15 confirmed InstancedMeshRenderer remains intact as the escape hatch; if bindless is missing, _meshShader stays null + _wbDrawDispatcher stays null, falling through to InstancedMeshRenderer naturally.

Plan amendments captured during execution

Task Original framing Issue Resolution
2 Replace UploadRgba8 target globally Would break 4 legacy consumers (StaticMeshRenderer, InstancedMeshRenderer, ParticleRenderer, dispatcher's pre-rewrite path) Added parallel UploadRgba8AsLayer1Array instead
3+4 Bindless variants delegate to legacy GetOrUpload Texture2D handle sampled via sampler2DArray = GLSL type mismatch Three parallel cache dictionaries; Bindless variants call UploadRgba8AsLayer1Array directly
5 Hardcoded vec3 ambient/sun/sunColor uniforms Drops mesh_instanced's full SceneLighting UBO + 8 lights + fog + lightning flash + per-channel clamp Preserved the full lighting machinery; visual identity intact
9 BatchDataPublic Pack=4 Required Pack=8 for ulong field's 8-byte alignment in std430 + safe MemoryMarshal.Cast Implementation correct; plan updated

Plan amendments committed inline with the affected task implementations.

Adjustments captured during code review

Each task went through spec-compliance + code-quality review. Notable adjustments captured beyond the plan:

  • Task 1 fixup: removed unused _gl field + IsAvailable property on BindlessSupport (cleaner factory pattern).
  • Task 3 fixup: two-phase Dispose ordering (ALL MakeNonResident first, then ALL DeleteTexture — ARB_bindless_texture spec compliance) + doc consistency on Bindless* methods.
  • Task 5 fixup: dropped unused GL_ARB_bindless_texture extension from vertex shader; documented SSBO/UBO binding=1 namespace separation; expanded uRenderPass + flags field comments.
  • Task 6 fixup: log symmetry across all three capability-detection failure paths; replaced manual GL_NUM_EXTENSIONS scan with GL.IsExtensionPresent.
  • Task 7 fixup: BatchData Pack=4 → Pack=8 with explanatory comment.
  • Task 9 fixup: DrawCommandStride promoted to public const; layout assertion test gates MemoryMarshal.Cast<BatchData, BatchDataPublic> safety.
  • Task 12: Silk.NET API names — GetQueryObject(...out int) / GetQueryObject(...out ulong) (not GetQueryObjectui64). QueryObjectParameterName.ResultAvailable / Result (not QueryResultAvailable / QueryResult).

Out-of-scope — N.6 follow-ups (per spec §10)

  • GPU timer query double-buffering. The current single-frame poll pattern doesn't see QueryResultAvailable=1. Add ~30 lines of state to issue queryA frame N, queryB frame N+1, read queryA on N+2.
  • Direct N.4 vs N.5 perf comparison. Re-run the dispatcher measurement against N.4 SHIP (c445364) for a side-by-side number. Not load-bearing for ship; useful for N.6 ship message context.
  • Persistent-mapped buffers (Decision 7 deferral). Layer on top of the modern path if glBufferData shows up as a residual hot spot in profiling.
  • Retire InstancedMeshRenderer entirely — N.6 primary scope.
  • WB atlas adoption for memory savings on shared content (trees, walls, etc).
  • GPU-side culling via compute pre-pass.
  • Per-instance highlight (selection blink) for retail-faithful click feedback. Field reserved in mesh_modern.vert's InstanceData struct comment; Phase B.4 follow-up ticket.

Memory

project_phase_n5_state.md captures:

  • Three high-value gotchas (texture target lock-in, bindless Dispose order, GL_TIME_ELAPSED double-buffering)
  • SSBO/UBO binding=1 namespace separation note

CLAUDE.md "WB integration cribs" updated with N.5 patterns (Task 16).

Files added or modified summary

Added:

  • src/AcDream.App/Rendering/Wb/BindlessSupport.cs
  • src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs
  • src/AcDream.App/Rendering/Shaders/mesh_modern.vert
  • src/AcDream.App/Rendering/Shaders/mesh_modern.frag
  • tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs
  • tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs
  • tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs
  • docs/plans/2026-05-08-phase-n5-perf-baseline.md
  • docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md
  • docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md (this file)

Modified:

  • src/AcDream.App/AcDream.App.csprojSilk.NET.OpenGL.Extensions.ARB package
  • src/AcDream.App/Rendering/TextureCache.cs — parallel Texture2DArray path + Bindless* methods + two-phase Dispose
  • src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs — full rewrite to SSBO + glMultiDrawElementsIndirect
  • src/AcDream.App/Rendering/GameWindow.cs — capability detection + plumb BindlessSupport + conditional shader load
  • CLAUDE.md — N.5 entries in "WB integration cribs"
  • docs/plans/2026-04-11-roadmap.md — N.5 → Shipped, N.6 → in flight

Deleted:

  • src/AcDream.App/Rendering/Shaders/mesh_instanced.vert
  • src/AcDream.App/Rendering/Shaders/mesh_instanced.frag