Detailed briefing for the next agent picking up Phase N.5 (Modern Rendering Path: bindless textures + glMultiDrawElementsIndirect on N.4's foundation). Covers: - Where N.4 left things (commits, what works, gotchas inherited) - The two-feature pairing (why bindless + indirect together) - Files to read first (WB shaders, our dispatcher, CLAUDE.md cribs) - 8 brainstorm questions to resolve before spec - Spec + plan structure (matching N.4's pattern) - Acceptance criteria - Things to explicitly NOT do Sized for a fresh session to pick up cold without spelunking through months of session history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 KiB
Phase N.5 — Modern Rendering Path — Cold-Start Handoff
Created: 2026-05-08, immediately after N.4 ship. Audience: the next agent picking up rendering perf work. Purpose: give you everything you need to start N.5 cold, without spelunking through five months of session history.
TL;DR
N.4 just shipped: WB's ObjectMeshManager is now acdream's production
mesh pipeline, and WbDrawDispatcher is the production draw path. It
works (Holtburg renders correctly, FPS substantially improved over the
naïve dual-pipeline state we hit during week 4 verification) but it's
still doing per-group state changes (glBindTexture, glBindBuffer
for the IBO, glDrawElementsInstancedBaseVertexBaseInstance per group)
and a fresh glBufferData upload per frame.
N.5's job: lift the dispatcher onto WB's modern rendering primitives that we're already paying GPU-feature-detection cost for. Two big wins, paired:
- Bindless textures (
GL_ARB_bindless_texture) — WB already populatesObjectRenderBatch.BindlessTextureHandle. Switch our shader to read texture handles from a per-instance attribute (uvec2→sampler2Dvia the bindless extension). Eliminates 100% ofglBindTexturecalls. - Multi-draw indirect (
glMultiDrawElementsIndirect) — build a buffer ofDrawElementsIndirectCommandstructs (one per group), upload once, fire ONEglMultiDrawElementsIndirectcall per pass. The driver pulls everything from the indirect buffer.
Together they target a 2-5× CPU win on draw-heavy scenes (Holtburg
courtyard, Foundry, dense dungeons). They're packaged together because
both are "modern path" extensions we already gate on, both require
the same shader rewrite, and they pair naturally — multi-draw indirect
is a no-op CPU-win without bindless because per-group glBindTexture
calls would still serialize.
Estimated scope: 2-3 weeks. Plan + spec to be written by the brainstorm + spec steps below.
Where N.4 left things
Branch state
If this handoff is being read on main after merging the N.4 worktree:
N.4 commits land at the head of main. The relevant final commits:
c445364— N.4 SHIP (flag default-on, plan final, roadmap, memory)573526d— perf pass 1-4 (drop dead lookup, sort, cull, hash memo)7b41efc— FirstIndex/BaseVertex + Issue #47 + grouped instanced943652d— load triggers +batch.Key.SurfaceIdsource01cff41— Tasks 22+23 (WbDrawDispatcher+ side-table)
If the worktree branch (claude/tender-mcclintock-a16839) hasn't been
merged yet, that's where the work is. Verify with git log --oneline.
What works in N.4
ACDREAM_USE_WB_FOUNDATION=1is default-on. WB'sObjectMeshManagerloads, decodes, and uploads every entity mesh. Our existingTextureCachedecodes textures (palette-aware, per-instance overrides viaGetOrUploadWithPaletteOverride).WbDrawDispatcher.Draw:- Walks visible entities (per-landblock AABB cull + per-entity AABB cull + portal visibility)
- Buckets every (entity × meshRef × batch) tuple by
GroupKey(Ibo, FirstIndex, BaseVertex, IndexCount, TextureHandle, Translucency) - Single
glBufferDataupload of all matrices for the frame - Per group:
glActiveTexture(0) + glBindTexture(2D, handle) + glBindBuffer(EBO, ibo) + glDrawElementsInstancedBaseVertexBaseInstance(..., FirstInstance) - Two passes: opaque (front-to-back sorted) + translucent
- 940/948 tests pass (8 pre-existing failures unrelated to rendering).
- Visual verification at Holtburg passed: scenery + characters render correctly with full close-detail geometry (Issue #47 preserved).
What N.5 inherits
These are levers N.5 will pull on:
- WB's modern rendering is already active.
OpenGLGraphicsDevicedetected GL 4.3 + bindless on first run; WB's_useModernRenderingis true; every mesh lives in WB's singleGlobalMeshBuffer(one VAO, one VBO, one IBO). - Bindless handles are already populated.
ObjectRenderBatch.BindlessTextureHandleis non-zero for batches WB owns the texture for. (See gotcha #2 below for entities with palette overrides — those use acdream'sTextureCachewhich doesn't expose bindless handles yet.) - The instance VBO is acdream-owned (
WbDrawDispatcher._instanceVbo) with locations 3-6 patched onto WB's global VAO. Stride 64 bytes (one mat4). N.5 expands this to (mat4 + uvec2 handle) = 80 bytes.
Three load-bearing WB API gotchas N.4 surfaced
These bit us hard during Task 26 visual verification. Documented in
CLAUDE.md "WB integration cribs" + plan adjustments 7-9 +
memory/project_phase_n4_state.md. Re-stating here because they
reshape the design space:
-
ObjectMeshManager.IncrementRefCount(id)is NOT lifecycle-aware. It only bumps a usage counter. Mesh loading is fired separately viaPrepareMeshDataAsync(id, isSetup). The result auto-enqueues to_stagedMeshData(line 510 ofObjectMeshManager.cs); our existingWbMeshAdapter.Tick()drains it.WbMeshAdapter.IncrementRefCountalready callsPrepareMeshDataAsync. N.5 doesn't need to change this — just don't break it. -
ObjectRenderBatch.SurfaceIdis unset. WB constructs batches withKey = batch.Key(aTextureAtlasManager.TextureKeystruct that has aSurfaceIdfield) but never populates the top-levelSurfaceIdproperty. Readbatch.Key.SurfaceId. N.5 keeps this pattern. -
WB's modern rendering packs every mesh into ONE global VAO/VBO/IBO. Each batch's
IBOfield points to the global IBO; the batch's actual slice is identified byFirstIndex(offset into IBO, in indices) andBaseVertex(offset into VBO, in vertices). N.4's draw usesglDrawElementsInstancedBaseVertexBaseInstancewith those offsets. N.5'sDrawElementsIndirectCommandper-group record will carryfirstIndex+baseVertexfor the same reason.
What N.5 is — technical detail
The two-feature pairing
Bindless textures (GL_ARB_bindless_texture):
- Each texture handle is a 64-bit integer (
uvec2in GLSL). - Shader declares
layout(bindless_sampler) uniform sampler2D ...or receives the handle as a per-vertex-attributeuvec2. - No
glBindTextureneeded at draw time — the handle IS the binding. - Handle generation:
glGetTextureHandleARB(textureId)followed byglMakeTextureHandleResidentARB(handle)(the texture must be resident on the GPU; non-resident handles produce GPU faults).
Multi-draw indirect (glMultiDrawElementsIndirect):
- Indirect command struct layout (must match
DrawElementsIndirectCommand):struct { uint count; // index count for this draw uint instanceCount; // number of instances uint firstIndex; // offset into IBO, in indices int baseVertex; // vertex offset into VBO uint baseInstance; // first instance ID (offsets per-instance attribs) }; - Build a buffer of N of these structs (one per group), upload once,
fire one GL call:
glMultiDrawElementsIndirect(mode, indexType, ptr, drawcount, stride). - The driver issues all N draws in one shot. Effectively zero CPU overhead per draw beyond uploading the indirect buffer.
Why pair them. Multi-draw indirect doesn't let you change uniform
state between draws. So if textures are bound via glBindTexture per
group, you'd still need N CPU-side setup steps before each indirect
call — defeating the purpose. Bindless removes that constraint by
encoding the texture handle as per-instance data the shader reads
directly. With both, the modern render loop becomes:
1. Upload instance buffer (mat4 + uvec2 handle, per-instance) — once per frame
2. Upload indirect command buffer (one DEIC per group) — once per frame
3. glBindVertexArray(globalVAO) — once
4. glMultiDrawElementsIndirect(...) — ONCE per pass
That's it. No per-group state changes.
Instance attribute layout
Currently (N.4): location 3-6 = mat4 model matrix (16 floats = 64 bytes).
N.5 (proposed): location 3-6 = mat4 + location 7 = uvec2 bindless
handle = 16 floats + 2 uints = 72 bytes (16-aligned to 80 bytes per
WB's InstanceData precedent).
Or use std140-aligned struct:
struct InstanceData {
mat4 transform; // locations 3-6
uvec2 textureHandle; // location 7
uvec2 _pad; // padding to 80
};
Brainstorm should decide if we copy WB's InstanceData struct (Pack=16,
80 bytes including CellId/Flags fields we don't use) or define our own
minimal version. The 80-byte stride matches WB's so global VAO state
configured by WB stays compatible if the legacy WB draw path ever runs.
Per-instance entity texture handles
Here's the wrinkle. N.4 uses WbDrawDispatcher.ResolveTexture to map
each (entity, batch) to a GL texture handle:
- Tree (no overrides):
_textures.GetOrUpload(surfaceId)→ 2D texture handle - NPC with palette override:
_textures.GetOrUploadWithPaletteOverride(...)→ composite-cached 2D texture handle - Anything with surface override:
_textures.GetOrUploadWithOrigTextureOverride(...)→ composite-cached 2D texture handle
Those are all GLuint 32-bit GL texture names, not bindless handles.
N.5 needs TextureCache to publish bindless handles for everything
it owns, not just WB-owned textures.
Implementation sketch:
TextureCacheadds a parallel cache keyed identically but storing 64-bit bindless handles. On first request, generate viaglGetTextureHandleARB(textureId)+ make resident.- New API:
GetBindlessHandle(uint surfaceId, ...)returns the handle. - Or: change every
GetOrUpload*method to return both the GL name and the bindless handle (or just the handle; let GL name fall out if anyone needs it later).
WB's ObjectRenderBatch.BindlessTextureHandle covers the atlas-tier
case. For per-instance entities, we use TextureCache's handle.
The new shader
Reuse WB's StaticObjectModern.vert / StaticObjectModern.frag as a
template. Read those files cold. They already do bindless + the
instance-data layout. Adapt to acdream's mesh_instanced.vert/frag
conventions:
- Keep the
uViewProjectionuniform, lighting UBO at binding=1, fog uniforms. - Add
#version 430 core+#extension GL_ARB_bindless_texture : require. - Replace
uniform sampler2D uDiffusewith auvec2per-vertex attribute (location 7) → reconstruct sampler in vertex shader OR pass through to fragment via flat varying. - Drop
uTranslucencyKinduniform, OR keep it (still set per-pass — multi-draw indirect doesn't break uniforms; only state that varies per-draw is the constraint).
Translucency
Multi-draw indirect can't change blend state mid-draw. Solution:
still use two passes (opaque + translucent), but within translucent
keep the per-blendfunc sub-passes (additive, alpha-blend, inv-alpha).
Three sub-passes within translucent. Each sub-pass = one
glMultiDrawElementsIndirect over its filtered groups.
Or: if perf allows, fold all four blend modes into the shader via per-instance blendmode int, sort all translucent groups by blendmode in the indirect buffer, switch blend state at sub-pass boundaries. Brainstorm decides the cleanest pattern.
Files to read before brainstorming
In rough order:
-
N.4 plan + spec —
docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md(status: Final). Adjustments 7-10 capture the gotchas. Spec atdocs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md. -
N.4 dispatcher source —
src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs. This is what you're modifying. Read end-to-end. -
WB's modern rendering shaders —
references/WorldBuilder/Chorizite.OpenGLSDLBackend/Shaders/StaticObjectModern.vertStaticObjectModern.frag. The template you're adapting from.
-
WB's
ObjectMeshManager.UploadGfxObjMeshData— lines ~1654-1780 ofreferences/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs. Shows how WB sets up the modern path's VBO/IBO/VAO. Especially note how it patches in instance attribute slots (locations 3-6) on the global VAO and configures location 7+ for bindless handles. -
WB's
ObjectRenderBatch— same file, lines ~166-184. Note theBindlessTextureHandlefield — already populated when_useModernRenderingis on. -
Our
TextureCache—src/AcDream.App/Rendering/TextureCache.cs. Three composite caches: by surface id, by surface+origTex, by surface+origTex+palette. N.5 adds parallel bindless-handle caches. -
CLAUDE.md "WB integration cribs" section. Lines ~28-80. The three gotchas + the integration architecture in plain language.
-
Memory:
project_phase_n4_state.md— same content from a different angle. Reading both helps lock in the gotchas.
Brainstorm questions
These are the questions to resolve in the brainstorm step. Don't prejudge them — bring them to the user with options + recommendation:
-
Instance attribute layout. Match WB's
InstanceDatastruct (80 bytes including CellId/Flags fields we don't use) for global VAO compatibility, or define a minimal acdream-specific version (mat4 + handle = ~72 bytes padded to 80)? -
Bindless handle generation strategy.
- At texture upload time? (Eager — every texture that lands in
TextureCachegets a handle. Memory cost ~per-texture state.) - On first draw lookup? (Lazy — cache fills as scene exercises content. Possible first-use stall.)
- At spawn time via the spawn adapter? (Tied to lifecycle. Cleanest but requires touching the spawn path.)
- At texture upload time? (Eager — every texture that lands in
-
Translucent pass structure. Three sub-indirect-draws (one per blend mode) or a single sorted indirect buffer with per-instance blend mode + state-flip at sub-pass boundaries? Or: just iterate per-group like N.4 for translucent only (translucent groups are a small fraction of total)?
-
Persistent-mapped indirect + instance buffers. Use
GL_ARB_buffer_storage+MAP_PERSISTENT_BIT | MAP_COHERENT_BIT? Triple-buffered ring + sync object? Or stick withglBufferData(still one upload per frame, just larger)? Persistent mapping is ~2-5% per-frame win in our context but adds buffer-management complexity. -
Shader unification. Keep
mesh_instancedfor legacy + addmesh_indirectfor modern, or replacemesh_instancedentirely? Replacement requires the legacyInstancedMeshRenderer(escape hatch underACDREAM_USE_WB_FOUNDATION=0) to also use the new shader, which... probably doesn't matter if we delete legacy in N.6 anyway. Brainstorm. -
Conformance test strategy. N.4 used visual verification at Holtburg as the gate. N.5's gate is "no visual regression vs N.4 AND measurable CPU win." How do we measure CPU?
[WB-DIAG]counters give draw count + group count; we need frame-time counters too. Add to the dispatcher? Use a profiler? -
Per-instance entity bindless.
TextureCache.GetOrUpload*returns a GL name. The dispatcher (orTextureCacheitself) needs to convert that to a bindless handle. Design questions:- Where does the conversion happen?
- When is the texture made resident? (Residency is global state; too many resident textures hits driver limits.)
- What about palette/surface overrides — same caching key as the name, just a parallel handle dictionary?
-
Escape hatch. N.4 keeps
ACDREAM_USE_WB_FOUNDATION=0as a fallback. N.5 needs to decide: does the new shader REPLACE the N.4 dispatcher's draw path (so flag-on means N.5 modern path, flag-off means legacyInstancedMeshRenderer)? Or do we add a separate flag (ACDREAM_USE_MODERN_DRAW) so users can toggle N.4 vs N.5 vs legacy independently? Three-way flag is more complex but useful for A/B during rollout.
Spec structure
After the brainstorm, the spec doc covers:
- Architecture diagram — how
WbDrawDispatcherchanges shape. Where the indirect buffer lives. Where bindless handles flow from. - Instance data layout — exact struct, byte offsets, GL attribute pointer setup.
- TextureCache changes — new methods, new cache, residency policy.
- Shader files — name(s), version, extensions, in/out variables.
- Conformance tests — what to write, what coverage to claim.
- Acceptance criteria — visual identity to N.4 + measured CPU delta.
- Risks — driver bugs in bindless / indirect, residency limits, shader compile issues on weird GPUs, the legacy escape hatch breaking.
Spec lives at: docs/superpowers/specs/2026-05-XX-phase-n5-modern-rendering-design.md.
Plan structure
After the spec, the plan doc lays out the week-by-week task list.
Match N.4's plan structure (living document, task checkboxes, commit
SHAs appended, adjustments documented inline). Plan lives at:
docs/superpowers/plans/2026-05-XX-phase-n5-modern-rendering.md.
Suggested initial breakdown (brainstorm + spec will refine):
- Week 1 — Plumbing: bindless handle generation in
TextureCache, shader rewrite (compile + bind), instance-attrib layout updated to mat4+handle. Dispatcher still uses per-group draws but reads textures bindless. Validate: visual identical to N.4. - Week 2 — Indirect: build
DrawElementsIndirectCommandbuffer per frame, switch toglMultiDrawElementsIndirect. Three-pass translucent (or whatever brainstorm decides). Validate: visual identical, draw-call count drops to 2-4 per frame. - Week 3 — Polish + ship: persistent-mapped buffers if brainstorm voted yes, profiler/counters, visual verification, flag flip, plan finalization.
Acceptance criteria for the whole phase
- Visual output identical to N.4 (no character regressions, no scenery missing, no z-fighting introduced)
[WB-DIAG]showsdrawsIssued≤ ~5 per frame (down from N.4's few hundred)- Frame time measurably lower in dense scenes (specify what scenes to test in the spec — probably Holtburg courtyard + Foundry interior)
- All tests still green (940/948 + any new conformance tests)
ACDREAM_USE_WB_FOUNDATION=0escape hatch still works- Plan doc finalized, roadmap updated, memory captured if N.5
surfaces durable lessons (it almost certainly will — bindless
- indirect both have well-known driver gotchas)
What you'll be doing in the first 30 minutes
- Read this handoff in full.
- Read CLAUDE.md "WB integration cribs" section.
- Read
WbDrawDispatcher.csend-to-end. - Skim WB's
StaticObjectModern.vert/frag+ObjectMeshManager.UploadGfxObjMeshDatato ground the reference. - Verify build is green:
dotnet build. - Verify N.4 ship is intact:
dotnet test --filter "FullyQualifiedName~Wb|MatrixComposition"should produce 60 passing tests, 0 failures. - Invoke the
superpowers:brainstormingskill with the user. Walk through the 8 brainstorm questions above. Capture decisions in a spec. - Write the spec at the path above.
- Write the plan at the path above.
- Begin Week 1 implementation per the plan.
Don't skip the brainstorm. Multi-draw indirect + bindless have several real driver-compatibility / API-shape decisions that need user input, not "the agent makes a call and goes." This phase is structurally the same shape as N.4 — brainstorm → spec → plan → tasks-with-checkboxes → commits-update-checkboxes → final SHIP commit.
Things to NOT do
- Don't delete the legacy
InstancedMeshRenderer. It's the N.4 escape hatch. N.6 retires it after N.5 is proven default-on. - Don't fork WB. N.4 deliberately avoided fork patches by using
the side-table pattern (
AcSurfaceMetadataTable). Stay on that path. If you need data WB doesn't expose, add a side-table or decode it yourself from dats. - Don't try to make per-instance entities use WB's
TextureAtlasManager. That's N.6+ territory. acdream'sTextureCacheowns palette/surface overrides because WB's atlas is keyed by(surfaceId, paletteId, stippling, isSolid)and our overrides don't fit cleanly. Bindless handles let us escape that mismatch — handles for both atlas-tier AND per-instance-tier textures, no atlas adoption needed. - Don't skip visual verification. N.4 surfaced three bugs at visual verification that no test caught. Don't trust "build green + tests pass" — exercise the rendering path with the local ACE server.
- Don't extend the phase scope. N.5 is bindless + indirect on the existing rendering pipeline. Texture array atlas, GPU-side culling, terrain wiring — all of those are subsequent phases. If the brainstorm tries to expand, push back.
Reference: the N.4 dispatcher flow you're modifying
Draw(camera, landblockEntries, frustum, ...) {
// Phase 1: walk entities, build groups
foreach (entity, meshRef, batch) {
cull, classify into _groups[GroupKey]
}
// Phase 2: lay matrices contiguously
// Phase 3: glBufferData(_instanceVbo, allMatrices)
// Phase 4: bind global VAO once
// Phase 5: opaque pass (sorted)
foreach (group in _opaqueDraws) {
glBindTexture(group.handle)
glBindBuffer(EBO, group.ibo)
glDrawElementsInstancedBaseVertexBaseInstance(...)
}
// Phase 6: translucent pass
}
After N.5, Phases 5 and 6 collapse to:
glBindBuffer(DRAW_INDIRECT_BUFFER, _opaqueIndirect)
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_SHORT, 0, opaqueGroups.Count, sizeof(DEIC))
glBindBuffer(DRAW_INDIRECT_BUFFER, _translucentIndirect)
// 3 sub-calls for translucent or 1 if shader-folded
glMultiDrawElementsIndirect(...)
That's the destination. Get there cleanly.
Good luck. Holler at the user if any of the brainstorm questions feel genuinely ambiguous after reading the references — they care about this phase landing right and will engage on design questions.