docs(N.5): cold-start handoff for next session
Detailed briefing for the next agent picking up Phase N.5 (Modern Rendering Path: bindless textures + glMultiDrawElementsIndirect on N.4's foundation). Covers: - Where N.4 left things (commits, what works, gotchas inherited) - The two-feature pairing (why bindless + indirect together) - Files to read first (WB shaders, our dispatcher, CLAUDE.md cribs) - 8 brainstorm questions to resolve before spec - Spec + plan structure (matching N.4's pattern) - Acceptance criteria - Things to explicitly NOT do Sized for a fresh session to pick up cold without spelunking through months of session history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
c44536451d
commit
dd5ca3d2b2
1 changed files with 495 additions and 0 deletions
495
docs/research/2026-05-08-phase-n5-handoff.md
Normal file
495
docs/research/2026-05-08-phase-n5-handoff.md
Normal file
|
|
@ -0,0 +1,495 @@
|
||||||
|
# Phase N.5 — Modern Rendering Path — Cold-Start Handoff
|
||||||
|
|
||||||
|
**Created:** 2026-05-08, immediately after N.4 ship.
|
||||||
|
**Audience:** the next agent picking up rendering perf work.
|
||||||
|
**Purpose:** give you everything you need to start N.5 cold, without
|
||||||
|
spelunking through five months of session history.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
N.4 just shipped: WB's `ObjectMeshManager` is now acdream's production
|
||||||
|
mesh pipeline, and `WbDrawDispatcher` is the production draw path. It
|
||||||
|
works (Holtburg renders correctly, FPS substantially improved over the
|
||||||
|
naïve dual-pipeline state we hit during week 4 verification) but it's
|
||||||
|
still doing per-group state changes (`glBindTexture`, `glBindBuffer`
|
||||||
|
for the IBO, `glDrawElementsInstancedBaseVertexBaseInstance` per group)
|
||||||
|
and a fresh `glBufferData` upload per frame.
|
||||||
|
|
||||||
|
**N.5's job: lift the dispatcher onto WB's modern rendering primitives
|
||||||
|
that we're already paying GPU-feature-detection cost for.** Two big
|
||||||
|
wins, paired:
|
||||||
|
|
||||||
|
1. **Bindless textures** (`GL_ARB_bindless_texture`) — WB already
|
||||||
|
populates `ObjectRenderBatch.BindlessTextureHandle`. Switch our
|
||||||
|
shader to read texture handles from a per-instance attribute
|
||||||
|
(`uvec2` → `sampler2D` via the bindless extension). Eliminates
|
||||||
|
100% of `glBindTexture` calls.
|
||||||
|
2. **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — build a
|
||||||
|
buffer of `DrawElementsIndirectCommand` structs (one per group),
|
||||||
|
upload once, fire ONE `glMultiDrawElementsIndirect` call per pass.
|
||||||
|
The driver pulls everything from the indirect buffer.
|
||||||
|
|
||||||
|
Together they target a 2-5× CPU win on draw-heavy scenes (Holtburg
|
||||||
|
courtyard, Foundry, dense dungeons). They're packaged together because
|
||||||
|
both are "modern path" extensions we already gate on, both require
|
||||||
|
the same shader rewrite, and they pair naturally — multi-draw indirect
|
||||||
|
is a no-op CPU-win without bindless because per-group `glBindTexture`
|
||||||
|
calls would still serialize.
|
||||||
|
|
||||||
|
**Estimated scope: 2-3 weeks.** Plan + spec to be written by the
|
||||||
|
brainstorm + spec steps below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Where N.4 left things
|
||||||
|
|
||||||
|
### Branch state
|
||||||
|
|
||||||
|
If this handoff is being read on `main` after merging the N.4 worktree:
|
||||||
|
N.4 commits land at the head of main. The relevant final commits:
|
||||||
|
|
||||||
|
- `c445364` — N.4 SHIP (flag default-on, plan final, roadmap, memory)
|
||||||
|
- `573526d` — perf pass 1-4 (drop dead lookup, sort, cull, hash memo)
|
||||||
|
- `7b41efc` — FirstIndex/BaseVertex + Issue #47 + grouped instanced
|
||||||
|
- `943652d` — load triggers + `batch.Key.SurfaceId` source
|
||||||
|
- `01cff41` — Tasks 22+23 (`WbDrawDispatcher` + side-table)
|
||||||
|
|
||||||
|
If the worktree branch (`claude/tender-mcclintock-a16839`) hasn't been
|
||||||
|
merged yet, that's where the work is. Verify with `git log --oneline`.
|
||||||
|
|
||||||
|
### What works in N.4
|
||||||
|
|
||||||
|
- `ACDREAM_USE_WB_FOUNDATION=1` is default-on. WB's `ObjectMeshManager`
|
||||||
|
loads, decodes, and uploads every entity mesh. Our existing
|
||||||
|
`TextureCache` decodes textures (palette-aware, per-instance overrides
|
||||||
|
via `GetOrUploadWithPaletteOverride`).
|
||||||
|
- `WbDrawDispatcher.Draw`:
|
||||||
|
- Walks visible entities (per-landblock AABB cull + per-entity AABB
|
||||||
|
cull + portal visibility)
|
||||||
|
- Buckets every (entity × meshRef × batch) tuple by
|
||||||
|
`GroupKey(Ibo, FirstIndex, BaseVertex, IndexCount, TextureHandle, Translucency)`
|
||||||
|
- Single `glBufferData` upload of all matrices for the frame
|
||||||
|
- Per group: `glActiveTexture(0) + glBindTexture(2D, handle) + glBindBuffer(EBO, ibo) + glDrawElementsInstancedBaseVertexBaseInstance(..., FirstInstance)`
|
||||||
|
- Two passes: opaque (front-to-back sorted) + translucent
|
||||||
|
- 940/948 tests pass (8 pre-existing failures unrelated to rendering).
|
||||||
|
- Visual verification at Holtburg passed: scenery + characters render
|
||||||
|
correctly with full close-detail geometry (Issue #47 preserved).
|
||||||
|
|
||||||
|
### What N.5 inherits
|
||||||
|
|
||||||
|
These are levers N.5 will pull on:
|
||||||
|
|
||||||
|
- **WB's modern rendering is already active.** `OpenGLGraphicsDevice`
|
||||||
|
detected GL 4.3 + bindless on first run; WB's `_useModernRendering`
|
||||||
|
is true; every mesh lives in WB's single `GlobalMeshBuffer` (one VAO,
|
||||||
|
one VBO, one IBO).
|
||||||
|
- **Bindless handles are already populated.** `ObjectRenderBatch.BindlessTextureHandle`
|
||||||
|
is non-zero for batches WB owns the texture for. (See gotcha #2
|
||||||
|
below for entities with palette overrides — those use acdream's
|
||||||
|
`TextureCache` which doesn't expose bindless handles yet.)
|
||||||
|
- **The instance VBO is acdream-owned** (`WbDrawDispatcher._instanceVbo`)
|
||||||
|
with locations 3-6 patched onto WB's global VAO. Stride 64 bytes
|
||||||
|
(one mat4). N.5 expands this to (mat4 + uvec2 handle) = 80 bytes.
|
||||||
|
|
||||||
|
### Three load-bearing WB API gotchas N.4 surfaced
|
||||||
|
|
||||||
|
These bit us hard during Task 26 visual verification. Documented in
|
||||||
|
CLAUDE.md "WB integration cribs" + plan adjustments 7-9 +
|
||||||
|
`memory/project_phase_n4_state.md`. Re-stating here because they
|
||||||
|
reshape the design space:
|
||||||
|
|
||||||
|
1. **`ObjectMeshManager.IncrementRefCount(id)` is NOT lifecycle-aware.**
|
||||||
|
It only bumps a usage counter. Mesh loading is fired separately
|
||||||
|
via `PrepareMeshDataAsync(id, isSetup)`. The result auto-enqueues
|
||||||
|
to `_stagedMeshData` (line 510 of `ObjectMeshManager.cs`); our
|
||||||
|
existing `WbMeshAdapter.Tick()` drains it. `WbMeshAdapter.IncrementRefCount`
|
||||||
|
already calls `PrepareMeshDataAsync`. **N.5 doesn't need to change
|
||||||
|
this — just don't break it.**
|
||||||
|
|
||||||
|
2. **`ObjectRenderBatch.SurfaceId` is unset.** WB constructs batches
|
||||||
|
with `Key = batch.Key` (a `TextureAtlasManager.TextureKey` struct
|
||||||
|
that has a `SurfaceId` field) but never populates the top-level
|
||||||
|
`SurfaceId` property. Read `batch.Key.SurfaceId`. **N.5 keeps this
|
||||||
|
pattern.**
|
||||||
|
|
||||||
|
3. **WB's modern rendering packs every mesh into ONE global
|
||||||
|
VAO/VBO/IBO.** Each batch's `IBO` field points to the global IBO;
|
||||||
|
the batch's actual slice is identified by `FirstIndex` (offset into
|
||||||
|
IBO, in *indices*) and `BaseVertex` (offset into VBO, in *vertices*).
|
||||||
|
N.4's draw uses `glDrawElementsInstancedBaseVertexBaseInstance`
|
||||||
|
with those offsets. **N.5's `DrawElementsIndirectCommand` per-group
|
||||||
|
record will carry `firstIndex` + `baseVertex` for the same reason.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What N.5 is — technical detail
|
||||||
|
|
||||||
|
### The two-feature pairing
|
||||||
|
|
||||||
|
**Bindless textures** (`GL_ARB_bindless_texture`):
|
||||||
|
- Each texture handle is a 64-bit integer (`uvec2` in GLSL).
|
||||||
|
- Shader declares `layout(bindless_sampler) uniform sampler2D ...` or
|
||||||
|
receives the handle as a per-vertex-attribute `uvec2`.
|
||||||
|
- No `glBindTexture` needed at draw time — the handle IS the binding.
|
||||||
|
- Handle generation: `glGetTextureHandleARB(textureId)` followed by
|
||||||
|
`glMakeTextureHandleResidentARB(handle)` (the texture must be
|
||||||
|
resident on the GPU; non-resident handles produce GPU faults).
|
||||||
|
|
||||||
|
**Multi-draw indirect** (`glMultiDrawElementsIndirect`):
|
||||||
|
- Indirect command struct layout (must match `DrawElementsIndirectCommand`):
|
||||||
|
```c
|
||||||
|
struct {
|
||||||
|
uint count; // index count for this draw
|
||||||
|
uint instanceCount; // number of instances
|
||||||
|
uint firstIndex; // offset into IBO, in indices
|
||||||
|
int baseVertex; // vertex offset into VBO
|
||||||
|
uint baseInstance; // first instance ID (offsets per-instance attribs)
|
||||||
|
};
|
||||||
|
```
|
||||||
|
- Build a buffer of N of these structs (one per group), upload once,
|
||||||
|
fire one GL call: `glMultiDrawElementsIndirect(mode, indexType, ptr, drawcount, stride)`.
|
||||||
|
- The driver issues all N draws in one shot. Effectively zero CPU
|
||||||
|
overhead per draw beyond uploading the indirect buffer.
|
||||||
|
|
||||||
|
**Why pair them.** Multi-draw indirect doesn't let you change uniform
|
||||||
|
state between draws. So if textures are bound via `glBindTexture` per
|
||||||
|
group, you'd still need N CPU-side setup steps before each indirect
|
||||||
|
call — defeating the purpose. Bindless removes that constraint by
|
||||||
|
encoding the texture handle as per-instance data the shader reads
|
||||||
|
directly. With both, the modern render loop becomes:
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Upload instance buffer (mat4 + uvec2 handle, per-instance) — once per frame
|
||||||
|
2. Upload indirect command buffer (one DEIC per group) — once per frame
|
||||||
|
3. glBindVertexArray(globalVAO) — once
|
||||||
|
4. glMultiDrawElementsIndirect(...) — ONCE per pass
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it. No per-group state changes.
|
||||||
|
|
||||||
|
### Instance attribute layout
|
||||||
|
|
||||||
|
Currently (N.4): location 3-6 = mat4 model matrix (16 floats = 64 bytes).
|
||||||
|
|
||||||
|
N.5 (proposed): location 3-6 = mat4 + location 7 = uvec2 bindless
|
||||||
|
handle = 16 floats + 2 uints = 72 bytes (16-aligned to 80 bytes per
|
||||||
|
WB's `InstanceData` precedent).
|
||||||
|
|
||||||
|
Or use std140-aligned struct:
|
||||||
|
```c
|
||||||
|
struct InstanceData {
|
||||||
|
mat4 transform; // locations 3-6
|
||||||
|
uvec2 textureHandle; // location 7
|
||||||
|
uvec2 _pad; // padding to 80
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Brainstorm should decide if we copy WB's `InstanceData` struct (Pack=16,
|
||||||
|
80 bytes including CellId/Flags fields we don't use) or define our own
|
||||||
|
minimal version. The 80-byte stride matches WB's so global VAO state
|
||||||
|
configured by WB stays compatible if the legacy WB draw path ever runs.
|
||||||
|
|
||||||
|
### Per-instance entity texture handles
|
||||||
|
|
||||||
|
Here's the wrinkle. N.4 uses `WbDrawDispatcher.ResolveTexture` to map
|
||||||
|
each (entity, batch) to a GL texture handle:
|
||||||
|
|
||||||
|
- Tree (no overrides): `_textures.GetOrUpload(surfaceId)` → 2D texture handle
|
||||||
|
- NPC with palette override: `_textures.GetOrUploadWithPaletteOverride(...)` → composite-cached 2D texture handle
|
||||||
|
- Anything with surface override: `_textures.GetOrUploadWithOrigTextureOverride(...)` → composite-cached 2D texture handle
|
||||||
|
|
||||||
|
Those are all `GLuint` 32-bit GL texture *names*, not bindless handles.
|
||||||
|
**N.5 needs `TextureCache` to publish bindless handles for everything
|
||||||
|
it owns, not just WB-owned textures.**
|
||||||
|
|
||||||
|
Implementation sketch:
|
||||||
|
- `TextureCache` adds a parallel cache keyed identically but storing
|
||||||
|
64-bit bindless handles. On first request, generate via
|
||||||
|
`glGetTextureHandleARB(textureId)` + make resident.
|
||||||
|
- New API: `GetBindlessHandle(uint surfaceId, ...)` returns the handle.
|
||||||
|
- Or: change every `GetOrUpload*` method to return both the GL name
|
||||||
|
and the bindless handle (or just the handle; let GL name fall out
|
||||||
|
if anyone needs it later).
|
||||||
|
|
||||||
|
WB's `ObjectRenderBatch.BindlessTextureHandle` covers the atlas-tier
|
||||||
|
case. For per-instance entities, we use `TextureCache`'s handle.
|
||||||
|
|
||||||
|
### The new shader
|
||||||
|
|
||||||
|
Reuse WB's `StaticObjectModern.vert` / `StaticObjectModern.frag` as a
|
||||||
|
template. Read those files cold. They already do bindless + the
|
||||||
|
instance-data layout. Adapt to acdream's `mesh_instanced.vert/frag`
|
||||||
|
conventions:
|
||||||
|
|
||||||
|
- Keep the `uViewProjection` uniform, lighting UBO at binding=1, fog
|
||||||
|
uniforms.
|
||||||
|
- Add `#version 430 core` + `#extension GL_ARB_bindless_texture : require`.
|
||||||
|
- Replace `uniform sampler2D uDiffuse` with a `uvec2` per-vertex
|
||||||
|
attribute (location 7) → reconstruct sampler in vertex shader OR
|
||||||
|
pass through to fragment via flat varying.
|
||||||
|
- Drop `uTranslucencyKind` uniform, OR keep it (still set per-pass —
|
||||||
|
multi-draw indirect doesn't break uniforms; only state that varies
|
||||||
|
per-draw is the constraint).
|
||||||
|
|
||||||
|
### Translucency
|
||||||
|
|
||||||
|
Multi-draw indirect can't change blend state mid-draw. Solution:
|
||||||
|
**still use two passes** (opaque + translucent), but within translucent
|
||||||
|
keep the per-blendfunc sub-passes (additive, alpha-blend, inv-alpha).
|
||||||
|
Three sub-passes within translucent. Each sub-pass = one
|
||||||
|
`glMultiDrawElementsIndirect` over its filtered groups.
|
||||||
|
|
||||||
|
Or: if perf allows, fold all four blend modes into the shader via
|
||||||
|
per-instance blendmode int, sort all translucent groups by blendmode
|
||||||
|
in the indirect buffer, switch blend state at sub-pass boundaries.
|
||||||
|
Brainstorm decides the cleanest pattern.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files to read before brainstorming
|
||||||
|
|
||||||
|
In rough order:
|
||||||
|
|
||||||
|
1. **N.4 plan + spec** — `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`
|
||||||
|
(status: Final). Adjustments 7-10 capture the gotchas. Spec at
|
||||||
|
`docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md`.
|
||||||
|
|
||||||
|
2. **N.4 dispatcher source** — `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`.
|
||||||
|
This is what you're modifying. Read end-to-end.
|
||||||
|
|
||||||
|
3. **WB's modern rendering shaders** — `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Shaders/StaticObjectModern.vert`
|
||||||
|
+ `StaticObjectModern.frag`. The template you're adapting from.
|
||||||
|
|
||||||
|
4. **WB's `ObjectMeshManager.UploadGfxObjMeshData`** — lines ~1654-1780
|
||||||
|
of `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs`.
|
||||||
|
Shows how WB sets up the modern path's VBO/IBO/VAO. Especially note
|
||||||
|
how it patches in instance attribute slots (locations 3-6) on the
|
||||||
|
global VAO and configures location 7+ for bindless handles.
|
||||||
|
|
||||||
|
5. **WB's `ObjectRenderBatch`** — same file, lines ~166-184. Note the
|
||||||
|
`BindlessTextureHandle` field — already populated when `_useModernRendering`
|
||||||
|
is on.
|
||||||
|
|
||||||
|
6. **Our `TextureCache`** — `src/AcDream.App/Rendering/TextureCache.cs`.
|
||||||
|
Three composite caches: by surface id, by surface+origTex, by
|
||||||
|
surface+origTex+palette. N.5 adds parallel bindless-handle caches.
|
||||||
|
|
||||||
|
7. **CLAUDE.md "WB integration cribs"** section. Lines ~28-80. The
|
||||||
|
three gotchas + the integration architecture in plain language.
|
||||||
|
|
||||||
|
8. **Memory: `project_phase_n4_state.md`** — same content from a
|
||||||
|
different angle. Reading both helps lock in the gotchas.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Brainstorm questions
|
||||||
|
|
||||||
|
These are the questions to resolve in the brainstorm step. Don't
|
||||||
|
prejudge them — bring them to the user with options + recommendation:
|
||||||
|
|
||||||
|
1. **Instance attribute layout.** Match WB's `InstanceData` struct
|
||||||
|
(80 bytes including CellId/Flags fields we don't use) for global
|
||||||
|
VAO compatibility, or define a minimal acdream-specific version
|
||||||
|
(mat4 + handle = ~72 bytes padded to 80)?
|
||||||
|
|
||||||
|
2. **Bindless handle generation strategy.**
|
||||||
|
- At texture upload time? (Eager — every texture that lands in
|
||||||
|
`TextureCache` gets a handle. Memory cost ~per-texture state.)
|
||||||
|
- On first draw lookup? (Lazy — cache fills as scene exercises
|
||||||
|
content. Possible first-use stall.)
|
||||||
|
- At spawn time via the spawn adapter? (Tied to lifecycle. Cleanest
|
||||||
|
but requires touching the spawn path.)
|
||||||
|
|
||||||
|
3. **Translucent pass structure.** Three sub-indirect-draws (one per
|
||||||
|
blend mode) or a single sorted indirect buffer with per-instance
|
||||||
|
blend mode + state-flip at sub-pass boundaries? Or: just iterate
|
||||||
|
per-group like N.4 for translucent only (translucent groups are a
|
||||||
|
small fraction of total)?
|
||||||
|
|
||||||
|
4. **Persistent-mapped indirect + instance buffers.** Use
|
||||||
|
`GL_ARB_buffer_storage` + `MAP_PERSISTENT_BIT | MAP_COHERENT_BIT`?
|
||||||
|
Triple-buffered ring + sync object? Or stick with `glBufferData`
|
||||||
|
(still one upload per frame, just larger)? Persistent mapping is
|
||||||
|
~2-5% per-frame win in our context but adds buffer-management
|
||||||
|
complexity.
|
||||||
|
|
||||||
|
5. **Shader unification.** Keep `mesh_instanced` for legacy + add
|
||||||
|
`mesh_indirect` for modern, or replace `mesh_instanced` entirely?
|
||||||
|
Replacement requires the legacy `InstancedMeshRenderer` (escape
|
||||||
|
hatch under `ACDREAM_USE_WB_FOUNDATION=0`) to also use the new
|
||||||
|
shader, which... probably doesn't matter if we delete legacy in
|
||||||
|
N.6 anyway. Brainstorm.
|
||||||
|
|
||||||
|
6. **Conformance test strategy.** N.4 used visual verification at
|
||||||
|
Holtburg as the gate. N.5's gate is "no visual regression vs N.4
|
||||||
|
AND measurable CPU win." How do we measure CPU? `[WB-DIAG]`
|
||||||
|
counters give draw count + group count; we need frame-time
|
||||||
|
counters too. Add to the dispatcher? Use a profiler?
|
||||||
|
|
||||||
|
7. **Per-instance entity bindless.** `TextureCache.GetOrUpload*`
|
||||||
|
returns a GL name. The dispatcher (or `TextureCache` itself) needs
|
||||||
|
to convert that to a bindless handle. Design questions:
|
||||||
|
- Where does the conversion happen?
|
||||||
|
- When is the texture made resident? (Residency is global state;
|
||||||
|
too many resident textures hits driver limits.)
|
||||||
|
- What about palette/surface overrides — same caching key as the
|
||||||
|
name, just a parallel handle dictionary?
|
||||||
|
|
||||||
|
8. **Escape hatch.** N.4 keeps `ACDREAM_USE_WB_FOUNDATION=0` as a
|
||||||
|
fallback. N.5 needs to decide: does the new shader REPLACE the
|
||||||
|
N.4 dispatcher's draw path (so flag-on means N.5 modern path,
|
||||||
|
flag-off means legacy `InstancedMeshRenderer`)? Or do we add a
|
||||||
|
separate flag (`ACDREAM_USE_MODERN_DRAW`) so users can toggle
|
||||||
|
N.4 vs N.5 vs legacy independently? Three-way flag is more
|
||||||
|
complex but useful for A/B during rollout.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Spec structure
|
||||||
|
|
||||||
|
After the brainstorm, the spec doc covers:
|
||||||
|
|
||||||
|
1. **Architecture diagram** — how `WbDrawDispatcher` changes shape.
|
||||||
|
Where the indirect buffer lives. Where bindless handles flow from.
|
||||||
|
2. **Instance data layout** — exact struct, byte offsets, GL attribute
|
||||||
|
pointer setup.
|
||||||
|
3. **TextureCache changes** — new methods, new cache, residency
|
||||||
|
policy.
|
||||||
|
4. **Shader files** — name(s), version, extensions, in/out variables.
|
||||||
|
5. **Conformance tests** — what to write, what coverage to claim.
|
||||||
|
6. **Acceptance criteria** — visual identity to N.4 + measured CPU
|
||||||
|
delta.
|
||||||
|
7. **Risks** — driver bugs in bindless / indirect, residency limits,
|
||||||
|
shader compile issues on weird GPUs, the legacy escape hatch
|
||||||
|
breaking.
|
||||||
|
|
||||||
|
Spec lives at: `docs/superpowers/specs/2026-05-XX-phase-n5-modern-rendering-design.md`.
|
||||||
|
|
||||||
|
## Plan structure
|
||||||
|
|
||||||
|
After the spec, the plan doc lays out the week-by-week task list.
|
||||||
|
Match N.4's plan structure (living document, task checkboxes, commit
|
||||||
|
SHAs appended, adjustments documented inline). Plan lives at:
|
||||||
|
`docs/superpowers/plans/2026-05-XX-phase-n5-modern-rendering.md`.
|
||||||
|
|
||||||
|
Suggested initial breakdown (brainstorm + spec will refine):
|
||||||
|
|
||||||
|
- **Week 1** — Plumbing: bindless handle generation in `TextureCache`,
|
||||||
|
shader rewrite (compile + bind), instance-attrib layout updated to
|
||||||
|
mat4+handle. Dispatcher still uses per-group draws but reads
|
||||||
|
textures bindless. Validate: visual identical to N.4.
|
||||||
|
- **Week 2** — Indirect: build `DrawElementsIndirectCommand` buffer
|
||||||
|
per frame, switch to `glMultiDrawElementsIndirect`. Three-pass
|
||||||
|
translucent (or whatever brainstorm decides). Validate: visual
|
||||||
|
identical, draw-call count drops to 2-4 per frame.
|
||||||
|
- **Week 3** — Polish + ship: persistent-mapped buffers if brainstorm
|
||||||
|
voted yes, profiler/counters, visual verification, flag flip, plan
|
||||||
|
finalization.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acceptance criteria for the whole phase
|
||||||
|
|
||||||
|
- Visual output identical to N.4 (no character regressions, no
|
||||||
|
scenery missing, no z-fighting introduced)
|
||||||
|
- `[WB-DIAG]` shows `drawsIssued` ≤ ~5 per frame (down from N.4's
|
||||||
|
few hundred)
|
||||||
|
- Frame time measurably lower in dense scenes (specify what scenes
|
||||||
|
to test in the spec — probably Holtburg courtyard + Foundry
|
||||||
|
interior)
|
||||||
|
- All tests still green (940/948 + any new conformance tests)
|
||||||
|
- `ACDREAM_USE_WB_FOUNDATION=0` escape hatch still works
|
||||||
|
- Plan doc finalized, roadmap updated, memory captured if N.5
|
||||||
|
surfaces durable lessons (it almost certainly will — bindless
|
||||||
|
+ indirect both have well-known driver gotchas)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What you'll be doing in the first 30 minutes
|
||||||
|
|
||||||
|
1. Read this handoff in full.
|
||||||
|
2. Read CLAUDE.md "WB integration cribs" section.
|
||||||
|
3. Read `WbDrawDispatcher.cs` end-to-end.
|
||||||
|
4. Skim WB's `StaticObjectModern.vert/frag` + `ObjectMeshManager.UploadGfxObjMeshData`
|
||||||
|
to ground the reference.
|
||||||
|
5. Verify build is green: `dotnet build`.
|
||||||
|
6. Verify N.4 ship is intact: `dotnet test --filter "FullyQualifiedName~Wb|MatrixComposition"`
|
||||||
|
should produce 60 passing tests, 0 failures.
|
||||||
|
7. Invoke the `superpowers:brainstorming` skill with the user. Walk
|
||||||
|
through the 8 brainstorm questions above. Capture decisions in a
|
||||||
|
spec.
|
||||||
|
8. Write the spec at the path above.
|
||||||
|
9. Write the plan at the path above.
|
||||||
|
10. Begin Week 1 implementation per the plan.
|
||||||
|
|
||||||
|
Don't skip the brainstorm. Multi-draw indirect + bindless have several
|
||||||
|
real driver-compatibility / API-shape decisions that need user input,
|
||||||
|
not "the agent makes a call and goes." This phase is structurally the
|
||||||
|
same shape as N.4 — brainstorm → spec → plan → tasks-with-checkboxes →
|
||||||
|
commits-update-checkboxes → final SHIP commit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Things to NOT do
|
||||||
|
|
||||||
|
- **Don't delete the legacy `InstancedMeshRenderer`.** It's the N.4
|
||||||
|
escape hatch. N.6 retires it after N.5 is proven default-on.
|
||||||
|
- **Don't fork WB.** N.4 deliberately avoided fork patches by using
|
||||||
|
the side-table pattern (`AcSurfaceMetadataTable`). Stay on that
|
||||||
|
path. If you need data WB doesn't expose, add a side-table or
|
||||||
|
decode it yourself from dats.
|
||||||
|
- **Don't try to make per-instance entities use WB's `TextureAtlasManager`.**
|
||||||
|
That's N.6+ territory. acdream's `TextureCache` owns palette/surface
|
||||||
|
overrides because WB's atlas is keyed by `(surfaceId, paletteId,
|
||||||
|
stippling, isSolid)` and our overrides don't fit cleanly. Bindless
|
||||||
|
handles let us escape that mismatch — handles for both atlas-tier
|
||||||
|
AND per-instance-tier textures, no atlas adoption needed.
|
||||||
|
- **Don't skip visual verification.** N.4 surfaced three bugs at
|
||||||
|
visual verification that no test caught. Don't trust "build green +
|
||||||
|
tests pass" — exercise the rendering path with the local ACE server.
|
||||||
|
- **Don't extend the phase scope.** N.5 is bindless + indirect on
|
||||||
|
the existing rendering pipeline. Texture array atlas, GPU-side
|
||||||
|
culling, terrain wiring — all of those are subsequent phases. If
|
||||||
|
the brainstorm tries to expand, push back.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Reference: the N.4 dispatcher flow you're modifying
|
||||||
|
|
||||||
|
```
|
||||||
|
Draw(camera, landblockEntries, frustum, ...) {
|
||||||
|
// Phase 1: walk entities, build groups
|
||||||
|
foreach (entity, meshRef, batch) {
|
||||||
|
cull, classify into _groups[GroupKey]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2: lay matrices contiguously
|
||||||
|
// Phase 3: glBufferData(_instanceVbo, allMatrices)
|
||||||
|
// Phase 4: bind global VAO once
|
||||||
|
// Phase 5: opaque pass (sorted)
|
||||||
|
foreach (group in _opaqueDraws) {
|
||||||
|
glBindTexture(group.handle)
|
||||||
|
glBindBuffer(EBO, group.ibo)
|
||||||
|
glDrawElementsInstancedBaseVertexBaseInstance(...)
|
||||||
|
}
|
||||||
|
// Phase 6: translucent pass
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After N.5, Phases 5 and 6 collapse to:
|
||||||
|
|
||||||
|
```
|
||||||
|
glBindBuffer(DRAW_INDIRECT_BUFFER, _opaqueIndirect)
|
||||||
|
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_SHORT, 0, opaqueGroups.Count, sizeof(DEIC))
|
||||||
|
glBindBuffer(DRAW_INDIRECT_BUFFER, _translucentIndirect)
|
||||||
|
// 3 sub-calls for translucent or 1 if shader-folded
|
||||||
|
glMultiDrawElementsIndirect(...)
|
||||||
|
```
|
||||||
|
|
||||||
|
That's the destination. Get there cleanly.
|
||||||
|
|
||||||
|
Good luck. Holler at the user if any of the brainstorm questions feel
|
||||||
|
genuinely ambiguous after reading the references — they care about
|
||||||
|
this phase landing right and will engage on design questions.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue