From dd5ca3d2b2f5636e332fe3878712d0ae8f77b8e5 Mon Sep 17 00:00:00 2001
From: Erik <erik.nihlen@gmail.com>
Date: Fri, 8 May 2026 18:05:36 +0200
Subject: [PATCH] docs(N.5): cold-start handoff for next session

Detailed briefing for the next agent picking up Phase N.5 (Modern
Rendering Path: bindless textures + glMultiDrawElementsIndirect on
N.4's foundation). Covers:

- Where N.4 left things (commits, what works, gotchas inherited)
- The two-feature pairing (why bindless + indirect together)
- Files to read first (WB shaders, our dispatcher, CLAUDE.md cribs)
- 8 brainstorm questions to resolve before spec
- Spec + plan structure (matching N.4's pattern)
- Acceptance criteria
- Things to explicitly NOT do

Sized for a fresh session to pick up cold without spelunking through
months of session history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/research/2026-05-08-phase-n5-handoff.md | 495 +++++++++++++++++++
 1 file changed, 495 insertions(+)
 create mode 100644 docs/research/2026-05-08-phase-n5-handoff.md

diff --git a/docs/research/2026-05-08-phase-n5-handoff.md b/docs/research/2026-05-08-phase-n5-handoff.md
new file mode 100644
index 0000000..1c4d7be
--- /dev/null
+++ b/docs/research/2026-05-08-phase-n5-handoff.md
@@ -0,0 +1,495 @@
+# Phase N.5 — Modern Rendering Path — Cold-Start Handoff
+
+**Created:** 2026-05-08, immediately after N.4 ship.
+**Audience:** the next agent picking up rendering perf work.
+**Purpose:** give you everything you need to start N.5 cold, without
+spelunking through five months of session history.
+
+---
+
+## TL;DR
+
+N.4 just shipped: WB's `ObjectMeshManager` is now acdream's production
+mesh pipeline, and `WbDrawDispatcher` is the production draw path. It
+works (Holtburg renders correctly, FPS substantially improved over the
+naïve dual-pipeline state we hit during week 4 verification) but it's
+still doing per-group state changes (`glBindTexture`, `glBindBuffer`
+for the IBO, `glDrawElementsInstancedBaseVertexBaseInstance` per group)
+and a fresh `glBufferData` upload per frame.
+
+**N.5's job: lift the dispatcher onto WB's modern rendering primitives
+that we're already paying GPU-feature-detection cost for.** Two big
+wins, paired:
+
+1. **Bindless textures** (`GL_ARB_bindless_texture`) — WB already
+   populates `ObjectRenderBatch.BindlessTextureHandle`. Switch our
+   shader to read texture handles from a per-instance attribute
+   (`uvec2` → `sampler2D` via the bindless extension). Eliminates
+   100% of `glBindTexture` calls.
+2. **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — build a
+   buffer of `DrawElementsIndirectCommand` structs (one per group),
+   upload once, fire ONE `glMultiDrawElementsIndirect` call per pass.
+   The driver pulls everything from the indirect buffer.
+
+Together they target a 2-5× CPU win on draw-heavy scenes (Holtburg
+courtyard, Foundry, dense dungeons). They're packaged together because
+both are "modern path" extensions we already gate on, both require
+the same shader rewrite, and they pair naturally — multi-draw indirect
+is a no-op CPU-win without bindless because per-group `glBindTexture`
+calls would still serialize.
+
+**Estimated scope: 2-3 weeks.** Plan + spec to be written by the
+brainstorm + spec steps below.
+
+---
+
+## Where N.4 left things
+
+### Branch state
+
+If this handoff is being read on `main` after merging the N.4 worktree:
+N.4 commits land at the head of main. The relevant final commits:
+
+- `c445364` — N.4 SHIP (flag default-on, plan final, roadmap, memory)
+- `573526d` — perf pass 1-4 (drop dead lookup, sort, cull, hash memo)
+- `7b41efc` — FirstIndex/BaseVertex + Issue #47 + grouped instanced
+- `943652d` — load triggers + `batch.Key.SurfaceId` source
+- `01cff41` — Tasks 22+23 (`WbDrawDispatcher` + side-table)
+
+If the worktree branch (`claude/tender-mcclintock-a16839`) hasn't been
+merged yet, that's where the work is. Verify with `git log --oneline`.
+
+### What works in N.4
+
+- `ACDREAM_USE_WB_FOUNDATION=1` is default-on. WB's `ObjectMeshManager`
+  loads, decodes, and uploads every entity mesh. Our existing
+  `TextureCache` decodes textures (palette-aware, per-instance overrides
+  via `GetOrUploadWithPaletteOverride`).
+- `WbDrawDispatcher.Draw`:
+  - Walks visible entities (per-landblock AABB cull + per-entity AABB
+    cull + portal visibility)
+  - Buckets every (entity × meshRef × batch) tuple by
+    `GroupKey(Ibo, FirstIndex, BaseVertex, IndexCount, TextureHandle, Translucency)`
+  - Single `glBufferData` upload of all matrices for the frame
+  - Per group: `glActiveTexture(0) + glBindTexture(2D, handle) + glBindBuffer(EBO, ibo) + glDrawElementsInstancedBaseVertexBaseInstance(..., FirstInstance)`
+  - Two passes: opaque (front-to-back sorted) + translucent
+- 940/948 tests pass (8 pre-existing failures unrelated to rendering).
+- Visual verification at Holtburg passed: scenery + characters render
+  correctly with full close-detail geometry (Issue #47 preserved).
+
+### What N.5 inherits
+
+These are levers N.5 will pull on:
+
+- **WB's modern rendering is already active.** `OpenGLGraphicsDevice`
+  detected GL 4.3 + bindless on first run; WB's `_useModernRendering`
+  is true; every mesh lives in WB's single `GlobalMeshBuffer` (one VAO,
+  one VBO, one IBO).
+- **Bindless handles are already populated.** `ObjectRenderBatch.BindlessTextureHandle`
+  is non-zero for batches WB owns the texture for. (See gotcha #2
+  below for entities with palette overrides — those use acdream's
+  `TextureCache` which doesn't expose bindless handles yet.)
+- **The instance VBO is acdream-owned** (`WbDrawDispatcher._instanceVbo`)
+  with locations 3-6 patched onto WB's global VAO. Stride 64 bytes
+  (one mat4). N.5 expands this to (mat4 + uvec2 handle) = 80 bytes.
+
+### Three load-bearing WB API gotchas N.4 surfaced
+
+These bit us hard during Task 26 visual verification. Documented in
+CLAUDE.md "WB integration cribs" + plan adjustments 7-9 +
+`memory/project_phase_n4_state.md`. Re-stating here because they
+reshape the design space:
+
+1. **`ObjectMeshManager.IncrementRefCount(id)` is NOT lifecycle-aware.**
+   It only bumps a usage counter. Mesh loading is fired separately
+   via `PrepareMeshDataAsync(id, isSetup)`. The result auto-enqueues
+   to `_stagedMeshData` (line 510 of `ObjectMeshManager.cs`); our
+   existing `WbMeshAdapter.Tick()` drains it. `WbMeshAdapter.IncrementRefCount`
+   already calls `PrepareMeshDataAsync`. **N.5 doesn't need to change
+   this — just don't break it.**
+
+2. **`ObjectRenderBatch.SurfaceId` is unset.** WB constructs batches
+   with `Key = batch.Key` (a `TextureAtlasManager.TextureKey` struct
+   that has a `SurfaceId` field) but never populates the top-level
+   `SurfaceId` property. Read `batch.Key.SurfaceId`. **N.5 keeps this
+   pattern.**
+
+3. **WB's modern rendering packs every mesh into ONE global
+   VAO/VBO/IBO.** Each batch's `IBO` field points to the global IBO;
+   the batch's actual slice is identified by `FirstIndex` (offset into
+   IBO, in *indices*) and `BaseVertex` (offset into VBO, in *vertices*).
+   N.4's draw uses `glDrawElementsInstancedBaseVertexBaseInstance`
+   with those offsets. **N.5's `DrawElementsIndirectCommand` per-group
+   record will carry `firstIndex` + `baseVertex` for the same reason.**
+
+---
+
+## What N.5 is — technical detail
+
+### The two-feature pairing
+
+**Bindless textures** (`GL_ARB_bindless_texture`):
+- Each texture handle is a 64-bit integer (`uvec2` in GLSL).
+- Shader declares `layout(bindless_sampler) uniform sampler2D ...` or
+  receives the handle as a per-vertex-attribute `uvec2`.
+- No `glBindTexture` needed at draw time — the handle IS the binding.
+- Handle generation: `glGetTextureHandleARB(textureId)` followed by
+  `glMakeTextureHandleResidentARB(handle)` (the texture must be
+  resident on the GPU; non-resident handles produce GPU faults).
+
+**Multi-draw indirect** (`glMultiDrawElementsIndirect`):
+- Indirect command struct layout (must match `DrawElementsIndirectCommand`):
+  ```c
+  struct {
+      uint count;          // index count for this draw
+      uint instanceCount;  // number of instances
+      uint firstIndex;     // offset into IBO, in indices
+      int  baseVertex;     // vertex offset into VBO
+      uint baseInstance;   // first instance ID (offsets per-instance attribs)
+  };
+  ```
+- Build a buffer of N of these structs (one per group), upload once,
+  fire one GL call: `glMultiDrawElementsIndirect(mode, indexType, ptr, drawcount, stride)`.
+- The driver issues all N draws in one shot. Effectively zero CPU
+  overhead per draw beyond uploading the indirect buffer.
+
+**Why pair them.** Multi-draw indirect doesn't let you change uniform
+state between draws. So if textures are bound via `glBindTexture` per
+group, you'd still need N CPU-side setup steps before each indirect
+call — defeating the purpose. Bindless removes that constraint by
+encoding the texture handle as per-instance data the shader reads
+directly. With both, the modern render loop becomes:
+
+```
+1. Upload instance buffer (mat4 + uvec2 handle, per-instance) — once per frame
+2. Upload indirect command buffer (one DEIC per group) — once per frame
+3. glBindVertexArray(globalVAO) — once
+4. glMultiDrawElementsIndirect(...) — ONCE per pass
+```
+
+That's it. No per-group state changes.
+
+### Instance attribute layout
+
+Currently (N.4): location 3-6 = mat4 model matrix (16 floats = 64 bytes).
+
+N.5 (proposed): location 3-6 = mat4 + location 7 = uvec2 bindless
+handle = 16 floats + 2 uints = 72 bytes (16-aligned to 80 bytes per
+WB's `InstanceData` precedent).
+
+Or use std140-aligned struct:
+```c
+struct InstanceData {
+    mat4 transform;        // locations 3-6
+    uvec2 textureHandle;   // location 7
+    uvec2 _pad;            // padding to 80
+};
+```
+
+Brainstorm should decide if we copy WB's `InstanceData` struct (Pack=16,
+80 bytes including CellId/Flags fields we don't use) or define our own
+minimal version. The 80-byte stride matches WB's so global VAO state
+configured by WB stays compatible if the legacy WB draw path ever runs.
+
+### Per-instance entity texture handles
+
+Here's the wrinkle. N.4 uses `WbDrawDispatcher.ResolveTexture` to map
+each (entity, batch) to a GL texture handle:
+
+- Tree (no overrides): `_textures.GetOrUpload(surfaceId)` → 2D texture handle
+- NPC with palette override: `_textures.GetOrUploadWithPaletteOverride(...)` → composite-cached 2D texture handle
+- Anything with surface override: `_textures.GetOrUploadWithOrigTextureOverride(...)` → composite-cached 2D texture handle
+
+Those are all `GLuint` 32-bit GL texture *names*, not bindless handles.
+**N.5 needs `TextureCache` to publish bindless handles for everything
+it owns, not just WB-owned textures.**
+
+Implementation sketch:
+- `TextureCache` adds a parallel cache keyed identically but storing
+  64-bit bindless handles. On first request, generate via
+  `glGetTextureHandleARB(textureId)` + make resident.
+- New API: `GetBindlessHandle(uint surfaceId, ...)` returns the handle.
+- Or: change every `GetOrUpload*` method to return both the GL name
+  and the bindless handle (or just the handle; let GL name fall out
+  if anyone needs it later).
+
+WB's `ObjectRenderBatch.BindlessTextureHandle` covers the atlas-tier
+case. For per-instance entities, we use `TextureCache`'s handle.
+
+### The new shader
+
+Reuse WB's `StaticObjectModern.vert` / `StaticObjectModern.frag` as a
+template. Read those files cold. They already do bindless + the
+instance-data layout. Adapt to acdream's `mesh_instanced.vert/frag`
+conventions:
+
+- Keep the `uViewProjection` uniform, lighting UBO at binding=1, fog
+  uniforms.
+- Add `#version 430 core` + `#extension GL_ARB_bindless_texture : require`.
+- Replace `uniform sampler2D uDiffuse` with a `uvec2` per-vertex
+  attribute (location 7) → reconstruct sampler in vertex shader OR
+  pass through to fragment via flat varying.
+- Drop `uTranslucencyKind` uniform, OR keep it (still set per-pass —
+  multi-draw indirect doesn't break uniforms; only state that varies
+  per-draw is the constraint).
+
+### Translucency
+
+Multi-draw indirect can't change blend state mid-draw. Solution:
+**still use two passes** (opaque + translucent), but within translucent
+keep the per-blendfunc sub-passes (additive, alpha-blend, inv-alpha).
+Three sub-passes within translucent. Each sub-pass = one
+`glMultiDrawElementsIndirect` over its filtered groups.
+
+Or: if perf allows, fold all four blend modes into the shader via
+per-instance blendmode int, sort all translucent groups by blendmode
+in the indirect buffer, switch blend state at sub-pass boundaries.
+Brainstorm decides the cleanest pattern.
+
+---
+
+## Files to read before brainstorming
+
+In rough order:
+
+1. **N.4 plan + spec** — `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`
+   (status: Final). Adjustments 7-10 capture the gotchas. Spec at
+   `docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md`.
+
+2. **N.4 dispatcher source** — `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`.
+   This is what you're modifying. Read end-to-end.
+
+3. **WB's modern rendering shaders** — `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Shaders/StaticObjectModern.vert`
+   + `StaticObjectModern.frag`. The template you're adapting from.
+
+4. **WB's `ObjectMeshManager.UploadGfxObjMeshData`** — lines ~1654-1780
+   of `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs`.
+   Shows how WB sets up the modern path's VBO/IBO/VAO. Especially note
+   how it patches in instance attribute slots (locations 3-6) on the
+   global VAO and configures location 7+ for bindless handles.
+
+5. **WB's `ObjectRenderBatch`** — same file, lines ~166-184. Note the
+   `BindlessTextureHandle` field — already populated when `_useModernRendering`
+   is on.
+
+6. **Our `TextureCache`** — `src/AcDream.App/Rendering/TextureCache.cs`.
+   Three composite caches: by surface id, by surface+origTex, by
+   surface+origTex+palette. N.5 adds parallel bindless-handle caches.
+
+7. **CLAUDE.md "WB integration cribs"** section. Lines ~28-80. The
+   three gotchas + the integration architecture in plain language.
+
+8. **Memory: `project_phase_n4_state.md`** — same content from a
+   different angle. Reading both helps lock in the gotchas.
+
+---
+
+## Brainstorm questions
+
+These are the questions to resolve in the brainstorm step. Don't
+prejudge them — bring them to the user with options + recommendation:
+
+1. **Instance attribute layout.** Match WB's `InstanceData` struct
+   (80 bytes including CellId/Flags fields we don't use) for global
+   VAO compatibility, or define a minimal acdream-specific version
+   (mat4 + handle = ~72 bytes padded to 80)?
+
+2. **Bindless handle generation strategy.**
+   - At texture upload time? (Eager — every texture that lands in
+     `TextureCache` gets a handle. Memory cost ~per-texture state.)
+   - On first draw lookup? (Lazy — cache fills as scene exercises
+     content. Possible first-use stall.)
+   - At spawn time via the spawn adapter? (Tied to lifecycle. Cleanest
+     but requires touching the spawn path.)
+
+3. **Translucent pass structure.** Three sub-indirect-draws (one per
+   blend mode) or a single sorted indirect buffer with per-instance
+   blend mode + state-flip at sub-pass boundaries? Or: just iterate
+   per-group like N.4 for translucent only (translucent groups are a
+   small fraction of total)?
+
+4. **Persistent-mapped indirect + instance buffers.** Use
+   `GL_ARB_buffer_storage` + `MAP_PERSISTENT_BIT | MAP_COHERENT_BIT`?
+   Triple-buffered ring + sync object? Or stick with `glBufferData`
+   (still one upload per frame, just larger)? Persistent mapping is
+   ~2-5% per-frame win in our context but adds buffer-management
+   complexity.
+
+5. **Shader unification.** Keep `mesh_instanced` for legacy + add
+   `mesh_indirect` for modern, or replace `mesh_instanced` entirely?
+   Replacement requires the legacy `InstancedMeshRenderer` (escape
+   hatch under `ACDREAM_USE_WB_FOUNDATION=0`) to also use the new
+   shader, which... probably doesn't matter if we delete legacy in
+   N.6 anyway. Brainstorm.
+
+6. **Conformance test strategy.** N.4 used visual verification at
+   Holtburg as the gate. N.5's gate is "no visual regression vs N.4
+   AND measurable CPU win." How do we measure CPU? `[WB-DIAG]`
+   counters give draw count + group count; we need frame-time
+   counters too. Add to the dispatcher? Use a profiler?
+
+7. **Per-instance entity bindless.** `TextureCache.GetOrUpload*`
+   returns a GL name. The dispatcher (or `TextureCache` itself) needs
+   to convert that to a bindless handle. Design questions:
+   - Where does the conversion happen?
+   - When is the texture made resident? (Residency is global state;
+     too many resident textures hits driver limits.)
+   - What about palette/surface overrides — same caching key as the
+     name, just a parallel handle dictionary?
+
+8. **Escape hatch.** N.4 keeps `ACDREAM_USE_WB_FOUNDATION=0` as a
+   fallback. N.5 needs to decide: does the new shader REPLACE the
+   N.4 dispatcher's draw path (so flag-on means N.5 modern path,
+   flag-off means legacy `InstancedMeshRenderer`)? Or do we add a
+   separate flag (`ACDREAM_USE_MODERN_DRAW`) so users can toggle
+   N.4 vs N.5 vs legacy independently? Three-way flag is more
+   complex but useful for A/B during rollout.
+
+---
+
+## Spec structure
+
+After the brainstorm, the spec doc covers:
+
+1. **Architecture diagram** — how `WbDrawDispatcher` changes shape.
+   Where the indirect buffer lives. Where bindless handles flow from.
+2. **Instance data layout** — exact struct, byte offsets, GL attribute
+   pointer setup.
+3. **TextureCache changes** — new methods, new cache, residency
+   policy.
+4. **Shader files** — name(s), version, extensions, in/out variables.
+5. **Conformance tests** — what to write, what coverage to claim.
+6. **Acceptance criteria** — visual identity to N.4 + measured CPU
+   delta.
+7. **Risks** — driver bugs in bindless / indirect, residency limits,
+   shader compile issues on weird GPUs, the legacy escape hatch
+   breaking.
+
+Spec lives at: `docs/superpowers/specs/2026-05-XX-phase-n5-modern-rendering-design.md`.
+
+## Plan structure
+
+After the spec, the plan doc lays out the week-by-week task list.
+Match N.4's plan structure (living document, task checkboxes, commit
+SHAs appended, adjustments documented inline). Plan lives at:
+`docs/superpowers/plans/2026-05-XX-phase-n5-modern-rendering.md`.
+
+Suggested initial breakdown (brainstorm + spec will refine):
+
+- **Week 1** — Plumbing: bindless handle generation in `TextureCache`,
+  shader rewrite (compile + bind), instance-attrib layout updated to
+  mat4+handle. Dispatcher still uses per-group draws but reads
+  textures bindless. Validate: visual identical to N.4.
+- **Week 2** — Indirect: build `DrawElementsIndirectCommand` buffer
+  per frame, switch to `glMultiDrawElementsIndirect`. Three-pass
+  translucent (or whatever brainstorm decides). Validate: visual
+  identical, draw-call count drops to 2-4 per frame.
+- **Week 3** — Polish + ship: persistent-mapped buffers if brainstorm
+  voted yes, profiler/counters, visual verification, flag flip, plan
+  finalization.
+
+---
+
+## Acceptance criteria for the whole phase
+
+- Visual output identical to N.4 (no character regressions, no
+  scenery missing, no z-fighting introduced)
+- `[WB-DIAG]` shows `drawsIssued` ≤ ~5 per frame (down from N.4's
+  few hundred)
+- Frame time measurably lower in dense scenes (specify what scenes
+  to test in the spec — probably Holtburg courtyard + Foundry
+  interior)
+- All tests still green (940/948 + any new conformance tests)
+- `ACDREAM_USE_WB_FOUNDATION=0` escape hatch still works
+- Plan doc finalized, roadmap updated, memory captured if N.5
+  surfaces durable lessons (it almost certainly will — bindless
+  + indirect both have well-known driver gotchas)
+
+---
+
+## What you'll be doing in the first 30 minutes
+
+1. Read this handoff in full.
+2. Read CLAUDE.md "WB integration cribs" section.
+3. Read `WbDrawDispatcher.cs` end-to-end.
+4. Skim WB's `StaticObjectModern.vert/frag` + `ObjectMeshManager.UploadGfxObjMeshData`
+   to ground the reference.
+5. Verify build is green: `dotnet build`.
+6. Verify N.4 ship is intact: `dotnet test --filter "FullyQualifiedName~Wb|MatrixComposition"`
+   should produce 60 passing tests, 0 failures.
+7. Invoke the `superpowers:brainstorming` skill with the user. Walk
+   through the 8 brainstorm questions above. Capture decisions in a
+   spec.
+8. Write the spec at the path above.
+9. Write the plan at the path above.
+10. Begin Week 1 implementation per the plan.
+
+Don't skip the brainstorm. Multi-draw indirect + bindless have several
+real driver-compatibility / API-shape decisions that need user input,
+not "the agent makes a call and goes." This phase is structurally the
+same shape as N.4 — brainstorm → spec → plan → tasks-with-checkboxes →
+commits-update-checkboxes → final SHIP commit.
+
+---
+
+## Things to NOT do
+
+- **Don't delete the legacy `InstancedMeshRenderer`.** It's the N.4
+  escape hatch. N.6 retires it after N.5 is proven default-on.
+- **Don't fork WB.** N.4 deliberately avoided fork patches by using
+  the side-table pattern (`AcSurfaceMetadataTable`). Stay on that
+  path. If you need data WB doesn't expose, add a side-table or
+  decode it yourself from dats.
+- **Don't try to make per-instance entities use WB's `TextureAtlasManager`.**
+  That's N.6+ territory. acdream's `TextureCache` owns palette/surface
+  overrides because WB's atlas is keyed by `(surfaceId, paletteId,
+  stippling, isSolid)` and our overrides don't fit cleanly. Bindless
+  handles let us escape that mismatch — handles for both atlas-tier
+  AND per-instance-tier textures, no atlas adoption needed.
+- **Don't skip visual verification.** N.4 surfaced three bugs at
+  visual verification that no test caught. Don't trust "build green +
+  tests pass" — exercise the rendering path with the local ACE server.
+- **Don't extend the phase scope.** N.5 is bindless + indirect on
+  the existing rendering pipeline. Texture array atlas, GPU-side
+  culling, terrain wiring — all of those are subsequent phases. If
+  the brainstorm tries to expand, push back.
+
+---
+
+## Reference: the N.4 dispatcher flow you're modifying
+
+```
+Draw(camera, landblockEntries, frustum, ...) {
+  // Phase 1: walk entities, build groups
+  foreach (entity, meshRef, batch) {
+    cull, classify into _groups[GroupKey]
+  }
+
+  // Phase 2: lay matrices contiguously
+  // Phase 3: glBufferData(_instanceVbo, allMatrices)
+  // Phase 4: bind global VAO once
+  // Phase 5: opaque pass (sorted)
+  foreach (group in _opaqueDraws) {
+    glBindTexture(group.handle)
+    glBindBuffer(EBO, group.ibo)
+    glDrawElementsInstancedBaseVertexBaseInstance(...)
+  }
+  // Phase 6: translucent pass
+}
+```
+
+After N.5, Phases 5 and 6 collapse to:
+
+```
+glBindBuffer(DRAW_INDIRECT_BUFFER, _opaqueIndirect)
+glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_SHORT, 0, opaqueGroups.Count, sizeof(DEIC))
+glBindBuffer(DRAW_INDIRECT_BUFFER, _translucentIndirect)
+// 3 sub-calls for translucent or 1 if shader-folded
+glMultiDrawElementsIndirect(...)
+```
+
+That's the destination. Get there cleanly.
+
+Good luck. Holler at the user if any of the brainstorm questions feel
+genuinely ambiguous after reading the references — they care about
+this phase landing right and will engage on design questions.