From 55ecec683f74cc287c872f830eb1e477bfccaafd Mon Sep 17 00:00:00 2001
From: Erik <erik.nihlen@gmail.com>
Date: Fri, 8 May 2026 21:14:50 +0200
Subject: [PATCH] =?UTF-8?q?phase(N.5):=20SHIP=20=E2=80=94=20modern=20rende?=
 =?UTF-8?q?ring=20path=20on=20N.4=20dispatcher?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bindless textures + glMultiDrawElementsIndirect on top of N.4's grouped
pipeline. Per-frame entity rendering: 3 SSBO uploads (instance matrices
@ binding=0, batch data @ binding=1, indirect commands) + 2 indirect
calls (opaque + transparent). Total ~12-15 GL calls per frame for entity
rendering, regardless of scene complexity.

Acceptance gates (spec §8.3):
- [x] Visual identity to N.4 — Task 10 USER GATE PASS (Holtburg courtyard)
      + Task 14 USER GATE PASS (general roaming, no regressions seen)
- [x] CPU dispatcher time ≤ 70% of N.4 — measured 1.23 ms/frame median
      at Holtburg courtyard (1662 groups, ~810 fps); estimated N.4
      hot path ≥2.5 ms/frame; comfortably under threshold
- [x] drawsIssued ≤ 5 per pass (CPU GL calls) — exactly 2 indirect calls
      per frame regardless of scene size
- [x] All tests green — 71/71 in
      FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless
- [x] ACDREAM_USE_WB_FOUNDATION=0 still works — InstancedMeshRenderer
      escape hatch preserved (its own shader path, untouched)
- [ ] GPU rendering time within ±10% of N.4 — DEFERRED to N.6.
      GL_TIME_ELAPSED query polling never reports avail!=1 within the
      same frame; needs double-buffering. CPU is the load-bearing metric.

Plan amendments captured during execution:
- Task 2: parallel Texture2DArray upload path (replacing the original
  "switch globally" framing that would've broken 4 legacy consumers)
- Task 3+4: parallel bindless cache dictionaries (avoiding the GLSL
  type mismatch from sampling a Texture2D handle via sampler2DArray)
- Task 5: preserved mesh_instanced.frag's full SceneLighting UBO + 8
  lights + fog + lightning flash + per-channel clamp
- Task 9: BatchDataPublic Pack=8 (required for safe MemoryMarshal.Cast)

Plan archived at:
  docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md
Spec at:
  docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md
Perf baseline at:
  docs/plans/2026-05-08-phase-n5-perf-baseline.md
Memory at:
  ~/.claude/.../memory/project_phase_n5_state.md

Files changed: 6 added, 6 modified, 2 deleted. 19 tasks shipped across
~40 commits including amendments + fixups + reviews.

N.6 follow-ups: retire InstancedMeshRenderer entirely; GPU timer query
double-buffering; persistent-mapped buffers if profiling shows the
residual glBufferData hot spot; possible WB atlas adoption for memory
savings on shared content; possible GPU-side culling via compute pre-pass;
per-instance highlight (selection blink) for retail-faithful click feedback
(field reserved in mesh_modern.vert's InstanceData struct).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>