From 55ecec683f74cc287c872f830eb1e477bfccaafd Mon Sep 17 00:00:00 2001 From: Erik Date: Fri, 8 May 2026 21:14:50 +0200 Subject: [PATCH] =?UTF-8?q?phase(N.5):=20SHIP=20=E2=80=94=20modern=20rende?= =?UTF-8?q?ring=20path=20on=20N.4=20dispatcher?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Bindless textures + glMultiDrawElementsIndirect on top of N.4's grouped pipeline. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect calls (opaque + transparent). Total ~12-15 GL calls per frame for entity rendering, regardless of scene complexity. Acceptance gates (spec §8.3): - [x] Visual identity to N.4 — Task 10 USER GATE PASS (Holtburg courtyard) + Task 14 USER GATE PASS (general roaming, no regressions seen) - [x] CPU dispatcher time ≤ 70% of N.4 — measured 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps); estimated N.4 hot path ≥2.5 ms/frame; comfortably under threshold - [x] drawsIssued ≤ 5 per pass (CPU GL calls) — exactly 2 indirect calls per frame regardless of scene size - [x] All tests green — 71/71 in FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless - [x] ACDREAM_USE_WB_FOUNDATION=0 still works — InstancedMeshRenderer escape hatch preserved (its own shader path, untouched) - [ ] GPU rendering time within ±10% of N.4 — DEFERRED to N.6. GL_TIME_ELAPSED query polling never reports avail!=1 within the same frame; needs double-buffering. CPU is the load-bearing metric. Plan amendments captured during execution: - Task 2: parallel Texture2DArray upload path (replacing the original "switch globally" framing that would've broken 4 legacy consumers) - Task 3+4: parallel bindless cache dictionaries (avoiding the GLSL type mismatch from sampling a Texture2D handle via sampler2DArray) - Task 5: preserved mesh_instanced.frag's full SceneLighting UBO + 8 lights + fog + lightning flash + per-channel clamp - Task 9: BatchDataPublic Pack=8 (required for safe MemoryMarshal.Cast) Plan archived at: docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md Spec at: docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md Perf baseline at: docs/plans/2026-05-08-phase-n5-perf-baseline.md Memory at: ~/.claude/.../memory/project_phase_n5_state.md Files changed: 6 added, 6 modified, 2 deleted. 19 tasks shipped across ~40 commits including amendments + fixups + reviews. N.6 follow-ups: retire InstancedMeshRenderer entirely; GPU timer query double-buffering; persistent-mapped buffers if profiling shows the residual glBufferData hot spot; possible WB atlas adoption for memory savings on shared content; possible GPU-side culling via compute pre-pass; per-instance highlight (selection blink) for retail-faithful click feedback (field reserved in mesh_modern.vert's InstanceData struct). Co-Authored-By: Claude Opus 4.7 (1M context)