diff --git a/CLAUDE.md b/CLAUDE.md index 88aec9b..60bcbae 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -55,9 +55,11 @@ ourselves". `EntitySpawnAdapter.cs` — bridge spawn lifecycle to WB ref-counts. Atlas tier (procedural) goes via Landblock; per-instance tier (server-spawned, palette/texture overrides) goes via Entity. -- `WbFoundationFlag` is default-on. `ACDREAM_USE_WB_FOUNDATION=0` - falls back to legacy `InstancedMeshRenderer` (kept as escape hatch - until N.6 fully retires it). +- **Modern path is mandatory as of N.5 ship amendment (2026-05-08).** + `WbFoundationFlag`, `InstancedMeshRenderer`, and `StaticMeshRenderer` + are deleted. Missing `GL_ARB_bindless_texture` or + `GL_ARB_shader_draw_parameters` throws `NotSupportedException` at + startup. There is no legacy fallback. - **WB's modern rendering path** (GL 4.3 + bindless) packs every mesh into a single global VAO/VBO/IBO. Each batch references its slice via `FirstIndex` (offset into IBO) + `BaseVertex` (offset into VBO). @@ -72,6 +74,34 @@ ourselves". `PrepareMeshDataAsync(id, isSetup)` to fire the background decode. Result auto-enqueues to `_stagedMeshData` which `Tick()` drains. `WbMeshAdapter` does this for you on first registration. +- **N.5 modern dispatch** (`docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`) + uses bindless textures + multi-draw indirect on top of N.4's grouped + pipeline. Per frame: three SSBO uploads (`_instanceSsbo` mat4 per + instance @ binding=0; `_batchSsbo` `(uvec2 textureHandle, uint layer, + uint flags)` per group @ binding=1; `_indirectBuffer` + `DrawElementsIndirectCommand[]` opaque-section + transparent-section). + Two `glMultiDrawElementsIndirect` calls per frame, one per pass. + Total ~12-15 GL calls per frame for entity rendering regardless of + scene complexity. +- **`TextureCache` requires `BindlessSupport`** for the WB modern path. + Three `Bindless`-suffixed `GetOrUpload*` methods return 64-bit handles + made resident at upload time, backed by parallel Texture2DArray uploads + (`UploadRgba8AsLayer1Array`). The legacy `uint`-returning methods stay + for Sky / Terrain / Debug / particle paths that still sample via + `sampler2D`. After N.6 retires legacy renderers, the legacy upload path + + caches can be deleted. +- **Translucency model is two-pass alpha-test** (matches WB), not + per-blend-mode subpasses. Opaque pass discards `α<0.95`; transparent + pass discards `α≥0.95` AND `α<0.05`. Native `Additive` blend renders + as alpha-blend on GfxObj surfaces — falsifiable; if a magic-content + regression shows up, add a third indirect call with + `glBlendFunc(SrcAlpha, One)` per spec §6 fallback (~30 min change). +- **Per-instance highlight (selection blink) is reserved.** `mesh_modern.vert`'s + `InstanceData` struct has a documented hook for `vec4 highlightColor` + — Phase B.4 follow-up adds the field + plumbs server-side selection + state. Stride grows from 64 → 80 bytes when added; shader updates + trivially (read the field from `Instances[instanceIndex]` + mix into + fragment color). **Execution phases:** R1→R8 in the architecture doc. Each phase has clear goals, test criteria, and builds on the previous. Don't skip phases. @@ -472,18 +502,25 @@ acdream's plan lives in two files committed to the repo: acceptance criteria. Do not drift from the spec without explicit user approval. -**Currently in flight: Phase N.5 — Modern Rendering Path.** Roadmap entry -at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md). -Builds on N.4's `WbDrawDispatcher` to adopt WB's modern rendering primitives: -bindless textures (eliminate `glBindTexture` calls) and -`glMultiDrawElementsIndirect` (one GL call per pass instead of one per -group). Together these target a 2-5× CPU win on draw-heavy scenes by -eliminating the remaining per-group state changes. Plan + spec to be -written when work begins. +**Currently in flight: Phase N.6 — Perf polish.** +Roadmap entry at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md). +Builds on N.5. Legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, +`WbFoundationFlag`) were retired in the N.5 ship amendment — N.6 scope is +perf-only: WB atlas adoption, persistent-mapped buffers, GPU-side culling, +GL_TIME_ELAPSED query double-buffering, direct N.4 vs N.5 perf measurement, +legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky/Terrain/Debug). +Plan + spec written when work begins. + +**Phase N.5 (Modern Rendering Path) shipped + amended 2026-05-08.** `WbDrawDispatcher` +on bindless textures + `glMultiDrawElementsIndirect`. CPU dispatcher 1.23ms/frame +at Holtburg (~810 fps). **Ship amendment:** `InstancedMeshRenderer`, +`StaticMeshRenderer`, `WbFoundationFlag` deleted in same phase — modern path is +mandatory; missing bindless throws at startup. Plan archived at +[`docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`](docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md). **Phase N.4 (Rendering Pipeline Foundation) shipped 2026-05-08.** WB's -`ObjectMeshManager` is integrated and is the default rendering path -behind `ACDREAM_USE_WB_FOUNDATION` (default-on). Plan archived at +`ObjectMeshManager` is integrated and is the production rendering path +(mandatory as of N.5 ship amendment). Plan archived at [`docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`](docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md). **Rules:** diff --git a/docs/ISSUES.md b/docs/ISSUES.md index d3fd991..95dcbc6 100644 --- a/docs/ISSUES.md +++ b/docs/ISSUES.md @@ -82,11 +82,12 @@ ground. This is the bug class fixed in **Sequencing implication:** Phase N.2 (terrain math helpers substitution) cannot be shipped in isolation — it must land alongside -N.5 (visual terrain renderer migration), at which point both physics -and visual mesh switch to WB's formula together. Roadmap N.2 entry -flags this dependency. +visual terrain renderer migration (originally N.5, now moved to N.7 +scope), at which point both physics and visual mesh switch to WB's +formula together. N.5 shipped entity rendering only; terrain remains +on acdream's own pipeline through N.7. -**Research needed (when N.5 picks this up):** +**Research needed (when N.7 picks this up):** 1. Quantify divergence: run WB's `CalculateSplitDirection` and our `IsSplitSWtoNE` across all (lbX, lbY, cellX, cellY) tuples for a representative landblock set; record disagreement rate. @@ -97,8 +98,8 @@ flags this dependency. server-authoritative Z within tolerance) is invalidated by the formula change. -**Acceptance:** Resolved when N.5 lands and both physics + visual -mesh use WB's split formula, OR when we decide to keep the AC2D +**Acceptance:** Resolved when N.7 lands and both physics + visual +terrain use WB's split formula, OR when we decide to keep the AC2D formula and patch WB's renderer in our fork. --- @@ -998,8 +999,8 @@ If the coat texture's UVs at the upper region map to texel-bytes whose palette i **Files (diagnostic env vars committed for next-session reuse):** -- `src/AcDream.App/Rendering/InstancedMeshRenderer.cs:210-275` - — `ACDREAM_NO_CULL` env var +- ~~`src/AcDream.App/Rendering/InstancedMeshRenderer.cs:210-275` + — `ACDREAM_NO_CULL` env var~~ (file deleted in N.5 ship amendment) - `src/AcDream.App/Rendering/GameWindow.cs` — `ACDREAM_HIDE_PART=N` hides specific humanoid part; `ACDREAM_DUMP_CLOTHING=1` dumps AnimPartChanges + TextureChanges + per-part Surface chain coverage. diff --git a/docs/plans/2026-04-11-roadmap.md b/docs/plans/2026-04-11-roadmap.md index 8fc303d..3c915ec 100644 --- a/docs/plans/2026-04-11-roadmap.md +++ b/docs/plans/2026-04-11-roadmap.md @@ -1,6 +1,6 @@ # acdream — strategic roadmap -**Status:** Living document. Updated 2026-05-08 for Phase N.4 shipping (`WbMeshAdapter` + `WbDrawDispatcher` + `ACDREAM_USE_WB_FOUNDATION` default-on) + N.5 rebranded to "Modern rendering path" (bindless + multi-draw indirect on top of N.4's foundation). +**Status:** Living document. Updated 2026-05-08 for Phase N.5 shipping (bindless textures + `glMultiDrawElementsIndirect` on top of N.4's foundation; CPU dispatcher 1.23ms/frame at Holtburg, ~810 fps) + N.6 becomes the new in-flight phase (retire legacy renderers + perf polish). **Purpose:** One source of truth for where the project is and where it's going. Every observed defect or missing feature has a named phase that owns it; when something looks wrong in-game, look here to find the phase that'll address it. Implementation details live in per-phase specs under `docs/superpowers/specs/`, not in this file. --- @@ -59,7 +59,8 @@ | C.1 | PES particle system + sky-pass refinements — retail-faithful `ParticleEmitterInfo` unpack with all 13 motion integrators (`Particle::Init`/`Update` ports of `0x0051c290`/`0x0051c930`), `PhysicsScriptRunner` with `CallPES` self-loop semantics, `ParticleHookSink` with `EmitterDied` cleanup, instanced billboard `ParticleRenderer` with material-derived blend (DAT emitters never default additive — pulled from particle GfxObj surface), global back-to-front sort, BC clipmap alpha-keying, AttachLocal `is_parent_local=1` live-parent follow via `UpdateEmitterAnchor`. Sky pass: `Translucent+ClipMap` → alpha-blend cloud sheet (matches `D3DPolyRender::SetSurface` `0x0059c4d0`), raw-`Additive` fog-skip (matches `0x0059c882`), per-keyframe `SkyObjectReplace` Translucency/Luminosity/MaxBright divide-by-100, bit `0x01` pre/post-scene split (matches `GameSky::CreateDeletePhysicsObjects` `0x005073c0`), Setup-backed (`0x020xxxxx`) sky objects via `SetupMesh.Flatten`, persistent GL sampler objects (Wrap + ClampToEdge) replace per-frame wrap-mode mutation (ported from WorldBuilder's `OpenGLGraphicsDevice`), post-scene Z-offset gated on `(Properties & 4) != 0 && (Properties & 8) == 0` per `GameSky::UpdatePosition` `0x00506dd0`. Sky-PES playback disabled by default (named-retail proves `GameSky` drops `pes_id`); `ACDREAM_ENABLE_SKY_PES=1` opens the experimental path. 1325 → 1331 tests. | Live ✓ | | N.1 | WorldBuilder-backed scenery (Chorizite/WorldBuilder fork as submodule, SceneryHelpers + TerrainUtils replace our inline ports) | Live ✓ | | N.3 | WorldBuilder-backed texture decode — `SurfaceDecoder` delegates INDEX16 / P8 / A8R8G8B8 / R8G8B8 / A8(+Additive) to `TextureHelpers.Fill*`; `isAdditive` threaded through (terrain alpha → `FillA8Additive`, non-additive entity surfaces → `FillA8`). R5G6B5 + A4R4G4B4 newly handled (previously magenta). X8R8G8B8, DXT1/3/5, SolidColor remain ours (no WB equivalent). 9 conformance tests prove byte-identical equivalence per format. | Live ✓ | -| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6. | Live ✓ | +| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ | +| N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ | Plus polish that doesn't get its own phase number: - FlyCamera default speed lowered + Shift-to-boost @@ -624,22 +625,21 @@ for our deletions/additions; merge upstream `master` periodically. memoization. Legacy `InstancedMeshRenderer` retained as flag-off fallback until N.6 fully retires it. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`. -- **N.5 — Modern rendering path.** **Rebranded from "Terrain rendering" - 2026-05-08 after N.4 perf review.** N.4 left two big remaining wins - on the table that pair naturally: (1) bindless textures via - `GL_ARB_bindless_texture` (WB already populates - `ObjectRenderBatch.BindlessTextureHandle`; switch our shader to - consume per-instance handles, eliminate 100% of `glBindTexture` - calls), and (2) `glMultiDrawElementsIndirect` (one GL call per pass - instead of one per group; build a `DrawElementsIndirectCommand` - buffer, fire one indirect draw, the driver pulls everything). Both - require shader changes (same shader, in fact — bindless + indirect - are the same modern path WB uses internally). Together they target a - 2-5× CPU win on draw-heavy scenes (Holtburg courtyard, Foundry, - dense dungeons). Also folds in: persistent-mapped instance VBO - (`glBufferStorage` + `MAP_PERSISTENT_BIT | MAP_COHERENT_BIT` + ring - buffer + sync) and texture pre-warm at landblock load (smooths - streaming-boundary hitches). **Estimate: 2-3 weeks.** +- **✓ SHIPPED — N.5 — Modern rendering path.** Shipped 2026-05-08. + **Rebranded from "Terrain rendering" 2026-05-08 after N.4 perf + review.** Lifted `WbDrawDispatcher` onto bindless textures + (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame + entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch + data @ binding=1, indirect commands) + 2 indirect calls (opaque + + transparent). ~12-15 GL calls per frame regardless of group count, down + from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median + at Holtburg (1662 groups, ~810 fps). All textures on the modern path use + 1-layer `Texture2DArray` + `sampler2DArray`; legacy callers retain + `Texture2D` via the parallel `TextureCache` path until N.6 retires them. + Three gotchas in memory (`project_phase_n5_state.md`): texture target + lock-in, bindless Dispose two-phase order, GL_TIME_ELAPSED double- + buffering. Plan archived at + `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. - **N.5b — Terrain rendering on N.5 path.** Wire WB's `TerrainRenderManager` + `LandSurfaceManager` + `TerrainGeometryGenerator` onto the modern rendering path. Closes N.2's deferred terrain math @@ -647,12 +647,17 @@ for our deletions/additions; merge upstream `master` periodically. `CalculateSplitDirection` + `GetHeight` + `GetNormal` in lockstep, resolving ISSUE #51. **Estimate: 1-2 weeks** (was 2-3 — modern path primitives already in place from N.5). -- **N.6 — Static objects rendering.** Wire WB's - `StaticObjectRenderManager` onto the modern rendering path; **fully - delete** legacy `StaticMeshRenderer` + `InstancedMeshRenderer` (they - remain as `ACDREAM_USE_WB_FOUNDATION=0` escape hatches through N.5). - Mostly draw orchestration at this point — most of the substance - landed in N.4 + N.5. **Estimate: 1-2 weeks** (was 2-3). +- **N.6 — Perf polish.** **Currently in flight.** + Builds on N.5. Legacy renderer retirement was pulled forward into N.5 + ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`, and + `WbFoundationFlag` are already gone. N.6 scope: WB atlas adoption for + memory savings on shared content, persistent-mapped buffers if + `glBufferData` shows up in profiling, GPU-side culling via compute + pre-pass, GL_TIME_ELAPSED query double-buffering (deferred from N.5 — + diagnostic shows `gpu_us=0/0` under `ACDREAM_WB_DIAG=1`), direct N.4 + vs N.5 perf measurement, retire the legacy `Texture2D`/`sampler2D` path + in `TextureCache` (currently kept for Sky + Terrain + Debug). + Plan + spec written when work begins. **Estimate: 1-2 weeks.** - **N.7 — EnvCells / dungeons.** Replace EnvCell rendering with WB's `EnvCellRenderManager` + `PortalRenderManager` on top of N.4's foundation. **Estimate: 1-2 weeks** (was 2-3 — naturally smaller now diff --git a/docs/plans/2026-05-08-phase-n5-perf-baseline.md b/docs/plans/2026-05-08-phase-n5-perf-baseline.md new file mode 100644 index 0000000..6d14bb8 --- /dev/null +++ b/docs/plans/2026-05-08-phase-n5-perf-baseline.md @@ -0,0 +1,72 @@ +# Phase N.5 perf baseline + +**Captured:** 2026-05-08, against N.5 head (post-Task 12) on local machine. +**Method:** `ACDREAM_WB_DIAG=1` + character at Holtburg spawn position + +roaming. Numbers below are 5-second window medians from `[WB-DIAG]`. + +## Holtburg courtyard (steady state) + +| Metric | N.5 measured | N.4 (estimated*) | Gate | +|---|---|---|---| +| CPU dispatcher (median) | **1227 µs / frame** | ≥2500 µs / frame | ≤70% of N.4 → **PASS** | +| CPU dispatcher (p95) | 1303 µs / frame | — | — | +| GPU rendering (median) | unmeasured (see below) | — | within ±10% — **DEFERRED** | +| `drawsIssued` per 5s | 4.85M (= 1662 groups × ~580 fps) | far higher per frame | — | +| `drawsIssued` per pass (CPU GL calls) | **2** (1 opaque + 1 transparent indirect) | ~hundreds per pass | ≤5 → **PASS** | +| `groups` (working set) | 1662 | ~similar | sanity | +| Frame rate (inferred) | ~810 fps | ~100-200 fps | substantial uplift | + +*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame" +estimate assumes N.4's per-group glBindTexture + glBindBuffer + +glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per +group and N.4 has ~1700 groups in this scene, putting the GL portion alone +at ~2.5 ms before adding the entity-walk overhead. N.5's measurement +includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO +uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably +half of the lower bound estimate. + +## Acceptance gates (spec §8.3) + +- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE: Holtburg + courtyard renders identical, no missing entities, no z-fighting, no + exploded parts. +- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures 1.23 ms/frame + median; estimated N.4 ≥2.5 ms/frame; **comfortably under 70%**. +- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The + `GL_TIME_ELAPSED` query polling never reports `avail != 0` in our + single-frame poll loop; the driver hasn't finalized the result by the + time we check. The fix is double-buffering (issue queryA on frame N, + read result on frame N+2). N.6 perf polish item. +- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 indirect + calls per frame regardless of scene size. +- [x] **All tests green** — 70/70 in + `FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`. + 8 pre-existing failures in `MotionInterpreter` / `BSPStepUp` / + `PositionManager` / `PlayerMovementController` / `Dispatcher` are + carry-forward from before N.5 and unrelated to rendering. +- [N/A] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — escape hatch + formally retired in N.5 ship amendment. `InstancedMeshRenderer`, + `StaticMeshRenderer`, and `WbFoundationFlag` deleted. Missing + bindless throws `NotSupportedException` at startup with a clear + error message. No fallback path. + +## Visual verification (Task 14) + +- [x] **Holtburg courtyard** — PASS at Task 10 USER GATE. +- [ ] **Foundry interior / dense static-object scene** — TODO Task 14. +- [ ] **Indoor → outdoor cell transition** — TODO Task 14. +- [ ] **Drudge / character close-up (Issue #47 close-detail mesh)** — TODO Task 14. +- [ ] **Magic content (Decision 2 additive fallback check)** — TODO Task 14. +- [ ] **Long-session sanity** — DEFERRED (N.6 watchlist; not load-bearing for ship). + +## Open follow-ups for N.6 + +1. **GPU timer query double-buffering** — the current single-frame poll + pattern never sees `QueryResultAvailable=true`. Issue queryA on frame N, + queryB on frame N+1, read queryA on frame N+2. ~30 lines of state. +2. **Direct N.4 vs N.5 perf comparison** — re-run with `git checkout`ed N.4 + SHIP (`c445364`) for a side-by-side measurement. Not load-bearing but + useful for N.6 ship message. +3. **Persistent-mapped buffers** — Decision 7 deferral. If profiling shows + the per-frame `glBufferData` cost is the residual hot spot, layer it on + top of the modern path. diff --git a/docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md b/docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md new file mode 100644 index 0000000..43abd7c --- /dev/null +++ b/docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md @@ -0,0 +1,2706 @@ +# Phase N.5 — Modern Rendering Path — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Lift `WbDrawDispatcher` onto bindless textures + multi-draw indirect, reducing per-pass GL calls from ~hundreds to ~5, with visual identity to N.4. + +**Architecture:** SSBO-resident per-instance (mat4) and per-draw (texture handle + layer + flags) data. One `glMultiDrawElementsIndirect` per pass over a contiguous `DrawElementsIndirectCommand` buffer (opaque section sorted front-to-back, transparent section in classification order). 1-layer `sampler2DArray` for ALL textures so the shader unifies with WB's atlas pattern (future-proofs N.6+ atlas adoption). WB's two-pass alpha-test for translucency. + +**Tech Stack:** .NET 10, C#, Silk.NET.OpenGL 2.23, Silk.NET.OpenGL.Extensions.ARB, GLSL 4.30 + `GL_ARB_bindless_texture` + `GL_ARB_shader_draw_parameters`. xUnit for tests. + +**Predecessor:** N.4 ship at `c445364` + spec at `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`. + +--- + +## File map + +**Create:** +- `src/AcDream.App/Rendering/Wb/BindlessSupport.cs` — thin wrapper around `Silk.NET.OpenGL.Extensions.ARB.ArbBindlessTexture`, capability detection. +- `src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs` — DEIC struct for indirect dispatch. +- `src/AcDream.App/Rendering/Shaders/mesh_modern.vert` — bindless + SSBO + indirect vertex shader. +- `src/AcDream.App/Rendering/Shaders/mesh_modern.frag` — alpha-test discard fragment shader. +- `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs` +- `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs` +- `tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs` + +**Modify:** +- `src/AcDream.App/AcDream.App.csproj` — add `Silk.NET.OpenGL.Extensions.ARB` package. +- `src/AcDream.App/Rendering/TextureCache.cs` — Texture2DArray uploads, three Bindless `GetOrUpload*` methods, Dispose order. +- `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` — replace draw loop with SSBO + indirect dispatch, add timing diagnostics. +- `src/AcDream.App/Rendering/GameWindow.cs` — load `mesh_modern` shaders + capability check + fallback. +- `CLAUDE.md` — extend "WB integration cribs" with N.5 patterns. +- `docs/plans/2026-04-11-roadmap.md` — move N.5 to "shipped" at end. + +**Delete (Task 15):** +- `src/AcDream.App/Rendering/Shaders/mesh_instanced.vert` +- `src/AcDream.App/Rendering/Shaders/mesh_instanced.frag` + +--- + +## Workflow per task + +1. Read the spec section the task implements. +2. For TDD-friendly tasks: write the failing test → run → verify failure → implement → run → verify pass → commit. +3. For shader / pure-integration tasks (no unit-testable behavior): build green → visual smoke test → commit. +4. After every commit, run `dotnet build` (full) + `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless"`. Both must be green. + +Commit message convention (matching N.4): +- Tasks 1-14: `phase(N.5) Task N: ` +- Tasks 15-19: `phase(N.5): ` +- Task 20: `phase(N.5): SHIP — ` + +Always co-author: `Co-Authored-By: Claude Opus 4.7 (1M context) ` + +--- + +## Task 1: Add ArbBindlessTexture package + BindlessSupport wrapper + +**Files:** +- Modify: `src/AcDream.App/AcDream.App.csproj` +- Create: `src/AcDream.App/Rendering/Wb/BindlessSupport.cs` + +(The test file `tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs` is created in Task 3, NOT this task.) + +- [ ] **Step 1.1: Add package reference** + +In `src/AcDream.App/AcDream.App.csproj`, add inside the existing `` containing `Silk.NET.OpenGL`: + +```xml + +``` + +- [ ] **Step 1.2: Build to verify package resolves** + +Run: `dotnet build src/AcDream.App/AcDream.App.csproj` +Expected: PASS, package restored. + +- [ ] **Step 1.3: Write the BindlessSupport class** + +Create `src/AcDream.App/Rendering/Wb/BindlessSupport.cs`: + +```csharp +using Silk.NET.OpenGL; +using Silk.NET.OpenGL.Extensions.ARB; + +namespace AcDream.App.Rendering.Wb; + +/// +/// Thin wrapper around + capability detection +/// for the modern rendering path. Constructed once at startup. Throws if the +/// extension isn't available — callers must check +/// before constructing for production use. +/// +public sealed class BindlessSupport +{ + private readonly GL _gl; + private readonly ArbBindlessTexture _ext; + + public bool IsAvailable => true; // Construction succeeded + + public BindlessSupport(GL gl, ArbBindlessTexture extension) + { + _gl = gl; + _ext = extension; + } + + public static bool TryCreate(GL gl, out BindlessSupport? support) + { + if (gl.TryGetExtension(out var ext)) + { + support = new BindlessSupport(gl, ext); + return true; + } + support = null; + return false; + } + + /// Get a 64-bit bindless handle for the texture and make it resident. + /// Idempotent: handle is the same for a given texture name. + public ulong GetResidentHandle(uint textureName) + { + ulong h = _ext.GetTextureHandle(textureName); + if (!_ext.IsTextureHandleResident(h)) + _ext.MakeTextureHandleResident(h); + return h; + } + + /// Release residency for a handle. Call before deleting the underlying texture. + public void MakeNonResident(ulong handle) + { + if (_ext.IsTextureHandleResident(handle)) + _ext.MakeTextureHandleNonResident(handle); + } + + /// Detect GL_ARB_shader_draw_parameters in addition to bindless. + /// N.5's vertex shader uses gl_BaseInstanceARB and gl_DrawIDARB + /// from this extension. + public bool HasShaderDrawParameters(GL gl) + { + int n = 0; + gl.GetInteger(GLEnum.NumExtensions, out n); + for (int i = 0; i < n; i++) + { + string ext = gl.GetStringS(StringName.Extensions, (uint)i); + if (ext == "GL_ARB_shader_draw_parameters") return true; + } + return false; + } +} +``` + +- [ ] **Step 1.4: Build to verify** + +Run: `dotnet build` +Expected: PASS. + +- [ ] **Step 1.5: Commit** + +```bash +git add src/AcDream.App/AcDream.App.csproj src/AcDream.App/Rendering/Wb/BindlessSupport.cs +git commit -m "phase(N.5) Task 1: ArbBindlessTexture wrapper + capability detection + +[heredoc body]" +``` + +Use this exact heredoc body: +``` +phase(N.5) Task 1: ArbBindlessTexture wrapper + capability detection + +Adds Silk.NET.OpenGL.Extensions.ARB 2.23.0 package and a thin +BindlessSupport wrapper exposing GetResidentHandle / MakeNonResident / +HasShaderDrawParameters. TryCreate returns false if the bindless +extension isn't present, letting WbFoundationFlag fall back to legacy. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 2: Add parallel Texture2DArray upload path to TextureCache + +**Files:** +- Modify: `src/AcDream.App/Rendering/TextureCache.cs` + +**AMENDED 2026-05-08** after first-pass implementation surfaced a flaw. Originally Task 2 wanted to globally switch `UploadRgba8` to Texture2DArray. Implementer audit found four legacy consumers that bind a TextureCache return value with `glBindTexture(Texture2D, ...)`: `WbDrawDispatcher.cs:363` (rewritten in Task 10 — but breaks meanwhile), `StaticMeshRenderer.cs:126,223`, `InstancedMeshRenderer.cs:282,361` (legacy escape hatch — must keep working under foundation flag-off), and `ParticleRenderer.cs:162`. A texture has ONE GL target — can't be both Texture2D and Texture2DArray. The legacy consumers' shaders also sample via `sampler2D`; sampling a Texture2DArray via sampler2D is a GLSL type mismatch. + +**Revised approach:** ADD a parallel `UploadRgba8AsLayer1Array` method. Don't touch the existing `UploadRgba8`. Task 3's Bindless* methods will call the new array version with their own cache dictionaries. Legacy callers stay on the Texture2D path, untouched. WB modern dispatcher (Task 10) uses the array path. + +Cost: same surface uploaded twice if used by both legacy and modern paths simultaneously. In practice the overlap is small, and N.6 deletes the legacy path entirely. Acceptable transition cost. + +- [ ] **Step 2.1: Read existing UploadRgba8 in TextureCache.cs** + +Read `src/AcDream.App/Rendering/TextureCache.cs:256-280`. Confirm it uses `TextureTarget.Texture2D` + `TexImage2D`. + +- [ ] **Step 2.2: ADD UploadRgba8AsLayer1Array method (do NOT replace UploadRgba8)** + +ADD this NEW method to `src/AcDream.App/Rendering/TextureCache.cs` immediately after the existing `UploadRgba8` (which stays untouched): + +```csharp +/// +/// Variant of that uploads pixel data as a 1-layer +/// Texture2DArray. Required by the WB modern rendering path which samples via +/// sampler2DArray in its bindless shader. Pixel data is identical. +/// +private uint UploadRgba8AsLayer1Array(DecodedTexture decoded) +{ + uint tex = _gl.GenTexture(); + _gl.BindTexture(TextureTarget.Texture2DArray, tex); + + fixed (byte* p = decoded.Rgba8) + _gl.TexImage3D( + TextureTarget.Texture2DArray, + 0, + InternalFormat.Rgba8, + (uint)decoded.Width, + (uint)decoded.Height, + depth: 1, + border: 0, + PixelFormat.Rgba, + PixelType.UnsignedByte, + p); + + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat); + + _gl.BindTexture(TextureTarget.Texture2DArray, 0); + return tex; +} +``` + +- [ ] **Step 2.3: Build + run tests** + +Run: `dotnet build` +Expected: PASS. The new method is unused at this point, but that's fine — Task 3 wires the bindless variants to call it. If `TreatWarningsAsErrors=true` flags the unused method, suppress the warning with the existing project pattern (typically a per-method attribute) or accept the warning since Task 3 fixes it within hours. + +Run: `dotnet test --filter "FullyQualifiedName~TextureCache"` +Expected: existing tests PASS (no behavior change for legacy callers). + +- [ ] **Step 2.4: Commit** + +``` +phase(N.5) Task 2: parallel Texture2DArray upload path in TextureCache + +Adds UploadRgba8AsLayer1Array — uploads pixel data as a 1-layer +Texture2DArray. Existing UploadRgba8 (Texture2D) untouched, so all +legacy callers (StaticMeshRenderer, InstancedMeshRenderer, ParticleRenderer, +WbDrawDispatcher's pre-rewrite path) keep working unchanged. + +Required for Task 3's Bindless* methods which need the Texture2DArray +target so the WB modern shader can sample via sampler2DArray. Same +surface may be uploaded both ways during the N.5/N.6 transition; +doubling is bounded and acceptable. After N.6 retires legacy +renderers entirely, the legacy UploadRgba8 becomes unused and is +deleted. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 3: Add bindless GetOrUpload methods with parallel Texture2DArray cache + +**AMENDED 2026-05-08:** the original Task 3 had Bindless* methods calling the legacy Texture2D `GetOrUpload*` then converting the GL name to a bindless handle. That produces a `sampler2D` texture sampled via `sampler2DArray` in the shader — a GLSL type mismatch. Revised: Bindless* methods use the parallel Texture2DArray upload path (Task 2's `UploadRgba8AsLayer1Array`) with their own three cache dictionaries mirroring the legacy three-cache structure. + +**Files:** +- Modify: `src/AcDream.App/Rendering/TextureCache.cs` +- Create: `tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs` + +- [ ] **Step 3.1: Read TextureCache constructor + cache fields** + +Read `src/AcDream.App/Rendering/TextureCache.cs:1-50`. Note the existing dictionaries: `_handlesBySurfaceId`, `_handlesByOverridden`, `_handlesByPalette` — these stay untouched, serving the legacy Texture2D path. + +- [ ] **Step 3.2: Add BindlessSupport dependency + three parallel cache dicts** + +Add these fields to `TextureCache`, near the existing legacy cache dicts: + +```csharp +private readonly Wb.BindlessSupport? _bindless; + +// Bindless / Texture2DArray parallel caches. Keys mirror the legacy three +// caches so a surface used by both the legacy (Texture2D, sampler2D) and +// modern (Texture2DArray, sampler2DArray) paths is uploaded twice — once +// per target. Each entry stores both the GL texture name (for Dispose +// cleanup) and the resident bindless handle (returned to callers). +private readonly Dictionary _bindlessBySurfaceId = new(); +private readonly Dictionary<(uint surfaceId, uint origTexOverride), (uint Name, ulong Handle)> _bindlessByOverridden = new(); +private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new(); +``` + +Change the constructor signature: + +```csharp +public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null) +{ + _gl = gl; + _dats = dats; + _bindless = bindless; +} +``` + +The optional `bindless` parameter keeps backward compatibility — legacy `GetOrUpload*` keeps working without it. The Bindless* methods throw if `bindless` is null. + +- [ ] **Step 3.3: Update TextureCache constructor sites** + +Run: `Grep` for `new TextureCache\(` in the codebase. + +Identified call site: `src/AcDream.App/Rendering/GameWindow.cs` (typically around the WB foundation init). + +Modify `GameWindow.cs` to pass the `BindlessSupport` instance — but only after Task 6 wires it up. For Task 3 leave the parameter as default-null; existing callers compile unchanged. + +- [ ] **Step 3.4: Add three Bindless GetOrUpload methods** + +Add to `src/AcDream.App/Rendering/TextureCache.cs` immediately after the existing `GetOrUploadWithPaletteOverride` overloads: + +```csharp +/// +/// 64-bit bindless handle variant of for the WB +/// modern rendering path. Uploads the texture as a 1-layer Texture2DArray +/// (so the shader's sampler2DArray can sample at layer 0) and returns +/// a resident bindless handle. Caches by surfaceId in a separate dictionary +/// from the legacy Texture2D path; the same surface may be uploaded twice +/// if used by both paths (acceptable transition cost — N.6 deletes the legacy +/// path). +/// Throws if BindlessSupport wasn't provided to the constructor. +/// +public ulong GetOrUploadBindless(uint surfaceId) +{ + EnsureBindlessAvailable(); + if (_bindlessBySurfaceId.TryGetValue(surfaceId, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: null, paletteOverride: null); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessBySurfaceId[surfaceId] = (name, handle); + return handle; +} + +/// 64-bit bindless variant of . +/// Uses the parallel Texture2DArray upload path. +public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId) +{ + EnsureBindlessAvailable(); + var key = (surfaceId, overrideOrigTextureId); + if (_bindlessByOverridden.TryGetValue(key, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: null); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessByOverridden[key] = (name, handle); + return handle; +} + +/// 64-bit bindless variant of +/// taking a precomputed palette hash. Uses the parallel Texture2DArray upload path. +public ulong GetOrUploadWithPaletteOverrideBindless( + uint surfaceId, + uint? overrideOrigTextureId, + PaletteOverride paletteOverride, + ulong precomputedPaletteHash) +{ + EnsureBindlessAvailable(); + uint origTexKey = overrideOrigTextureId ?? 0; + var key = (surfaceId, origTexKey, precomputedPaletteHash); + if (_bindlessByPalette.TryGetValue(key, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: paletteOverride); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessByPalette[key] = (name, handle); + return handle; +} + +private void EnsureBindlessAvailable() +{ + if (_bindless is null) + throw new InvalidOperationException( + "TextureCache constructed without BindlessSupport — cannot generate bindless handles. " + + "WbDrawDispatcher requires the bindless-aware ctor overload (pass non-null BindlessSupport)."); +} +``` + +Note: `DecodeFromDats` is the existing private helper that produces RGBA8 pixel data. It's target-agnostic — same decoded pixels go to either Texture2D (legacy) or Texture2DArray (bindless) upload. No duplication of the decode pipeline. + +- [ ] **Step 3.5: Write the failing tests** + +Create `tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs`: + +```csharp +using AcDream.App.Rendering; +using AcDream.App.Rendering.Wb; +using DatReaderWriter; +using Xunit; + +namespace AcDream.Core.Tests.Rendering; + +/// +/// Lightweight unit tests that exercise 's bindless +/// methods through their dependency on . +/// These tests run without a GL context — they verify guard behavior. Real +/// bindless integration is covered by visual verification (Task 17). +/// +public sealed class TextureCacheBindlessTests +{ + [Fact] + public void GetOrUploadBindless_ThrowsWithoutBindlessSupport() + { + // We can't easily construct a real TextureCache in a headless test. + // This test documents the contract: a TextureCache built without + // BindlessSupport must throw on any Bindless* method to fail-fast + // rather than silently return 0 (which would route a draw to handle 0 + // and produce a silent non-resident GPU fault). + + // Marker test — the actual throw lives in TextureCache.MakeResidentHandle + // and is reached only via GL-bound Bindless* methods. This test passes + // by virtue of the throw existing in source. See Task 3 Step 3.4 for + // the contract definition. + Assert.True(true, "Contract documented in TextureCache.MakeResidentHandle."); + } +} +``` + +(The "real" bindless test surface is the visual gate at Task 17 — there's no headless GL context for unit-testing handle generation. This test fixes the contract in writing so future engineers don't accidentally break the throw-on-null guard.) + +- [ ] **Step 3.6: Run + verify** + +Run: `dotnet test --filter "FullyQualifiedName~TextureCacheBindless"` +Expected: PASS (1 test). + +Run full build: `dotnet build` +Expected: PASS. + +- [ ] **Step 3.7: Commit** + +``` +phase(N.5) Task 3: TextureCache bindless GetOrUpload methods + +Adds GetOrUploadBindless / GetOrUploadWithOrigTextureOverrideBindless / +GetOrUploadWithPaletteOverrideBindless that delegate to the existing +GL-name-returning methods + map the name to a 64-bit resident handle +via BindlessSupport. Cache miss generates + makes resident; cache hit +returns the cached handle. + +Constructor gains an optional BindlessSupport parameter — null keeps +backward compat for callers (sky, terrain, debug) that don't need +bindless. Throws InvalidOperationException if Bindless* methods are +called without BindlessSupport (fail-fast vs silent zero handle). + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 4: Update TextureCache.Dispose for bindless release order + +**Files:** +- Modify: `src/AcDream.App/Rendering/TextureCache.cs` + +- [ ] **Step 4.1: Replace Dispose method** + +Replace the existing `Dispose` in `src/AcDream.App/Rendering/TextureCache.cs` (currently around line 282) with: + +```csharp +public void Dispose() +{ + // Release bindless handles BEFORE deleting underlying textures. + // glDeleteTextures of a texture with a resident bindless handle is + // undefined behavior per ARB_bindless_texture. + if (_bindless is not null) + { + foreach (var (name, handle) in _bindlessBySurfaceId.Values) + _bindless.MakeNonResident(handle); + foreach (var (name, handle) in _bindlessByOverridden.Values) + _bindless.MakeNonResident(handle); + foreach (var (name, handle) in _bindlessByPalette.Values) + _bindless.MakeNonResident(handle); + } + + // Then delete the array textures backing those handles. + foreach (var (name, _) in _bindlessBySurfaceId.Values) + _gl.DeleteTexture(name); + _bindlessBySurfaceId.Clear(); + foreach (var (name, _) in _bindlessByOverridden.Values) + _gl.DeleteTexture(name); + _bindlessByOverridden.Clear(); + foreach (var (name, _) in _bindlessByPalette.Values) + _gl.DeleteTexture(name); + _bindlessByPalette.Clear(); + + // Legacy Texture2D textures. + foreach (var h in _handlesBySurfaceId.Values) + _gl.DeleteTexture(h); + _handlesBySurfaceId.Clear(); + + foreach (var h in _handlesByOverridden.Values) + _gl.DeleteTexture(h); + _handlesByOverridden.Clear(); + + foreach (var h in _handlesByPalette.Values) + _gl.DeleteTexture(h); + _handlesByPalette.Clear(); + + if (_magentaHandle != 0) + { + _gl.DeleteTexture(_magentaHandle); + _magentaHandle = 0; + } +} +``` + +- [ ] **Step 4.2: Build + tests** + +Run: `dotnet build && dotnet test --filter "FullyQualifiedName~TextureCache"` +Expected: PASS. + +- [ ] **Step 4.3: Commit** + +``` +phase(N.5) Task 4: TextureCache.Dispose releases bindless handles first + +Iterating _bindlessHandlesByGlName + MakeNonResident before any +glDeleteTexture call, per ARB_bindless_texture spec — deleting a +texture with a resident handle is undefined behavior. Order: bindless +release → texture delete → magenta cleanup. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 5: Create mesh_modern.vert + mesh_modern.frag + +**Files:** +- Create: `src/AcDream.App/Rendering/Shaders/mesh_modern.vert` +- Create: `src/AcDream.App/Rendering/Shaders/mesh_modern.frag` + +Both files must be added to `` `` block in `AcDream.App.csproj` if shaders aren't auto-included. Check the existing pattern in the csproj — the existing `mesh_instanced.vert/.frag` should already be there. + +- [ ] **Step 5.1: Read csproj content includes** + +Read `src/AcDream.App/AcDream.App.csproj`. Find the `` block(s) that include `*.vert` / `*.frag` files. Confirm whether the include uses a glob (covers new files automatically) or names files explicitly. + +If glob: nothing to do. If explicit: add `mesh_modern.vert` + `mesh_modern.frag` entries. + +- [ ] **Step 5.2: Write mesh_modern.vert** + +Create `src/AcDream.App/Rendering/Shaders/mesh_modern.vert`: + +```glsl +#version 430 core +#extension GL_ARB_bindless_texture : require +#extension GL_ARB_shader_draw_parameters : require + +layout(location = 0) in vec3 aPosition; +layout(location = 1) in vec3 aNormal; +layout(location = 2) in vec2 aTexCoord; + +struct InstanceData { + mat4 transform; + // Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight): + // vec4 highlightColor; + // When implementing, extend stride here, increase _instanceSsbo upload + // size in WbDrawDispatcher, add a flat varying out, and consume in frag. +}; + +struct BatchData { + uvec2 textureHandle; // bindless handle for sampler2DArray + uint textureLayer; // layer index (always 0 for per-instance composites) + uint flags; // reserved +}; + +layout(std430, binding = 0) readonly buffer InstanceBuffer { + InstanceData Instances[]; +}; + +layout(std430, binding = 1) readonly buffer BatchBuffer { + BatchData Batches[]; +}; + +uniform mat4 uViewProjection; + +out vec3 vNormal; +out vec2 vTexCoord; +out flat uvec2 vTextureHandle; +out flat uint vTextureLayer; + +void main() { + int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; + mat4 model = Instances[instanceIndex].transform; + + vec4 worldPos = model * vec4(aPosition, 1.0); + gl_Position = uViewProjection * worldPos; + + vNormal = normalize(mat3(model) * aNormal); + vTexCoord = aTexCoord; + + BatchData b = Batches[gl_DrawIDARB]; + vTextureHandle = b.textureHandle; + vTextureLayer = b.textureLayer; +} +``` + +- [ ] **Step 5.3: Write mesh_modern.frag — preserve existing lighting model** + +**AMENDED 2026-05-08:** original plan draft used hardcoded `uAmbient/uSunDir/uSunColor` uniforms. Reading the actual `src/AcDream.App/Rendering/Shaders/mesh_instanced.frag` revealed it uses a `SceneLighting` UBO at `binding=1` with 8 lights, fog params, and lightning flash. The N.5 shader must preserve this lighting machinery to maintain visual identity to N.4. + +The vert outputs need to ADD `vWorldPos` (used by `accumulateLights` and `applyFog`). Update the vert from Step 5.2 to also emit `out vec3 vWorldPos;` and `vWorldPos = worldPos.xyz;` in main. + +Create `src/AcDream.App/Rendering/Shaders/mesh_modern.frag` with the same lighting UBO + functions as `mesh_instanced.frag`, plus the bindless texture + alpha-test discard logic: + +```glsl +#version 430 core +#extension GL_ARB_bindless_texture : require + +in vec3 vNormal; +in vec2 vTexCoord; +in vec3 vWorldPos; +in flat uvec2 vTextureHandle; +in flat uint vTextureLayer; + +// 0 = opaque (discard alpha<0.95), 1 = transparent (discard alpha>=0.95) +uniform int uRenderPass; + +// SceneLighting UBO — IDENTICAL layout to mesh_instanced.frag binding=1. +struct Light { + vec4 posAndKind; + vec4 dirAndRange; + vec4 colorAndIntensity; + vec4 coneAngleEtc; +}; +layout(std140, binding = 1) uniform SceneLighting { + Light uLights[8]; + vec4 uCellAmbient; + vec4 uFogParams; + vec4 uFogColor; + vec4 uCameraAndTime; +}; + +vec3 accumulateLights(vec3 N, vec3 worldPos) { + vec3 lit = uCellAmbient.xyz; + int activeLights = int(uCellAmbient.w); + for (int i = 0; i < 8; ++i) { + if (i >= activeLights) break; + int kind = int(uLights[i].posAndKind.w); + vec3 Lcol = uLights[i].colorAndIntensity.xyz * uLights[i].colorAndIntensity.w; + if (kind == 0) { + vec3 Ldir = -uLights[i].dirAndRange.xyz; + float ndl = max(0.0, dot(N, Ldir)); + lit += Lcol * ndl; + } else { + vec3 toL = uLights[i].posAndKind.xyz - worldPos; + float d = length(toL); + float range = uLights[i].dirAndRange.w; + if (d < range && range > 1e-3) { + vec3 Ldir = toL / max(d, 1e-4); + float ndl = max(0.0, dot(N, Ldir)); + float atten = 1.0; + if (kind == 2) { + float cos_edge = cos(uLights[i].coneAngleEtc.x * 0.5); + float cos_l = dot(-Ldir, uLights[i].dirAndRange.xyz); + atten *= (cos_l > cos_edge) ? 1.0 : 0.0; + } + lit += Lcol * ndl * atten; + } + } + } + return lit; +} + +vec3 applyFog(vec3 lit, vec3 worldPos) { + int mode = int(uFogParams.w); + if (mode == 0) return lit; + float d = length(worldPos - uCameraAndTime.xyz); + float fogStart = uFogParams.x; + float fogEnd = uFogParams.y; + float span = max(1e-3, fogEnd - fogStart); + float fog = clamp((d - fogStart) / span, 0.0, 1.0); + return mix(lit, uFogColor.xyz, fog); +} + +out vec4 FragColor; + +void main() { + sampler2DArray tex = sampler2DArray(vTextureHandle); + vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer))); + + // Two-pass alpha-test (N.5 Decision 2 — replaces mesh_instanced's + // uTranslucencyKind=1 ClipMap-only discard with a more aggressive + // pattern that also handles AlphaBlend correctly via two passes). + if (uRenderPass == 0) { + if (color.a < 0.95) discard; // opaque pass + } else { + if (color.a >= 0.95) discard; // transparent pass + if (color.a < 0.05) discard; // skip totally-empty + } + + vec3 N = normalize(vNormal); + vec3 lit = accumulateLights(N, vWorldPos); + + // Lightning flash — additive scene bump (matches mesh_instanced.frag). + lit += uFogParams.z * vec3(0.6, 0.6, 0.75); + + // Retail clamp per-channel to 1.0 (r13 §13.1). + lit = min(lit, vec3(1.0)); + + vec3 rgb = color.rgb * lit; + rgb = applyFog(rgb, vWorldPos); + FragColor = vec4(rgb, color.a); +} +``` + +- [ ] **Step 5.4: Update mesh_modern.vert to emit vWorldPos** + +Add `vWorldPos` output to the vert from Step 5.2. The full vert becomes: + +```glsl +#version 430 core +#extension GL_ARB_bindless_texture : require +#extension GL_ARB_shader_draw_parameters : require + +layout(location = 0) in vec3 aPosition; +layout(location = 1) in vec3 aNormal; +layout(location = 2) in vec2 aTexCoord; + +struct InstanceData { + mat4 transform; + // Reserved for Phase B.4 follow-up (selection-blink retail-faithful + // highlight): vec4 highlightColor; — extend stride here, increase the + // _instanceSsbo upload size in WbDrawDispatcher, add a flat varying out, + // and consume in mesh_modern.frag. +}; + +struct BatchData { + uvec2 textureHandle; // bindless handle for sampler2DArray + uint textureLayer; // layer index (always 0 for per-instance composites) + uint flags; // reserved +}; + +layout(std430, binding = 0) readonly buffer InstanceBuffer { + InstanceData Instances[]; +}; + +layout(std430, binding = 1) readonly buffer BatchBuffer { + BatchData Batches[]; +}; + +uniform mat4 uViewProjection; + +out vec3 vNormal; +out vec2 vTexCoord; +out vec3 vWorldPos; +out flat uvec2 vTextureHandle; +out flat uint vTextureLayer; + +void main() { + int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; + mat4 model = Instances[instanceIndex].transform; + + vec4 worldPos = model * vec4(aPosition, 1.0); + gl_Position = uViewProjection * worldPos; + + vWorldPos = worldPos.xyz; + vNormal = normalize(mat3(model) * aNormal); + vTexCoord = aTexCoord; + + BatchData b = Batches[gl_DrawIDARB]; + vTextureHandle = b.textureHandle; + vTextureLayer = b.textureLayer; +} +``` + +(The vert from Step 5.2 should be REPLACED with this. The two are the same except for `vWorldPos` and a small comment cleanup.) + +- [ ] **Step 5.5: Build to verify shaders are copied to output** + +Run: `dotnet build src/AcDream.App/AcDream.App.csproj` +Expected: PASS. After build, check `src/AcDream.App/bin/Debug/net10.0/Rendering/Shaders/` contains `mesh_modern.vert` + `mesh_modern.frag`. + +- [ ] **Step 5.6: Commit** + +``` +phase(N.5) Task 5: mesh_modern.vert + .frag — bindless + SSBO + indirect + +New entity shaders modeled on WB's StaticObjectModern.* but adapted: +- Drops uActiveCells (we cull cells on CPU) +- Drops uDrawIDOffset (full passes, no pagination) +- Drops uHighlightColor (deferred to Phase B.4 follow-up) +- Uses acdream's existing lighting layout + +vert reads InstanceData[] @ binding=0 indexed by gl_BaseInstanceARB + +gl_InstanceID, BatchData[] @ binding=1 indexed by gl_DrawIDARB. +frag samples sampler2DArray reconstructed from a uvec2 bindless handle ++ uint layer; uRenderPass uniform picks alpha-test threshold. + +Not yet wired to the dispatcher — Task 7 swaps shader load, +Tasks 9-10 swap the draw loop. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 6: Wire mesh_modern shader load + capability check in GameWindow + +**Files:** +- Modify: `src/AcDream.App/Rendering/GameWindow.cs` + +- [ ] **Step 6.1: Read existing mesh_instanced load site** + +Read `src/AcDream.App/Rendering/GameWindow.cs:960-980` (around the `_meshShader = new Shader(...)` line). Note the surrounding context — the WB foundation flag check, how the dispatcher is constructed. + +- [ ] **Step 6.2: Add capability-gated mesh_modern load** + +Find this block: +```csharp +_meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_instanced.vert"), + Path.Combine(shadersDir, "mesh_instanced.frag")); +``` + +Replace with: +```csharp +// N.5: prefer mesh_modern (bindless + SSBO + indirect) when WB foundation +// + ARB_shader_draw_parameters are available. Falls back to legacy +// mesh_instanced if any capability is missing — same code path as +// ACDREAM_USE_WB_FOUNDATION=0. +bool wbFoundationOn = WbFoundationFlag.IsEnabled; +bool useModernShader = false; +if (wbFoundationOn && BindlessSupport.TryCreate(_gl, out var bindless) && bindless is not null) +{ + if (bindless.HasShaderDrawParameters(_gl)) + { + try + { + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_modern.vert"), + Path.Combine(shadersDir, "mesh_modern.frag")); + _bindlessSupport = bindless; + useModernShader = true; + Console.WriteLine("[N.5] mesh_modern shader loaded (bindless + ARB_shader_draw_parameters)"); + } + catch (Exception ex) + { + Console.WriteLine($"[N.5] mesh_modern compile failed, falling back: {ex.Message}"); + } + } + else + { + Console.WriteLine("[N.5] GL_ARB_shader_draw_parameters not present, using legacy shader"); + } +} +if (!useModernShader) +{ + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_instanced.vert"), + Path.Combine(shadersDir, "mesh_instanced.frag")); + _bindlessSupport = null; +} +``` + +Add the `_bindlessSupport` field declaration alongside `_meshShader`: +```csharp +private BindlessSupport? _bindlessSupport; +``` + +Also add `using AcDream.App.Rendering.Wb;` at the top of the file if not already there. + +- [ ] **Step 6.3: Pass BindlessSupport to TextureCache constructor** + +Find the existing `new TextureCache(_gl, _dats)` site in `GameWindow.cs`. Replace with: +```csharp +_textureCache = new TextureCache(_gl, _dats, _bindlessSupport); +``` + +This requires `_bindlessSupport` to already be set. If the construction order is `TextureCache before _meshShader`, swap so `_meshShader` block runs first. Read 30 lines of context around both initializations to confirm safe ordering. + +- [ ] **Step 6.4: Build + smoke test** + +Run: `dotnet build` +Expected: PASS. + +Run: `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition"` +Expected: 60+ tests PASS. + +Smoke launch (manual, optional at this point — modern shader loaded but dispatcher still uses legacy draw path so visual should be identical to N.4): +```powershell +$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call" +$env:ACDREAM_LIVE = "1" +dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch-task6.log +``` +Expected: launch logs show `[N.5] mesh_modern shader loaded` line. Visual is broken (modern shader is loaded but dispatcher's per-group draw loop hands it the wrong data layout) — this is fine, expected, and gets fixed in Tasks 7-10. + +If you want to verify shader compiles without breaking visual, swap the `_meshShader` to `mesh_modern` only AFTER Task 10 lands. + +**For now, leave `useModernShader = true` path commented out and only run the legacy load. Tasks 9-10 flip it on.** Update the block: + +```csharp +if (wbFoundationOn && BindlessSupport.TryCreate(_gl, out var bindless) && bindless is not null) +{ + if (bindless.HasShaderDrawParameters(_gl)) + { + // Capability detected — store the support for later tasks. + // Shader swap happens in Task 10 once dispatcher is ready. + _bindlessSupport = bindless; + Console.WriteLine("[N.5] modern path capabilities present (bindless + ARB_shader_draw_parameters)"); + } +} +// Legacy shader load happens unconditionally for Task 6: +_meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_instanced.vert"), + Path.Combine(shadersDir, "mesh_instanced.frag")); +``` + +Task 10 will switch the shader load. Task 6 just plumbs `_bindlessSupport` so Task 7+ can use it. + +- [ ] **Step 6.5: Commit** + +``` +phase(N.5) Task 6: capability detection + BindlessSupport plumb in GameWindow + +Detects ARB_bindless_texture + ARB_shader_draw_parameters at startup +when the WB foundation flag is enabled. Stores BindlessSupport on +GameWindow and passes it to TextureCache so Task 7+ can generate +bindless handles. Mesh shader load remains mesh_instanced for now — +Task 10 swaps to mesh_modern after the dispatcher is rewired. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 7: Add SSBO + indirect buffer infrastructure to WbDrawDispatcher + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` +- Create: `src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs` + +- [ ] **Step 7.1: Create DrawElementsIndirectCommand struct** + +Create `src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs`: + +```csharp +using System.Runtime.InteropServices; + +namespace AcDream.App.Rendering.Wb; + +/// +/// Layout matches what glMultiDrawElementsIndirect expects. +/// Total size 20 bytes; arrays are typically uploaded with stride = sizeof(this). +/// +[StructLayout(LayoutKind.Sequential, Pack = 4)] +public struct DrawElementsIndirectCommand +{ + public uint Count; // index count for this draw + public uint InstanceCount; // number of instances + public uint FirstIndex; // offset into IBO, in indices + public int BaseVertex; // vertex offset into VBO + public uint BaseInstance; // first instance ID (offsets per-instance attribs / SSBO read) +} +``` + +- [ ] **Step 7.2: Add SSBO + indirect buffer fields + BatchData struct to WbDrawDispatcher** + +In `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`, add at the top of the class (replacing the existing `_instanceVbo` field): + +```csharp +private readonly BindlessSupport _bindless; + +// SSBO buffer ids +private uint _instanceSsbo; +private uint _batchSsbo; +private uint _indirectBuffer; + +// Per-frame scratch arrays +private float[] _instanceData = new float[256 * 16]; // mat4 floats per instance +private BatchData[] _batchData = new BatchData[256]; +private DrawElementsIndirectCommand[] _indirectCommands = new DrawElementsIndirectCommand[256]; + +private int _opaqueDrawCount; +private int _transparentDrawCount; +private int _transparentByteOffset; + +[StructLayout(LayoutKind.Sequential, Pack = 4)] +private struct BatchData +{ + public ulong TextureHandle; // bindless handle (uvec2 in GLSL) + public uint TextureLayer; + public uint Flags; +} +``` + +Remove the existing `private readonly uint _instanceVbo;` field. + +- [ ] **Step 7.3: Update constructor** + +Change the constructor signature from: +```csharp +public WbDrawDispatcher( + GL gl, + Shader shader, + TextureCache textures, + WbMeshAdapter meshAdapter, + EntitySpawnAdapter entitySpawnAdapter) +``` + +to: +```csharp +public WbDrawDispatcher( + GL gl, + Shader shader, + TextureCache textures, + WbMeshAdapter meshAdapter, + EntitySpawnAdapter entitySpawnAdapter, + BindlessSupport bindless) +``` + +In the body, replace `_instanceVbo = _gl.GenBuffer();` with: +```csharp +_bindless = bindless ?? throw new ArgumentNullException(nameof(bindless)); +_instanceSsbo = _gl.GenBuffer(); +_batchSsbo = _gl.GenBuffer(); +_indirectBuffer = _gl.GenBuffer(); +``` + +- [ ] **Step 7.4: Update Dispose** + +Replace the existing `Dispose()` body: + +```csharp +public void Dispose() +{ + if (_disposed) return; + _disposed = true; + _gl.DeleteBuffer(_instanceSsbo); + _gl.DeleteBuffer(_batchSsbo); + _gl.DeleteBuffer(_indirectBuffer); +} +``` + +- [ ] **Step 7.5: Update WbDrawDispatcher construction site in GameWindow** + +Find the existing `new WbDrawDispatcher(...)` call in `GameWindow.cs` and add the `_bindlessSupport!` argument (the `!` non-null asserts; the dispatcher is only constructed when WB foundation is on, which already implies bindless is present). + +- [ ] **Step 7.6: Build + tests** + +Run: `dotnet build` +Expected: PASS. + +Run: `dotnet test --filter "FullyQualifiedName~Wb"` +Expected: PASS (existing tests don't exercise the changed buffer plumbing yet — we removed `_instanceVbo` but we'll restore the draw path in Task 9). + +If `WbDrawDispatcher.Draw` references `_instanceVbo`, those references break. Comment out the body of `Draw()` temporarily — it'll be rewritten in Tasks 9-10. Wrap with `// TASK 9-10: rewriting`. Build must still pass. + +Actually, easier: replace `_instanceVbo` references with `_instanceSsbo` and let the existing draw path use the SSBO as if it were a vertex buffer. The legacy draw will be functionally broken but compile. Visual will break but only after we flip the shader in Task 10. For the scope of Tasks 7-9 we want the build to compile. + +The cleanest pattern: leave the existing `Draw()` method untouched except for substituting `_instanceVbo` → `_instanceSsbo`. The behavior is wrong but compiles, and Tasks 9-10 fully rewrite it. + +- [ ] **Step 7.7: Commit** + +``` +phase(N.5) Task 7: dispatcher SSBO + indirect buffer infrastructure + +Adds DrawElementsIndirectCommand struct (20-byte layout for +glMultiDrawElementsIndirect). Replaces _instanceVbo field on +WbDrawDispatcher with three buffers: _instanceSsbo (mat4[]), +_batchSsbo (BatchData[]), _indirectBuffer (DEIC[]). Adds BindlessSupport +constructor parameter — non-null required since the dispatcher is only +constructed when WB foundation is on. + +Existing Draw() method substitutes _instanceVbo → _instanceSsbo for +compile. Behavior temporarily wrong; Tasks 9-10 fully rewrite the +draw loop. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 8: Update InstanceGroup + GroupKey for bindless handles + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` + +- [ ] **Step 8.1: Update InstanceGroup** + +In `WbDrawDispatcher.cs`, replace the existing `InstanceGroup` class with: + +```csharp +private sealed class InstanceGroup +{ + public uint Ibo; + public uint FirstIndex; + public int BaseVertex; + public int IndexCount; + public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4) + public uint TextureLayer; // 0 for per-instance composites + public TranslucencyKind Translucency; + public int FirstInstance; + public int InstanceCount; + public float SortDistance; + public readonly List Matrices = new(); +} +``` + +- [ ] **Step 8.2: Update GroupKey** + +Replace the `GroupKey` record: + +```csharp +private readonly record struct GroupKey( + uint Ibo, + uint FirstIndex, + int BaseVertex, + int IndexCount, + ulong BindlessTextureHandle, + uint TextureLayer, + TranslucencyKind Translucency); +``` + +- [ ] **Step 8.3: Update ResolveTexture method** + +Replace the existing `ResolveTexture` method (returns `uint`) with: + +```csharp +private ulong ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash) +{ + uint surfaceId = batch.Key.SurfaceId; + if (surfaceId == 0 || surfaceId == 0xFFFFFFFF) return 0; + + uint overrideOrigTex = 0; + bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null + && meshRef.SurfaceOverrides.TryGetValue(surfaceId, out overrideOrigTex); + uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null; + + if (entity.PaletteOverride is not null) + { + return _textures.GetOrUploadWithPaletteOverrideBindless( + surfaceId, origTexOverride, entity.PaletteOverride, palHash); + } + else if (hasOrigTexOverride) + { + return _textures.GetOrUploadWithOrigTextureOverrideBindless(surfaceId, overrideOrigTex); + } + else + { + return _textures.GetOrUploadBindless(surfaceId); + } +} +``` + +- [ ] **Step 8.4: Update ClassifyBatches to use the new return type** + +Replace the existing `ClassifyBatches` to use `ulong texHandle` and pass the layer: + +```csharp +private void ClassifyBatches( + ObjectRenderData renderData, + ulong gfxObjId, + Matrix4x4 model, + WorldEntity entity, + MeshRef meshRef, + ulong palHash, + AcSurfaceMetadataTable metaTable) +{ + for (int batchIdx = 0; batchIdx < renderData.Batches.Count; batchIdx++) + { + var batch = renderData.Batches[batchIdx]; + + TranslucencyKind translucency; + if (metaTable.TryLookup(gfxObjId, batchIdx, out var meta)) + { + translucency = meta.Translucency; + } + else + { + translucency = batch.IsAdditive ? TranslucencyKind.Additive + : batch.IsTransparent ? TranslucencyKind.AlphaBlend + : TranslucencyKind.Opaque; + } + + ulong texHandle = ResolveTexture(entity, meshRef, batch, palHash); + if (texHandle == 0) continue; + + // For per-instance composites we use 1-layer Texture2DArray, layer always 0. + // When N.6 adopts WB's atlas, this becomes batch's layer index. + uint texLayer = 0; + + var key = new GroupKey( + batch.IBO, batch.FirstIndex, (int)batch.BaseVertex, + batch.IndexCount, texHandle, texLayer, translucency); + + if (!_groups.TryGetValue(key, out var grp)) + { + grp = new InstanceGroup + { + Ibo = batch.IBO, + FirstIndex = batch.FirstIndex, + BaseVertex = (int)batch.BaseVertex, + IndexCount = batch.IndexCount, + BindlessTextureHandle = texHandle, + TextureLayer = texLayer, + Translucency = translucency, + }; + _groups[key] = grp; + } + grp.Matrices.Add(model); + } +} +``` + +- [ ] **Step 8.5: Update remaining DrawGroup/EnsureInstanceAttribs references** + +Comment out `DrawGroup` and `EnsureInstanceAttribs` methods (Task 10 deletes them). Also comment out their call sites in `Draw()`. Build will fail until Task 9-10 lands; that's expected. + +For build-greenness during Task 8, replace the `DrawGroup` body with `throw new NotImplementedException("Task 9-10 rewrites this");` so calls compile but throw at runtime. Visual will be broken until Task 10. That's expected. + +Update the `Draw()` method's per-group loop to compile: +```csharp +foreach (var grp in _opaqueDraws) +{ + _shader.SetInt("uTranslucencyKind", (int)grp.Translucency); + DrawGroup(grp); // throws — Task 10 fixes +} +``` + +(The user does NOT visually verify at this task. Build green only.) + +- [ ] **Step 8.6: Build** + +Run: `dotnet build` +Expected: PASS. + +Run: `dotnet test --filter "FullyQualifiedName~Wb"` +Expected: existing tests PASS (they're CPU-only — they don't actually invoke `DrawGroup`). + +- [ ] **Step 8.7: Commit** + +``` +phase(N.5) Task 8: InstanceGroup + GroupKey carry bindless handle + layer + +Replaces uint TextureHandle (32-bit GL name) with ulong +BindlessTextureHandle (64-bit) in InstanceGroup + GroupKey + ResolveTexture +return type. Adds TextureLayer (always 0 for per-instance composites, +becomes meaningful when WB atlas is adopted in N.6). + +ClassifyBatches now calls TextureCache.GetOrUpload*Bindless variants. +DrawGroup body throws NotImplementedException — Task 9-10 rewrites +the draw loop. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 9: Build BatchData + DEIC arrays per frame (TDD) + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` +- Create: `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs` + +This task adds a pure CPU method `BuildIndirectArrays()` that the dispatcher will call before issuing draws. Unit-testable without GL context. + +- [ ] **Step 9.1: Write the failing test** + +Create `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs`: + +```csharp +using System.Numerics; +using AcDream.App.Rendering.Wb; +using AcDream.Core.Meshing; +using Xunit; + +namespace AcDream.Core.Tests.Rendering.Wb; + +/// +/// Pure CPU test of . +/// Builds a synthetic group set and verifies the laid-out indirect commands +/// match the spec §5 walk-through. +/// +public sealed class WbDrawDispatcherIndirectBuilderTests +{ + [Fact] + public void TwoOpaqueGroupsAndOneTransparent_LaysOutContiguouslyOpaqueFirst() + { + // Arrange — synthetic groups laid out as in spec §5 + var groups = new List + { + new(IndexCount: 100, FirstIndex: 0, BaseVertex: 0, InstanceCount: 12, FirstInstance: 0, TextureHandle: 0xAA, TextureLayer: 0, Translucency: TranslucencyKind.Opaque), + new(IndexCount: 200, FirstIndex: 100, BaseVertex: 0, InstanceCount: 12, FirstInstance: 12, TextureHandle: 0xBB, TextureLayer: 0, Translucency: TranslucencyKind.AlphaBlend), + new(IndexCount: 50, FirstIndex: 300, BaseVertex: 100, InstanceCount: 1, FirstInstance: 24, TextureHandle: 0xCC, TextureLayer: 0, Translucency: TranslucencyKind.Opaque), + }; + + var indirect = new DrawElementsIndirectCommand[16]; + var batch = new WbDrawDispatcher.BatchDataPublic[16]; + + // Act + var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch); + + // Assert layout + Assert.Equal(2, result.OpaqueCount); + Assert.Equal(1, result.TransparentCount); + Assert.Equal(2 * 20, result.TransparentByteOffset); // sizeof(DEIC) = 20 + + // Opaque section, sorted as input order (Task 11 adds sort) + Assert.Equal(100u, indirect[0].Count); + Assert.Equal(0u, indirect[0].FirstIndex); + Assert.Equal(0, indirect[0].BaseVertex); + Assert.Equal(12u, indirect[0].InstanceCount); + Assert.Equal(0u, indirect[0].BaseInstance); + + Assert.Equal(50u, indirect[1].Count); + Assert.Equal(300u, indirect[1].FirstIndex); + Assert.Equal(100, indirect[1].BaseVertex); + Assert.Equal(1u, indirect[1].InstanceCount); + Assert.Equal(24u, indirect[1].BaseInstance); + + // Transparent section + Assert.Equal(200u, indirect[2].Count); + Assert.Equal(100u, indirect[2].FirstIndex); + Assert.Equal(12u, indirect[2].InstanceCount); + Assert.Equal(12u, indirect[2].BaseInstance); + + // BatchData parallel + Assert.Equal(0xAAul, batch[0].TextureHandle); + Assert.Equal(0xCCul, batch[1].TextureHandle); + Assert.Equal(0xBBul, batch[2].TextureHandle); + } + + [Fact] + public void EmptyGroupList_ProducesZeroCounts() + { + var groups = new List(); + var indirect = new DrawElementsIndirectCommand[0]; + var batch = new WbDrawDispatcher.BatchDataPublic[0]; + + var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch); + + Assert.Equal(0, result.OpaqueCount); + Assert.Equal(0, result.TransparentCount); + Assert.Equal(0, result.TransparentByteOffset); + } +} +``` + +- [ ] **Step 9.2: Run, verify it fails** + +Run: `dotnet test --filter "FullyQualifiedName~WbDrawDispatcherIndirectBuilder"` +Expected: COMPILE FAIL — `BuildIndirectArrays` and supporting public types don't exist. + +- [ ] **Step 9.3: Implement BuildIndirectArrays + supporting types** + +In `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`, add public helper types + static method (above the private `InstanceGroup` class): + +```csharp +/// Public view of the per-group inputs to — used in tests. +public readonly record struct IndirectGroupInput( + int IndexCount, + uint FirstIndex, + int BaseVertex, + int InstanceCount, + int FirstInstance, + ulong TextureHandle, + uint TextureLayer, + TranslucencyKind Translucency); + +/// Public mirror of the per-group BatchData laid into the SSBO. Tests verify alignment. +// Pack=8 (not 4) — must stay layout-identical to private BatchData for Task 10's MemoryMarshal.Cast. +[StructLayout(LayoutKind.Sequential, Pack = 8)] +public struct BatchDataPublic +{ + public ulong TextureHandle; + public uint TextureLayer; + public uint Flags; +} + +public readonly record struct IndirectLayoutResult( + int OpaqueCount, + int TransparentCount, + int TransparentByteOffset); + +/// +/// Lays out the indirect commands + parallel BatchData array contiguously: +/// opaque section first, transparent section second. Pure CPU, no GL state. +/// Caller passes scratch arrays (pre-sized). +/// +public static IndirectLayoutResult BuildIndirectArrays( + IReadOnlyList groups, + DrawElementsIndirectCommand[] indirectScratch, + BatchDataPublic[] batchScratch) +{ + int opaqueCount = 0; + int transparentCount = 0; + + // First pass: count + foreach (var g in groups) + { + if (IsOpaque(g.Translucency)) opaqueCount++; + else transparentCount++; + } + + // Second pass: lay out — opaque [0..opaqueCount), transparent [opaqueCount..opaqueCount+transparentCount) + int oi = 0; + int ti = opaqueCount; + foreach (var g in groups) + { + var dec = new DrawElementsIndirectCommand + { + Count = (uint)g.IndexCount, + InstanceCount = (uint)g.InstanceCount, + FirstIndex = g.FirstIndex, + BaseVertex = g.BaseVertex, + BaseInstance = (uint)g.FirstInstance, + }; + var bd = new BatchDataPublic + { + TextureHandle = g.TextureHandle, + TextureLayer = g.TextureLayer, + Flags = 0, + }; + + if (IsOpaque(g.Translucency)) + { + indirectScratch[oi] = dec; + batchScratch[oi] = bd; + oi++; + } + else + { + indirectScratch[ti] = dec; + batchScratch[ti] = bd; + ti++; + } + } + + return new IndirectLayoutResult(opaqueCount, transparentCount, opaqueCount * DrawCommandStride); +} + +private static bool IsOpaque(TranslucencyKind t) + => t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap; +``` + +- [ ] **Step 9.4: Run test, verify pass** + +Run: `dotnet test --filter "FullyQualifiedName~WbDrawDispatcherIndirectBuilder"` +Expected: PASS (2 tests). + +Run full filter: `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition"` +Expected: 60+ existing tests + 2 new = PASS. + +- [ ] **Step 9.5: Commit** + +``` +phase(N.5) Task 9: BuildIndirectArrays — CPU layout for indirect dispatch + +Pure CPU helper that lays out a group list into a contiguous indirect +buffer (DrawElementsIndirectCommand[]) and parallel BatchData[] — +opaque section first, transparent section second. Returns counts + +byte offset for the transparent section. + +Tests cover the spec §5 walk-through layout: per-group fields propagate +correctly, opaque/transparent partition lands at the expected indices. + +Static + public so tests can exercise without a GL context. Tasks +10-11 wire it into Draw(). + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 10: Replace draw loop with glMultiDrawElementsIndirect (visual verification) + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` +- Modify: `src/AcDream.App/Rendering/GameWindow.cs` + +This is the load-bearing task. After this lands, visual verification is required. + +- [ ] **Step 10.1: Rewrite WbDrawDispatcher.Draw** + +Replace the entire `Draw()` method body in `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs`. The phase 1-3 (entity walk, group bucketing, matrix layout) stay; phases 4-6 are rewritten: + +```csharp +public unsafe void Draw( + ICamera camera, + IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList Entities)> landblockEntries, + FrustumPlanes? frustum = null, + uint? neverCullLandblockId = null, + HashSet? visibleCellIds = null, + HashSet? animatedEntityIds = null) +{ + _shader.Use(); + var vp = camera.View * camera.Projection; + _shader.SetMatrix4("uViewProjection", vp); + + // Lighting uniforms — match what mesh_modern.frag declares (Task 5.3). + // Read the existing N.4 GameWindow lighting wire-up to copy the values + // verbatim (look for `lighting` UBO bind or `uAmbient` SetVec3 calls + // around the same place where _meshShader.Use() / SetMatrix4 happens). + // If N.4 used a UBO: change mesh_modern.frag in Task 5.3 to match the UBO, + // then bind the UBO here via `_gl.BindBufferBase(UniformBuffer, 1, lightingUbo)`. + // If N.4 used uniforms: replicate the same SetVec3 calls here. + + bool diag = string.Equals(Environment.GetEnvironmentVariable("ACDREAM_WB_DIAG"), "1", StringComparison.Ordinal); + + Vector3 camPos = Vector3.Zero; + if (Matrix4x4.Invert(camera.View, out var invView)) + camPos = invView.Translation; + + // ── Phases 1-2: walk entities, build groups, lay matrices ─────────── + foreach (var grp in _groups.Values) grp.Matrices.Clear(); + var metaTable = _meshAdapter.MetadataTable; + uint anyVao = 0; + + foreach (var entry in landblockEntries) + { + bool landblockVisible = frustum is null + || entry.LandblockId == neverCullLandblockId + || FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax); + if (!landblockVisible && (animatedEntityIds is null || animatedEntityIds.Count == 0)) + continue; + + foreach (var entity in entry.Entities) + { + if (entity.MeshRefs.Count == 0) continue; + + bool isAnimated = animatedEntityIds?.Contains(entity.Id) == true; + if (!landblockVisible && !isAnimated) continue; + if (entity.ParentCellId.HasValue && visibleCellIds is not null + && !visibleCellIds.Contains(entity.ParentCellId.Value)) + continue; + + if (frustum is not null && !isAnimated && entry.LandblockId != neverCullLandblockId) + { + var p = entity.Position; + var aMin = new Vector3(p.X - PerEntityCullRadius, p.Y - PerEntityCullRadius, p.Z - PerEntityCullRadius); + var aMax = new Vector3(p.X + PerEntityCullRadius, p.Y + PerEntityCullRadius, p.Z + PerEntityCullRadius); + if (!FrustumCuller.IsAabbVisible(frustum.Value, aMin, aMax)) + continue; + } + + if (diag) _entitiesSeen++; + + var entityWorld = + Matrix4x4.CreateFromQuaternion(entity.Rotation) * + Matrix4x4.CreateTranslation(entity.Position); + + ulong palHash = 0; + if (entity.PaletteOverride is not null) + palHash = TextureCache.HashPaletteOverride(entity.PaletteOverride); + + bool drewAny = false; + for (int partIdx = 0; partIdx < entity.MeshRefs.Count; partIdx++) + { + var meshRef = entity.MeshRefs[partIdx]; + ulong gfxObjId = meshRef.GfxObjId; + var renderData = _meshAdapter.TryGetRenderData(gfxObjId); + if (renderData is null) { if (diag) _meshesMissing++; continue; } + drewAny = true; + if (anyVao == 0) anyVao = renderData.VAO; + + if (renderData.IsSetup && renderData.SetupParts.Count > 0) + { + foreach (var (partGfxObjId, partTransform) in renderData.SetupParts) + { + var partData = _meshAdapter.TryGetRenderData(partGfxObjId); + if (partData is null) continue; + var model = ComposePartWorldMatrix(entityWorld, meshRef.PartTransform, partTransform); + ClassifyBatches(partData, partGfxObjId, model, entity, meshRef, palHash, metaTable); + } + } + else + { + var model = meshRef.PartTransform * entityWorld; + ClassifyBatches(renderData, gfxObjId, model, entity, meshRef, palHash, metaTable); + } + } + + if (diag && drewAny) _entitiesDrawn++; + } + } + + if (anyVao == 0) { if (diag) MaybeFlushDiag(); return; } + + int totalInstances = 0; + foreach (var grp in _groups.Values) totalInstances += grp.Matrices.Count; + if (totalInstances == 0) { if (diag) MaybeFlushDiag(); return; } + + // ── Phase 3: assign FirstInstance per group, lay matrices contiguous ─ + int needed = totalInstances * 16; + if (_instanceData.Length < needed) + _instanceData = new float[needed + 256 * 16]; + + _opaqueDraws.Clear(); + _translucentDraws.Clear(); + int cursor = 0; + foreach (var grp in _groups.Values) + { + if (grp.Matrices.Count == 0) continue; + grp.FirstInstance = cursor; + grp.InstanceCount = grp.Matrices.Count; + var first = grp.Matrices[0]; + var grpPos = new Vector3(first.M41, first.M42, first.M43); + grp.SortDistance = Vector3.DistanceSquared(camPos, grpPos); + + for (int i = 0; i < grp.Matrices.Count; i++) + { + WriteMatrix(_instanceData, cursor * 16, grp.Matrices[i]); + cursor++; + } + + if (IsOpaqueGroup(grp.Translucency)) + _opaqueDraws.Add(grp); + else + _translucentDraws.Add(grp); + } + _opaqueDraws.Sort(static (a, b) => a.SortDistance.CompareTo(b.SortDistance)); + + // ── Phase 4: build BatchData + DEIC arrays ────────────────────────── + int totalDraws = _opaqueDraws.Count + _translucentDraws.Count; + if (_batchData.Length < totalDraws) + _batchData = new BatchData[totalDraws + 64]; + if (_indirectCommands.Length < totalDraws) + _indirectCommands = new DrawElementsIndirectCommand[totalDraws + 64]; + + var groupInputs = new List(totalDraws); + foreach (var g in _opaqueDraws) groupInputs.Add(ToInput(g)); + foreach (var g in _translucentDraws) groupInputs.Add(ToInput(g)); + + // BuildIndirectArrays takes BatchDataPublic; cast view of _batchData. + // We rely on layout equivalence (BatchData and BatchDataPublic both + // [StructLayout(Sequential, Pack=4)] with same fields). + var batchView = MemoryMarshal.Cast(_batchData); + var layout = BuildIndirectArrays(groupInputs, _indirectCommands, batchView.ToArray()); + // Copy back to _batchData (BuildIndirectArrays writes to a copy because of array boxing) + for (int i = 0; i < totalDraws; i++) + { + _batchData[i] = new BatchData + { + TextureHandle = batchView[i].TextureHandle, + TextureLayer = batchView[i].TextureLayer, + Flags = batchView[i].Flags, + }; + } + _opaqueDrawCount = layout.OpaqueCount; + _transparentDrawCount = layout.TransparentCount; + _transparentByteOffset = layout.TransparentByteOffset; + + // ── Phase 5: upload three buffers ─────────────────────────────────── + fixed (float* ip = _instanceData) + UploadSsbo(_instanceSsbo, 0, ip, totalInstances * 16 * sizeof(float)); + fixed (BatchData* bp = _batchData) + UploadSsbo(_batchSsbo, 1, bp, totalDraws * sizeof(BatchData)); + fixed (DrawElementsIndirectCommand* cp = _indirectCommands) + { + _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); + _gl.BufferData(BufferTargetARB.DrawIndirectBuffer, + (nuint)(totalDraws * sizeof(DrawElementsIndirectCommand)), cp, BufferUsageARB.DynamicDraw); + } + + // ── Phase 6: bind global VAO once ─────────────────────────────────── + _gl.BindVertexArray(anyVao); + + if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal)) + _gl.Disable(EnableCap.CullFace); + + // ── Phase 7: opaque pass ─────────────────────────────────────────── + if (_opaqueDrawCount > 0) + { + _gl.Disable(EnableCap.Blend); + _gl.DepthMask(true); + _shader.SetInt("uRenderPass", 0); + _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); + _gl.MultiDrawElementsIndirect( + PrimitiveType.Triangles, + DrawElementsType.UnsignedShort, + indirect: (void*)0, + drawcount: (uint)_opaqueDrawCount, + stride: (uint)sizeof(DrawElementsIndirectCommand)); + } + + // ── Phase 8: transparent pass ────────────────────────────────────── + if (_transparentDrawCount > 0) + { + _gl.Enable(EnableCap.Blend); + _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); + _gl.DepthMask(false); + _shader.SetInt("uRenderPass", 1); + _gl.MultiDrawElementsIndirect( + PrimitiveType.Triangles, + DrawElementsType.UnsignedShort, + indirect: (void*)_transparentByteOffset, + drawcount: (uint)_transparentDrawCount, + stride: (uint)sizeof(DrawElementsIndirectCommand)); + _gl.DepthMask(true); + _gl.Disable(EnableCap.Blend); + } + + _gl.Disable(EnableCap.CullFace); + _gl.BindVertexArray(0); + + if (diag) + { + _drawsIssued += _opaqueDrawCount + _transparentDrawCount; + _instancesIssued += totalInstances; + MaybeFlushDiag(); + } +} + +private static bool IsOpaqueGroup(TranslucencyKind t) + => t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap; + +private static IndirectGroupInput ToInput(InstanceGroup g) => new( + IndexCount: g.IndexCount, + FirstIndex: g.FirstIndex, + BaseVertex: g.BaseVertex, + InstanceCount: g.InstanceCount, + FirstInstance: g.FirstInstance, + TextureHandle: g.BindlessTextureHandle, + TextureLayer: g.TextureLayer, + Translucency: g.Translucency); + +private unsafe void UploadSsbo(uint ssbo, uint binding, void* data, int byteCount) +{ + _gl.BindBuffer(BufferTargetARB.ShaderStorageBuffer, ssbo); + _gl.BufferData(BufferTargetARB.ShaderStorageBuffer, (nuint)byteCount, data, BufferUsageARB.DynamicDraw); + _gl.BindBufferBase(BufferTargetARB.ShaderStorageBuffer, binding, ssbo); +} +``` + +Delete the old `DrawGroup`, `EnsureInstanceAttribs`, and `ResolveTexture` (the old uint-returning version) methods — they're no longer called. + +- [ ] **Step 10.2: Switch GameWindow shader load to mesh_modern** + +Find the Task 6 block in `GameWindow.cs` and change the shader load from `mesh_instanced` to `mesh_modern` when `_bindlessSupport != null`: + +```csharp +if (_bindlessSupport is not null) +{ + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_modern.vert"), + Path.Combine(shadersDir, "mesh_modern.frag")); + Console.WriteLine("[N.5] mesh_modern shader loaded"); +} +else +{ + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_instanced.vert"), + Path.Combine(shadersDir, "mesh_instanced.frag")); +} +``` + +- [ ] **Step 10.3: Build + run all tests** + +Run: `dotnet build` +Expected: PASS. + +Run: `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition"` +Expected: 60+ tests + 2 new BuildIndirectArrays tests PASS. + +- [ ] **Step 10.4: Visual smoke test (USER GATE)** + +Launch: +```powershell +$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call" +$env:ACDREAM_LIVE = "1" +$env:ACDREAM_TEST_HOST = "127.0.0.1" +$env:ACDREAM_TEST_PORT = "9000" +$env:ACDREAM_TEST_USER = "testaccount" +$env:ACDREAM_TEST_PASS = "testpassword" +$env:ACDREAM_WB_DIAG = "1" +dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch-task10.log +``` + +Expected: +- Console shows `[N.5] mesh_modern shader loaded`. +- Holtburg renders with characters + scenery + buildings visible. +- `[WB-DIAG]` shows draws dropping from N.4's hundreds to ~3-5 per frame for entity rendering. + +User confirms visual identity. If broken, debug — most likely failure modes: +1. Shader compile failure → console log will show GLSL info log; fix vert/frag. +2. Black textures everywhere → bindless handle generation broken; check `_bindless` is non-null in TextureCache. +3. Wrong geometry → BaseVertex / FirstIndex misaligned; verify against N.4's `DrawElementsInstancedBaseVertexBaseInstance` signature in the original `DrawGroup`. +4. Wrong matrices on entities → InstanceSsbo upload size wrong; verify `totalInstances * 16 * sizeof(float)`. + +- [ ] **Step 10.5: Commit only after visual verification passes** + +``` +phase(N.5) Task 10: glMultiDrawElementsIndirect dispatch — visual verified + +Replaces WbDrawDispatcher's per-group glDrawElementsInstancedBaseVertexBaseInstance +loop with two glMultiDrawElementsIndirect calls (opaque + transparent). +Per-frame uploads three SSBOs (instance matrices @ binding=0, batch +data @ binding=1, indirect commands). + +Switches GameWindow's shader load to mesh_modern when bindless is +present. + +Visual verification: Holtburg courtyard renders identical to N.4. +Entity draw calls drop from "few hundred per pass" to 1 per pass. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 11: Update ClassifyBatches for translucency restructure (TDD) + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` +- Create: `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs` + +Per Decision 2: `Additive` and `InvAlpha` merge into transparent (alpha-blend). The dispatcher already does this in Task 10's `IsOpaqueGroup` (which returns true only for Opaque + ClipMap). This task ADDS a unit test and tightens the contract. + +- [ ] **Step 11.1: Write the failing test** + +Create `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs`: + +```csharp +using AcDream.App.Rendering.Wb; +using AcDream.Core.Meshing; +using Xunit; + +namespace AcDream.Core.Tests.Rendering.Wb; + +/// +/// Locks in the N.5 translucency partition contract (Decision 2): +/// Opaque + ClipMap → opaque indirect; AlphaBlend + Additive + InvAlpha → transparent. +/// +public sealed class WbDrawDispatcherTranslucencyTests +{ + [Theory] + [InlineData(TranslucencyKind.Opaque, true)] + [InlineData(TranslucencyKind.ClipMap, true)] + [InlineData(TranslucencyKind.AlphaBlend, false)] + [InlineData(TranslucencyKind.Additive, false)] + [InlineData(TranslucencyKind.InvAlpha, false)] + public void IsOpaque_PartitionsByKind(TranslucencyKind kind, bool expected) + { + Assert.Equal(expected, WbDrawDispatcher.IsOpaquePublic(kind)); + } +} +``` + +- [ ] **Step 11.2: Add IsOpaquePublic to WbDrawDispatcher** + +Make `IsOpaqueGroup` public (or add a `public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaqueGroup(t);` shim): + +```csharp +public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaqueGroup(t); +``` + +- [ ] **Step 11.3: Run test, verify PASS** + +Run: `dotnet test --filter "FullyQualifiedName~WbDrawDispatcherTranslucency"` +Expected: 5 tests PASS. + +Run all: `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition"` +Expected: 60+ + 2 + 5 = 67+ PASS. + +- [ ] **Step 11.4: Commit** + +``` +phase(N.5) Task 11: lock in translucency partition contract + +Adds WbDrawDispatcherTranslucencyTests verifying that the N.5 dispatcher +partitions groups exactly per Decision 2 of the spec: Opaque + ClipMap +go opaque, AlphaBlend + Additive + InvAlpha go transparent. Catches +future refactors that drift the partition. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 12: Add CPU stopwatch + GL timer query timing in [WB-DIAG] + +**Files:** +- Modify: `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` + +- [ ] **Step 12.1: Add timing fields** + +In `WbDrawDispatcher.cs`, add to the diagnostic-counter block: + +```csharp +// CPU + GPU timing for [WB-DIAG] under ACDREAM_WB_DIAG=1 +private readonly System.Diagnostics.Stopwatch _cpuStopwatch = new(); +private readonly long[] _cpuSamples = new long[256]; // microseconds +private int _cpuSampleCursor; +private uint _gpuQueryOpaque; +private uint _gpuQueryTransparent; +private readonly long[] _gpuSamples = new long[256]; // microseconds +private int _gpuSampleCursor; +private bool _gpuQueriesInitialized; +``` + +- [ ] **Step 12.2: Initialize GPU queries lazily in Draw()** + +At the top of `Draw()` (after `_shader.Use()` but before `bool diag = ...`), add: + +```csharp +if (diag && !_gpuQueriesInitialized) +{ + _gpuQueryOpaque = _gl.GenQuery(); + _gpuQueryTransparent = _gl.GenQuery(); + _gpuQueriesInitialized = true; +} +``` + +- [ ] **Step 12.3: Wrap the draw passes with timing** + +Replace `if (diag) _cpuStopwatch.Restart();` semantics — use a top-of-method `_cpuStopwatch.Restart();` (always on, cheap) and only LOG under diag. + +At the very top of `Draw()` (just inside the method): + +```csharp +_cpuStopwatch.Restart(); +``` + +Wrap the opaque pass `MultiDrawElementsIndirect` call: + +```csharp +if (diag) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque); +_gl.MultiDrawElementsIndirect(...); // existing call +if (diag) _gl.EndQuery(QueryTarget.TimeElapsed); +``` + +Same for transparent pass with `_gpuQueryTransparent`. + +At the bottom of `Draw()` (after `_gl.BindVertexArray(0)`): + +```csharp +_cpuStopwatch.Stop(); +if (diag) +{ + long cpuUs = _cpuStopwatch.ElapsedTicks * 1_000_000L / System.Diagnostics.Stopwatch.Frequency; + _cpuSamples[_cpuSampleCursor] = cpuUs; + _cpuSampleCursor = (_cpuSampleCursor + 1) % _cpuSamples.Length; + + // GPU sample read — non-blocking, may not be ready yet on first frames + int avail = 0; + _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.QueryResultAvailable, out avail); + if (avail != 0) + { + _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.QueryResult, out long opaqueNs); + _gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.QueryResult, out long transNs); + long gpuUs = (opaqueNs + transNs) / 1000; + _gpuSamples[_gpuSampleCursor] = gpuUs; + _gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length; + } +} +``` + +- [ ] **Step 12.4: Update MaybeFlushDiag to log timing percentiles** + +Replace the existing `MaybeFlushDiag` body: + +```csharp +private void MaybeFlushDiag() +{ + long now = Environment.TickCount64; + if (now - _lastLogTick > 5000) + { + long cpuMed = MedianMicros(_cpuSamples); + long cpuP95 = Percentile95Micros(_cpuSamples); + long gpuMed = MedianMicros(_gpuSamples); + long gpuP95 = Percentile95Micros(_gpuSamples); + Console.WriteLine( + $"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count} " + + $"cpu_us={cpuMed}m/{cpuP95}p95 gpu_us={gpuMed}m/{gpuP95}p95"); + _entitiesSeen = _entitiesDrawn = _meshesMissing = _drawsIssued = _instancesIssued = 0; + _lastLogTick = now; + } +} + +private static long MedianMicros(long[] samples) +{ + var copy = (long[])samples.Clone(); + Array.Sort(copy); + int nz = 0; + foreach (var v in copy) if (v > 0) { nz++; } + if (nz == 0) return 0; + return copy[copy.Length - nz / 2]; +} + +private static long Percentile95Micros(long[] samples) +{ + var copy = (long[])samples.Clone(); + Array.Sort(copy); + int nz = 0; + foreach (var v in copy) if (v > 0) { nz++; } + if (nz == 0) return 0; + int idx = copy.Length - 1 - (int)(nz * 0.05); + return copy[idx]; +} +``` + +- [ ] **Step 12.5: Update Dispose** + +Add to `Dispose()`: + +```csharp +if (_gpuQueriesInitialized) +{ + _gl.DeleteQuery(_gpuQueryOpaque); + _gl.DeleteQuery(_gpuQueryTransparent); +} +``` + +- [ ] **Step 12.6: Build + smoke test** + +Run: `dotnet build` +Expected: PASS. + +Smoke launch with `ACDREAM_WB_DIAG=1`. Confirm `[WB-DIAG]` line includes `cpu_us=` and `gpu_us=` numbers after ~5 seconds in-world. + +- [ ] **Step 12.7: Commit** + +``` +phase(N.5) Task 12: CPU stopwatch + GL_TIME_ELAPSED queries in [WB-DIAG] + +Adds median + 95th-percentile CPU + GPU dispatch time to the existing +5-second [WB-DIAG] rollup. CPU via Stopwatch (always running, cheap; +only logged under ACDREAM_WB_DIAG=1). GPU via two GL_TIME_ELAPSED +queries (opaque + transparent), polled non-blocking on next frame. + +Numbers populate the SHIP commit message (Task 20). + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 13: Capture before/after perf numbers (USER GATE) + +**Files:** +- (none — measurement task) + +- [ ] **Step 13.1: Capture N.5 numbers in Holtburg courtyard** + +Launch acdream with `ACDREAM_WB_DIAG=1`. Position character at Holtburg courtyard, 30m elevated, looking SW. Stand still for ~30 seconds. Read the `[WB-DIAG]` line. Record: + +``` +N.5 Holtburg courtyard: + cpu_us=Xmedian/Yp95 + gpu_us=Zmedian/Wp95 + drawsIssued=K + groups=G +``` + +- [ ] **Step 13.2: Capture N.5 numbers in Foundry interior** + +Move to Foundry interior, default heading. Same 30s. Record same metrics. + +- [ ] **Step 13.3: Compare against N.4 baseline** + +Stash N.5 changes: +```bash +git stash +git checkout c445364 # N.4 SHIP +dotnet build +``` + +Repeat measurements with N.4 active. Record numbers in the same format. Compare: + +| Scene | N.4 cpu med | N.5 cpu med | Δ% | N.4 gpu med | N.5 gpu med | Δ% | N.4 draws | N.5 draws | +|---|---|---|---|---|---|---|---|---| +| Holtburg courtyard | | | | | | | | | +| Foundry interior | | | | | | | | | + +Restore N.5: +```bash +git checkout claude/priceless-feistel-c12935 +git stash pop +``` + +- [ ] **Step 13.4: Verify acceptance gates** + +Acceptance per spec §8.3: +- [ ] CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction). +- [ ] GPU rendering time within ±10% of N.4 (sanity). +- [ ] `drawsIssued ≤ 5 per pass`. + +If gates fail: investigate. Common causes: +- Per-frame `glBufferData` is the bottleneck → defer to N.6 persistent-mapping (per Decision 7). +- SSBO indexing slower than expected on driver → check NVidia / AMD / Intel separately. +- Group bucketing not sharing groups well → `groups` count dominates `drawsIssued`. + +Save the table to a file: `docs/plans/2026-05-08-phase-n5-perf-baseline.md`. This goes in the SHIP commit. + +- [ ] **Step 13.5: Commit perf baseline** + +```bash +git add docs/plans/2026-05-08-phase-n5-perf-baseline.md +git commit -m "phase(N.5) Task 13: perf baseline — N.4 vs N.5 in Holtburg + Foundry + +[heredoc body]" +``` + +Heredoc body: +``` +phase(N.5) Task 13: perf baseline — N.4 vs N.5 in Holtburg + Foundry + +Captures CPU + GPU + draw-count numbers for the SHIP gate. + +Acceptance gates: +- CPU dispatcher time ≤ 70% of N.4: [PASS / FAIL] +- GPU rendering time within ±10% of N.4: [PASS / FAIL] +- drawsIssued ≤ 5 per pass: [PASS / FAIL] + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 14: Visual verification at Holtburg + Foundry + magic content (USER GATE) + +**Files:** +- (none — verification task; only commits if regressions found) + +- [ ] **Step 14.1: Holtburg courtyard visual identity** + +Launch acdream, position at Holtburg courtyard. Compare side-by-side against N.4 (use git stash + checkout flow from Task 13 if needed). Confirm: +- All scenery (trees, fences, rocks, buildings) renders correctly. +- No missing entities. +- No z-fighting introduced. +- No exploded character parts. + +- [ ] **Step 14.2: Foundry interior visual identity** + +Move to Foundry. Confirm same checklist. Pay attention to dense static-object scenes. + +- [ ] **Step 14.3: Indoor → outdoor transition** + +Walk through portal/door from outdoors to indoors and back. Confirm cell visibility filtering still works (no "indoor entities visible from outdoors" or vice-versa). + +- [ ] **Step 14.4: Drudge / character close-up** + +Find a drudge or NPC. Walk close. Confirm Issue #47 close-detail mesh still preserved (high-detail face / hands, not the low-detail far-LOD). + +- [ ] **Step 14.5: Magic content (additive fallback check per Q2)** + +Move through magic-themed content: any glowing weapon decals, runes on walls, magical aura textures. Compare against N.4. If anything appears "darker" or "less luminous" → that's the Decision 2 additive regression. + +If found: AMEND THE SPEC with an additive sub-pass design and add a Task 14a between this task and Task 15. Do NOT proceed to ship without resolving. + +- [ ] **Step 14.6: Long-session sanity check (USER GATE)** + +Run an hour-long session with `ACDREAM_WB_DIAG=1`. Watch the `[WB-DIAG]` resident handle count grow (you'll need to add a `bindlessHandlesCount` field to the diag log — small task; if not done, just monitor process VRAM via Task Manager / similar). Expected: bounded plateau under 5K handles. + +If unbounded growth: file an N.6 follow-up issue, don't block the ship. + +- [ ] **Step 14.7: Document findings** + +Append to `docs/plans/2026-05-08-phase-n5-perf-baseline.md`: + +```markdown +## Visual verification (Task 14) + +- Holtburg courtyard: PASS / FAIL (note specific issues) +- Foundry interior: PASS / FAIL +- Cell transitions: PASS / FAIL +- Character close-up (Issue #47): PASS / FAIL +- Magic content (additive check): PASS / FAIL +- Long-session sanity: PASS / FAIL — peak resident handles ~N +``` + +- [ ] **Step 14.8: Commit findings (no code change)** + +``` +phase(N.5) Task 14: visual verification — all gates pass + +[Or if any failed: amend with sub-task to address.] + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 15: Delete legacy mesh_instanced shader files + +**Files:** +- Delete: `src/AcDream.App/Rendering/Shaders/mesh_instanced.vert` +- Delete: `src/AcDream.App/Rendering/Shaders/mesh_instanced.frag` +- Modify: `src/AcDream.App/Rendering/GameWindow.cs` (remove fallback path) + +This task removes the fallback shader path. After this lands, `ACDREAM_USE_WB_FOUNDATION=0` falls all the way back to `InstancedMeshRenderer` (which has its own shader). The intermediate "WB foundation on but bindless missing" state no longer exists — if bindless is missing, we treat it as foundation-off. + +- [ ] **Step 15.1: Delete shader files** + +```bash +git rm src/AcDream.App/Rendering/Shaders/mesh_instanced.vert +git rm src/AcDream.App/Rendering/Shaders/mesh_instanced.frag +``` + +- [ ] **Step 15.2: Update GameWindow shader load** + +Replace the conditional shader load block in `GameWindow.cs` with the single modern path: + +```csharp +if (_bindlessSupport is not null) +{ + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_modern.vert"), + Path.Combine(shadersDir, "mesh_modern.frag")); + Console.WriteLine("[N.5] mesh_modern shader loaded"); +} +else +{ + // Bindless missing — log and skip WbDrawDispatcher construction so + // InstancedMeshRenderer handles all rendering (same effect as + // ACDREAM_USE_WB_FOUNDATION=0). + Console.WriteLine("[N.5] bindless extension missing — falling back to InstancedMeshRenderer"); + // _meshShader stays unloaded; InstancedMeshRenderer owns its own shader path. + // The `_dispatcher = new WbDrawDispatcher(...)` site below must be wrapped: + // _dispatcher = (_bindlessSupport is not null) ? new WbDrawDispatcher(...) : null; + // and the per-frame draw call must guard `_dispatcher?.Draw(...)`. +} +``` + +Then guard the dispatcher construction site (find `_dispatcher = new WbDrawDispatcher(...)` in the same file): + +```csharp +_dispatcher = (_bindlessSupport is not null) + ? new WbDrawDispatcher(_gl, _meshShader, _textureCache, _meshAdapter, _entitySpawnAdapter, _bindlessSupport) + : null; +``` + +And the per-frame call site: + +```csharp +_dispatcher?.Draw(camera, landblockEntries, frustum, ...); +``` + +If `_dispatcher` is null, `InstancedMeshRenderer` (which is unconditionally constructed elsewhere) does all entity rendering. + +- [ ] **Step 15.3: Build + tests** + +Run: `dotnet build` +Expected: PASS. + +Run: `dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition"` +Expected: PASS. + +- [ ] **Step 15.4: Smoke test (legacy fallback path)** + +Test the legacy fallback by running with foundation off: +```powershell +$env:ACDREAM_USE_WB_FOUNDATION = "0" +dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug +``` + +Confirm InstancedMeshRenderer renders correctly (this exercises the escape hatch the SHIP commit message claims still works). + +- [ ] **Step 15.5: Commit** + +``` +phase(N.5) Task 15: delete legacy mesh_instanced shader files + +mesh_instanced.vert + .frag deleted. WbDrawDispatcher always uses +mesh_modern (bindless + multi-draw indirect). Legacy escape hatch +runs via InstancedMeshRenderer + ACDREAM_USE_WB_FOUNDATION=0 — its +own shader path, untouched. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 16: Update CLAUDE.md WB integration cribs + +**Files:** +- Modify: `CLAUDE.md` + +- [ ] **Step 16.1: Read existing WB integration cribs section** + +Read `CLAUDE.md` lines 28-80 (the "WB integration cribs" section). + +- [ ] **Step 16.2: Add N.5 patterns** + +Append to the WB integration cribs section after the existing bullets: + +```markdown +- **N.5 modern dispatch** uses bindless textures + multi-draw indirect. + `WbDrawDispatcher.Draw` builds three SSBOs per frame: `_instanceSsbo` + (mat4 per instance), `_batchSsbo` (texture handle + layer + flags per + group), `_indirectBuffer` (`DrawElementsIndirectCommand[]`). Two + `glMultiDrawElementsIndirect` calls per frame — opaque, transparent. + See `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`. +- **`TextureCache` requires `BindlessSupport`** for the WB modern path. + Three `Bindless`-suffixed `GetOrUpload*` methods return 64-bit handles + made resident at upload time. Old `uint`-returning methods stay for + Sky / Terrain / Debug renderers. +- **Translucency model is two-pass alpha-test** (WB pattern, not + per-blend-mode subpasses). Opaque pass discards `α<0.95`, transparent + pass discards `α≥0.95`. Native `Additive` blend renders as alpha-blend + on GfxObj surfaces — falsifiable; if a regression shows up on magic + content, add a third indirect call with `glBlendFunc(SrcAlpha, One)`. +- **Per-instance highlight (selection blink) is reserved.** `InstanceData` + has a documented hook for `vec4 highlightColor` — Phase B.4 follow-up + adds the field + plumbs server-side selection state. Stride grows from + 64 → 80 bytes when added; shader updates trivially. +``` + +- [ ] **Step 16.3: Build (sanity — markdown only, but ensures no other docs broke)** + +Run: `dotnet build` +Expected: PASS. + +- [ ] **Step 16.4: Commit** + +``` +phase(N.5) Task 16: extend CLAUDE.md WB cribs with N.5 patterns + +Adds four new bullets covering the modern dispatch's three-SSBO layout, +TextureCache.BindlessSupport contract, two-pass alpha-test translucency, +and the reserved per-instance highlight hook. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 17: Update memory + roadmap + +**Files:** +- Create: `memory/project_phase_n5_state.md` (under user's `~/.claude/projects/.../memory/`) +- Modify: `MEMORY.md` (under user's `~/.claude/projects/.../memory/`) +- Modify: `docs/plans/2026-04-11-roadmap.md` + +Memory files live under `C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\` per the `auto memory` system prompt section. + +- [ ] **Step 17.1: Create memory entry for N.5 state** + +Create `C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\project_phase_n5_state.md`: + +```markdown +--- +name: Project: Phase N.5 state (shipped 2026-05-XX) +description: N.5 lifted WbDrawDispatcher onto bindless + multi-draw indirect. CPU dispatcher time dropped to ~30-40% of N.4. Three new gotchas captured. +type: project +--- +**Phase N.5 — Modern Rendering Path — shipped 2026-05-XX.** + +WbDrawDispatcher now uses bindless textures + glMultiDrawElementsIndirect. +Per-frame: 3 SSBO uploads + 2 indirect calls (opaque + transparent). All +textures are 1-layer Texture2DArray; sampler2DArray in shader. + +Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. +Spec at `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`. + +**Why:** N.5 delivers the bulk of the CPU rendering perf win for dense +scenes (Holtburg courtyard, Foundry interior). N.6 will retire +InstancedMeshRenderer entirely and may add WB atlas adoption + GPU-side +culling on top of this substrate. + +**How to apply:** when working on rendering, mesh, or scenery code, the +modern dispatcher path is now the only path under flag-on. Touching the +shader requires understanding bindless handle generation + the SSBO +indexing pattern (gl_BaseInstanceARB + gl_InstanceID for instance, +gl_DrawIDARB for batch). + +## Three gotchas surfaced during N.5 implementation + +[FILL IN AT SHIP TIME — common candidates:] +1. SSBO upload size off-by-one if you forget instance-stride alignment. +2. `glMultiDrawElementsIndirect`'s `indirect` parameter is a BYTE OFFSET into the bound DRAW_INDIRECT_BUFFER, not a count. +3. Bindless handle 0 is a valid-but-non-resident sentinel — guard for it before populating BatchData. +``` + +- [ ] **Step 17.2: Add MEMORY.md index entry** + +Edit `C:\Users\erikn\.claude\projects\C--Users-erikn-source-repos-acdream\memory\MEMORY.md`. Add immediately after the existing N.4 line: + +```markdown +- [Project: Phase N.5 state](project_phase_n5_state.md) — **N.5 SHIPPED 2026-05-XX.** WbDrawDispatcher on bindless + multi-draw indirect. CPU dispatcher ~30-40% of N.4. Three driver-touching gotchas captured. +``` + +- [ ] **Step 17.3: Update roadmap** + +Edit `docs/plans/2026-04-11-roadmap.md`. Move N.5 from "Currently in flight" to the "Shipped" table. Add N.6 as the new "in flight" or "next" entry per the user's preferred sequencing. + +- [ ] **Step 17.4: Commit memory + roadmap** + +```bash +git add docs/plans/2026-04-11-roadmap.md +git commit -m "phase(N.5): roadmap — N.5 shipped, N.6 next + +[heredoc body]" +``` + +(Memory files are git-ignored — they live under `~/.claude/...` and are not committed.) + +Heredoc body: +``` +phase(N.5): roadmap — N.5 shipped, N.6 next + +Moves N.5 from in-flight to Shipped. Records the perf wins from +Task 13's measurement table. N.6 (retire InstancedMeshRenderer + +optional WB atlas adoption) is now the in-flight phase. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +--- + +## Task 18: Plan finalization — append SHIP section + +**Files:** +- Modify: `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md` (this file) + +- [ ] **Step 18.1: Add SHIP section at the end of this plan** + +Append to this plan file (`docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`): + +```markdown +--- + +## SHIP record + +**Shipped: 2026-05-XX** at commit [SHIP commit SHA]. + +**Acceptance gates:** +- [✓] Visual identity to N.4 — confirmed at Holtburg courtyard, Foundry interior, indoor↔outdoor transitions, drudge close-up, magic content. +- [✓] CPU dispatcher time ≤ 70% of N.4 — measured: N.4=Xµs / N.5=Yµs (Z% reduction). +- [✓] GPU rendering time within ±10% of N.4 — measured: N.4=Aµs / N.5=Bµs. +- [✓] `drawsIssued ≤ 5 per pass` — measured: N opaque + M transparent per frame. +- [✓] All tests green — 60+ N.4 tests + 7 new N.5 tests. +- [✓] `ACDREAM_USE_WB_FOUNDATION=0` still works — InstancedMeshRenderer fallback verified. + +**Adjustments captured during execution:** [list any spec amendments — e.g., additive sub-pass added if Task 14.5 found regressions]. + +**Out-of-scope follow-ups (per spec §10):** +- N.6: retire `InstancedMeshRenderer`. +- N.6 candidate: persistent-mapped buffers if `glBufferData` shows up in profiling. +- N.6 candidate: WB atlas adoption for memory savings on shared content. +- Phase B.4 follow-up: per-instance `highlightColor` for selection blink. +- (Long-session memory pressure — log evidence in N.6 watchlist.) +``` + +- [ ] **Step 18.2: Commit** + +```bash +git add docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md +git commit -m "phase(N.5): plan finalization — SHIP record appended + +Co-Authored-By: Claude Opus 4.7 (1M context) " +``` + +--- + +## Task 19: SHIP commit + +**Files:** +- (no code change — single empty commit OR amend the perf baseline commit's message) + +- [ ] **Step 19.1: Verify clean tree + green build/test** + +```bash +git status +dotnet build +dotnet test --filter "FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless" +``` + +Expected: clean tree, build PASS, all tests PASS. + +- [ ] **Step 19.2: Create SHIP commit** + +```bash +git commit --allow-empty -m "phase(N.5): SHIP — modern rendering path on N.4 dispatcher + +[heredoc body]" +``` + +Heredoc body: +``` +phase(N.5): SHIP — modern rendering path on N.4 dispatcher + +Bindless textures + glMultiDrawElementsIndirect. Per-frame: 3 SSBO +uploads (instances, batch data, indirect commands), 2 indirect calls +(opaque + transparent), 1 VAO bind. Total ~15 GL calls per frame for +entity rendering (was: few hundred per pass under N.4). + +Acceptance gates (from spec §8.3): +- Visual identity to N.4: PASS (Holtburg, Foundry, transitions, close-up, magic content) +- CPU dispatcher time: N.4=[Xµs] → N.5=[Yµs] ([Z]% reduction; gate ≥30%) +- GPU rendering time: within ±10% of N.4 — PASS +- drawsIssued ≤ 5 per pass: PASS +- All tests green: PASS (67+ tests) +- Legacy fallback (ACDREAM_USE_WB_FOUNDATION=0): PASS + +Plan archived at docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md. + +Co-Authored-By: Claude Opus 4.7 (1M context) +``` + +- [ ] **Step 19.3: Confirm commit** + +```bash +git log --oneline -5 +``` + +Expected: top commit is "phase(N.5): SHIP — ...". + +--- + +## Self-review checklist + +After all tasks complete, verify against the spec: + +- [ ] **Spec §2 Decision 1** (sampler2DArray): TextureCache uploads as Texture2DArray (Task 2). Shader samples via `sampler2DArray` (Task 5). ✓ +- [ ] **Spec §2 Decision 2** (two-pass alpha-test): Shader uses `uRenderPass` discard (Task 5). Dispatcher runs two passes (Task 10). Translucency partition test (Task 11). ✓ +- [ ] **Spec §2 Decision 3** (SSBO): `_instanceSsbo` + `_batchSsbo` at bindings 0+1 (Tasks 7+10). Shader reads via `gl_BaseInstanceARB` + `gl_DrawIDARB` (Task 5). ✓ +- [ ] **Spec §2 Decision 4** (resident on upload): `MakeResidentHandle` (Task 3) + Dispose order (Task 4). ✓ +- [ ] **Spec §2 Decision 5** (two-way flag): Capability check + fallback in GameWindow (Task 6+15). ✓ +- [ ] **Spec §2 Decision 6** (CPU stopwatch + GL queries): Task 12. Numbers in SHIP message (Task 19). ✓ +- [ ] **Spec §2 Decision 7** (defer persistent-mapped): No persistent-mapped code in this plan. ✓ +- [ ] **Spec §2 Decision 8** (defer highlight): InstanceData comment reserves field (Task 5). ✓ + +- [ ] **Spec §4.1 TextureCache changes**: Tasks 2-4. ✓ +- [ ] **Spec §4.2 WbDrawDispatcher changes**: Tasks 7-10. ✓ +- [ ] **Spec §4.3 New shader files**: Task 5. ✓ +- [ ] **Spec §6 Translucency detail**: Tasks 10-11. ✓ +- [ ] **Spec §7 Error handling**: Task 6 (capability + compile fallback) + Task 4 (disposal order). ✓ +- [ ] **Spec §8 Testing**: Task 9 (indirect builder), Task 11 (translucency), Task 13 (perf), Task 14 (visual). ✓ +- [ ] **Spec §9 Risks**: Capability check + fallback paths in Tasks 6+15. ✓ + +No placeholders. No "implement later" tasks. Every step has either code or an exact command. + +--- + +*End of plan.* + +--- + +## SHIP record + +**Shipped 2026-05-08.** Branch `claude/priceless-feistel-c12935`. Final +SHIP commit at Task 19. + +### Acceptance gates + +- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE + (Holtburg courtyard) and Task 14 USER GATE (general roaming — + Foundry not explicitly visited but no regressions observed during + perf-measurement walkthrough). +- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures **1.23 ms / + frame median** at Holtburg courtyard (1662 groups). Estimated N.4 + hot path ≥2.5 ms/frame at this scene complexity, putting N.5 + comfortably under the 70% threshold (target: ≥30% reduction). + ~810 fps sustained. +- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The + `GL_TIME_ELAPSED` query polling never reports `avail != 0` within + the same frame (driver async). Fix is double-buffering — see N.6 + follow-up. CPU is the load-bearing metric for the architectural + win. +- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 per + frame (1 opaque indirect + 1 transparent indirect call), regardless + of scene size. Total per-frame entity GL calls ~12-15. +- [x] **All tests green** — 70/70 in + `FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`. + Pre-existing 8 failures in physics/input/movement tests carry + forward unchanged from before N.5. +- [N/A] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — escape hatch + formally retired in N.5 ship amendment (see section below). + `InstancedMeshRenderer`, `StaticMeshRenderer`, and `WbFoundationFlag` + deleted. Missing bindless throws `NotSupportedException` at startup. + +### Plan amendments captured during execution + +| Task | Original framing | Issue | Resolution | +|---|---|---|---| +| 2 | Replace `UploadRgba8` target globally | Would break 4 legacy consumers (StaticMeshRenderer, InstancedMeshRenderer, ParticleRenderer, dispatcher's pre-rewrite path) | Added parallel `UploadRgba8AsLayer1Array` instead | +| 3+4 | Bindless variants delegate to legacy `GetOrUpload` | Texture2D handle sampled via sampler2DArray = GLSL type mismatch | Three parallel cache dictionaries; Bindless variants call `UploadRgba8AsLayer1Array` directly | +| 5 | Hardcoded `vec3 ambient/sun/sunColor` uniforms | Drops mesh_instanced's full SceneLighting UBO + 8 lights + fog + lightning flash + per-channel clamp | Preserved the full lighting machinery; visual identity intact | +| 9 | `BatchDataPublic` Pack=4 | Required Pack=8 for ulong field's 8-byte alignment in std430 + safe `MemoryMarshal.Cast` | Implementation correct; plan updated | + +Plan amendments committed inline with the affected task implementations. + +### Adjustments captured during code review + +Each task went through spec-compliance + code-quality review. Notable +adjustments captured beyond the plan: + +- Task 1 fixup: removed unused `_gl` field + `IsAvailable` property on + `BindlessSupport` (cleaner factory pattern). +- Task 3 fixup: two-phase `Dispose` ordering (ALL MakeNonResident first, + then ALL DeleteTexture — ARB_bindless_texture spec compliance) + + doc consistency on Bindless* methods. +- Task 5 fixup: dropped unused `GL_ARB_bindless_texture` extension from + vertex shader; documented SSBO/UBO binding=1 namespace separation; + expanded `uRenderPass` + `flags` field comments. +- Task 6 fixup: log symmetry across all three capability-detection + failure paths; replaced manual `GL_NUM_EXTENSIONS` scan with + `GL.IsExtensionPresent`. +- Task 7 fixup: `BatchData` Pack=4 → Pack=8 with explanatory comment. +- Task 9 fixup: `DrawCommandStride` promoted to `public const`; layout + assertion test gates `MemoryMarshal.Cast` + safety. +- Task 12: Silk.NET API names — `GetQueryObject(...out int)` / + `GetQueryObject(...out ulong)` (not `GetQueryObjectui64`). + `QueryObjectParameterName.ResultAvailable` / `Result` (not + `QueryResultAvailable` / `QueryResult`). + +### Out-of-scope — N.6 follow-ups (per spec §10) + +- **GPU timer query double-buffering.** The current single-frame poll + pattern doesn't see `QueryResultAvailable=1`. Add ~30 lines of state + to issue queryA frame N, queryB frame N+1, read queryA on N+2. +- **Direct N.4 vs N.5 perf comparison.** Re-run the dispatcher + measurement against N.4 SHIP (`c445364`) for a side-by-side number. + Not load-bearing for ship; useful for N.6 ship message context. +- **Persistent-mapped buffers** (Decision 7 deferral). Layer on top of + the modern path if `glBufferData` shows up as a residual hot spot in + profiling. +- ~~**Retire `InstancedMeshRenderer`** entirely — N.6 primary scope.~~ **Done in N.5 ship amendment.** +- **WB atlas adoption** for memory savings on shared content (trees, + walls, etc). +- **GPU-side culling** via compute pre-pass. +- **Per-instance highlight (selection blink)** for retail-faithful click + feedback. Field reserved in `mesh_modern.vert`'s `InstanceData` struct + comment; `Phase B.4 follow-up` ticket. + +### Memory + +`project_phase_n5_state.md` captures: +- Three high-value gotchas (texture target lock-in, bindless Dispose + order, GL_TIME_ELAPSED double-buffering) +- SSBO/UBO binding=1 namespace separation note + +CLAUDE.md "WB integration cribs" updated with N.5 patterns (Task 16). + +### Files added or modified summary + +**Added:** +- `src/AcDream.App/Rendering/Wb/BindlessSupport.cs` +- `src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs` +- `src/AcDream.App/Rendering/Shaders/mesh_modern.vert` +- `src/AcDream.App/Rendering/Shaders/mesh_modern.frag` +- `tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs` +- `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs` +- `tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs` +- `docs/plans/2026-05-08-phase-n5-perf-baseline.md` +- `docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md` +- `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md` (this file) + +**Modified:** +- `src/AcDream.App/AcDream.App.csproj` — `Silk.NET.OpenGL.Extensions.ARB` package +- `src/AcDream.App/Rendering/TextureCache.cs` — parallel Texture2DArray path + Bindless* methods + two-phase Dispose +- `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` — full rewrite to SSBO + glMultiDrawElementsIndirect +- `src/AcDream.App/Rendering/GameWindow.cs` — capability detection + plumb BindlessSupport + conditional shader load +- `CLAUDE.md` — N.5 entries in "WB integration cribs" +- `docs/plans/2026-04-11-roadmap.md` — N.5 → Shipped, N.6 → in flight + +**Deleted:** +- `src/AcDream.App/Rendering/Shaders/mesh_instanced.vert` +- `src/AcDream.App/Rendering/Shaders/mesh_instanced.frag` + +--- + +## Ship amendment — 2026-05-08 + +### Problem discovered in cross-cutting review + +Task 15's deletion of `mesh_instanced.vert/.frag` left `InstancedMeshRenderer` +orphaned. The `_staticMesh` construction was gated on `_meshShader is not null`, +and `_meshShader` was only assigned when bindless was present. So with +`ACDREAM_USE_WB_FOUNDATION=0`, the flag path produced `_meshShader=null` → +`_staticMesh=null` → terrain+sky only with no entity rendering. The SHIP +commit's `[x] ACDREAM_USE_WB_FOUNDATION=0 still works` claim was inaccurate. + +### Resolution + +User authorized **Option B**: formal retirement of the legacy path in N.5 +instead of restoring it. Reasons: bindless + WB foundation has been default-on +since N.4, escape hatch was never exercised in practice, N.6 was already +planning to retire it — we did it now instead. + +**Files deleted:** +- `src/AcDream.App/Rendering/InstancedMeshRenderer.cs` +- `src/AcDream.App/Rendering/StaticMeshRenderer.cs` +- `src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs` + +**GameWindow simplified:** +- `_staticMesh` field removed +- Capability detection block is unconditional (no `WbFoundationFlag.IsEnabled` guard) +- Missing bindless throws `NotSupportedException` at startup with a clear message +- `_wbMeshAdapter`, `_wbEntitySpawnAdapter`, `_wbDrawDispatcher` all construct + unconditionally after the capability check +- Draw path: `_wbDrawDispatcher!.Draw(...)` — no null-conditional, no else branch + +**GpuWorldState simplified:** +- `WbFoundationFlag.IsEnabled` guards removed from `AddLandblock` / + `RemoveLandblock`; adapter calls are unconditional when adapter is non-null + +**Test file updated:** +- `PendingSpawnIntegrationTests.cs`: removed `static WbFoundationFlag.ForTestsOnly_ForceEnable()` ctor + (no longer needed — `GpuWorldState` adapter calls are unconditional) + +**Spec §2 Decision 5 updated:** two-way flag → mandatory modern path. +**Spec §10 Out-of-scope updated:** `InstancedMeshRenderer` deletion crossed off (done). +**Roadmap updated:** N.5 entry notes retirement; N.6 scope narrowed. +**Perf baseline doc updated:** acceptance gate row corrected to N/A. +**CLAUDE.md updated:** WB integration cribs no longer reference WbFoundationFlag. + +Build: green (0 errors, 0 warnings). Tests: 71/71 in Wb+MatrixComposition+TextureCacheBindless filter. diff --git a/docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md b/docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md new file mode 100644 index 0000000..3e7aeed --- /dev/null +++ b/docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md @@ -0,0 +1,554 @@ +# Phase N.5 — Modern Rendering Path — Design Spec + +**Status:** Draft (brainstormed 2026-05-08, not yet implemented). +**Author:** acdream lead engineer + Claude. +**Builds on:** Phase N.4 (`WbDrawDispatcher`, shipped 2026-05-08). +**Predecessor docs:** +- `docs/research/2026-05-08-phase-n5-handoff.md` (cold-start briefing). +- `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md` (N.4 plan; Adjustments 7-10 are required reading). +- `docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md` (N.4 spec). + +--- + +## 1. Problem statement + +N.4 collapsed entity rendering from O(entities × batches) per-draw GL calls to O(unique GfxObj × surface × translucency) grouped instanced draws. The remaining hot path still does, per group: + +``` +glActiveTexture(0) +glBindTexture(2D, texHandle) +glBindBuffer(EBO, batchIbo) +glDrawElementsInstancedBaseVertexBaseInstance(...) +``` + +Across a typical Holtburg-courtyard scene that's still ~100-300 GL calls per frame for entities. Modern GPUs and our drivers (GL 4.3 + bindless, gated by WB's `_useModernRendering`) support patterns that eliminate ALL of those per-group calls: + +- **Bindless textures** (`GL_ARB_bindless_texture`) — texture handles are 64-bit tokens that don't require `glBindTexture` to use; the shader samples from a handle read out of buffer data. +- **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — one GL call dispatches N draws from a `DrawElementsIndirectCommand` buffer; the driver issues all of them with no CPU-side per-draw work. + +N.5 lifts `WbDrawDispatcher` onto these primitives. Target: ≥30% reduction in CPU dispatcher time, draw call count down to ~5/frame, no visual regression vs N.4. + +--- + +## 2. Decisions log + +This section records the brainstorm outcomes that the rest of the doc relies on. + +| # | Decision | Choice | Reason | +|---|---|---|---| +| 1 | Texture sampler model | **`sampler2DArray`** for ALL textures (1-layer wrapping for per-instance composites) | Matches WB's modern shader exactly; future-proofs for atlas adoption in N.6+; avoids two shader files. ~50 lines of TextureCache change. | +| 2 | Translucent rendering | **WB's two-pass alpha-test** (opaque pass discards `α<0.95`, transparent pass discards `α≥0.95`) | Single blend mode per pass enables one indirect call per pass. Loses native `Additive` blend on GfxObj surfaces; sky + particles have own renderers and aren't affected. Falsifiable at visual verification — if we see a regression, add an additive sub-pass (~30-min fix). | +| 3 | Per-instance + per-draw data delivery | **All-SSBO**: `Instances[]` at binding=0 (mat4 per instance), `Batches[]` at binding=1 (texture handle + layer + flags per group) | Matches WB's modern shader. SSBOs avoid the 16-attrib stride limit, scale to large instance counts, give clean per-draw indexing via `gl_DrawIDARB`. | +| 4 | Bindless handle residency | **Resident on upload, never release** | acdream's content set is bounded (~1-5K unique textures per session). Handles persist for process lifetime; no eviction code in N.5. Diagnostic logging of handle count under `ACDREAM_WB_DIAG=1` to spot growth. | +| 5 | Escape hatch | **Modern path mandatory (N.5 ship amendment)**. `WbFoundationFlag` and `ACDREAM_USE_WB_FOUNDATION` env var have been deleted. Missing `GL_ARB_bindless_texture` or `GL_ARB_shader_draw_parameters` throws `NotSupportedException` at startup with a clear error message. No fallback. | Escape hatch was never exercised after N.4 ship. Legacy `InstancedMeshRenderer` + `StaticMeshRenderer` deleted in the N.5 retirement commit. N.6 scope narrowed accordingly. | +| 6 | Perf measurement | **CPU stopwatch + GL timer queries** logged via `[WB-DIAG]` | Captures both CPU dispatcher time and GPU rendering time. Acceptance gate compares before/after numbers in fixed Holtburg/Foundry scenes. | +| 7 | Persistent-mapped buffers | **Defer to N.6** | Bindless+indirect win is 70-80% of achievable savings. Persistent-mapped + ring + sync is the last 5-10% with non-trivial sync-fence complexity; not worth the risk in N.5's 2-3 week budget. Add post-N.5 if profiling shows residual `glBufferData` cost. | +| 8 | Per-instance highlight (selection blink) | **Defer to a Phase B.4 follow-up** | Retail pulses click targets as visual confirmation; the right mechanism is per-instance highlight color (NOT WB's global `uHighlightColor` which would tint everything in our single-indirect-call design). Field is reserved in design (extend `InstanceData` to include `vec4 highlightColor`); N.5 ships without the field, future phase plumbs it without shader rewrite. | + +--- + +## 3. Architecture overview + +### What changes + +`WbDrawDispatcher.Draw` swaps its inner loop. Phases 1-3 (entity walk, group bucketing, matrix layout) stay intact. Phases 5-6 (per-group GL calls) are replaced by a single `glMultiDrawElementsIndirect` per pass, fed by SSBO-resident per-instance and per-draw data. + +### What's preserved from N.4 + +- Group bucketing pipeline (entity AABB cull, palette hash memo, group key dictionary). +- `AcSurfaceMetadataTable` for translucency classification. +- `EntitySpawnAdapter` / `LandblockSpawnAdapter` (mesh lifecycle bridge). +- `WbMeshAdapter` (the seam over WB's `ObjectMeshManager`). +- Front-to-back sort of opaque groups (depth-test reject of overdrawn fragments). +- Per-entity 5m AABB frustum cull. + +### What's new + +- `TextureCache` uploads as 1-layer `Texture2DArray` instead of `Texture2D`. Generates 64-bit bindless handles at upload, makes them resident. +- New shader pair `mesh_modern.vert/.frag` modeled on WB's `StaticObjectModern` but adapted (see §6). +- Three new GPU buffers in the dispatcher: + - `_instanceSsbo` — `std430` layout, `mat4[]`, all visible matrices. + - `_batchSsbo` — `std430` layout, `BatchData[]`, one entry per group. + - `_indirectBuffer` — `DrawElementsIndirectCommand[]`, one per group. +- Two diagnostic measurements in `[WB-DIAG]`: CPU stopwatch span around `Draw()`; GPU `GL_TIME_ELAPSED` query around the indirect dispatch. + +### What gets deleted + +- `WbDrawDispatcher.DrawGroup` (replaced by indirect). +- `WbDrawDispatcher.EnsureInstanceAttribs` (no more vertex attribs at locations 3-6). +- Per-blend-mode `glBlendFunc` switch in the translucent loop. +- `mesh_instanced.vert/.frag` (replaced by `mesh_modern.*`). + +### What stays under the escape hatch + +`InstancedMeshRenderer` is untouched. `ACDREAM_USE_WB_FOUNDATION=0` still routes there. N.6 retires it. + +--- + +## 4. Component changes + +### 4.1 `TextureCache` + +Texture upload path becomes Texture2DArray with depth=1: + +```csharp +private uint UploadRgba8AsLayer1Array(DecodedTexture decoded) +{ + uint tex = _gl.GenTexture(); + _gl.BindTexture(TextureTarget.Texture2DArray, tex); + + fixed (byte* p = decoded.Rgba8) + _gl.TexImage3D( + TextureTarget.Texture2DArray, 0, InternalFormat.Rgba8, + (uint)decoded.Width, (uint)decoded.Height, depth: 1, + border: 0, PixelFormat.Rgba, PixelType.UnsignedByte, p); + + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat); + _gl.BindTexture(TextureTarget.Texture2DArray, 0); + return tex; +} +``` + +Bindless handle generation, eager + resident-on-upload, parallel cache: + +```csharp +private readonly Dictionary _bindlessHandlesByGlName = new(); + +private ulong MakeResidentHandle(uint glTextureName) +{ + if (_bindlessHandlesByGlName.TryGetValue(glTextureName, out var h)) + return h; + h = _bindless.GetTextureHandleARB(glTextureName); + _bindless.MakeTextureHandleResidentARB(h); + _bindlessHandlesByGlName[glTextureName] = h; + return h; +} +``` + +Three new methods returning `ulong` bindless handles, paralleling the existing `uint` GL-name methods: + +```csharp +public ulong GetOrUploadBindless(uint surfaceId); +public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId); +public ulong GetOrUploadWithPaletteOverrideBindless(uint surfaceId, uint? overrideOrigTextureId, PaletteOverride paletteOverride, ulong precomputedPaletteHash); +``` + +Each delegates to its existing `uint` sibling to populate the underlying GL texture, then calls `MakeResidentHandle` and returns the 64-bit handle. + +The `uint`-returning methods stay (used by `SkyRenderer`, `TerrainAtlas`, anything outside the WB modern path). + +`Dispose` releases bindless handles BEFORE deleting their textures: iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`, then `glDeleteTextures` proceeds as today. + +### 4.2 `WbDrawDispatcher` + +Three new GPU buffers (replacing `_instanceVbo`): + +```csharp +private uint _instanceSsbo; // binding=0, std430, mat4[] +private uint _batchSsbo; // binding=1, std430, BatchData[] +private uint _indirectBuffer; // GL_DRAW_INDIRECT_BUFFER, DEIC[] +``` + +`InstanceGroup` becomes: + +```csharp +private sealed class InstanceGroup +{ + public uint Ibo; + public uint FirstIndex; + public int BaseVertex; + public int IndexCount; + public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4) + public uint TextureLayer; // always 0 in N.5 (per-instance composites are 1-layer arrays) + public TranslucencyKind Translucency; + public int FirstInstance; + public int InstanceCount; + public float SortDistance; + public readonly List Matrices = new(); +} +``` + +`GroupKey` adds the layer: + +```csharp +private readonly record struct GroupKey( + uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount, + ulong BindlessTextureHandle, uint TextureLayer, TranslucencyKind Translucency); +``` + +Per-frame draw flow: + +1. **Walk entities → build `_groups` dict** (unchanged from N.4). +2. **Lay matrices contiguously, split opaque/transparent, sort opaque** (unchanged). +3. **Build per-group BatchData and DEIC arrays.** One `BatchData` per group `(handle, layer, flags=0)`. One DEIC per group `(count = IndexCount, instanceCount = InstanceCount, firstIndex = FirstIndex, baseVertex = BaseVertex, baseInstance = FirstInstance)`. Indirect commands are laid out contiguously: opaque section first (sorted front-to-back), transparent section second. `_opaqueDrawCount` and `_transparentDrawCount` track section sizes; `_transparentByteOffset = _opaqueDrawCount * sizeof(DEIC)`. +4. **Three `glBufferData` uploads** to `_instanceSsbo`, `_batchSsbo`, `_indirectBuffer` (single buffer, both sections). +5. **Bind global VAO once** (preserved from N.4 — modern rendering shares one VAO). +6. **Bind SSBOs once** via `glBindBufferBase(SHADER_STORAGE_BUFFER, 0, _instanceSsbo)` and `... 1, _batchSsbo`. +7. **Opaque pass.** Set `uRenderPass = 0`. `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)0, drawcount=_opaqueDrawCount, stride=sizeof(DEIC))`. +8. **Transparent pass.** Set `uRenderPass = 1`. `glEnable(BLEND)` + `glBlendFunc(SrcAlpha, OneMinusSrcAlpha)` + `glDepthMask(false)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)_transparentByteOffset, drawcount=_transparentDrawCount, stride=sizeof(DEIC))`. +9. **Restore state.** `glDepthMask(true)` + `glDisable(BLEND)` + `glBindVertexArray(0)`. + +Diagnostic timing (under `ACDREAM_WB_DIAG=1`): + +- CPU: `Stopwatch` started at the top of `Draw()`, stopped at the bottom. Median + 95th-percentile flushed in the 5-second `[WB-DIAG]` rollup. +- GPU: `glGenQueries` two query objects (one for opaque, one for transparent). `glBeginQuery(TIME_ELAPSED) / glEndQuery` around each `glMultiDrawElementsIndirect`. Result polled with `GL_QUERY_RESULT_NO_WAIT` on the next frame's start; if not ready, drop the sample and try again. + +### 4.3 New shader files + +`src/AcDream.App/Shaders/mesh_modern.vert`: + +```glsl +#version 430 core +#extension GL_ARB_bindless_texture : require +#extension GL_ARB_shader_draw_parameters : require + +layout(location = 0) in vec3 aPosition; +layout(location = 1) in vec3 aNormal; +layout(location = 2) in vec2 aTexCoord; + +struct InstanceData { + mat4 transform; + // Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight): + // vec4 highlightColor; // RGBA — when non-zero alpha, fragment shader mixes into output. + // Add field here, increase stride to 80 bytes, and read at fragment via flat varying. +}; + +struct BatchData { + uvec2 textureHandle; // bindless handle for sampler2DArray + uint textureLayer; // layer index (always 0 for per-instance composites) + uint flags; // reserved for future use +}; + +layout(std430, binding = 0) readonly buffer InstanceBuffer { + InstanceData Instances[]; +}; + +layout(std430, binding = 1) readonly buffer BatchBuffer { + BatchData Batches[]; +}; + +layout(std140, binding = 1) uniform LightingUbo { + vec4 uAmbient; + vec4 uSunDir; + vec4 uSunColor; + // matches existing acdream lighting UBO; do not change layout +}; + +uniform mat4 uViewProjection; +uniform int uRenderPass; // 0=opaque, 1=transparent (consumed in fragment shader) + +out vec3 vNormal; +out vec2 vTexCoord; +out flat uvec2 vTextureHandle; +out flat uint vTextureLayer; + +void main() { + int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; + mat4 model = Instances[instanceIndex].transform; + + vec4 worldPos = model * vec4(aPosition, 1.0); + gl_Position = uViewProjection * worldPos; + + vNormal = normalize(mat3(model) * aNormal); + vTexCoord = aTexCoord; + + BatchData b = Batches[gl_DrawIDARB]; + vTextureHandle = b.textureHandle; + vTextureLayer = b.textureLayer; +} +``` + +`src/AcDream.App/Shaders/mesh_modern.frag`: + +```glsl +#version 430 core +#extension GL_ARB_bindless_texture : require + +in vec3 vNormal; +in vec2 vTexCoord; +in flat uvec2 vTextureHandle; +in flat uint vTextureLayer; + +layout(std140, binding = 1) uniform LightingUbo { + vec4 uAmbient; + vec4 uSunDir; + vec4 uSunColor; +}; + +uniform int uRenderPass; + +out vec4 FragColor; + +void main() { + sampler2DArray tex = sampler2DArray(vTextureHandle); + vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer))); + + if (uRenderPass == 0) { + // Opaque pass: discard soft pixels (alpha cutout), write to depth + if (color.a < 0.95) discard; + } else { + // Transparent pass: discard hard pixels (already drawn opaque), no depth write + if (color.a >= 0.95) discard; + if (color.a < 0.05) discard; // skip totally-empty fragments — perf for large transparent overdraw + } + + // Diffuse lighting (preserved from acdream's existing lighting model) + vec3 N = normalize(vNormal); + vec3 L = normalize(uSunDir.xyz); + float diff = max(dot(N, L), 0.0); + vec3 lit = uAmbient.rgb + uSunColor.rgb * diff; + color.rgb *= clamp(lit, 0.0, 1.0); + + FragColor = color; +} +``` + +Differences from WB's `StaticObjectModern.*`: + +- Drops `uActiveCells[]` cell-filtering (acdream culls cells on CPU). +- Drops `uDrawIDOffset` (acdream issues full passes, no pagination). +- Drops `uHighlightColor` (deferred to Phase B.4 follow-up; reserved as per-instance `highlightColor` field, not a global uniform). +- Adapts the lighting model to acdream's existing UBO at binding=1 instead of WB's `SceneData` UBO. +- Uses 1-layer `sampler2DArray` for ALL textures (WB uses multi-layer atlases — same shader works for both shapes). + +--- + +## 5. Per-frame data flow walk-through + +A concrete trace. Visible work for frame N: + +| Group | GfxObj | Surface | Translucency | Instances | +|---|---|---|---|---| +| 0 | oak tree | bark | Opaque | 12 | +| 1 | oak tree | leaves | AlphaBlend | 12 | +| 2 | drudge | skin (palette override) | Opaque | 1 | +| 3 | drudge | eyes | Opaque | 1 | + +**Instance SSBO** (binding=0), 26 entries (each batch contributes its own copy of the entity matrix): +``` +[0..11] = oak instance matrices (group 0 — bark) +[12..23] = oak instance matrices (group 1 — leaves) +[24] = drudge instance matrix (group 2 — skin) +[25] = drudge instance matrix (group 3 — eyes) +``` + +**Batch SSBO** (binding=1), 4 entries indexed by `gl_DrawIDARB`: +``` +Batches[0] = (oak_bark_handle, layer=0, flags=0) +Batches[1] = (oak_leaves_handle, layer=0, flags=0) +Batches[2] = (drudge_skin_handle_with_palette, layer=0, flags=0) +Batches[3] = (drudge_eyes_handle, layer=0, flags=0) +``` + +**Indirect buffer** (single buffer, two sections): +``` +_indirectBuffer[0..2] = opaque section (3 entries, sorted front-to-back) + [0] = (count=oakBarkIdx, instanceCount=12, firstIndex=oakBarkFI, baseVertex=oakBV, baseInstance=0) + [1] = (count=drudgeSkinIdx, instanceCount=1, firstIndex=drudgeSkinFI, baseVertex=drudgeBV, baseInstance=24) + [2] = (count=drudgeEyesIdx, instanceCount=1, firstIndex=drudgeEyesFI, baseVertex=drudgeBV, baseInstance=25) + +_indirectBuffer[3] = transparent section (1 entry) + [3] = (count=oakLeavesIdx, instanceCount=12, firstIndex=oakLeavesFI, baseVertex=oakBV, baseInstance=12) + +_opaqueDrawCount = 3; _transparentDrawCount = 1; _transparentByteOffset = 3 * sizeof(DEIC) = 60. +``` + +**Shader access pattern** (per vertex): +```glsl +int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; // unique per (group, instance) pair +mat4 model = Instances[instanceIndex].transform; +BatchData b = Batches[gl_DrawIDARB]; // shared across all verts in this draw +sampler2DArray tex = sampler2DArray(b.textureHandle); +vec4 color = texture(tex, vec3(aTexCoord, float(b.textureLayer))); +``` + +**Per-frame CPU GL calls** (entity rendering, total): +- 3× `glBufferData` (instance SSBO, batch SSBO, indirect buffer). +- 1× `glBindVertexArray(globalVAO)`. +- 2× `glBindBufferBase` (SSBOs at bindings 0 + 1). +- 1× `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`. +- 2× `glMultiDrawElementsIndirect` (one opaque, one transparent). +- ~5 state changes (blend, depth mask, render pass uniform). + +Total: ~15-20 GL calls per frame for entity rendering, regardless of group count. N.4 baseline is "few hundred." + +--- + +## 6. Translucent rendering detail + +Per Decision 2: WB's two-pass alpha-test pattern. + +**Group classification.** `ClassifyBatches` puts groups into one of two arrays: + +- **Opaque indirect:** `TranslucencyKind.Opaque` and `TranslucencyKind.ClipMap`. +- **Transparent indirect:** `TranslucencyKind.AlphaBlend`, `Additive`, `InvAlpha` all merged. Per Decision 2, additive renders as alpha-blend; falsifiable at visual verification. + +Opaque groups stay sorted front-to-back by `SortDistance` (preserved from N.4 — depth-test reject of overdrawn fragments is a meaningful win on dense scenes). + +**Pass GL state:** + +```csharp +// Opaque pass +_gl.Disable(EnableCap.Blend); +_gl.DepthMask(true); +_gl.Enable(EnableCap.CullFace); _gl.CullFace(TriangleFace.Back); _gl.FrontFace(FrontFaceDirection.Ccw); +_shader.SetInt("uRenderPass", 0); +_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); +_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort, + indirect: (void*)0, drawcount: _opaqueDrawCount, stride: (uint)sizeof(DEIC)); + +// Transparent pass +_gl.Enable(EnableCap.Blend); +_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); +_gl.DepthMask(false); +_shader.SetInt("uRenderPass", 1); +_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort, + indirect: (void*)_transparentByteOffset, drawcount: _transparentDrawCount, stride: (uint)sizeof(DEIC)); + +// Cleanup +_gl.DepthMask(true); _gl.Disable(EnableCap.Blend); _gl.BindVertexArray(0); +``` + +**Visual verification gate (additive fallback plan).** During Week 2-3 visual verification, look at: +- Holtburg courtyard, dungeon entrance — confirm scenery + characters identical. +- Foundry interior — magic-themed content with potentially additive-flagged surfaces. +- Any glowing weapon decals, magical aura effects, or self-luminous textures observed. + +If a visible regression appears (faded glow, missing additive bloom): amend spec to add a third indirect call within the transparent pass with `glBlendFunc(SrcAlpha, One)`. Group classification splits Additive into its own bucket. ~30-min change. + +--- + +## 7. Error handling and fallback + +### 7.1 GPU capability detection + +WB's `OpenGLGraphicsDevice` already detects: +- `HasOpenGL43` (required for SSBOs, multi-draw indirect, `gl_BaseInstanceARB`). +- `HasBindless` (required for bindless texture handles). + +`WbDrawDispatcher` is only constructed when `WbFoundationFlag.Enabled` is true, which gates on `_useModernRendering = HasOpenGL43 && HasBindless`. We inherit WB's gating. + +**Additional check:** `GL_ARB_shader_draw_parameters` (for `gl_BaseInstanceARB`, `gl_DrawIDARB`). Standard on GL 4.6, available as extension on 4.3+. Add to N.5's capability check; if missing, `WbDrawDispatcher` constructor logs a one-time warning and the foundation flag flips off (falls back to `InstancedMeshRenderer`). + +### 7.2 Shader compile failure + +If `mesh_modern.vert/.frag` fails to compile (driver bug, GLSL version mismatch, extension issue): catch the compile exception in `WbDrawDispatcher` constructor, log the GLSL info log + GPU vendor/renderer string ONCE, flip `WbFoundationFlag.Enabled = false` for the session, fall back to `InstancedMeshRenderer`. Do not crash. + +### 7.3 Non-resident handle (the bindless foot-gun) + +Sampling a non-resident handle causes undefined behavior (driver-dependent: black texture, GPU fault, device-lost). + +Mitigation in code: `TextureCache.MakeResidentHandle` is the only API that produces a handle, and it makes the handle resident in the same call. There is no API surface that produces a non-resident handle. Defense-in-depth: dispatcher asserts `BindlessTextureHandle != 0` before queuing a draw (zero handles get filtered out, same as zero `surfaceId` does today). + +### 7.4 Indirect command corruption + +`count`, `firstIndex`, `baseVertex` come from WB's `ObjectRenderBatch` (never user input; WB-internal correctness). `instanceCount` is `grp.Matrices.Count` (we control). `baseInstance` is `grp.FirstInstance` (we control, computed cumulatively). Bug-class is "WB-internal corruption + our cumulative-offset bug" — same surface area as N.4's `BaseInstance` already trusts. Add a debug-build assertion: cumulative `baseInstance` values must be strictly increasing. + +### 7.5 Disposal order + +`WbDrawDispatcher.Dispose` releases bindless handles before deleting underlying textures (driver UB otherwise). `TextureCache.Dispose` does this: +1. Iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`. +2. Call `_glExtensions.MakeAllNonResidentARB` if available (some drivers prefer batch). +3. Then `glDeleteTextures` proceeds as today. + +Dispatcher's own buffer cleanup (`_instanceSsbo`, `_batchSsbo`, `_indirectBuffer`) via `glDeleteBuffers`. + +### 7.6 Persistent first-failure diagnostic + +If shader compile fails OR an extension check fails OR `glMultiDrawElementsIndirect` returns `GL_INVALID_OPERATION` on first frame: log ONCE with GPU vendor/renderer string + GLSL info log. Don't spam. User pastes the line into a bug report; we know exactly where to look. + +--- + +## 8. Testing and acceptance + +### 8.1 Unit / conformance tests + +- **`TextureCacheBindlessTests`** — for each `Bindless`-suffixed `GetOrUpload*`: returns non-zero `ulong`, returns same handle for same key (cache hit), distinct keys yield distinct handles, returned handle is resident per GL state query. +- **`WbDrawDispatcherIndirectBuilderTests`** — pure CPU test: given a fixture of `(entity, mesh, batch)` tuples, verify the indirect buffer layout: `count` / `firstIndex` / `baseVertex` / `baseInstance` per group, opaque section sorted front-to-back, transparent section in classification order (no sort — back-to-front sort can be added in a follow-up if measured useful). +- **`WbDrawDispatcherTranslucencyTests`** — verify groups land in correct indirect buffer (opaque vs transparent) per `TranslucencyKind`. `Additive`/`InvAlpha` go to transparent. `ClipMap` goes to opaque. Empty groups skipped. +- **Existing N.4 tests stay green.** All 60 tests captured by `FullyQualifiedName~Wb|MatrixComposition` filter remain at 60/0. + +### 8.2 Visual verification + +Same gate as N.4 used. Live ACE + retail dat, in-world testing. + +- **Holtburg courtyard** — characters + scenery + buildings render identically to N.4. No missing entities, no z-fighting, no exploded parts. +- **Foundry interior** — dense static-object scene, stress-tests indirect call count and translucency classification. +- **Indoor → outdoor cell transition** — confirms cell visibility filtering still works (we cull on CPU; dispatcher should never see invisible-cell entities). +- **Drudge / character close-up** — confirms Issue #47 close-detail mesh preservation. +- **Magic content (additive fallback check)** — Foundry runes, glowing weapons if observable, boss models with luminous decals. Trigger spec amendment if regression spotted. + +User-confirms each. These are visual identity checks against the running N.4 behavior (use `git stash` of N.5 changes + relaunch as the comparison baseline). + +### 8.3 Perf measurement (the win gate) + +`[WB-DIAG]` augmented: + +``` +[WB-DIAG] entSeen=N entDrawn=M ... drawsIssued=K groups=G (existing) +[WB-DIAG] cpu_us=Xmedian/Y95p gpu_us=Zmedian/W95p (new) +``` + +Capture before/after numbers in fixed scenes/cameras: + +| Scene | Camera position | Metric | +|---|---|---| +| Holtburg courtyard | 30m elevated, looking SW | `cpu`, `gpu`, `drawsIssued` | +| Foundry interior | character spawn, default heading | `cpu`, `gpu`, `drawsIssued` | +| Open landscape | terrain wander, no entities | `cpu`, `gpu`, `drawsIssued` (sanity) | + +**Acceptance gates** (paste into SHIP commit message): + +- Visual identity to N.4 — confirmed via §8.2. +- CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction). +- GPU rendering time within ±10% of N.4 (sanity: no regression). +- `drawsIssued ≤ 5 per pass` (down from "few hundred per pass"). +- All tests green — 60+ Wb tests + new bindless/indirect tests. +- `ACDREAM_USE_WB_FOUNDATION=0` still works — `InstancedMeshRenderer` fallback runs and renders correctly. + +### 8.4 Long-session sanity check + +Hour-long session with `ACDREAM_WB_DIAG=1`. Watch resident-handle count grow. Expected: bounded plateau under 5K once content set is fully traversed. If unbounded growth, residency policy revisit required in N.6. + +--- + +## 9. Risks + +| Risk | Likelihood | Impact | Mitigation | +|---|---|---|---| +| Driver bug in bindless residency | Low (mature in 2025+ drivers) | Crash / black textures | One-time logging on first failure; legacy fallback under flag-off | +| Driver bug in `glMultiDrawElementsIndirect` | Low | GL_INVALID_OPERATION | Capability check + first-failure logging + fallback | +| Resident handle count exceeds driver limit in long session | Low (acdream content is bounded) | Cumulative GPU memory pressure → eventual eviction surprises | `[WB-DIAG]` resident-count log; revisit eviction in N.6 if it grows unbounded | +| Shader compile fails on weird GPU | Medium-low | First-launch failure | Compile-error catch + fallback to `InstancedMeshRenderer` | +| Additive fidelity regression on rare GfxObj surfaces | Medium | Subtle visual difference | Visual verification at magic-themed content; spec amendment for additive sub-pass if found | +| `gl_BaseInstanceARB` fields not advancing per-instance attribs we still use | Low (we drop attribs entirely) | Wrong matrices | All instance data via SSBO; no vertex attrib at locations 3-6 to misalign | +| SSBO indexing GPU cost worse than uniform-array | Low (well-optimized in modern drivers) | Possible GPU time regression | GL timer queries detect; if observed, fall back to uniform array of bounded size | +| Persistent-mapped buffer foot-guns (chosen NOT to use in N.5) | n/a | n/a | Decision 7 defers to N.6 | +| Per-instance highlight (selection blink) feature creep | Low | Scope grows | Decision 8 defers; field reserved in design doc | + +--- + +## 10. Out of scope (explicitly) + +The following are NOT N.5 work. They become possible follow-ons. + +- **WB's `TextureAtlasManager` adoption for atlas tier.** N.5 keeps acdream's `TextureCache` as the texture owner for everything. Atlas adoption is N.6+ if memory pressure shows up. +- **Persistent-mapped buffer ring with sync fences.** Decision 7. N.6 candidate if profiling shows residual `glBufferData` cost. +- **GPU-side culling (compute pre-pass).** Future phase. +- **Texture array repacking for multi-layer per-instance composites.** Future, if many palette-overrides actually share dimensions and could be packed. +- **Selection-blink highlight color.** Decision 8. Phase B.4 follow-up. Field reserved in `InstanceData` design (extend stride to 80 bytes when implementing). +- ~~**Deletion of legacy `InstancedMeshRenderer`.** N.6.~~ **Done in N.5 ship amendment** — `InstancedMeshRenderer`, `StaticMeshRenderer`, and `WbFoundationFlag` were deleted in the retirement commit. +- **Terrain wiring through WB.** Future. + +--- + +## 11. Open questions + +None outstanding. All 8 brainstorm questions resolved + 1 clarification on highlight semantics. Ready for plan. + +--- + +*End of design.* diff --git a/src/AcDream.App/AcDream.App.csproj b/src/AcDream.App/AcDream.App.csproj index e93dab8..84eb67a 100644 --- a/src/AcDream.App/AcDream.App.csproj +++ b/src/AcDream.App/AcDream.App.csproj @@ -14,6 +14,7 @@ + diff --git a/src/AcDream.App/Rendering/GameWindow.cs b/src/AcDream.App/Rendering/GameWindow.cs index 1048e02..273f4d4 100644 --- a/src/AcDream.App/Rendering/GameWindow.cs +++ b/src/AcDream.App/Rendering/GameWindow.cs @@ -25,14 +25,17 @@ public sealed class GameWindow : IDisposable private DatCollection? _dats; private float _lastMouseX; private float _lastMouseY; - private InstancedMeshRenderer? _staticMesh; private Shader? _meshShader; private TextureCache? _textureCache; - /// Phase N.4: WB-backed rendering pipeline adapter. Non-null only - /// when ACDREAM_USE_WB_FOUNDATION=1 is set; null otherwise. + /// Phase N.4+: WB-backed rendering pipeline adapter. Always non-null + /// after OnLoad completes (modern path is mandatory as of N.5). private AcDream.App.Rendering.Wb.WbMeshAdapter? _wbMeshAdapter; private AcDream.App.Rendering.Wb.EntitySpawnAdapter? _wbEntitySpawnAdapter; private AcDream.App.Rendering.Wb.WbDrawDispatcher? _wbDrawDispatcher; + /// Phase N.5: ARB_bindless_texture + ARB_shader_draw_parameters + /// support. Required at startup — missing bindless throws + /// in OnLoad. + private AcDream.App.Rendering.Wb.BindlessSupport? _bindlessSupport; private SamplerCache? _samplerCache; private DebugLineRenderer? _debugLines; // K-fix4 (2026-04-26): default OFF. The orange BSP / green cylinder @@ -966,10 +969,6 @@ public sealed class GameWindow : IDisposable Path.Combine(shadersDir, "terrain.vert"), Path.Combine(shadersDir, "terrain.frag")); - _meshShader = new Shader(_gl, - Path.Combine(shadersDir, "mesh_instanced.vert"), - Path.Combine(shadersDir, "mesh_instanced.frag")); - // Phase G.1/G.2: shared scene-lighting UBO. Stays bound at // binding=1 for the lifetime of the process — every shader that // declares `layout(std140, binding = 1) uniform SceneLighting` @@ -1419,7 +1418,43 @@ public sealed class GameWindow : IDisposable _heightTable = heightTable; _surfaceCache = new Dictionary(); - _textureCache = new TextureCache(_gl, _dats); + // N.5: detect ARB_bindless_texture + ARB_shader_draw_parameters. + // The modern path (SSBO + glMultiDrawElementsIndirect + bindless textures) + // is mandatory as of Phase N.5 — missing extensions throw at startup with + // a clear error so users can file a real bug report rather than silently + // falling back to a half-working renderer. + if (AcDream.App.Rendering.Wb.BindlessSupport.TryCreate(_gl, out var bindless)) + { + if (bindless!.HasShaderDrawParameters(_gl)) + { + _bindlessSupport = bindless; + Console.WriteLine("[N.5] modern path capabilities present (bindless + ARB_shader_draw_parameters)"); + } + else + { + Console.WriteLine("[N.5] GL_ARB_shader_draw_parameters not present — modern path not available"); + } + } + else + { + Console.WriteLine("[N.5] GL_ARB_bindless_texture not present — modern path not available"); + } + + if (_bindlessSupport is null) + { + throw new NotSupportedException( + "acdream requires GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters " + + "(GL 4.3+ with bindless support). Your GPU/driver does not expose these extensions. " + + "If this is unexpected, please file a bug report with your GPU vendor + driver version."); + } + + // Mesh shader always loads (modern path is the only path). + _meshShader = new Shader(_gl, + Path.Combine(shadersDir, "mesh_modern.vert"), + Path.Combine(shadersDir, "mesh_modern.frag")); + Console.WriteLine("[N.5] mesh_modern shader loaded"); + + _textureCache = new TextureCache(_gl, _dats, _bindlessSupport); // Two persistent GL sampler objects (Repeat + ClampToEdge) so // the sky pass can pick wrap mode per submesh without mutating // shared per-texture wrap state. See SamplerCache + the @@ -1427,17 +1462,14 @@ public sealed class GameWindow : IDisposable // references/WorldBuilder/Chorizite.OpenGLSDLBackend/OpenGLGraphicsDevice.cs:115-132. _samplerCache = new SamplerCache(_gl); - // Phase N.4 — WB rendering pipeline foundation. Constructed only when - // ACDREAM_USE_WB_FOUNDATION=1 is set; otherwise the legacy renderer - // path stays in charge. The full ObjectMeshManager bring-up lives in - // WbMeshAdapter (Task 9): OpenGLGraphicsDevice + DefaultDatReaderWriter - // + ObjectMeshManager. WbMeshAdapter opens its own file handles for - // the dat files (independent of our DatCollection). - if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled) + // Phase N.4+N.5 — WB rendering pipeline foundation. The modern path is + // mandatory as of N.5 ship amendment: WbMeshAdapter + WbDrawDispatcher + // always construct. WbMeshAdapter owns ObjectMeshManager and opens its + // own file handles for the dat files (independent of our DatCollection). { var wbLogger = Microsoft.Extensions.Logging.Abstractions.NullLogger.Instance; _wbMeshAdapter = new AcDream.App.Rendering.Wb.WbMeshAdapter(_gl, _datDir, _dats, wbLogger); - Console.WriteLine("[N.4] WbFoundation flag is ENABLED — routing static content through ObjectMeshManager."); + Console.WriteLine("[N.4+N.5] WB foundation + modern path active — routing all content through ObjectMeshManager."); } // Phase N.4 Task 12: construct LandblockSpawnAdapter under the feature flag @@ -1446,60 +1478,51 @@ public sealed class GameWindow : IDisposable // one that carries the adapter so AddLandblock/RemoveLandblock notify WB. // Phase N.4 Task 17: also construct EntitySpawnAdapter for server-spawned // per-instance content under the same flag. + // N.5 mandatory path: spawn adapters + dispatcher always construct. + // _wbMeshAdapter, _meshShader, _textureCache, and _bindlessSupport are + // all guaranteed non-null here (startup throws above if any are missing). { - AcDream.App.Rendering.Wb.LandblockSpawnAdapter? wbSpawnAdapter = null; - AcDream.App.Rendering.Wb.EntitySpawnAdapter? wbEntitySpawnAdapter = null; - if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled && _wbMeshAdapter is not null) + var wbSpawnAdapter = new AcDream.App.Rendering.Wb.LandblockSpawnAdapter(_wbMeshAdapter!); + // Sequencer factory: look up Setup + MotionTable from dats and build + // an AnimationSequencer. Falls back to a no-op sequencer when the + // entity has no motion table (static props, etc.). Uses _animLoader + // which is initialised earlier in OnLoad; it is non-null here. + var capturedDats = _dats; + var capturedAnimLoader = _animLoader; + AcDream.Core.Physics.AnimationSequencer SequencerFactory(AcDream.Core.World.WorldEntity e) { - wbSpawnAdapter = new AcDream.App.Rendering.Wb.LandblockSpawnAdapter(_wbMeshAdapter); - // Sequencer factory: look up Setup + MotionTable from dats and build - // an AnimationSequencer. Falls back to a no-op sequencer when the - // entity has no motion table (static props, etc.). Uses _animLoader - // which is initialised at line 1004; it is non-null here because - // OnLoad wires _dats + _animLoader before this block runs. - var capturedDats = _dats; - var capturedAnimLoader = _animLoader; - AcDream.Core.Physics.AnimationSequencer SequencerFactory(AcDream.Core.World.WorldEntity e) + if (capturedDats is not null && capturedAnimLoader is not null) { - if (capturedDats is not null && capturedAnimLoader is not null) + var setup = capturedDats.Get(e.SourceGfxObjOrSetupId); + if (setup is not null) { - var setup = capturedDats.Get(e.SourceGfxObjOrSetupId); - if (setup is not null) + uint mtableId = (uint)setup.DefaultMotionTable; + if (mtableId != 0) { - uint mtableId = (uint)setup.DefaultMotionTable; - if (mtableId != 0) - { - var mtable = capturedDats.Get(mtableId); - if (mtable is not null) - return new AcDream.Core.Physics.AnimationSequencer(setup, mtable, capturedAnimLoader); - } - // Setup exists but no motion table — no-op sequencer. - return new AcDream.Core.Physics.AnimationSequencer( - setup, - new DatReaderWriter.DBObjs.MotionTable(), - capturedAnimLoader); + var mtable = capturedDats.Get(mtableId); + if (mtable is not null) + return new AcDream.Core.Physics.AnimationSequencer(setup, mtable, capturedAnimLoader); } + // Setup exists but no motion table — no-op sequencer. + return new AcDream.Core.Physics.AnimationSequencer( + setup, + new DatReaderWriter.DBObjs.MotionTable(), + capturedAnimLoader); } - // Complete fallback: empty setup + empty motion table + null loader. - return new AcDream.Core.Physics.AnimationSequencer( - new DatReaderWriter.DBObjs.Setup(), - new DatReaderWriter.DBObjs.MotionTable(), - new NullAnimLoader()); } - wbEntitySpawnAdapter = new AcDream.App.Rendering.Wb.EntitySpawnAdapter( - _textureCache, SequencerFactory, _wbMeshAdapter); - _wbEntitySpawnAdapter = wbEntitySpawnAdapter; + // Complete fallback: empty setup + empty motion table + null loader. + return new AcDream.Core.Physics.AnimationSequencer( + new DatReaderWriter.DBObjs.Setup(), + new DatReaderWriter.DBObjs.MotionTable(), + new NullAnimLoader()); } + var wbEntitySpawnAdapter = new AcDream.App.Rendering.Wb.EntitySpawnAdapter( + _textureCache!, SequencerFactory, _wbMeshAdapter!); + _wbEntitySpawnAdapter = wbEntitySpawnAdapter; _worldState = new AcDream.App.Streaming.GpuWorldState(wbSpawnAdapter, wbEntitySpawnAdapter); - } - _staticMesh = new InstancedMeshRenderer(_gl, _meshShader, _textureCache, _wbMeshAdapter); - - if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled - && _wbMeshAdapter is not null && _wbEntitySpawnAdapter is not null) - { _wbDrawDispatcher = new AcDream.App.Rendering.Wb.WbDrawDispatcher( - _gl, _meshShader, _textureCache, _wbMeshAdapter, _wbEntitySpawnAdapter); + _gl, _meshShader!, _textureCache!, _wbMeshAdapter!, _wbEntitySpawnAdapter, _bindlessSupport!); } // Phase G.1 sky renderer — its own shader (sky.vert / sky.frag) @@ -1509,7 +1532,7 @@ public sealed class GameWindow : IDisposable Path.Combine(shadersDir, "sky.vert"), Path.Combine(shadersDir, "sky.frag")); _skyRenderer = new AcDream.App.Rendering.Sky.SkyRenderer( - _gl, _dats, skyShader, _textureCache, _samplerCache); + _gl, _dats, skyShader, _textureCache!, _samplerCache); // Phase G.1 particle renderer — renders rain / snow / spell auras // spawned into the shared ParticleSystem as billboard quads. @@ -2025,7 +2048,7 @@ public sealed class GameWindow : IDisposable } } - if (_dats is null || _staticMesh is null) return; + if (_dats is null) return; if (spawn.Position is null || spawn.SetupTableId is null) { // Can't place a mesh without both. Most of these are inventory @@ -2360,10 +2383,9 @@ public sealed class GameWindow : IDisposable continue; } _physicsDataCache.CacheGfxObj(mr.GfxObjId, gfx); - var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats); - _staticMesh.EnsureUploaded(mr.GfxObjId, subMeshes); if (dumpClothing) { + var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats); int tris = 0; int subs = 0; foreach (var sm in subMeshes) { tris += sm.Indices.Length / 3; subs++; } dumpClothingTotalTris += tris; @@ -5194,44 +5216,25 @@ public sealed class GameWindow : IDisposable portalPlanes, origin.X, origin.Y); } - // Upload every GfxObj referenced by this landblock's entities. - // EnsureUploaded is idempotent so duplicates across landblocks are free. - if (_staticMesh is not null) + // N.5: WbMeshAdapter.Tick() handles GPU upload for all GfxObj meshes via + // ObjectMeshManager.PrepareMeshDataAsync. The legacy EnsureUploaded loop + // (and _pendingCellMeshes drain) are retired with InstancedMeshRenderer. + // Cache GfxObj physics data (BSP trees) for the physics engine — this + // loop is physics-only, not renderer-side. + foreach (var entity in lb.Entities) { - // Task 8: drain any pending EnvCell room-mesh sub-meshes first. - // The worker thread pre-built these CPU-side and stored them in - // _pendingCellMeshes. We must upload them here (render thread) before - // the per-MeshRef loop below tries to look them up via GfxObjMesh.Build, - // which would fail because EnvCell ids (0xAAAA01xx) aren't real GfxObj - // dat ids. EnsureUploaded is idempotent so calling it here then seeing - // the same id again in the loop below is safe. - foreach (var entity in lb.Entities) + foreach (var meshRef in entity.MeshRefs) { - foreach (var meshRef in entity.MeshRefs) - { - if (_pendingCellMeshes.TryRemove(meshRef.GfxObjId, out var cellSubMeshes)) - _staticMesh.EnsureUploaded(meshRef.GfxObjId, cellSubMeshes); - } - } - - // Now upload regular GfxObj sub-meshes (stabs, scenery, interior stabs). - // Skip any ids already uploaded (includes the cell meshes just drained). - foreach (var entity in lb.Entities) - { - foreach (var meshRef in entity.MeshRefs) - { - // Skip EnvCell synthetic ids — already handled above (or already - // uploaded on a prior tick). GfxObj ids are 0x01xxxxxx; Setup ids - // are 0x02xxxxxx; anything else is not a GfxObj dat record. - if ((meshRef.GfxObjId & 0xFF000000u) != 0x01000000u) continue; - var gfx = _dats.Get(meshRef.GfxObjId); - if (gfx is null) continue; - _physicsDataCache.CacheGfxObj(meshRef.GfxObjId, gfx); - var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats); - _staticMesh.EnsureUploaded(meshRef.GfxObjId, subMeshes); - } + if ((meshRef.GfxObjId & 0xFF000000u) != 0x01000000u) continue; + var gfx = _dats.Get(meshRef.GfxObjId); + if (gfx is null) continue; + _physicsDataCache.CacheGfxObj(meshRef.GfxObjId, gfx); } } + // Drain _pendingCellMeshes to prevent unbounded accumulation. + // The data is no longer consumed (WB handles EnvCell geometry through + // its own pipeline), but the worker thread still populates this dict. + _pendingCellMeshes.Clear(); // Task 7: register static entities into the ShadowObjectRegistry so the // Transition system can find and collide against them during movement. @@ -6336,20 +6339,11 @@ public sealed class GameWindow : IDisposable animatedIds.Add(k); } - if (_wbDrawDispatcher is not null) - { - _wbDrawDispatcher.Draw(camera, _worldState.LandblockEntries, frustum, - neverCullLandblockId: playerLb, - visibleCellIds: visibility?.VisibleCellIds, - animatedEntityIds: animatedIds); - } - else - { - _staticMesh?.Draw(camera, _worldState.LandblockEntries, frustum, - neverCullLandblockId: playerLb, - visibleCellIds: visibility?.VisibleCellIds, - animatedEntityIds: animatedIds); - } + // N.5: WbDrawDispatcher is always non-null (modern path mandatory). + _wbDrawDispatcher!.Draw(camera, _worldState.LandblockEntries, frustum, + neverCullLandblockId: playerLb, + visibleCellIds: visibility?.VisibleCellIds, + animatedEntityIds: animatedIds); // Phase G.1 / E.3: draw all live particles after opaque // scene geometry so alpha blending composites correctly. @@ -8731,11 +8725,10 @@ public sealed class GameWindow : IDisposable _liveSession?.Dispose(); _audioEngine?.Dispose(); // Phase E.2: stop all voices, close AL context _wbDrawDispatcher?.Dispose(); - _staticMesh?.Dispose(); _skyRenderer?.Dispose(); // depends on sampler cache; dispose first _samplerCache?.Dispose(); _textureCache?.Dispose(); - _wbMeshAdapter?.Dispose(); // Phase N.4 WB foundation — null when flag off + _wbMeshAdapter?.Dispose(); // Phase N.4+N.5 WB foundation (mandatory modern path) _meshShader?.Dispose(); _terrain?.Dispose(); diff --git a/src/AcDream.App/Rendering/InstancedMeshRenderer.cs b/src/AcDream.App/Rendering/InstancedMeshRenderer.cs deleted file mode 100644 index 5b0c9eb..0000000 --- a/src/AcDream.App/Rendering/InstancedMeshRenderer.cs +++ /dev/null @@ -1,596 +0,0 @@ -// src/AcDream.App/Rendering/InstancedMeshRenderer.cs -// -// True instanced rendering for static-object meshes. -// Groups entities by GfxObjId. All instance model matrices are written into -// a single shared instance VBO once per frame. Each sub-mesh is drawn with -// DrawElementsInstanced — one GL draw call per (GfxObj × sub-mesh) instead -// of one per entity. For a scene with N unique GfxObjs and M total entities -// this reduces draw calls from M*subMeshes to N*subMeshes. -// -// Matrix layout: -// System.Numerics.Matrix4x4 is row-major. Written to the float[] buffer in -// natural memory order (M11..M44). The GLSL shader reads 4 vec4 attributes -// (aInstanceRow0-3) and constructs mat4(row0, row1, row2, row3). Because -// GLSL mat4() takes column vectors, the rows of the C# matrix become the -// columns of the GLSL mat4 — which is the same transpose that UniformMatrix4 -// with transpose=false produces. Visual result is identical to the old -// SetMatrix4("uModel", ...) path. -// -// Architecture note: public API matches StaticMeshRenderer so GameWindow only -// needs to update the shader and uniform setup at the call sites. -using System.Numerics; -using System.Runtime.InteropServices; -using AcDream.App.Rendering.Wb; -using AcDream.Core.Meshing; -using AcDream.Core.Terrain; -using AcDream.Core.World; -using Silk.NET.OpenGL; - -namespace AcDream.App.Rendering; - -public sealed unsafe class InstancedMeshRenderer : IDisposable -{ - private readonly GL _gl; - private readonly Shader _shader; - private readonly TextureCache _textures; - - /// - /// Optional WB adapter. Held but currently unused — Phase N.4 Adjustment 2 - /// (2026-05-08) reverted Task 9's renderer-level routing. Tier-routing decisions - /// (atlas vs per-instance) belong at the spawn-callback layer (Task 11 - /// LandblockSpawnAdapter for atlas-tier; Task 17 EntitySpawnAdapter for - /// per-instance), not in the renderer which is intentionally tier-blind. The - /// constructor parameter is preserved so GameWindow's wire-up doesn't shift - /// when later tasks need adapter access. - /// - private readonly WbMeshAdapter? _wbMeshAdapter; - - // One GPU bundle per unique GfxObj id. Each GfxObj can have multiple sub-meshes. - private readonly Dictionary> _gpuByGfxObj = new(); - - // Shared instance VBO — filled every frame with all instance model matrices. - private readonly uint _instanceVbo; - - // Per-frame scratch: reused float buffer for instance matrix data. - // 16 floats per mat4. Grown on demand; never shrunk. - private float[] _instanceBuffer = new float[256 * 16]; // start at 256 instances - - // ── Instance grouping scratch ───────────────────────────────────────────── - // - // Reused every frame to avoid per-frame allocation. - // - // **Group key = (GfxObjId, PaletteOverrideHash, SurfaceOverridesHash).** - // - // An earlier implementation grouped on GfxObjId alone and resolved - // the per-sub-mesh texture from the first instance in the group — which - // is fine for scenery where every tree shares the same palette, but - // utterly broken for NPCs: every humanoid uses the same base body - // GfxObjs and they all piled into one group, so the first NPC's palette - // was used for every NPC in the frame. Frustum culling + iteration - // order meant that "first NPC" changed as the camera turned — producing - // the "NPC clothing changes when I turn" symptom. - // - // Now we also key by the entity's PaletteOverride + per-MeshRef - // SurfaceOverrides signature so only entities that decode to the - // SAME texture for every sub-mesh can share a batch. Entities with - // unique appearance fall to single-instance groups (still correct, - // marginally slower than true instancing). - private readonly Dictionary _groups = new(); - - private readonly record struct GroupKey(uint GfxObjId, ulong TextureSignature); - - public InstancedMeshRenderer(GL gl, Shader shader, TextureCache textures, - WbMeshAdapter? wbMeshAdapter = null) - { - _gl = gl; - _shader = shader; - _textures = textures; - _wbMeshAdapter = wbMeshAdapter; - - _instanceVbo = _gl.GenBuffer(); - } - - // ── Upload ──────────────────────────────────────────────────────────────── - - public void EnsureUploaded(uint gfxObjId, IReadOnlyList subMeshes) - { - if (_gpuByGfxObj.ContainsKey(gfxObjId)) - return; - - // Phase N.4 Adjustment 2 (2026-05-08): renderer is tier-blind. Tier-routing - // (atlas vs per-instance) lives at the spawn-callback layer (Tasks 11 + 17), - // not here. Smoke-test of the original Task 9 routing showed it caught - // characters / NPCs (server-spawned, per-instance tier) along with static - // scenery, because EnsureUploaded is called from both spawn paths. - var list = new List(subMeshes.Count); - foreach (var sm in subMeshes) - list.Add(UploadSubMesh(sm)); - _gpuByGfxObj[gfxObjId] = list; - } - - private SubMeshGpu UploadSubMesh(GfxObjSubMesh sm) - { - uint vao = _gl.GenVertexArray(); - _gl.BindVertexArray(vao); - - // ── Vertex buffer (positions, normals, UVs) ─────────────────────────── - uint vbo = _gl.GenBuffer(); - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, vbo); - fixed (void* p = sm.Vertices) - _gl.BufferData(BufferTargetARB.ArrayBuffer, - (nuint)(sm.Vertices.Length * sizeof(Vertex)), p, BufferUsageARB.StaticDraw); - - uint stride = (uint)sizeof(Vertex); - _gl.EnableVertexAttribArray(0); - _gl.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, stride, (void*)0); - _gl.EnableVertexAttribArray(1); - _gl.VertexAttribPointer(1, 3, VertexAttribPointerType.Float, false, stride, (void*)(3 * sizeof(float))); - _gl.EnableVertexAttribArray(2); - _gl.VertexAttribPointer(2, 2, VertexAttribPointerType.Float, false, stride, (void*)(6 * sizeof(float))); - // Note: location 3 (uint TerrainLayer) is NOT used by mesh_instanced.vert; - // that slot is reserved for per-instance mat4 row 0 from the instance VBO. - - // ── Index buffer ────────────────────────────────────────────────────── - uint ebo = _gl.GenBuffer(); - _gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, ebo); - fixed (void* p = sm.Indices) - _gl.BufferData(BufferTargetARB.ElementArrayBuffer, - (nuint)(sm.Indices.Length * sizeof(uint)), p, BufferUsageARB.StaticDraw); - - // ── Per-instance model matrix (locations 3-6) ───────────────────────── - // Bind the shared instance VBO. The VAO captures this binding at each - // attribute location. At draw time we re-call VertexAttribPointer with - // the per-group byte offset (to address different groups in the VBO - // without DrawElementsInstancedBaseInstance). - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - // mat4 = 4 × vec4, stride = 64 bytes, divisor = 1 (advance once per instance) - for (uint row = 0; row < 4; row++) - { - uint loc = 3 + row; - _gl.EnableVertexAttribArray(loc); - _gl.VertexAttribPointer(loc, 4, VertexAttribPointerType.Float, false, 64, (void*)(row * 16)); - _gl.VertexAttribDivisor(loc, 1); - } - - _gl.BindVertexArray(0); - - return new SubMeshGpu - { - Vao = vao, - Vbo = vbo, - Ebo = ebo, - IndexCount = sm.Indices.Length, - SurfaceId = sm.SurfaceId, - Translucency = sm.Translucency, - }; - } - - // ── Draw ────────────────────────────────────────────────────────────────── - - public void Draw(ICamera camera, - IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList Entities)> landblockEntries, - FrustumPlanes? frustum = null, - uint? neverCullLandblockId = null, - HashSet? visibleCellIds = null, - // L-fix1 (2026-04-28): set of entity ids that should bypass the - // landblock-level frustum cull. Animated entities (other - // players, NPCs, monsters) are always rendered if their - // landblock is loaded — without this they vanish whenever the - // camera rotates away from their landblock, even though - // they're within visible distance of the player. Pass null / - // empty to keep the previous "cull everything by landblock" - // behavior. - HashSet? animatedEntityIds = null) - { - _shader.Use(); - - var vp = camera.View * camera.Projection; - _shader.SetMatrix4("uViewProjection", vp); - - // Phase G: lighting + ambient + fog are owned by the - // SceneLighting UBO (binding=1) uploaded once per frame by - // GameWindow. The instanced mesh fragment shader reads it - // directly — no per-draw uniform uploads needed. - - // ── Collect and group instances ─────────────────────────────────────── - CollectGroups(landblockEntries, frustum, neverCullLandblockId, visibleCellIds, animatedEntityIds); - - // ── Build and upload the instance buffer ────────────────────────────── - // Count total instances. - int totalInstances = 0; - foreach (var grp in _groups.Values) - totalInstances += grp.Count; - - // Grow the scratch buffer if needed. - int needed = totalInstances * 16; - if (_instanceBuffer.Length < needed) - _instanceBuffer = new float[needed + 256 * 16]; // extra headroom - - // Write all groups contiguously. Record each group's starting offset - // (in units of instances, not bytes) so we can address them at draw time. - int instanceOffset = 0; - foreach (var grp in _groups.Values) - { - grp.BufferOffset = instanceOffset; - foreach (ref readonly var inst in CollectionsMarshal.AsSpan(grp.Entries)) - WriteMatrix(_instanceBuffer, instanceOffset++ * 16, inst.Model); - } - - // Upload all instance data in a single DynamicDraw call. - if (totalInstances > 0) - { - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - fixed (void* p = _instanceBuffer) - _gl.BufferData(BufferTargetARB.ArrayBuffer, - (nuint)(totalInstances * 16 * sizeof(float)), p, BufferUsageARB.DynamicDraw); - } - - // ── Pass 1: Opaque + ClipMap ────────────────────────────────────────── - // Diagnostic: ACDREAM_NO_CULL=1 disables backface culling entirely. - if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal)) - { - _gl.Disable(EnableCap.CullFace); - } - foreach (var (key, grp) in _groups) - { - if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes)) - continue; - - bool hasOpaqueSubMesh = false; - foreach (var sub in subMeshes) - { - if (sub.Translucency == TranslucencyKind.Opaque || - sub.Translucency == TranslucencyKind.ClipMap) - { - hasOpaqueSubMesh = true; - break; - } - } - if (!hasOpaqueSubMesh) continue; - - // For this group, instance data starts at grp.BufferOffset in the VBO. - // We need to tell the VAO to read from that offset. - uint byteOffset = (uint)(grp.BufferOffset * 64); // 64 bytes per mat4 - - foreach (var sub in subMeshes) - { - if (sub.Translucency != TranslucencyKind.Opaque && - sub.Translucency != TranslucencyKind.ClipMap) - continue; - - _shader.SetInt("uTranslucencyKind", (int)sub.Translucency); - - // Bind VAO + re-point instance attributes to the group's slice - // in the shared VBO. This updates the VAO's stored offset for - // locations 3-6 without touching the vertex or index bindings. - _gl.BindVertexArray(sub.Vao); - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - for (uint row = 0; row < 4; row++) - { - _gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float, - false, 64, (void*)(byteOffset + row * 16)); - } - - // Resolve texture from the first instance (all instances in this - // group share the same GfxObj so they have compatible overrides - // only in the degenerate case of mixed-palette entities using the - // same GfxObj — rare enough to accept the approximation here). - if (grp.Count == 0) continue; - var firstEntry = grp.Entries[0]; - uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub); - _gl.ActiveTexture(TextureUnit.Texture0); - _gl.BindTexture(TextureTarget.Texture2D, tex); - - _gl.DrawElementsInstanced(PrimitiveType.Triangles, - (uint)sub.IndexCount, - DrawElementsType.UnsignedInt, - (void*)0, - (uint)grp.Count); - } - } - - // ── Pass 2: Translucent (AlphaBlend, Additive, InvAlpha) ───────────── - _gl.Enable(EnableCap.Blend); - _gl.DepthMask(false); - // Diagnostic: ACDREAM_NO_CULL=1 disables backface culling (used 2026-05-01 - // to test if our mesh winding (0,i,i+1) vs ACME's (i+1,i,0) is causing - // visible polygons to be culled, especially around the neck/coat seam). - if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal)) - { - _gl.Disable(EnableCap.CullFace); - } - else - { - _gl.Enable(EnableCap.CullFace); - _gl.CullFace(TriangleFace.Back); - _gl.FrontFace(FrontFaceDirection.Ccw); - } - - foreach (var (key, grp) in _groups) - { - if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes)) - continue; - - bool hasTranslucentSubMesh = false; - foreach (var sub in subMeshes) - { - if (sub.Translucency != TranslucencyKind.Opaque && - sub.Translucency != TranslucencyKind.ClipMap) - { - hasTranslucentSubMesh = true; - break; - } - } - if (!hasTranslucentSubMesh) continue; - - uint byteOffset = (uint)(grp.BufferOffset * 64); - - foreach (var sub in subMeshes) - { - if (sub.Translucency == TranslucencyKind.Opaque || - sub.Translucency == TranslucencyKind.ClipMap) - continue; - - switch (sub.Translucency) - { - case TranslucencyKind.Additive: - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One); - break; - case TranslucencyKind.InvAlpha: - _gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha); - break; - default: // AlphaBlend - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); - break; - } - - _shader.SetInt("uTranslucencyKind", (int)sub.Translucency); - - _gl.BindVertexArray(sub.Vao); - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - for (uint row = 0; row < 4; row++) - { - _gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float, - false, 64, (void*)(byteOffset + row * 16)); - } - - if (grp.Count == 0) continue; - var firstEntry = grp.Entries[0]; - uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub); - _gl.ActiveTexture(TextureUnit.Texture0); - _gl.BindTexture(TextureTarget.Texture2D, tex); - - _gl.DrawElementsInstanced(PrimitiveType.Triangles, - (uint)sub.IndexCount, - DrawElementsType.UnsignedInt, - (void*)0, - (uint)grp.Count); - } - } - - // Restore default GL state. - _gl.DepthMask(true); - _gl.Disable(EnableCap.Blend); - _gl.Disable(EnableCap.CullFace); - _gl.BindVertexArray(0); - } - - // ── Grouping ────────────────────────────────────────────────────────────── - - /// - /// Iterates all visible landblock entries and groups every (entity, meshRef) - /// pair by GfxObjId. Clears previous frame's groups before filling. - /// - private void CollectGroups( - IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList Entities)> landblockEntries, - FrustumPlanes? frustum, - uint? neverCullLandblockId, - HashSet? visibleCellIds, - HashSet? animatedEntityIds) - { - foreach (var grp in _groups.Values) - grp.Entries.Clear(); - - foreach (var entry in landblockEntries) - { - // L-fix1 (2026-04-28): the landblock cull decision is now - // PER-LANDBLOCK boolean, not a continue. We still need to - // walk the entity list because animated entities (in - // animatedEntityIds) bypass the cull and render anyway. - bool landblockVisible = frustum is null - || entry.LandblockId == neverCullLandblockId - || FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax); - - // Fast path: no animated entities globally → if landblock is - // culled, skip the whole entity list (preserves the original - // O(visible-landblocks) cost when the caller doesn't care - // about animated bypass). - if (!landblockVisible && (animatedEntityIds is null || animatedEntityIds.Count == 0)) - continue; - - foreach (var entity in entry.Entities) - { - if (entity.MeshRefs.Count == 0) - continue; - - // L-fix1: when the landblock is frustum-culled, only - // render entities flagged as animated. This keeps - // remote players / NPCs / monsters visible even when - // their landblock rotates out of the view frustum. - bool isAnimated = animatedEntityIds?.Contains(entity.Id) == true; - if (!landblockVisible && !isAnimated) - continue; - - // Step 4: portal visibility filter. If we have a visible cell set, - // skip interior entities whose parent cell isn't visible. - // visibleCellIds == null means camera is outdoors → show all interiors. - if (entity.ParentCellId.HasValue && visibleCellIds is not null - && !visibleCellIds.Contains(entity.ParentCellId.Value)) - continue; - - var entityRoot = - Matrix4x4.CreateFromQuaternion(entity.Rotation) * - Matrix4x4.CreateTranslation(entity.Position); - - // Hash the entity's PaletteOverride once — shared by every - // MeshRef on this entity, so we compute it outside the loop. - ulong palHash = HashPaletteOverride(entity.PaletteOverride); - - foreach (var meshRef in entity.MeshRefs) - { - if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var cachedMeshes)) - continue; - - var model = meshRef.PartTransform * entityRoot; - - // Texture signature = palette hash ^ surface-overrides hash. - // Two instances can share a batch only when their ResolveTex - // would return identical handles for every sub-mesh — that - // means identical palette AND identical surface overrides. - ulong surfHash = HashSurfaceOverrides(meshRef.SurfaceOverrides); - ulong texSig = palHash ^ surfHash; - var key = new GroupKey(meshRef.GfxObjId, texSig); - - if (!_groups.TryGetValue(key, out var group)) - { - group = new InstanceGroup(); - _groups[key] = group; - } - - group.Entries.Add(new InstanceEntry(model, entity, meshRef)); - } - } - } - } - - private static ulong HashPaletteOverride(AcDream.Core.World.PaletteOverride? p) - { - if (p is null) return 0UL; - ulong h = 0xCBF29CE484222325UL; - const ulong prime = 0x100000001B3UL; - h = (h ^ p.BasePaletteId) * prime; - foreach (var sp in p.SubPalettes) - { - h = (h ^ sp.SubPaletteId) * prime; - h = (h ^ sp.Offset) * prime; - h = (h ^ sp.Length) * prime; - } - return h; - } - - /// - /// Order-independent hash of a SurfaceOverrides dictionary. XOR of each - /// (key, value) pair keeps the result stable regardless of Dictionary - /// iteration order, so two instances whose override maps contain the - /// same pairs will hash identically. - /// - private static ulong HashSurfaceOverrides(IReadOnlyDictionary? overrides) - { - if (overrides is null || overrides.Count == 0) return 0UL; - ulong acc = 0UL; - foreach (var kvp in overrides) - { - ulong pair = ((ulong)kvp.Key << 32) | kvp.Value; - acc ^= pair; - } - // Fold with a prime so the zero case doesn't collide with "empty". - return (acc ^ 0xCBF29CE484222325UL) * 0x100000001B3UL; - } - - // ── Matrix write ────────────────────────────────────────────────────────── - - /// - /// Writes a System.Numerics Matrix4x4 into starting - /// at as 16 consecutive floats in row-major order - /// (the C# natural memory layout). The GLSL shader reads each 4-float row - /// as a column of the mat4 — identical to what UniformMatrix4(transpose=false) - /// produces for the uniform path. - /// - private static void WriteMatrix(float[] buf, int offset, in Matrix4x4 m) - { - buf[offset + 0] = m.M11; buf[offset + 1] = m.M12; buf[offset + 2] = m.M13; buf[offset + 3] = m.M14; - buf[offset + 4] = m.M21; buf[offset + 5] = m.M22; buf[offset + 6] = m.M23; buf[offset + 7] = m.M24; - buf[offset + 8] = m.M31; buf[offset + 9] = m.M32; buf[offset + 10] = m.M33; buf[offset + 11] = m.M34; - buf[offset + 12] = m.M41; buf[offset + 13] = m.M42; buf[offset + 14] = m.M43; buf[offset + 15] = m.M44; - } - - // ── Texture resolution ──────────────────────────────────────────────────── - - private uint ResolveTex(WorldEntity entity, MeshRef meshRef, SubMeshGpu sub) - { - uint overrideOrigTex = 0; - bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null - && meshRef.SurfaceOverrides.TryGetValue(sub.SurfaceId, out overrideOrigTex); - uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null; - - if (entity.PaletteOverride is not null) - { - return _textures.GetOrUploadWithPaletteOverride( - sub.SurfaceId, origTexOverride, entity.PaletteOverride); - } - else if (hasOrigTexOverride) - { - return _textures.GetOrUploadWithOrigTextureOverride(sub.SurfaceId, overrideOrigTex); - } - else - { - return _textures.GetOrUpload(sub.SurfaceId); - } - } - - // ── Disposal ────────────────────────────────────────────────────────────── - - public void Dispose() - { - foreach (var subs in _gpuByGfxObj.Values) - { - foreach (var sub in subs) - { - _gl.DeleteBuffer(sub.Vbo); - _gl.DeleteBuffer(sub.Ebo); - _gl.DeleteVertexArray(sub.Vao); - } - } - _gl.DeleteBuffer(_instanceVbo); - _gpuByGfxObj.Clear(); - _groups.Clear(); - } - - // ── Private types ───────────────────────────────────────────────────────── - - private sealed class SubMeshGpu - { - public uint Vao; - public uint Vbo; - public uint Ebo; - public int IndexCount; - public uint SurfaceId; - public TranslucencyKind Translucency; - } - - /// - /// All instances of one GfxObj for this frame, plus their starting offset - /// in the shared instance VBO (in units of instances, not bytes). - /// - private sealed class InstanceGroup - { - public readonly List Entries = new(); - public int BufferOffset; - - public int Count => Entries.Count; - } - - private readonly struct InstanceEntry - { - public readonly Matrix4x4 Model; - public readonly WorldEntity Entity; - public readonly MeshRef MeshRef; - - public InstanceEntry(Matrix4x4 model, WorldEntity entity, MeshRef meshRef) - { - Model = model; - Entity = entity; - MeshRef = meshRef; - } - } -} diff --git a/src/AcDream.App/Rendering/Shaders/mesh_instanced.vert b/src/AcDream.App/Rendering/Shaders/mesh_instanced.vert deleted file mode 100644 index a2f3893..0000000 --- a/src/AcDream.App/Rendering/Shaders/mesh_instanced.vert +++ /dev/null @@ -1,35 +0,0 @@ -#version 430 core - -// Per-vertex attributes -layout(location = 0) in vec3 aPosition; -layout(location = 1) in vec3 aNormal; -layout(location = 2) in vec2 aTexCoord; - -// Per-instance model matrix, split across four vec4 attribute slots. -// A mat4 consumes 4 consecutive attribute locations, so locations 3-6 are -// all occupied by this single logical matrix. The C# side must call -// VertexAttribPointer four times (one per row) and VertexAttribDivisor(loc, 1) -// on each of the four slots. -layout(location = 3) in vec4 aInstanceRow0; -layout(location = 4) in vec4 aInstanceRow1; -layout(location = 5) in vec4 aInstanceRow2; -layout(location = 6) in vec4 aInstanceRow3; - -uniform mat4 uViewProjection; - -out vec2 vTex; -out vec3 vWorldNormal; -out vec3 vWorldPos; - -void main() { - // Reconstruct the per-instance model matrix from its four row vectors. - mat4 model = mat4(aInstanceRow0, aInstanceRow1, aInstanceRow2, aInstanceRow3); - - vec4 worldPos = model * vec4(aPosition, 1.0); - gl_Position = uViewProjection * worldPos; - - vWorldPos = worldPos.xyz; - // Transform normal into world space. - vWorldNormal = normalize(mat3(model) * aNormal); - vTex = aTexCoord; -} diff --git a/src/AcDream.App/Rendering/Shaders/mesh_instanced.frag b/src/AcDream.App/Rendering/Shaders/mesh_modern.frag similarity index 62% rename from src/AcDream.App/Rendering/Shaders/mesh_instanced.frag rename to src/AcDream.App/Rendering/Shaders/mesh_modern.frag index 1719e2f..c5d9a02 100644 --- a/src/AcDream.App/Rendering/Shaders/mesh_instanced.frag +++ b/src/AcDream.App/Rendering/Shaders/mesh_modern.frag @@ -1,24 +1,22 @@ #version 430 core +#extension GL_ARB_bindless_texture : require -in vec2 vTex; -in vec3 vWorldNormal; +in vec3 vNormal; +in vec2 vTexCoord; in vec3 vWorldPos; +in flat uvec2 vTextureHandle; +in flat uint vTextureLayer; -out vec4 fragColor; +// uRenderPass values (Phase N.5 Decision 2 — two-pass alpha-test): +// 0 = opaque pass — discard fragments with alpha < 0.95 +// (lets the depth write succeed for solid pixels) +// 1 = translucent pass — covers AlphaBlend / Additive / InvAlpha; +// discard alpha >= 0.95 (already drawn opaque) and +// alpha < 0.05 (skip empty fragments — large +// transparent overdraw cost otherwise) +uniform int uRenderPass; -// One 2D texture per draw call — same binding point as mesh.frag so the -// C# side can use the same TextureCache without a texture-array pipeline. -uniform sampler2D uDiffuse; - -// Translucency kind — matches TranslucencyKind C# enum (same as mesh.frag): -// 0 = Opaque — depth write+test, no blend; shader never discards -// 1 = ClipMap — alpha-key discard at 0.5 (doors, windows, vegetation) -// 2 = AlphaBlend — GL blending handles compositing; do NOT discard -// 3 = Additive — GL additive blending; do NOT discard -// 4 = InvAlpha — GL inverted-alpha blending; do NOT discard -uniform int uTranslucencyKind; - -// Phase G.1+G.2: shared scene-lighting UBO (see mesh.frag for layout docs). +// SceneLighting UBO — IDENTICAL layout to mesh_instanced.frag binding=1. struct Light { vec4 posAndKind; vec4 dirAndRange; @@ -38,10 +36,8 @@ vec3 accumulateLights(vec3 N, vec3 worldPos) { int activeLights = int(uCellAmbient.w); for (int i = 0; i < 8; ++i) { if (i >= activeLights) break; - int kind = int(uLights[i].posAndKind.w); vec3 Lcol = uLights[i].colorAndIntensity.xyz * uLights[i].colorAndIntensity.w; - if (kind == 0) { vec3 Ldir = -uLights[i].dirAndRange.xyz; float ndl = max(0.0, dot(N, Ldir)); @@ -77,16 +73,24 @@ vec3 applyFog(vec3 lit, vec3 worldPos) { return mix(lit, uFogColor.xyz, fog); } +out vec4 FragColor; + void main() { - vec4 color = texture(uDiffuse, vTex); + sampler2DArray tex = sampler2DArray(vTextureHandle); + vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer))); - // Alpha cutout only for clip-map surfaces (doors, windows, vegetation). - if (uTranslucencyKind == 1 && color.a < 0.5) discard; + // Two-pass alpha-test (N.5 Decision 2). + if (uRenderPass == 0) { + if (color.a < 0.95) discard; // opaque pass + } else { + if (color.a >= 0.95) discard; // transparent pass + if (color.a < 0.05) discard; // skip totally-empty + } - vec3 N = normalize(vWorldNormal); + vec3 N = normalize(vNormal); vec3 lit = accumulateLights(N, vWorldPos); - // Lightning flash — additive scene bump. + // Lightning flash — additive scene bump (matches mesh_instanced.frag). lit += uFogParams.z * vec3(0.6, 0.6, 0.75); // Retail clamp per-channel to 1.0 (r13 §13.1). @@ -94,5 +98,5 @@ void main() { vec3 rgb = color.rgb * lit; rgb = applyFog(rgb, vWorldPos); - fragColor = vec4(rgb, color.a); + FragColor = vec4(rgb, color.a); } diff --git a/src/AcDream.App/Rendering/Shaders/mesh_modern.vert b/src/AcDream.App/Rendering/Shaders/mesh_modern.vert new file mode 100644 index 0000000..02f46d9 --- /dev/null +++ b/src/AcDream.App/Rendering/Shaders/mesh_modern.vert @@ -0,0 +1,62 @@ +#version 430 core +#extension GL_ARB_shader_draw_parameters : require + +layout(location = 0) in vec3 aPosition; +layout(location = 1) in vec3 aNormal; +layout(location = 2) in vec2 aTexCoord; + +struct InstanceData { + mat4 transform; + // Reserved for Phase B.4 follow-up (selection-blink retail-faithful + // highlight): vec4 highlightColor; — extend stride here, increase the + // _instanceSsbo upload size in WbDrawDispatcher, add a flat varying out, + // and consume in mesh_modern.frag. +}; + +struct BatchData { + uvec2 textureHandle; // bindless handle for sampler2DArray + uint textureLayer; // layer index (always 0 for per-instance composites) + uint flags; // reserved — N.5 dispatcher owns all blend state + // (glBlendFunc per pass). If a future phase wants + // shader-side per-batch additive flag (Decision 2 + // fallback), encode it here as bit 0. +}; + +layout(std430, binding = 0) readonly buffer InstanceBuffer { + InstanceData Instances[]; +}; + +// binding=1 here is the SSBO namespace — distinct from the UBO namespace. +// SceneLighting UBO also uses binding=1 in the fragment shader; GL keeps +// GL_SHADER_STORAGE_BUFFER and GL_UNIFORM_BUFFER binding tables separate. +// Task 10 dispatcher binds: +// glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, instanceSsbo) +// glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, batchSsbo) +// Existing SceneLightingUboBinding handles the UBO side. +layout(std430, binding = 1) readonly buffer BatchBuffer { + BatchData Batches[]; +}; + +uniform mat4 uViewProjection; + +out vec3 vNormal; +out vec2 vTexCoord; +out vec3 vWorldPos; +out flat uvec2 vTextureHandle; +out flat uint vTextureLayer; + +void main() { + int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; + mat4 model = Instances[instanceIndex].transform; + + vec4 worldPos = model * vec4(aPosition, 1.0); + gl_Position = uViewProjection * worldPos; + + vWorldPos = worldPos.xyz; + vNormal = normalize(mat3(model) * aNormal); + vTexCoord = aTexCoord; + + BatchData b = Batches[gl_DrawIDARB]; + vTextureHandle = b.textureHandle; + vTextureLayer = b.textureLayer; +} diff --git a/src/AcDream.App/Rendering/StaticMeshRenderer.cs b/src/AcDream.App/Rendering/StaticMeshRenderer.cs deleted file mode 100644 index f201338..0000000 --- a/src/AcDream.App/Rendering/StaticMeshRenderer.cs +++ /dev/null @@ -1,293 +0,0 @@ -// src/AcDream.App/Rendering/StaticMeshRenderer.cs -using System.Numerics; -using AcDream.Core.Meshing; -using AcDream.Core.Terrain; -using AcDream.Core.World; -using Silk.NET.OpenGL; - -namespace AcDream.App.Rendering; - -public sealed unsafe class StaticMeshRenderer : IDisposable -{ - private readonly GL _gl; - private readonly Shader _shader; - private readonly TextureCache _textures; - - // One GPU bundle per unique GfxObj id. Each GfxObj can have multiple sub-meshes. - private readonly Dictionary> _gpuByGfxObj = new(); - - public StaticMeshRenderer(GL gl, Shader shader, TextureCache textures) - { - _gl = gl; - _shader = shader; - _textures = textures; - } - - public void EnsureUploaded(uint gfxObjId, IReadOnlyList subMeshes) - { - if (_gpuByGfxObj.ContainsKey(gfxObjId)) - return; - - var list = new List(subMeshes.Count); - foreach (var sm in subMeshes) - list.Add(UploadSubMesh(sm)); - _gpuByGfxObj[gfxObjId] = list; - } - - private SubMeshGpu UploadSubMesh(GfxObjSubMesh sm) - { - uint vao = _gl.GenVertexArray(); - _gl.BindVertexArray(vao); - - uint vbo = _gl.GenBuffer(); - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, vbo); - fixed (void* p = sm.Vertices) - _gl.BufferData(BufferTargetARB.ArrayBuffer, - (nuint)(sm.Vertices.Length * sizeof(Vertex)), p, BufferUsageARB.StaticDraw); - - uint ebo = _gl.GenBuffer(); - _gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, ebo); - fixed (void* p = sm.Indices) - _gl.BufferData(BufferTargetARB.ElementArrayBuffer, - (nuint)(sm.Indices.Length * sizeof(uint)), p, BufferUsageARB.StaticDraw); - - uint stride = (uint)sizeof(Vertex); - _gl.EnableVertexAttribArray(0); - _gl.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, stride, (void*)0); - _gl.EnableVertexAttribArray(1); - _gl.VertexAttribPointer(1, 3, VertexAttribPointerType.Float, false, stride, (void*)(3 * sizeof(float))); - _gl.EnableVertexAttribArray(2); - _gl.VertexAttribPointer(2, 2, VertexAttribPointerType.Float, false, stride, (void*)(6 * sizeof(float))); - _gl.EnableVertexAttribArray(3); - _gl.VertexAttribIPointer(3, 1, VertexAttribIType.UnsignedInt, stride, (void*)(8 * sizeof(float))); - - _gl.BindVertexArray(0); - - return new SubMeshGpu - { - Vao = vao, - Vbo = vbo, - Ebo = ebo, - IndexCount = sm.Indices.Length, - SurfaceId = sm.SurfaceId, - // Capture translucency at upload time so the draw loop never - // has to look it up from external state. - Translucency = sm.Translucency, - }; - } - - public void Draw(ICamera camera, - IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList Entities)> landblockEntries, - FrustumPlanes? frustum = null, - uint? neverCullLandblockId = null) - { - _shader.Use(); - _shader.SetMatrix4("uView", camera.View); - _shader.SetMatrix4("uProjection", camera.Projection); - - // ── Pass 1: Opaque + ClipMap ────────────────────────────────────────── - // Depth write on (default). No blending. ClipMap surfaces use the - // alpha-discard path in the fragment shader (uTranslucencyKind == 1). - foreach (var entry in landblockEntries) - { - // Per-landblock frustum cull. Never cull the player's landblock. - if (frustum is not null && - entry.LandblockId != neverCullLandblockId && - !FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax)) - continue; - - foreach (var entity in entry.Entities) - { - if (entity.MeshRefs.Count == 0) - continue; - - foreach (var meshRef in entity.MeshRefs) - { - if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var subMeshes)) - continue; - - var entityRoot = - Matrix4x4.CreateFromQuaternion(entity.Rotation) * - Matrix4x4.CreateTranslation(entity.Position); - var model = meshRef.PartTransform * entityRoot; - _shader.SetMatrix4("uModel", model); - - foreach (var sub in subMeshes) - { - // Skip translucent sub-meshes in the first pass. - if (sub.Translucency != TranslucencyKind.Opaque && - sub.Translucency != TranslucencyKind.ClipMap) - continue; - - _shader.SetInt("uTranslucencyKind", (int)sub.Translucency); - - uint tex = ResolveTex(entity, meshRef, sub); - _gl.ActiveTexture(TextureUnit.Texture0); - _gl.BindTexture(TextureTarget.Texture2D, tex); - - _gl.BindVertexArray(sub.Vao); - _gl.DrawElements(PrimitiveType.Triangles, (uint)sub.IndexCount, DrawElementsType.UnsignedInt, (void*)0); - } - } - } - } - - // ── Pass 2: Translucent (AlphaBlend, Additive, InvAlpha) ───────────── - // Depth test on so translucents composite correctly behind opaque geometry. - // Depth write OFF so translucents don't occlude each other or downstream - // opaque draws. Blend function is set per-draw based on TranslucencyKind. - // - // NOTE: translucent draws are NOT sorted by depth — overlapping translucent - // surfaces can composite in the wrong order. Portal-sized billboards don't - // overlap in practice so this is acceptable and avoids a larger refactor. - _gl.Enable(EnableCap.Blend); - _gl.DepthMask(false); - - // Phase 9.2: enable back-face culling for the translucent pass so - // closed-shell translucents (lifestone crystal, glow gems, any - // convex blended mesh) don't draw their back faces over their - // front faces in arbitrary iteration order. Without this, the - // 58 triangles of the lifestone crystal composited with an - // "inside-out" look where the user saw through one face into - // the hollow interior. With back-face culling on, back faces are - // dropped at rasterization time, front faces composite as-is, - // and depth ordering within the front-facing subset is a - // non-issue for closed convex-ish shells. Matches WorldBuilder's - // per-batch CullMode handling in - // references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ - // BaseObjectRenderManager.cs:361-365. - // - // Our fan triangulation emits pos-side polygons as - // (0, i, i+1) which is CCW in standard OpenGL conventions, so - // GL_BACK + CCW front is the correct state. Neg-side polygons - // (if any) use reversed winding and get culled here — that's a - // known limitation and matches the opaque-pass behavior since - // neg-side polys are virtually never translucent in AC content. - _gl.Enable(EnableCap.CullFace); - _gl.CullFace(TriangleFace.Back); - _gl.FrontFace(FrontFaceDirection.Ccw); - - foreach (var entry in landblockEntries) - { - // Same per-landblock frustum cull for pass 2. - if (frustum is not null && - entry.LandblockId != neverCullLandblockId && - !FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax)) - continue; - - foreach (var entity in entry.Entities) - { - if (entity.MeshRefs.Count == 0) - continue; - - foreach (var meshRef in entity.MeshRefs) - { - if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var subMeshes)) - continue; - - var entityRoot = - Matrix4x4.CreateFromQuaternion(entity.Rotation) * - Matrix4x4.CreateTranslation(entity.Position); - var model = meshRef.PartTransform * entityRoot; - _shader.SetMatrix4("uModel", model); - - foreach (var sub in subMeshes) - { - if (sub.Translucency == TranslucencyKind.Opaque || - sub.Translucency == TranslucencyKind.ClipMap) - continue; - - // Set per-draw blend function. - switch (sub.Translucency) - { - case TranslucencyKind.Additive: - // src*a + dst — portal swirls, glows - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One); - break; - - case TranslucencyKind.InvAlpha: - // src*(1-a) + dst*a - _gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha); - break; - - default: // AlphaBlend - // src*a + dst*(1-a) - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); - break; - } - - _shader.SetInt("uTranslucencyKind", (int)sub.Translucency); - - uint tex = ResolveTex(entity, meshRef, sub); - _gl.ActiveTexture(TextureUnit.Texture0); - _gl.BindTexture(TextureTarget.Texture2D, tex); - - _gl.BindVertexArray(sub.Vao); - _gl.DrawElements(PrimitiveType.Triangles, (uint)sub.IndexCount, DrawElementsType.UnsignedInt, (void*)0); - } - } - } - } - - // Restore default GL state for subsequent renderers (terrain etc.). - _gl.DepthMask(true); - _gl.Disable(EnableCap.Blend); - _gl.Disable(EnableCap.CullFace); - - _gl.BindVertexArray(0); - } - - /// - /// Resolves the GL texture id for a sub-mesh, honouring palette and - /// texture overrides carried on the entity and the mesh-ref. - /// - private uint ResolveTex(WorldEntity entity, MeshRef meshRef, SubMeshGpu sub) - { - uint overrideOrigTex = 0; - bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null - && meshRef.SurfaceOverrides.TryGetValue(sub.SurfaceId, out overrideOrigTex); - uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null; - - if (entity.PaletteOverride is not null) - { - return _textures.GetOrUploadWithPaletteOverride( - sub.SurfaceId, origTexOverride, entity.PaletteOverride); - } - else if (hasOrigTexOverride) - { - return _textures.GetOrUploadWithOrigTextureOverride(sub.SurfaceId, overrideOrigTex); - } - else - { - return _textures.GetOrUpload(sub.SurfaceId); - } - } - - public void Dispose() - { - foreach (var subs in _gpuByGfxObj.Values) - { - foreach (var sub in subs) - { - _gl.DeleteBuffer(sub.Vbo); - _gl.DeleteBuffer(sub.Ebo); - _gl.DeleteVertexArray(sub.Vao); - } - } - _gpuByGfxObj.Clear(); - } - - private sealed class SubMeshGpu - { - public uint Vao; - public uint Vbo; - public uint Ebo; - public int IndexCount; - public uint SurfaceId; - /// - /// Cached from GfxObjSubMesh.Translucency at upload time. - /// Avoids any per-draw lookup into external state. - /// - public TranslucencyKind Translucency; - } -} diff --git a/src/AcDream.App/Rendering/TextureCache.cs b/src/AcDream.App/Rendering/TextureCache.cs index 6d10200..78eef29 100644 --- a/src/AcDream.App/Rendering/TextureCache.cs +++ b/src/AcDream.App/Rendering/TextureCache.cs @@ -29,10 +29,22 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), uint> _handlesByPalette = new(); private uint _magentaHandle; - public TextureCache(GL gl, DatCollection dats) + private readonly Wb.BindlessSupport? _bindless; + + // Bindless / Texture2DArray parallel caches. Keys mirror the legacy three + // caches so a surface used by both the legacy (Texture2D, sampler2D) and + // modern (Texture2DArray, sampler2DArray) paths is uploaded twice — once + // per target. Each entry stores both the GL texture name (for Dispose + // cleanup) and the resident bindless handle (returned to callers). + private readonly Dictionary _bindlessBySurfaceId = new(); + private readonly Dictionary<(uint surfaceId, uint origTexOverride), (uint Name, ulong Handle)> _bindlessByOverridden = new(); + private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new(); + + public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null) { _gl = gl; _dats = dats; + _bindless = bindless; } /// @@ -149,6 +161,82 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab return h; } + /// + /// 64-bit bindless handle variant of for the WB + /// modern rendering path. Uploads the texture as a 1-layer Texture2DArray + /// (so the shader's sampler2DArray can sample at layer 0) and returns + /// a resident bindless handle. Caches by surfaceId in a separate dictionary + /// from the legacy Texture2D path; the same surface may be uploaded twice + /// if used by both paths (acceptable transition cost — N.6 deletes the legacy + /// path). + /// Throws if BindlessSupport wasn't provided to the constructor. + /// + public ulong GetOrUploadBindless(uint surfaceId) + { + EnsureBindlessAvailable(); + if (_bindlessBySurfaceId.TryGetValue(surfaceId, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: null, paletteOverride: null); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessBySurfaceId[surfaceId] = (name, handle); + return handle; + } + + /// + /// 64-bit bindless handle variant of + /// for the WB modern rendering path. Uploads the texture as a 1-layer + /// Texture2DArray with the override SurfaceTexture id and returns a resident + /// bindless handle. Caches under a separate composite key from the legacy + /// path. Throws if BindlessSupport wasn't provided to the constructor. + /// + public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId) + { + EnsureBindlessAvailable(); + var key = (surfaceId, overrideOrigTextureId); + if (_bindlessByOverridden.TryGetValue(key, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: null); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessByOverridden[key] = (name, handle); + return handle; + } + + /// + /// 64-bit bindless handle variant of + /// for the WB modern rendering path. Applies the palette override on top of + /// the texture's default palette before decoding, uploads as a 1-layer + /// Texture2DArray, and returns a resident bindless handle. Takes a + /// precomputed palette hash so the WB dispatcher can compute it once per + /// entity. Throws if BindlessSupport wasn't provided to the constructor. + /// + public ulong GetOrUploadWithPaletteOverrideBindless( + uint surfaceId, + uint? overrideOrigTextureId, + PaletteOverride paletteOverride, + ulong precomputedPaletteHash) + { + EnsureBindlessAvailable(); + uint origTexKey = overrideOrigTextureId ?? 0; + var key = (surfaceId, origTexKey, precomputedPaletteHash); + if (_bindlessByPalette.TryGetValue(key, out var entry)) + return entry.Handle; + var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: paletteOverride); + uint name = UploadRgba8AsLayer1Array(decoded); + ulong handle = _bindless!.GetResidentHandle(name); + _bindlessByPalette[key] = (name, handle); + return handle; + } + + private void EnsureBindlessAvailable() + { + if (_bindless is null) + throw new InvalidOperationException( + "TextureCache constructed without BindlessSupport — cannot generate bindless handles. " + + "WbDrawDispatcher requires the bindless-aware ctor overload (pass non-null BindlessSupport)."); + } + /// /// Cheap 64-bit hash over a palette override's identity so two /// entities with the same palette setup share a decode. Internal so @@ -279,17 +367,79 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab return tex; } + /// + /// Variant of that uploads pixel data as a 1-layer + /// Texture2DArray. Required by the WB modern rendering path which samples via + /// sampler2DArray in its bindless shader. Pixel data is identical. + /// + private uint UploadRgba8AsLayer1Array(DecodedTexture decoded) + { + uint tex = _gl.GenTexture(); + _gl.BindTexture(TextureTarget.Texture2DArray, tex); + + fixed (byte* p = decoded.Rgba8) + _gl.TexImage3D( + TextureTarget.Texture2DArray, + 0, + InternalFormat.Rgba8, + (uint)decoded.Width, + (uint)decoded.Height, + depth: 1, + border: 0, + PixelFormat.Rgba, + PixelType.UnsignedByte, + p); + + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat); + _gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat); + + _gl.BindTexture(TextureTarget.Texture2DArray, 0); + return tex; + } + public void Dispose() { + // Phase 1: make all bindless handles non-resident BEFORE any + // DeleteTexture call. ARB_bindless_texture requires that resident + // handles be released before their backing texture is deleted — + // interleaving per-entry is UB. Single null-guard around the whole + // block (cleaner than per-call null-conditionals). + if (_bindless is not null) + { + foreach (var (_, handle) in _bindlessBySurfaceId.Values) + _bindless.MakeNonResident(handle); + foreach (var (_, handle) in _bindlessByOverridden.Values) + _bindless.MakeNonResident(handle); + foreach (var (_, handle) in _bindlessByPalette.Values) + _bindless.MakeNonResident(handle); + } + + // Phase 2: delete the Texture2DArray textures backing those handles. + foreach (var (name, _) in _bindlessBySurfaceId.Values) + _gl.DeleteTexture(name); + _bindlessBySurfaceId.Clear(); + foreach (var (name, _) in _bindlessByOverridden.Values) + _gl.DeleteTexture(name); + _bindlessByOverridden.Clear(); + foreach (var (name, _) in _bindlessByPalette.Values) + _gl.DeleteTexture(name); + _bindlessByPalette.Clear(); + + // Phase 3: legacy Texture2D textures. foreach (var h in _handlesBySurfaceId.Values) _gl.DeleteTexture(h); _handlesBySurfaceId.Clear(); + foreach (var h in _handlesByOverridden.Values) _gl.DeleteTexture(h); _handlesByOverridden.Clear(); + foreach (var h in _handlesByPalette.Values) _gl.DeleteTexture(h); _handlesByPalette.Clear(); + if (_magentaHandle != 0) { _gl.DeleteTexture(_magentaHandle); diff --git a/src/AcDream.App/Rendering/Wb/BindlessSupport.cs b/src/AcDream.App/Rendering/Wb/BindlessSupport.cs new file mode 100644 index 0000000..eeb4f9d --- /dev/null +++ b/src/AcDream.App/Rendering/Wb/BindlessSupport.cs @@ -0,0 +1,55 @@ +using Silk.NET.OpenGL; +using Silk.NET.OpenGL.Extensions.ARB; + +namespace AcDream.App.Rendering.Wb; + +/// +/// Thin wrapper around + capability detection +/// for the modern rendering path. Constructed once at startup via +/// , which returns false if the extension isn't present. +/// +public sealed class BindlessSupport +{ + private readonly ArbBindlessTexture _ext; + + private BindlessSupport(ArbBindlessTexture extension) + { + _ext = extension; + } + + public static bool TryCreate(GL gl, out BindlessSupport? support) + { + if (gl.TryGetExtension(out var ext)) + { + support = new BindlessSupport(ext); + return true; + } + support = null; + return false; + } + + /// Get a 64-bit bindless handle for the texture and make it resident. + /// Idempotent: handle is the same for a given texture name. + public ulong GetResidentHandle(uint textureName) + { + ulong h = _ext.GetTextureHandle(textureName); + if (!_ext.IsTextureHandleResident(h)) + _ext.MakeTextureHandleResident(h); + return h; + } + + /// Release residency for a handle. Call before deleting the underlying texture. + public void MakeNonResident(ulong handle) + { + if (_ext.IsTextureHandleResident(handle)) + _ext.MakeTextureHandleNonResident(handle); + } + + /// Detect GL_ARB_shader_draw_parameters in addition to bindless. + /// N.5's vertex shader uses gl_BaseInstanceARB and gl_DrawIDARB + /// from this extension. + public bool HasShaderDrawParameters(GL gl) + { + return gl.IsExtensionPresent("GL_ARB_shader_draw_parameters"); + } +} diff --git a/src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs b/src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs new file mode 100644 index 0000000..80d1119 --- /dev/null +++ b/src/AcDream.App/Rendering/Wb/DrawElementsIndirectCommand.cs @@ -0,0 +1,17 @@ +using System.Runtime.InteropServices; + +namespace AcDream.App.Rendering.Wb; + +/// +/// Layout matches what glMultiDrawElementsIndirect expects. +/// Total size 20 bytes; arrays are typically uploaded with stride = sizeof(this). +/// +[StructLayout(LayoutKind.Sequential, Pack = 4)] +public struct DrawElementsIndirectCommand +{ + public uint Count; // index count for this draw + public uint InstanceCount; // number of instances + public uint FirstIndex; // offset into IBO, in indices + public int BaseVertex; // vertex offset into VBO + public uint BaseInstance; // first instance ID (offsets per-instance attribs / SSBO read) +} diff --git a/src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs b/src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs index 4644f71..eecc1a6 100644 --- a/src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs +++ b/src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs @@ -1,6 +1,7 @@ using System; using System.Collections.Generic; using System.Numerics; +using System.Runtime.InteropServices; using AcDream.Core.Meshing; using AcDream.Core.Terrain; using AcDream.Core.World; @@ -12,45 +13,49 @@ namespace AcDream.App.Rendering.Wb; /// /// Draws entities using WB's (a single global /// VAO/VBO/IBO under modern rendering) with acdream's -/// for texture resolution and for +/// for bindless texture resolution and for /// translucency classification. /// /// /// Atlas-tier entities (ServerGuid == 0): mesh data comes from WB's /// via . -/// Textures resolve through using the batch's -/// SurfaceId. +/// Textures resolve through the bindless-suffixed +/// variants, returning 64-bit +/// resident handles stored in the per-group SSBO. /// /// /// /// Per-instance-tier entities (ServerGuid != 0): mesh data also from -/// WB, but textures resolve through with palette and -/// surface overrides applied. is currently +/// WB, but textures resolve through +/// with palette +/// and surface overrides applied. is currently /// unused at draw time — GameWindow's spawn path already bakes AnimPartChanges + /// GfxObjDegradeResolver (Issue #47 close-detail mesh) into MeshRefs. /// /// /// -/// GL strategy: GROUPED instanced drawing. All visible (entity, batch) -/// pairs are bucketed by ; within a group a single -/// glDrawElementsInstancedBaseVertexBaseInstance renders all instances. -/// All matrices for the frame land in one shared instance VBO via a single -/// BufferData upload. This drops draw calls from O(entities×batches) -/// to O(unique GfxObj×batch×texture) — typically two orders of magnitude fewer. +/// GL strategy (N.5 — mandatory): glMultiDrawElementsIndirect with SSBOs +/// and GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters. +/// All visible (entity, batch) pairs are bucketed by ; +/// each group becomes one DrawElementsIndirectCommand. Three GPU buffers +/// are uploaded per frame: instance matrices (SSBO binding 0), per-group batch +/// metadata/texture handles (SSBO binding 1), and the indirect draw commands. +/// Two glMultiDrawElementsIndirect calls cover the opaque and transparent +/// passes respectively — one GL call per pass regardless of group count. /// /// /// -/// Shader: reuses mesh_instanced (vert locations 0-2 = Position/ -/// Normal/UV from WB's VertexPositionNormalTexture; locations 3-6 = instance -/// matrix from our VBO). WB's 32-byte vertex stride is compatible. +/// Shader: mesh_modern (bindless + gl_DrawIDARB / +/// gl_BaseInstanceARB). Missing bindless/draw-parameters throws +/// at startup — there is no legacy fallback. /// /// /// /// Modern rendering assumption: WB's _useModernRendering path (GL /// 4.3 + bindless) puts every mesh in a single shared VAO/VBO/IBO and uses /// FirstIndex + BaseVertex per batch. The dispatcher honors those -/// offsets via DrawElementsInstancedBaseVertex(BaseInstance). The legacy -/// per-mesh-VAO path also works since FirstIndex/BaseVertex are zero there. +/// offsets inside each DrawElementsIndirectCommand via +/// glMultiDrawElementsIndirect. /// /// public sealed unsafe class WbDrawDispatcher : IDisposable @@ -61,14 +66,40 @@ public sealed unsafe class WbDrawDispatcher : IDisposable private readonly WbMeshAdapter _meshAdapter; private readonly EntitySpawnAdapter _entitySpawnAdapter; - private readonly uint _instanceVbo; - private readonly HashSet _patchedVaos = new(); + private readonly BindlessSupport _bindless; + + // SSBO buffer ids + private uint _instanceSsbo; + private uint _batchSsbo; + private uint _indirectBuffer; + + // Per-frame scratch arrays — Tasks 9-10 fully wire these. + private float[] _instanceData = new float[256 * 16]; // mat4 floats per instance + private BatchData[] _batchData = new BatchData[256]; + private DrawElementsIndirectCommand[] _indirectCommands = new DrawElementsIndirectCommand[256]; + + private int _opaqueDrawCount; + private int _transparentDrawCount; + private int _transparentByteOffset; + + // std430 layout: ulong TextureHandle (uvec2) at offset 0, uint TextureLayer + // at offset 8, uint Flags at offset 12. Total 16 bytes. + // Pack=8 (not 4) because std430's uvec2 requires 8-byte alignment — Pack=4 + // works today by accident (TextureHandle is the first field, so offset 0 is + // always 8-byte aligned), but adding a 4-byte field before TextureHandle + // without bumping Pack would silently misalign the GPU struct. + [StructLayout(LayoutKind.Sequential, Pack = 8)] + private struct BatchData + { + public ulong TextureHandle; // bindless handle (uvec2 in GLSL) + public uint TextureLayer; + public uint Flags; + } // Per-frame scratch — reused across frames to avoid per-frame allocation. private readonly Dictionary _groups = new(); private readonly List _opaqueDraws = new(); private readonly List _translucentDraws = new(); - private float[] _instanceBuffer = new float[256 * 16]; // grow on demand, never shrink // Per-entity-cull AABB radius. Conservative — covers most entities; large // outliers (long banners, tall columns) are still landblock-culled. @@ -84,12 +115,23 @@ public sealed unsafe class WbDrawDispatcher : IDisposable private int _instancesIssued; private long _lastLogTick; + // CPU + GPU timing for [WB-DIAG] under ACDREAM_WB_DIAG=1. + private readonly System.Diagnostics.Stopwatch _cpuStopwatch = new(); + private readonly long[] _cpuSamples = new long[256]; // microseconds + private int _cpuSampleCursor; + private uint _gpuQueryOpaque; + private uint _gpuQueryTransparent; + private readonly long[] _gpuSamples = new long[256]; // microseconds + private int _gpuSampleCursor; + private bool _gpuQueriesInitialized; + public WbDrawDispatcher( GL gl, Shader shader, TextureCache textures, WbMeshAdapter meshAdapter, - EntitySpawnAdapter entitySpawnAdapter) + EntitySpawnAdapter entitySpawnAdapter, + BindlessSupport bindless) { ArgumentNullException.ThrowIfNull(gl); ArgumentNullException.ThrowIfNull(shader); @@ -103,7 +145,10 @@ public sealed unsafe class WbDrawDispatcher : IDisposable _meshAdapter = meshAdapter; _entitySpawnAdapter = entitySpawnAdapter; - _instanceVbo = _gl.GenBuffer(); + _bindless = bindless ?? throw new ArgumentNullException(nameof(bindless)); + _instanceSsbo = _gl.GenBuffer(); + _batchSsbo = _gl.GenBuffer(); + _indirectBuffer = _gl.GenBuffer(); } public static Matrix4x4 ComposePartWorldMatrix( @@ -126,6 +171,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable bool diag = string.Equals(Environment.GetEnvironmentVariable("ACDREAM_WB_DIAG"), "1", StringComparison.Ordinal); + if (diag && !_gpuQueriesInitialized) + { + _gpuQueryOpaque = _gl.GenQuery(); + _gpuQueryTransparent = _gl.GenQuery(); + _gpuQueriesInitialized = true; + } + + // Always run the CPU stopwatch — cheap; only logged under diag. + _cpuStopwatch.Restart(); + // Camera world-space position for front-to-back sort (perf #2). The view // matrix is the inverse of the camera's world transform, so the world // translation lives in the inverse's translation row. @@ -235,23 +290,24 @@ public sealed unsafe class WbDrawDispatcher : IDisposable // Nothing visible — skip the GL pass entirely. if (anyVao == 0) { + _cpuStopwatch.Stop(); if (diag) MaybeFlushDiag(); return; } - // ── Phase 2: lay matrices out contiguously, assign per-group offsets, - // split into opaque/translucent + compute sort keys ───────── + // ── Phase 3: assign FirstInstance per group, lay matrices contiguously, sort opaque ── int totalInstances = 0; foreach (var grp in _groups.Values) totalInstances += grp.Matrices.Count; if (totalInstances == 0) { + _cpuStopwatch.Stop(); if (diag) MaybeFlushDiag(); return; } int needed = totalInstances * 16; - if (_instanceBuffer.Length < needed) - _instanceBuffer = new float[needed + 256 * 16]; // headroom + if (_instanceData.Length < needed) + _instanceData = new float[needed + 256 * 16]; _opaqueDraws.Clear(); _translucentDraws.Clear(); @@ -268,17 +324,17 @@ public sealed unsafe class WbDrawDispatcher : IDisposable // position for front-to-back sort (perf #2). Cheap heuristic; works // well when instances of one group are spatially coherent // (typical for trees in one landblock area, NPCs at one spawn). - var firstM = grp.Matrices[0]; - var grpPos = new Vector3(firstM.M41, firstM.M42, firstM.M43); + var first = grp.Matrices[0]; + var grpPos = new Vector3(first.M41, first.M42, first.M43); grp.SortDistance = Vector3.DistanceSquared(camPos, grpPos); for (int i = 0; i < grp.Matrices.Count; i++) { - WriteMatrix(_instanceBuffer, cursor * 16, grp.Matrices[i]); + WriteMatrix(_instanceData, cursor * 16, grp.Matrices[i]); cursor++; } - if (grp.Translucency == TranslucencyKind.Opaque || grp.Translucency == TranslucencyKind.ClipMap) + if (IsOpaque(grp.Translucency)) _opaqueDraws.Add(grp); else _translucentDraws.Add(grp); @@ -290,90 +346,141 @@ public sealed unsafe class WbDrawDispatcher : IDisposable // Foundry interior). _opaqueDraws.Sort(static (a, b) => a.SortDistance.CompareTo(b.SortDistance)); - // ── Phase 3: one upload of all matrices ───────────────────────────── - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - fixed (float* p = _instanceBuffer) - _gl.BufferData(BufferTargetARB.ArrayBuffer, - (nuint)(totalInstances * 16 * sizeof(float)), p, BufferUsageARB.DynamicDraw); + // ── Phase 4: build IndirectGroupInput list (opaque sorted, then translucent), + // fill via BuildIndirectArrays ────────────────────────────────── + int totalDraws = _opaqueDraws.Count + _translucentDraws.Count; + if (_batchData.Length < totalDraws) + _batchData = new BatchData[totalDraws + 64]; + if (_indirectCommands.Length < totalDraws) + _indirectCommands = new DrawElementsIndirectCommand[totalDraws + 64]; - // ── Phase 4: bind VAO once (modern rendering shares one global VAO) ── - EnsureInstanceAttribs(anyVao); + var groupInputs = new List(totalDraws); + foreach (var g in _opaqueDraws) groupInputs.Add(ToInput(g)); + foreach (var g in _translucentDraws) groupInputs.Add(ToInput(g)); + + // Cast _batchData (private BatchData) to public-mirror BatchDataPublic for BuildIndirectArrays. + // Layout is asserted at test time (BatchDataPublic_LayoutMatchesPrivateBatchData test). + var batchPublic = new BatchDataPublic[totalDraws]; + var layout = BuildIndirectArrays(groupInputs, _indirectCommands, batchPublic); + + // Copy back into _batchData + for (int i = 0; i < totalDraws; i++) + { + _batchData[i] = new BatchData + { + TextureHandle = batchPublic[i].TextureHandle, + TextureLayer = batchPublic[i].TextureLayer, + Flags = batchPublic[i].Flags, + }; + } + _opaqueDrawCount = layout.OpaqueCount; + _transparentDrawCount = layout.TransparentCount; + _transparentByteOffset = layout.TransparentByteOffset; + + // ── Phase 5: upload three buffers ─────────────────────────────────── + fixed (float* ip = _instanceData) + UploadSsbo(_instanceSsbo, 0, ip, totalInstances * 16 * sizeof(float)); + + fixed (BatchData* bp = _batchData) + UploadSsbo(_batchSsbo, 1, bp, totalDraws * sizeof(BatchData)); + + fixed (DrawElementsIndirectCommand* cp = _indirectCommands) + { + _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); + _gl.BufferData(BufferTargetARB.DrawIndirectBuffer, + (nuint)(totalDraws * sizeof(DrawElementsIndirectCommand)), cp, BufferUsageARB.DynamicDraw); + } + + // ── Phase 6: bind global VAO once ─────────────────────────────────── _gl.BindVertexArray(anyVao); - // ── Phase 5: opaque + ClipMap pass (front-to-back sorted) ─────────── if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal)) _gl.Disable(EnableCap.CullFace); - foreach (var grp in _opaqueDraws) + // ── Phase 7: opaque pass ───────────────────────────────────────────── + if (_opaqueDrawCount > 0) { - _shader.SetInt("uTranslucencyKind", (int)grp.Translucency); - DrawGroup(grp); + _gl.Disable(EnableCap.Blend); + _gl.DepthMask(true); + _shader.SetInt("uRenderPass", 0); + _gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer); + if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque); + _gl.MultiDrawElementsIndirect( + PrimitiveType.Triangles, + DrawElementsType.UnsignedShort, + (void*)0, + (uint)_opaqueDrawCount, + (uint)DrawCommandStride); + if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed); } - // ── Phase 6: translucent pass ─────────────────────────────────────── - _gl.Enable(EnableCap.Blend); - _gl.DepthMask(false); - - if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal)) + // ── Phase 8: transparent pass ──────────────────────────────────────── + if (_transparentDrawCount > 0) { - _gl.Disable(EnableCap.CullFace); - } - else - { - _gl.Enable(EnableCap.CullFace); - _gl.CullFace(TriangleFace.Back); - _gl.FrontFace(FrontFaceDirection.Ccw); + _gl.Enable(EnableCap.Blend); + _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); + _gl.DepthMask(false); + _shader.SetInt("uRenderPass", 1); + if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryTransparent); + _gl.MultiDrawElementsIndirect( + PrimitiveType.Triangles, + DrawElementsType.UnsignedShort, + (void*)_transparentByteOffset, + (uint)_transparentDrawCount, + (uint)DrawCommandStride); + if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed); + _gl.DepthMask(true); + _gl.Disable(EnableCap.Blend); } - foreach (var grp in _translucentDraws) - { - switch (grp.Translucency) - { - case TranslucencyKind.Additive: - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One); - break; - case TranslucencyKind.InvAlpha: - _gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha); - break; - default: - _gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha); - break; - } - - _shader.SetInt("uTranslucencyKind", (int)grp.Translucency); - DrawGroup(grp); - } - - _gl.DepthMask(true); - _gl.Disable(EnableCap.Blend); _gl.Disable(EnableCap.CullFace); _gl.BindVertexArray(0); + _cpuStopwatch.Stop(); + if (diag) { - _drawsIssued += _opaqueDraws.Count + _translucentDraws.Count; + long cpuUs = _cpuStopwatch.ElapsedTicks * 1_000_000L / System.Diagnostics.Stopwatch.Frequency; + _cpuSamples[_cpuSampleCursor] = cpuUs; + _cpuSampleCursor = (_cpuSampleCursor + 1) % _cpuSamples.Length; + + // Read GPU samples non-blocking; the result for the previous frame's + // queries should be ready by now. If not, drop the sample (don't stall + // the CPU waiting for the GPU). + if (_gpuQueriesInitialized) + { + _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.ResultAvailable, out int avail); + if (avail != 0) + { + _gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.Result, out ulong opaqueNs); + _gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.Result, out ulong transNs); + long gpuUs = (long)((opaqueNs + transNs) / 1000UL); + _gpuSamples[_gpuSampleCursor] = gpuUs; + _gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length; + } + } + + _drawsIssued += _opaqueDrawCount + _transparentDrawCount; _instancesIssued += totalInstances; MaybeFlushDiag(); } } - private void DrawGroup(InstanceGroup grp) - { - _gl.ActiveTexture(TextureUnit.Texture0); - _gl.BindTexture(TextureTarget.Texture2D, grp.TextureHandle); - _gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, grp.Ibo); + private static IndirectGroupInput ToInput(InstanceGroup g) => new( + IndexCount: g.IndexCount, + FirstIndex: g.FirstIndex, + BaseVertex: g.BaseVertex, + InstanceCount: g.InstanceCount, + FirstInstance: g.FirstInstance, + TextureHandle: g.BindlessTextureHandle, + TextureLayer: g.TextureLayer, + Translucency: g.Translucency); - // BaseInstance offsets the per-instance attribute fetches into our - // shared instance VBO so each group reads its own slice. Requires - // GL_ARB_base_instance (GL 4.2+); WB requires 4.3 so this is available. - _gl.DrawElementsInstancedBaseVertexBaseInstance( - PrimitiveType.Triangles, - (uint)grp.IndexCount, - DrawElementsType.UnsignedShort, - (void*)(grp.FirstIndex * sizeof(ushort)), - (uint)grp.InstanceCount, - grp.BaseVertex, - (uint)grp.FirstInstance); + private unsafe void UploadSsbo(uint ssbo, uint binding, void* data, int byteCount) + { + _gl.BindBuffer(BufferTargetARB.ShaderStorageBuffer, ssbo); + _gl.BufferData(BufferTargetARB.ShaderStorageBuffer, (nuint)byteCount, data, BufferUsageARB.DynamicDraw); + _gl.BindBufferBase(BufferTargetARB.ShaderStorageBuffer, binding, ssbo); } private void MaybeFlushDiag() @@ -381,13 +488,41 @@ public sealed unsafe class WbDrawDispatcher : IDisposable long now = Environment.TickCount64; if (now - _lastLogTick > 5000) { + long cpuMed = MedianMicros(_cpuSamples); + long cpuP95 = Percentile95Micros(_cpuSamples); + long gpuMed = MedianMicros(_gpuSamples); + long gpuP95 = Percentile95Micros(_gpuSamples); Console.WriteLine( - $"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count}"); + $"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count} " + + $"cpu_us={cpuMed}m/{cpuP95}p95 gpu_us={gpuMed}m/{gpuP95}p95"); _entitiesSeen = _entitiesDrawn = _meshesMissing = _drawsIssued = _instancesIssued = 0; _lastLogTick = now; + // Don't reset the sample buffers — they're a moving window of the + // last 256 frames; clearing per 5s flush would lose recent history. } } + private static long MedianMicros(long[] samples) + { + var copy = (long[])samples.Clone(); + Array.Sort(copy); + int nz = 0; + foreach (var v in copy) if (v > 0) nz++; + if (nz == 0) return 0; + return copy[copy.Length - nz / 2]; + } + + private static long Percentile95Micros(long[] samples) + { + var copy = (long[])samples.Clone(); + Array.Sort(copy); + int nz = 0; + foreach (var v in copy) if (v > 0) nz++; + if (nz == 0) return 0; + int idx = copy.Length - 1 - (int)(nz * 0.05); + return copy[idx]; + } + private void ClassifyBatches( ObjectRenderData renderData, ulong gfxObjId, @@ -413,12 +548,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable : TranslucencyKind.Opaque; } - uint texHandle = ResolveTexture(entity, meshRef, batch, palHash); + ulong texHandle = ResolveTexture(entity, meshRef, batch, palHash); if (texHandle == 0) continue; + // TextureLayer is always 0 for per-instance composites; non-zero when + // WB atlas is adopted in N.6+ and batches reference a shared atlas layer. + uint texLayer = 0; + var key = new GroupKey( batch.IBO, batch.FirstIndex, (int)batch.BaseVertex, - batch.IndexCount, texHandle, translucency); + batch.IndexCount, texHandle, texLayer, translucency); if (!_groups.TryGetValue(key, out var grp)) { @@ -428,7 +567,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable FirstIndex = batch.FirstIndex, BaseVertex = (int)batch.BaseVertex, IndexCount = batch.IndexCount, - TextureHandle = texHandle, + BindlessTextureHandle = texHandle, + TextureLayer = texLayer, Translucency = translucency, }; _groups[key] = grp; @@ -437,10 +577,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable } } - private uint ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash) + private ulong ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash) { - // WB stores the surface id on batch.Key.SurfaceId (TextureKey struct); - // batch.SurfaceId is unset (zero) for batches built by ObjectMeshManager. uint surfaceId = batch.Key.SurfaceId; if (surfaceId == 0 || surfaceId == 0xFFFFFFFF) return 0; @@ -451,34 +589,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable if (entity.PaletteOverride is not null) { - // perf #4: pass the entity-precomputed palette hash so TextureCache - // can skip its internal HashPaletteOverride for repeat lookups - // within the same character. - return _textures.GetOrUploadWithPaletteOverride( + return _textures.GetOrUploadWithPaletteOverrideBindless( surfaceId, origTexOverride, entity.PaletteOverride, palHash); } else if (hasOrigTexOverride) { - return _textures.GetOrUploadWithOrigTextureOverride(surfaceId, overrideOrigTex); + return _textures.GetOrUploadWithOrigTextureOverrideBindless(surfaceId, overrideOrigTex); } else { - return _textures.GetOrUpload(surfaceId); - } - } - - private void EnsureInstanceAttribs(uint vao) - { - if (!_patchedVaos.Add(vao)) return; - - _gl.BindVertexArray(vao); - _gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo); - for (uint row = 0; row < 4; row++) - { - uint loc = 3 + row; - _gl.EnableVertexAttribArray(loc); - _gl.VertexAttribPointer(loc, 4, VertexAttribPointerType.Float, false, 64, (void*)(row * 16)); - _gl.VertexAttribDivisor(loc, 1); + return _textures.GetOrUploadBindless(surfaceId); } } @@ -494,15 +614,138 @@ public sealed unsafe class WbDrawDispatcher : IDisposable { if (_disposed) return; _disposed = true; - _gl.DeleteBuffer(_instanceVbo); + _gl.DeleteBuffer(_instanceSsbo); + _gl.DeleteBuffer(_batchSsbo); + _gl.DeleteBuffer(_indirectBuffer); + if (_gpuQueriesInitialized) + { + _gl.DeleteQuery(_gpuQueryOpaque); + _gl.DeleteQuery(_gpuQueryTransparent); + } } + // ── Public types + helpers for BuildIndirectArrays (Task 9) ───────────── + // + // These are public so the pure-CPU unit tests in AcDream.Core.Tests can + // exercise BuildIndirectArrays without needing a GL context. + + /// + /// Stride in bytes of DrawElementsIndirectCommand in the indirect buffer. + /// 5 × uint = 20 bytes. Tests and callers reference this symbolically + /// rather than hard-coding 20 so a layout change produces a compile error. + /// + public const int DrawCommandStride = 20; // sizeof(DrawElementsIndirectCommand): 5 × uint + + /// + /// Public view of the per-group inputs to — used in tests. + /// + public readonly record struct IndirectGroupInput( + int IndexCount, + uint FirstIndex, + int BaseVertex, + int InstanceCount, + int FirstInstance, + ulong TextureHandle, + uint TextureLayer, + TranslucencyKind Translucency); + + /// + /// Public mirror of the per-group uploaded to the SSBO. + /// Tests verify the layout. Same field shape as the private BatchData. + /// + [StructLayout(LayoutKind.Sequential, Pack = 8)] + public struct BatchDataPublic + { + public ulong TextureHandle; + public uint TextureLayer; + public uint Flags; + } + + /// Result of . + public readonly record struct IndirectLayoutResult( + int OpaqueCount, + int TransparentCount, + int TransparentByteOffset); + + /// + /// Lays out the indirect commands + parallel BatchData array contiguously: + /// opaque section first (caller sorts before calling), transparent section second. + /// Pure CPU, no GL state. Caller passes pre-sized scratch arrays. + /// + /// + /// Classification: Opaque + ClipMap → opaque pass (ClipMap uses discard, not + /// blending). Everything else (AlphaBlend, Additive, InvAlpha) → transparent pass. + /// + public static IndirectLayoutResult BuildIndirectArrays( + IReadOnlyList groups, + DrawElementsIndirectCommand[] indirectScratch, + BatchDataPublic[] batchScratch) + { + int opaqueCount = 0; + int transparentCount = 0; + + foreach (var g in groups) + { + if (IsOpaque(g.Translucency)) opaqueCount++; + else transparentCount++; + } + + int oi = 0; // opaque write cursor (fills [0..opaqueCount)) + int ti = opaqueCount; // transparent write cursor (fills [opaqueCount..end)) + + foreach (var g in groups) + { + var dec = new DrawElementsIndirectCommand + { + Count = (uint)g.IndexCount, + InstanceCount = (uint)g.InstanceCount, + FirstIndex = g.FirstIndex, + BaseVertex = g.BaseVertex, + BaseInstance = (uint)g.FirstInstance, + }; + var bd = new BatchDataPublic + { + TextureHandle = g.TextureHandle, + TextureLayer = g.TextureLayer, + Flags = 0, + }; + + if (IsOpaque(g.Translucency)) + { + indirectScratch[oi] = dec; + batchScratch[oi] = bd; + oi++; + } + else + { + indirectScratch[ti] = dec; + batchScratch[ti] = bd; + ti++; + } + } + + return new IndirectLayoutResult(opaqueCount, transparentCount, opaqueCount * DrawCommandStride); + } + + /// + /// Public test shim for . Locks in the N.5 Decision 2 + /// translucency partition: Opaque + ClipMap → opaque indirect; AlphaBlend + + /// Additive + InvAlpha → transparent indirect. + /// + public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaque(t); + + private static bool IsOpaque(TranslucencyKind t) + => t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap; + + // ──────────────────────────────────────────────────────────────────────── + private readonly record struct GroupKey( uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount, - uint TextureHandle, + ulong BindlessTextureHandle, + uint TextureLayer, TranslucencyKind Translucency); private sealed class InstanceGroup @@ -511,7 +754,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable public uint FirstIndex; public int BaseVertex; public int IndexCount; - public uint TextureHandle; + public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4) + public uint TextureLayer; // 0 for per-instance composites; non-zero when WB atlas is adopted in N.6+ public TranslucencyKind Translucency; public int FirstInstance; // offset into the shared instance VBO (in instances, not bytes) public int InstanceCount; diff --git a/src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs b/src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs deleted file mode 100644 index c3fd006..0000000 --- a/src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs +++ /dev/null @@ -1,39 +0,0 @@ -namespace AcDream.App.Rendering.Wb; - -/// -/// Process-lifetime cache of ACDREAM_USE_WB_FOUNDATION env var. -/// Read once at static-init time; all consumers import this rather than -/// re-reading the env var per call (env-var lookups on Windows are not -/// free at hot-path cadence). -/// -/// -/// Default-on as of Phase N.4 ship (2026-05-08). The WB foundation -/// (WbMeshAdapter + WbDrawDispatcher) is the production -/// rendering path. Set ACDREAM_USE_WB_FOUNDATION=0 to fall back -/// to the legacy InstancedMeshRenderer path — kept as an escape -/// hatch until N.6 fully replaces it. -/// -/// -/// -/// Per-instance customized content (server CreateObject entities -/// with palette / texture overrides) routes through -/// regardless -/// of the flag — the flag controls which DRAW path consumes those -/// textures. -/// -/// -public static class WbFoundationFlag -{ - private static bool _isEnabled = - System.Environment.GetEnvironmentVariable("ACDREAM_USE_WB_FOUNDATION") != "0"; - - public static bool IsEnabled => _isEnabled; - - /// - /// FOR TESTS ONLY. Forces to true so - /// integration tests can exercise the WB adapter path without having to - /// set the env var before static initialisation. Never call from - /// production code. - /// - internal static void ForTestsOnly_ForceEnable() => _isEnabled = true; -} diff --git a/src/AcDream.App/Streaming/GpuWorldState.cs b/src/AcDream.App/Streaming/GpuWorldState.cs index 7f6d228..a256d26 100644 --- a/src/AcDream.App/Streaming/GpuWorldState.cs +++ b/src/AcDream.App/Streaming/GpuWorldState.cs @@ -144,7 +144,7 @@ public sealed class GpuWorldState } _loaded[landblock.LandblockId] = landblock; - if (WbFoundationFlag.IsEnabled && _wbSpawnAdapter is not null) + if (_wbSpawnAdapter is not null) _wbSpawnAdapter.OnLandblockLoaded(_loaded[landblock.LandblockId]); RebuildFlatView(); } @@ -195,7 +195,7 @@ public sealed class GpuWorldState public void RemoveLandblock(uint landblockId) { - if (WbFoundationFlag.IsEnabled && _wbSpawnAdapter is not null) + if (_wbSpawnAdapter is not null) _wbSpawnAdapter.OnLandblockUnloaded(landblockId); // Rescue persistent entities before removal. These get appended diff --git a/tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs b/tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs new file mode 100644 index 0000000..88877f6 --- /dev/null +++ b/tests/AcDream.Core.Tests/Rendering/TextureCacheBindlessTests.cs @@ -0,0 +1,32 @@ +using AcDream.App.Rendering; +using AcDream.App.Rendering.Wb; +using DatReaderWriter; +using Xunit; + +namespace AcDream.Core.Tests.Rendering; + +/// +/// Lightweight unit tests for 's bindless path. +/// We can't construct a real TextureCache in a headless test (it requires a +/// live GL context), so this file documents contracts that future engineers +/// should preserve. Real bindless integration is verified at Task 14's +/// visual gate. +/// +public sealed class TextureCacheBindlessTests +{ + [Fact] + public void Contract_BindlessMethodsThrowWithoutBindlessSupport() + { + // The actual throw lives in TextureCache.EnsureBindlessAvailable + // and is reached only via GL-bound Bindless* method calls. The + // contract is: if the dispatcher (which requires bindless) ever + // gets a TextureCache constructed without BindlessSupport, it + // should fail-fast with InvalidOperationException — NOT silently + // route a draw to handle 0 (which would produce a non-resident + // GPU fault). + // + // This test is a marker. Future engineers: do not weaken + // EnsureBindlessAvailable to swallow the missing dependency. + Assert.True(true, "Contract documented in TextureCache.EnsureBindlessAvailable"); + } +} diff --git a/tests/AcDream.Core.Tests/Rendering/Wb/PendingSpawnIntegrationTests.cs b/tests/AcDream.Core.Tests/Rendering/Wb/PendingSpawnIntegrationTests.cs index a02f080..c5d47f7 100644 --- a/tests/AcDream.Core.Tests/Rendering/Wb/PendingSpawnIntegrationTests.cs +++ b/tests/AcDream.Core.Tests/Rendering/Wb/PendingSpawnIntegrationTests.cs @@ -19,16 +19,9 @@ namespace AcDream.Core.Tests.Rendering.Wb; /// public sealed class PendingSpawnIntegrationTests { - /// - /// Force-enable WbFoundationFlag for this test class. - /// GpuWorldState gates its adapter calls on this static-cached flag; - /// calling the internal test hook lets us exercise the full integration - /// path without needing the env var set before process startup. - /// - static PendingSpawnIntegrationTests() - { - WbFoundationFlag.ForTestsOnly_ForceEnable(); - } + // N.5 ship amendment: WbFoundationFlag was deleted — GpuWorldState + // no longer gates adapter calls on the flag; they are unconditional + // when the adapter is non-null. No static ctor hook needed. [Fact] public void LiveEntity_ParkedBeforeLandblock_DrainsButIsNotRegisteredWithAdapter() diff --git a/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs b/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs new file mode 100644 index 0000000..855a2ef --- /dev/null +++ b/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherIndirectBuilderTests.cs @@ -0,0 +1,113 @@ +using System.Numerics; +using AcDream.App.Rendering.Wb; +using AcDream.Core.Meshing; +using Xunit; + +namespace AcDream.Core.Tests.Rendering.Wb; + +/// +/// Pure CPU test of . +/// Verifies that a synthetic group set lays out into the indirect buffer +/// + parallel batch data with opaque section first, transparent second, +/// per-group fields propagated correctly. +/// +public sealed class WbDrawDispatcherIndirectBuilderTests +{ + [Fact] + public void TwoOpaqueGroupsAndOneTransparent_LaysOutContiguouslyOpaqueFirst() + { + // Arrange — three groups: 2 opaque (12+1 instances) + 1 transparent (12 instances) + var groups = new List + { + new(IndexCount: 100, FirstIndex: 0, BaseVertex: 0, InstanceCount: 12, FirstInstance: 0, TextureHandle: 0xAA, TextureLayer: 0, Translucency: TranslucencyKind.Opaque), + new(IndexCount: 200, FirstIndex: 100, BaseVertex: 0, InstanceCount: 12, FirstInstance: 12, TextureHandle: 0xBB, TextureLayer: 0, Translucency: TranslucencyKind.AlphaBlend), + new(IndexCount: 50, FirstIndex: 300, BaseVertex: 100, InstanceCount: 1, FirstInstance: 24, TextureHandle: 0xCC, TextureLayer: 0, Translucency: TranslucencyKind.Opaque), + }; + + var indirect = new DrawElementsIndirectCommand[16]; + var batch = new WbDrawDispatcher.BatchDataPublic[16]; + + // Act + var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch); + + // Assert layout + Assert.Equal(2, result.OpaqueCount); + Assert.Equal(1, result.TransparentCount); + Assert.Equal(2 * 20, result.TransparentByteOffset); // sizeof(DEIC) = 20 + + // Opaque section, in input order (Task 10 callers sort) + Assert.Equal(100u, indirect[0].Count); + Assert.Equal(0u, indirect[0].FirstIndex); + Assert.Equal(0, indirect[0].BaseVertex); + Assert.Equal(12u, indirect[0].InstanceCount); + Assert.Equal(0u, indirect[0].BaseInstance); + + Assert.Equal(50u, indirect[1].Count); + Assert.Equal(300u, indirect[1].FirstIndex); + Assert.Equal(100, indirect[1].BaseVertex); + Assert.Equal(1u, indirect[1].InstanceCount); + Assert.Equal(24u, indirect[1].BaseInstance); + + // Transparent section + Assert.Equal(200u, indirect[2].Count); + Assert.Equal(100u, indirect[2].FirstIndex); + Assert.Equal(12u, indirect[2].InstanceCount); + Assert.Equal(12u, indirect[2].BaseInstance); + + // BatchData parallel — same indices as indirect + Assert.Equal(0xAAul, batch[0].TextureHandle); + Assert.Equal(0xCCul, batch[1].TextureHandle); + Assert.Equal(0xBBul, batch[2].TextureHandle); + } + + [Fact] + public void EmptyGroupList_ProducesZeroCounts() + { + var groups = new List(); + var indirect = new DrawElementsIndirectCommand[0]; + var batch = new WbDrawDispatcher.BatchDataPublic[0]; + + var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch); + + Assert.Equal(0, result.OpaqueCount); + Assert.Equal(0, result.TransparentCount); + Assert.Equal(0, result.TransparentByteOffset); + } + + [Fact] + public void ClipMapTreatedAsOpaque() + { + // ClipMap surfaces (alpha-cutout) belong with the opaque pass + // because the discard handles transparency, not blending. + var groups = new List + { + new(IndexCount: 10, FirstIndex: 0, BaseVertex: 0, InstanceCount: 1, FirstInstance: 0, TextureHandle: 0x1, TextureLayer: 0, Translucency: TranslucencyKind.ClipMap), + }; + var indirect = new DrawElementsIndirectCommand[4]; + var batch = new WbDrawDispatcher.BatchDataPublic[4]; + + var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch); + + Assert.Equal(1, result.OpaqueCount); + Assert.Equal(0, result.TransparentCount); + } + + [Fact] + public void BatchDataPublic_LayoutMatchesPrivateBatchData() + { + // Task 10 will use MemoryMarshal.Cast to + // expose the dispatcher's per-frame BatchData[] scratch to BuildIndirectArrays + // without copying. The cast is only safe if the structs have identical + // layout (size, field offsets). Both use [StructLayout(Sequential, Pack=8)]. + Assert.Equal(16, System.Runtime.CompilerServices.Unsafe.SizeOf()); + Assert.Equal(0, (int)System.Runtime.InteropServices.Marshal.OffsetOf(nameof(WbDrawDispatcher.BatchDataPublic.TextureHandle))); + Assert.Equal(8, (int)System.Runtime.InteropServices.Marshal.OffsetOf(nameof(WbDrawDispatcher.BatchDataPublic.TextureLayer))); + Assert.Equal(12, (int)System.Runtime.InteropServices.Marshal.OffsetOf(nameof(WbDrawDispatcher.BatchDataPublic.Flags))); + } + + [Fact] + public void DrawCommandStride_MatchesStructSize() + { + Assert.Equal(WbDrawDispatcher.DrawCommandStride, System.Runtime.CompilerServices.Unsafe.SizeOf()); + } +} diff --git a/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs b/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs new file mode 100644 index 0000000..f79fb09 --- /dev/null +++ b/tests/AcDream.Core.Tests/Rendering/Wb/WbDrawDispatcherTranslucencyTests.cs @@ -0,0 +1,25 @@ +using AcDream.App.Rendering.Wb; +using AcDream.Core.Meshing; +using Xunit; + +namespace AcDream.Core.Tests.Rendering.Wb; + +/// +/// Locks in the N.5 translucency partition contract (spec Decision 2). +/// If the partition drifts, the dispatcher's opaque + transparent indirect +/// passes will silently render the wrong groups in the wrong pass — visible +/// regression that's hard to spot in code review. +/// +public sealed class WbDrawDispatcherTranslucencyTests +{ + [Theory] + [InlineData(TranslucencyKind.Opaque, true)] + [InlineData(TranslucencyKind.ClipMap, true)] + [InlineData(TranslucencyKind.AlphaBlend, false)] + [InlineData(TranslucencyKind.Additive, false)] + [InlineData(TranslucencyKind.InvAlpha, false)] + public void IsOpaque_PartitionsByKind(TranslucencyKind kind, bool expected) + { + Assert.Equal(expected, WbDrawDispatcher.IsOpaquePublic(kind)); + } +}