Merge branch 'claude/priceless-feistel-c12935' — Phase N.5 SHIP

N.5: Modern Rendering Path. WbDrawDispatcher now uses bindless
textures + glMultiDrawElementsIndirect on top of N.4's grouped
pipeline. Three SSBO uploads + 2 indirect calls per frame, ~12-15
total GL calls for entity rendering regardless of scene complexity.

Measured 1.23 ms / frame median at Holtburg courtyard (1662 groups,
~810 fps). User-gated visual verification PASS at Holtburg.

Includes ship-amendment: legacy renderer path formally retired
(InstancedMeshRenderer + StaticMeshRenderer + WbFoundationFlag
deleted). Bindless is now mandatory; missing extensions throw
NotSupportedException at startup with a clear error message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-05-08 22:13:20 +02:00
commit 27eaf4e0be
23 changed files with 4379 additions and 1278 deletions

View file

@ -55,9 +55,11 @@ ourselves".
`EntitySpawnAdapter.cs` — bridge spawn lifecycle to WB ref-counts.
Atlas tier (procedural) goes via Landblock; per-instance tier
(server-spawned, palette/texture overrides) goes via Entity.
- `WbFoundationFlag` is default-on. `ACDREAM_USE_WB_FOUNDATION=0`
falls back to legacy `InstancedMeshRenderer` (kept as escape hatch
until N.6 fully retires it).
- **Modern path is mandatory as of N.5 ship amendment (2026-05-08).**
`WbFoundationFlag`, `InstancedMeshRenderer`, and `StaticMeshRenderer`
are deleted. Missing `GL_ARB_bindless_texture` or
`GL_ARB_shader_draw_parameters` throws `NotSupportedException` at
startup. There is no legacy fallback.
- **WB's modern rendering path** (GL 4.3 + bindless) packs every mesh
into a single global VAO/VBO/IBO. Each batch references its slice
via `FirstIndex` (offset into IBO) + `BaseVertex` (offset into VBO).
@ -72,6 +74,34 @@ ourselves".
`PrepareMeshDataAsync(id, isSetup)` to fire the background decode.
Result auto-enqueues to `_stagedMeshData` which `Tick()` drains.
`WbMeshAdapter` does this for you on first registration.
- **N.5 modern dispatch** (`docs/superpowers/specs/2026-05-08-phase-n5-modern-rendering-design.md`)
uses bindless textures + multi-draw indirect on top of N.4's grouped
pipeline. Per frame: three SSBO uploads (`_instanceSsbo` mat4 per
instance @ binding=0; `_batchSsbo` `(uvec2 textureHandle, uint layer,
uint flags)` per group @ binding=1; `_indirectBuffer`
`DrawElementsIndirectCommand[]` opaque-section + transparent-section).
Two `glMultiDrawElementsIndirect` calls per frame, one per pass.
Total ~12-15 GL calls per frame for entity rendering regardless of
scene complexity.
- **`TextureCache` requires `BindlessSupport`** for the WB modern path.
Three `Bindless`-suffixed `GetOrUpload*` methods return 64-bit handles
made resident at upload time, backed by parallel Texture2DArray uploads
(`UploadRgba8AsLayer1Array`). The legacy `uint`-returning methods stay
for Sky / Terrain / Debug / particle paths that still sample via
`sampler2D`. After N.6 retires legacy renderers, the legacy upload path
+ caches can be deleted.
- **Translucency model is two-pass alpha-test** (matches WB), not
per-blend-mode subpasses. Opaque pass discards `α<0.95`; transparent
pass discards `α≥0.95` AND `α<0.05`. Native `Additive` blend renders
as alpha-blend on GfxObj surfaces — falsifiable; if a magic-content
regression shows up, add a third indirect call with
`glBlendFunc(SrcAlpha, One)` per spec §6 fallback (~30 min change).
- **Per-instance highlight (selection blink) is reserved.** `mesh_modern.vert`'s
`InstanceData` struct has a documented hook for `vec4 highlightColor`
— Phase B.4 follow-up adds the field + plumbs server-side selection
state. Stride grows from 64 → 80 bytes when added; shader updates
trivially (read the field from `Instances[instanceIndex]` + mix into
fragment color).
**Execution phases:** R1→R8 in the architecture doc. Each phase has clear
goals, test criteria, and builds on the previous. Don't skip phases.
@ -472,18 +502,25 @@ acdream's plan lives in two files committed to the repo:
acceptance criteria. Do not drift from the spec without explicit user
approval.
**Currently in flight: Phase N.5 — Modern Rendering Path.** Roadmap entry
at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md).
Builds on N.4's `WbDrawDispatcher` to adopt WB's modern rendering primitives:
bindless textures (eliminate `glBindTexture` calls) and
`glMultiDrawElementsIndirect` (one GL call per pass instead of one per
group). Together these target a 2-5× CPU win on draw-heavy scenes by
eliminating the remaining per-group state changes. Plan + spec to be
written when work begins.
**Currently in flight: Phase N.6 — Perf polish.**
Roadmap entry at [`docs/plans/2026-04-11-roadmap.md`](docs/plans/2026-04-11-roadmap.md).
Builds on N.5. Legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`,
`WbFoundationFlag`) were retired in the N.5 ship amendment — N.6 scope is
perf-only: WB atlas adoption, persistent-mapped buffers, GPU-side culling,
GL_TIME_ELAPSED query double-buffering, direct N.4 vs N.5 perf measurement,
legacy `Texture2D`/`sampler2D` TextureCache path retirement (Sky/Terrain/Debug).
Plan + spec written when work begins.
**Phase N.5 (Modern Rendering Path) shipped + amended 2026-05-08.** `WbDrawDispatcher`
on bindless textures + `glMultiDrawElementsIndirect`. CPU dispatcher 1.23ms/frame
at Holtburg (~810 fps). **Ship amendment:** `InstancedMeshRenderer`,
`StaticMeshRenderer`, `WbFoundationFlag` deleted in same phase — modern path is
mandatory; missing bindless throws at startup. Plan archived at
[`docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`](docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md).
**Phase N.4 (Rendering Pipeline Foundation) shipped 2026-05-08.** WB's
`ObjectMeshManager` is integrated and is the default rendering path
behind `ACDREAM_USE_WB_FOUNDATION` (default-on). Plan archived at
`ObjectMeshManager` is integrated and is the production rendering path
(mandatory as of N.5 ship amendment). Plan archived at
[`docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`](docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md).
**Rules:**

View file

@ -82,11 +82,12 @@ ground. This is the bug class fixed in
**Sequencing implication:** Phase N.2 (terrain math helpers
substitution) cannot be shipped in isolation — it must land alongside
N.5 (visual terrain renderer migration), at which point both physics
and visual mesh switch to WB's formula together. Roadmap N.2 entry
flags this dependency.
visual terrain renderer migration (originally N.5, now moved to N.7
scope), at which point both physics and visual mesh switch to WB's
formula together. N.5 shipped entity rendering only; terrain remains
on acdream's own pipeline through N.7.
**Research needed (when N.5 picks this up):**
**Research needed (when N.7 picks this up):**
1. Quantify divergence: run WB's `CalculateSplitDirection` and our
`IsSplitSWtoNE` across all (lbX, lbY, cellX, cellY) tuples for a
representative landblock set; record disagreement rate.
@ -97,8 +98,8 @@ flags this dependency.
server-authoritative Z within tolerance) is invalidated by the
formula change.
**Acceptance:** Resolved when N.5 lands and both physics + visual
mesh use WB's split formula, OR when we decide to keep the AC2D
**Acceptance:** Resolved when N.7 lands and both physics + visual
terrain use WB's split formula, OR when we decide to keep the AC2D
formula and patch WB's renderer in our fork.
---
@ -998,8 +999,8 @@ If the coat texture's UVs at the upper region map to texel-bytes whose palette i
**Files (diagnostic env vars committed for next-session reuse):**
- `src/AcDream.App/Rendering/InstancedMeshRenderer.cs:210-275`
`ACDREAM_NO_CULL` env var
- ~~`src/AcDream.App/Rendering/InstancedMeshRenderer.cs:210-275`
`ACDREAM_NO_CULL` env var~~ (file deleted in N.5 ship amendment)
- `src/AcDream.App/Rendering/GameWindow.cs``ACDREAM_HIDE_PART=N`
hides specific humanoid part; `ACDREAM_DUMP_CLOTHING=1` dumps
AnimPartChanges + TextureChanges + per-part Surface chain coverage.

View file

@ -1,6 +1,6 @@
# acdream — strategic roadmap
**Status:** Living document. Updated 2026-05-08 for Phase N.4 shipping (`WbMeshAdapter` + `WbDrawDispatcher` + `ACDREAM_USE_WB_FOUNDATION` default-on) + N.5 rebranded to "Modern rendering path" (bindless + multi-draw indirect on top of N.4's foundation).
**Status:** Living document. Updated 2026-05-08 for Phase N.5 shipping (bindless textures + `glMultiDrawElementsIndirect` on top of N.4's foundation; CPU dispatcher 1.23ms/frame at Holtburg, ~810 fps) + N.6 becomes the new in-flight phase (retire legacy renderers + perf polish).
**Purpose:** One source of truth for where the project is and where it's going. Every observed defect or missing feature has a named phase that owns it; when something looks wrong in-game, look here to find the phase that'll address it. Implementation details live in per-phase specs under `docs/superpowers/specs/`, not in this file.
---
@ -59,7 +59,8 @@
| C.1 | PES particle system + sky-pass refinements — retail-faithful `ParticleEmitterInfo` unpack with all 13 motion integrators (`Particle::Init`/`Update` ports of `0x0051c290`/`0x0051c930`), `PhysicsScriptRunner` with `CallPES` self-loop semantics, `ParticleHookSink` with `EmitterDied` cleanup, instanced billboard `ParticleRenderer` with material-derived blend (DAT emitters never default additive — pulled from particle GfxObj surface), global back-to-front sort, BC clipmap alpha-keying, AttachLocal `is_parent_local=1` live-parent follow via `UpdateEmitterAnchor`. Sky pass: `Translucent+ClipMap` → alpha-blend cloud sheet (matches `D3DPolyRender::SetSurface` `0x0059c4d0`), raw-`Additive` fog-skip (matches `0x0059c882`), per-keyframe `SkyObjectReplace` Translucency/Luminosity/MaxBright divide-by-100, bit `0x01` pre/post-scene split (matches `GameSky::CreateDeletePhysicsObjects` `0x005073c0`), Setup-backed (`0x020xxxxx`) sky objects via `SetupMesh.Flatten`, persistent GL sampler objects (Wrap + ClampToEdge) replace per-frame wrap-mode mutation (ported from WorldBuilder's `OpenGLGraphicsDevice`), post-scene Z-offset gated on `(Properties & 4) != 0 && (Properties & 8) == 0` per `GameSky::UpdatePosition` `0x00506dd0`. Sky-PES playback disabled by default (named-retail proves `GameSky` drops `pes_id`); `ACDREAM_ENABLE_SKY_PES=1` opens the experimental path. 1325 → 1331 tests. | Live ✓ |
| N.1 | WorldBuilder-backed scenery (Chorizite/WorldBuilder fork as submodule, SceneryHelpers + TerrainUtils replace our inline ports) | Live ✓ |
| N.3 | WorldBuilder-backed texture decode — `SurfaceDecoder` delegates INDEX16 / P8 / A8R8G8B8 / R8G8B8 / A8(+Additive) to `TextureHelpers.Fill*`; `isAdditive` threaded through (terrain alpha → `FillA8Additive`, non-additive entity surfaces → `FillA8`). R5G6B5 + A4R4G4B4 newly handled (previously magenta). X8R8G8B8, DXT1/3/5, SolidColor remain ours (no WB equivalent). 9 conformance tests prove byte-identical equivalence per format. | Live ✓ |
| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6. | Live ✓ |
| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ |
| N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ |
Plus polish that doesn't get its own phase number:
- FlyCamera default speed lowered + Shift-to-boost
@ -624,22 +625,21 @@ for our deletions/additions; merge upstream `master` periodically.
memoization. Legacy `InstancedMeshRenderer` retained as flag-off
fallback until N.6 fully retires it. Plan archived at
`docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md`.
- **N.5 — Modern rendering path.** **Rebranded from "Terrain rendering"
2026-05-08 after N.4 perf review.** N.4 left two big remaining wins
on the table that pair naturally: (1) bindless textures via
`GL_ARB_bindless_texture` (WB already populates
`ObjectRenderBatch.BindlessTextureHandle`; switch our shader to
consume per-instance handles, eliminate 100% of `glBindTexture`
calls), and (2) `glMultiDrawElementsIndirect` (one GL call per pass
instead of one per group; build a `DrawElementsIndirectCommand`
buffer, fire one indirect draw, the driver pulls everything). Both
require shader changes (same shader, in fact — bindless + indirect
are the same modern path WB uses internally). Together they target a
2-5× CPU win on draw-heavy scenes (Holtburg courtyard, Foundry,
dense dungeons). Also folds in: persistent-mapped instance VBO
(`glBufferStorage` + `MAP_PERSISTENT_BIT | MAP_COHERENT_BIT` + ring
buffer + sync) and texture pre-warm at landblock load (smooths
streaming-boundary hitches). **Estimate: 2-3 weeks.**
- **✓ SHIPPED — N.5 — Modern rendering path.** Shipped 2026-05-08.
**Rebranded from "Terrain rendering" 2026-05-08 after N.4 perf
review.** Lifted `WbDrawDispatcher` onto bindless textures
(`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame
entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch
data @ binding=1, indirect commands) + 2 indirect calls (opaque +
transparent). ~12-15 GL calls per frame regardless of group count, down
from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median
at Holtburg (1662 groups, ~810 fps). All textures on the modern path use
1-layer `Texture2DArray` + `sampler2DArray`; legacy callers retain
`Texture2D` via the parallel `TextureCache` path until N.6 retires them.
Three gotchas in memory (`project_phase_n5_state.md`): texture target
lock-in, bindless Dispose two-phase order, GL_TIME_ELAPSED double-
buffering. Plan archived at
`docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`.
- **N.5b — Terrain rendering on N.5 path.** Wire WB's
`TerrainRenderManager` + `LandSurfaceManager` + `TerrainGeometryGenerator`
onto the modern rendering path. Closes N.2's deferred terrain math
@ -647,12 +647,17 @@ for our deletions/additions; merge upstream `master` periodically.
`CalculateSplitDirection` + `GetHeight` + `GetNormal` in lockstep,
resolving ISSUE #51. **Estimate: 1-2 weeks** (was 2-3 — modern path
primitives already in place from N.5).
- **N.6 — Static objects rendering.** Wire WB's
`StaticObjectRenderManager` onto the modern rendering path; **fully
delete** legacy `StaticMeshRenderer` + `InstancedMeshRenderer` (they
remain as `ACDREAM_USE_WB_FOUNDATION=0` escape hatches through N.5).
Mostly draw orchestration at this point — most of the substance
landed in N.4 + N.5. **Estimate: 1-2 weeks** (was 2-3).
- **N.6 — Perf polish.** **Currently in flight.**
Builds on N.5. Legacy renderer retirement was pulled forward into N.5
ship amendment — `InstancedMeshRenderer`, `StaticMeshRenderer`, and
`WbFoundationFlag` are already gone. N.6 scope: WB atlas adoption for
memory savings on shared content, persistent-mapped buffers if
`glBufferData` shows up in profiling, GPU-side culling via compute
pre-pass, GL_TIME_ELAPSED query double-buffering (deferred from N.5 —
diagnostic shows `gpu_us=0/0` under `ACDREAM_WB_DIAG=1`), direct N.4
vs N.5 perf measurement, retire the legacy `Texture2D`/`sampler2D` path
in `TextureCache` (currently kept for Sky + Terrain + Debug).
Plan + spec written when work begins. **Estimate: 1-2 weeks.**
- **N.7 — EnvCells / dungeons.** Replace EnvCell rendering with WB's
`EnvCellRenderManager` + `PortalRenderManager` on top of N.4's
foundation. **Estimate: 1-2 weeks** (was 2-3 — naturally smaller now

View file

@ -0,0 +1,72 @@
# Phase N.5 perf baseline
**Captured:** 2026-05-08, against N.5 head (post-Task 12) on local machine.
**Method:** `ACDREAM_WB_DIAG=1` + character at Holtburg spawn position +
roaming. Numbers below are 5-second window medians from `[WB-DIAG]`.
## Holtburg courtyard (steady state)
| Metric | N.5 measured | N.4 (estimated*) | Gate |
|---|---|---|---|
| CPU dispatcher (median) | **1227 µs / frame** | ≥2500 µs / frame | ≤70% of N.4 → **PASS** |
| CPU dispatcher (p95) | 1303 µs / frame | — | — |
| GPU rendering (median) | unmeasured (see below) | — | within ±10% — **DEFERRED** |
| `drawsIssued` per 5s | 4.85M (= 1662 groups × ~580 fps) | far higher per frame | — |
| `drawsIssued` per pass (CPU GL calls) | **2** (1 opaque + 1 transparent indirect) | ~hundreds per pass | ≤5 → **PASS** |
| `groups` (working set) | 1662 | ~similar | sanity |
| Frame rate (inferred) | ~810 fps | ~100-200 fps | substantial uplift |
*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame"
estimate assumes N.4's per-group glBindTexture + glBindBuffer +
glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per
group and N.4 has ~1700 groups in this scene, putting the GL portion alone
at ~2.5 ms before adding the entity-walk overhead. N.5's measurement
includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO
uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably
half of the lower bound estimate.
## Acceptance gates (spec §8.3)
- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE: Holtburg
courtyard renders identical, no missing entities, no z-fighting, no
exploded parts.
- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures 1.23 ms/frame
median; estimated N.4 ≥2.5 ms/frame; **comfortably under 70%**.
- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The
`GL_TIME_ELAPSED` query polling never reports `avail != 0` in our
single-frame poll loop; the driver hasn't finalized the result by the
time we check. The fix is double-buffering (issue queryA on frame N,
read result on frame N+2). N.6 perf polish item.
- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 indirect
calls per frame regardless of scene size.
- [x] **All tests green** — 70/70 in
`FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`.
8 pre-existing failures in `MotionInterpreter` / `BSPStepUp` /
`PositionManager` / `PlayerMovementController` / `Dispatcher` are
carry-forward from before N.5 and unrelated to rendering.
- [N/A] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — escape hatch
formally retired in N.5 ship amendment. `InstancedMeshRenderer`,
`StaticMeshRenderer`, and `WbFoundationFlag` deleted. Missing
bindless throws `NotSupportedException` at startup with a clear
error message. No fallback path.
## Visual verification (Task 14)
- [x] **Holtburg courtyard** — PASS at Task 10 USER GATE.
- [ ] **Foundry interior / dense static-object scene** — TODO Task 14.
- [ ] **Indoor → outdoor cell transition** — TODO Task 14.
- [ ] **Drudge / character close-up (Issue #47 close-detail mesh)** — TODO Task 14.
- [ ] **Magic content (Decision 2 additive fallback check)** — TODO Task 14.
- [ ] **Long-session sanity** — DEFERRED (N.6 watchlist; not load-bearing for ship).
## Open follow-ups for N.6
1. **GPU timer query double-buffering** — the current single-frame poll
pattern never sees `QueryResultAvailable=true`. Issue queryA on frame N,
queryB on frame N+1, read queryA on frame N+2. ~30 lines of state.
2. **Direct N.4 vs N.5 perf comparison** — re-run with `git checkout`ed N.4
SHIP (`c445364`) for a side-by-side measurement. Not load-bearing but
useful for N.6 ship message.
3. **Persistent-mapped buffers** — Decision 7 deferral. If profiling shows
the per-frame `glBufferData` cost is the residual hot spot, layer it on
top of the modern path.

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,554 @@
# Phase N.5 — Modern Rendering Path — Design Spec
**Status:** Draft (brainstormed 2026-05-08, not yet implemented).
**Author:** acdream lead engineer + Claude.
**Builds on:** Phase N.4 (`WbDrawDispatcher`, shipped 2026-05-08).
**Predecessor docs:**
- `docs/research/2026-05-08-phase-n5-handoff.md` (cold-start briefing).
- `docs/superpowers/plans/2026-05-08-phase-n4-rendering-foundation.md` (N.4 plan; Adjustments 7-10 are required reading).
- `docs/superpowers/specs/2026-05-08-phase-n4-rendering-foundation-design.md` (N.4 spec).
---
## 1. Problem statement
N.4 collapsed entity rendering from O(entities × batches) per-draw GL calls to O(unique GfxObj × surface × translucency) grouped instanced draws. The remaining hot path still does, per group:
```
glActiveTexture(0)
glBindTexture(2D, texHandle)
glBindBuffer(EBO, batchIbo)
glDrawElementsInstancedBaseVertexBaseInstance(...)
```
Across a typical Holtburg-courtyard scene that's still ~100-300 GL calls per frame for entities. Modern GPUs and our drivers (GL 4.3 + bindless, gated by WB's `_useModernRendering`) support patterns that eliminate ALL of those per-group calls:
- **Bindless textures** (`GL_ARB_bindless_texture`) — texture handles are 64-bit tokens that don't require `glBindTexture` to use; the shader samples from a handle read out of buffer data.
- **Multi-draw indirect** (`glMultiDrawElementsIndirect`) — one GL call dispatches N draws from a `DrawElementsIndirectCommand` buffer; the driver issues all of them with no CPU-side per-draw work.
N.5 lifts `WbDrawDispatcher` onto these primitives. Target: ≥30% reduction in CPU dispatcher time, draw call count down to ~5/frame, no visual regression vs N.4.
---
## 2. Decisions log
This section records the brainstorm outcomes that the rest of the doc relies on.
| # | Decision | Choice | Reason |
|---|---|---|---|
| 1 | Texture sampler model | **`sampler2DArray`** for ALL textures (1-layer wrapping for per-instance composites) | Matches WB's modern shader exactly; future-proofs for atlas adoption in N.6+; avoids two shader files. ~50 lines of TextureCache change. |
| 2 | Translucent rendering | **WB's two-pass alpha-test** (opaque pass discards `α<0.95`, transparent pass discards `α≥0.95`) | Single blend mode per pass enables one indirect call per pass. Loses native `Additive` blend on GfxObj surfaces; sky + particles have own renderers and aren't affected. Falsifiable at visual verification — if we see a regression, add an additive sub-pass (~30-min fix). |
| 3 | Per-instance + per-draw data delivery | **All-SSBO**: `Instances[]` at binding=0 (mat4 per instance), `Batches[]` at binding=1 (texture handle + layer + flags per group) | Matches WB's modern shader. SSBOs avoid the 16-attrib stride limit, scale to large instance counts, give clean per-draw indexing via `gl_DrawIDARB`. |
| 4 | Bindless handle residency | **Resident on upload, never release** | acdream's content set is bounded (~1-5K unique textures per session). Handles persist for process lifetime; no eviction code in N.5. Diagnostic logging of handle count under `ACDREAM_WB_DIAG=1` to spot growth. |
| 5 | Escape hatch | **Modern path mandatory (N.5 ship amendment)**. `WbFoundationFlag` and `ACDREAM_USE_WB_FOUNDATION` env var have been deleted. Missing `GL_ARB_bindless_texture` or `GL_ARB_shader_draw_parameters` throws `NotSupportedException` at startup with a clear error message. No fallback. | Escape hatch was never exercised after N.4 ship. Legacy `InstancedMeshRenderer` + `StaticMeshRenderer` deleted in the N.5 retirement commit. N.6 scope narrowed accordingly. |
| 6 | Perf measurement | **CPU stopwatch + GL timer queries** logged via `[WB-DIAG]` | Captures both CPU dispatcher time and GPU rendering time. Acceptance gate compares before/after numbers in fixed Holtburg/Foundry scenes. |
| 7 | Persistent-mapped buffers | **Defer to N.6** | Bindless+indirect win is 70-80% of achievable savings. Persistent-mapped + ring + sync is the last 5-10% with non-trivial sync-fence complexity; not worth the risk in N.5's 2-3 week budget. Add post-N.5 if profiling shows residual `glBufferData` cost. |
| 8 | Per-instance highlight (selection blink) | **Defer to a Phase B.4 follow-up** | Retail pulses click targets as visual confirmation; the right mechanism is per-instance highlight color (NOT WB's global `uHighlightColor` which would tint everything in our single-indirect-call design). Field is reserved in design (extend `InstanceData` to include `vec4 highlightColor`); N.5 ships without the field, future phase plumbs it without shader rewrite. |
---
## 3. Architecture overview
### What changes
`WbDrawDispatcher.Draw` swaps its inner loop. Phases 1-3 (entity walk, group bucketing, matrix layout) stay intact. Phases 5-6 (per-group GL calls) are replaced by a single `glMultiDrawElementsIndirect` per pass, fed by SSBO-resident per-instance and per-draw data.
### What's preserved from N.4
- Group bucketing pipeline (entity AABB cull, palette hash memo, group key dictionary).
- `AcSurfaceMetadataTable` for translucency classification.
- `EntitySpawnAdapter` / `LandblockSpawnAdapter` (mesh lifecycle bridge).
- `WbMeshAdapter` (the seam over WB's `ObjectMeshManager`).
- Front-to-back sort of opaque groups (depth-test reject of overdrawn fragments).
- Per-entity 5m AABB frustum cull.
### What's new
- `TextureCache` uploads as 1-layer `Texture2DArray` instead of `Texture2D`. Generates 64-bit bindless handles at upload, makes them resident.
- New shader pair `mesh_modern.vert/.frag` modeled on WB's `StaticObjectModern` but adapted (see §6).
- Three new GPU buffers in the dispatcher:
- `_instanceSsbo``std430` layout, `mat4[]`, all visible matrices.
- `_batchSsbo``std430` layout, `BatchData[]`, one entry per group.
- `_indirectBuffer``DrawElementsIndirectCommand[]`, one per group.
- Two diagnostic measurements in `[WB-DIAG]`: CPU stopwatch span around `Draw()`; GPU `GL_TIME_ELAPSED` query around the indirect dispatch.
### What gets deleted
- `WbDrawDispatcher.DrawGroup` (replaced by indirect).
- `WbDrawDispatcher.EnsureInstanceAttribs` (no more vertex attribs at locations 3-6).
- Per-blend-mode `glBlendFunc` switch in the translucent loop.
- `mesh_instanced.vert/.frag` (replaced by `mesh_modern.*`).
### What stays under the escape hatch
`InstancedMeshRenderer` is untouched. `ACDREAM_USE_WB_FOUNDATION=0` still routes there. N.6 retires it.
---
## 4. Component changes
### 4.1 `TextureCache`
Texture upload path becomes Texture2DArray with depth=1:
```csharp
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
{
uint tex = _gl.GenTexture();
_gl.BindTexture(TextureTarget.Texture2DArray, tex);
fixed (byte* p = decoded.Rgba8)
_gl.TexImage3D(
TextureTarget.Texture2DArray, 0, InternalFormat.Rgba8,
(uint)decoded.Width, (uint)decoded.Height, depth: 1,
border: 0, PixelFormat.Rgba, PixelType.UnsignedByte, p);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat);
_gl.BindTexture(TextureTarget.Texture2DArray, 0);
return tex;
}
```
Bindless handle generation, eager + resident-on-upload, parallel cache:
```csharp
private readonly Dictionary<uint, ulong> _bindlessHandlesByGlName = new();
private ulong MakeResidentHandle(uint glTextureName)
{
if (_bindlessHandlesByGlName.TryGetValue(glTextureName, out var h))
return h;
h = _bindless.GetTextureHandleARB(glTextureName);
_bindless.MakeTextureHandleResidentARB(h);
_bindlessHandlesByGlName[glTextureName] = h;
return h;
}
```
Three new methods returning `ulong` bindless handles, paralleling the existing `uint` GL-name methods:
```csharp
public ulong GetOrUploadBindless(uint surfaceId);
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId);
public ulong GetOrUploadWithPaletteOverrideBindless(uint surfaceId, uint? overrideOrigTextureId, PaletteOverride paletteOverride, ulong precomputedPaletteHash);
```
Each delegates to its existing `uint` sibling to populate the underlying GL texture, then calls `MakeResidentHandle` and returns the 64-bit handle.
The `uint`-returning methods stay (used by `SkyRenderer`, `TerrainAtlas`, anything outside the WB modern path).
`Dispose` releases bindless handles BEFORE deleting their textures: iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`, then `glDeleteTextures` proceeds as today.
### 4.2 `WbDrawDispatcher`
Three new GPU buffers (replacing `_instanceVbo`):
```csharp
private uint _instanceSsbo; // binding=0, std430, mat4[]
private uint _batchSsbo; // binding=1, std430, BatchData[]
private uint _indirectBuffer; // GL_DRAW_INDIRECT_BUFFER, DEIC[]
```
`InstanceGroup` becomes:
```csharp
private sealed class InstanceGroup
{
public uint Ibo;
public uint FirstIndex;
public int BaseVertex;
public int IndexCount;
public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4)
public uint TextureLayer; // always 0 in N.5 (per-instance composites are 1-layer arrays)
public TranslucencyKind Translucency;
public int FirstInstance;
public int InstanceCount;
public float SortDistance;
public readonly List<Matrix4x4> Matrices = new();
}
```
`GroupKey` adds the layer:
```csharp
private readonly record struct GroupKey(
uint Ibo, uint FirstIndex, int BaseVertex, int IndexCount,
ulong BindlessTextureHandle, uint TextureLayer, TranslucencyKind Translucency);
```
Per-frame draw flow:
1. **Walk entities → build `_groups` dict** (unchanged from N.4).
2. **Lay matrices contiguously, split opaque/transparent, sort opaque** (unchanged).
3. **Build per-group BatchData and DEIC arrays.** One `BatchData` per group `(handle, layer, flags=0)`. One DEIC per group `(count = IndexCount, instanceCount = InstanceCount, firstIndex = FirstIndex, baseVertex = BaseVertex, baseInstance = FirstInstance)`. Indirect commands are laid out contiguously: opaque section first (sorted front-to-back), transparent section second. `_opaqueDrawCount` and `_transparentDrawCount` track section sizes; `_transparentByteOffset = _opaqueDrawCount * sizeof(DEIC)`.
4. **Three `glBufferData` uploads** to `_instanceSsbo`, `_batchSsbo`, `_indirectBuffer` (single buffer, both sections).
5. **Bind global VAO once** (preserved from N.4 — modern rendering shares one VAO).
6. **Bind SSBOs once** via `glBindBufferBase(SHADER_STORAGE_BUFFER, 0, _instanceSsbo)` and `... 1, _batchSsbo`.
7. **Opaque pass.** Set `uRenderPass = 0`. `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)0, drawcount=_opaqueDrawCount, stride=sizeof(DEIC))`.
8. **Transparent pass.** Set `uRenderPass = 1`. `glEnable(BLEND)` + `glBlendFunc(SrcAlpha, OneMinusSrcAlpha)` + `glDepthMask(false)`. `glMultiDrawElementsIndirect(Triangles, UnsignedShort, indirect=(void*)_transparentByteOffset, drawcount=_transparentDrawCount, stride=sizeof(DEIC))`.
9. **Restore state.** `glDepthMask(true)` + `glDisable(BLEND)` + `glBindVertexArray(0)`.
Diagnostic timing (under `ACDREAM_WB_DIAG=1`):
- CPU: `Stopwatch` started at the top of `Draw()`, stopped at the bottom. Median + 95th-percentile flushed in the 5-second `[WB-DIAG]` rollup.
- GPU: `glGenQueries` two query objects (one for opaque, one for transparent). `glBeginQuery(TIME_ELAPSED) / glEndQuery` around each `glMultiDrawElementsIndirect`. Result polled with `GL_QUERY_RESULT_NO_WAIT` on the next frame's start; if not ready, drop the sample and try again.
### 4.3 New shader files
`src/AcDream.App/Shaders/mesh_modern.vert`:
```glsl
#version 430 core
#extension GL_ARB_bindless_texture : require
#extension GL_ARB_shader_draw_parameters : require
layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;
struct InstanceData {
mat4 transform;
// Reserved for Phase B.4 follow-up (selection-blink retail-faithful highlight):
// vec4 highlightColor; // RGBA — when non-zero alpha, fragment shader mixes into output.
// Add field here, increase stride to 80 bytes, and read at fragment via flat varying.
};
struct BatchData {
uvec2 textureHandle; // bindless handle for sampler2DArray
uint textureLayer; // layer index (always 0 for per-instance composites)
uint flags; // reserved for future use
};
layout(std430, binding = 0) readonly buffer InstanceBuffer {
InstanceData Instances[];
};
layout(std430, binding = 1) readonly buffer BatchBuffer {
BatchData Batches[];
};
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
// matches existing acdream lighting UBO; do not change layout
};
uniform mat4 uViewProjection;
uniform int uRenderPass; // 0=opaque, 1=transparent (consumed in fragment shader)
out vec3 vNormal;
out vec2 vTexCoord;
out flat uvec2 vTextureHandle;
out flat uint vTextureLayer;
void main() {
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
mat4 model = Instances[instanceIndex].transform;
vec4 worldPos = model * vec4(aPosition, 1.0);
gl_Position = uViewProjection * worldPos;
vNormal = normalize(mat3(model) * aNormal);
vTexCoord = aTexCoord;
BatchData b = Batches[gl_DrawIDARB];
vTextureHandle = b.textureHandle;
vTextureLayer = b.textureLayer;
}
```
`src/AcDream.App/Shaders/mesh_modern.frag`:
```glsl
#version 430 core
#extension GL_ARB_bindless_texture : require
in vec3 vNormal;
in vec2 vTexCoord;
in flat uvec2 vTextureHandle;
in flat uint vTextureLayer;
layout(std140, binding = 1) uniform LightingUbo {
vec4 uAmbient;
vec4 uSunDir;
vec4 uSunColor;
};
uniform int uRenderPass;
out vec4 FragColor;
void main() {
sampler2DArray tex = sampler2DArray(vTextureHandle);
vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));
if (uRenderPass == 0) {
// Opaque pass: discard soft pixels (alpha cutout), write to depth
if (color.a < 0.95) discard;
} else {
// Transparent pass: discard hard pixels (already drawn opaque), no depth write
if (color.a >= 0.95) discard;
if (color.a < 0.05) discard; // skip totally-empty fragments perf for large transparent overdraw
}
// Diffuse lighting (preserved from acdream's existing lighting model)
vec3 N = normalize(vNormal);
vec3 L = normalize(uSunDir.xyz);
float diff = max(dot(N, L), 0.0);
vec3 lit = uAmbient.rgb + uSunColor.rgb * diff;
color.rgb *= clamp(lit, 0.0, 1.0);
FragColor = color;
}
```
Differences from WB's `StaticObjectModern.*`:
- Drops `uActiveCells[]` cell-filtering (acdream culls cells on CPU).
- Drops `uDrawIDOffset` (acdream issues full passes, no pagination).
- Drops `uHighlightColor` (deferred to Phase B.4 follow-up; reserved as per-instance `highlightColor` field, not a global uniform).
- Adapts the lighting model to acdream's existing UBO at binding=1 instead of WB's `SceneData` UBO.
- Uses 1-layer `sampler2DArray` for ALL textures (WB uses multi-layer atlases — same shader works for both shapes).
---
## 5. Per-frame data flow walk-through
A concrete trace. Visible work for frame N:
| Group | GfxObj | Surface | Translucency | Instances |
|---|---|---|---|---|
| 0 | oak tree | bark | Opaque | 12 |
| 1 | oak tree | leaves | AlphaBlend | 12 |
| 2 | drudge | skin (palette override) | Opaque | 1 |
| 3 | drudge | eyes | Opaque | 1 |
**Instance SSBO** (binding=0), 26 entries (each batch contributes its own copy of the entity matrix):
```
[0..11] = oak instance matrices (group 0 — bark)
[12..23] = oak instance matrices (group 1 — leaves)
[24] = drudge instance matrix (group 2 — skin)
[25] = drudge instance matrix (group 3 — eyes)
```
**Batch SSBO** (binding=1), 4 entries indexed by `gl_DrawIDARB`:
```
Batches[0] = (oak_bark_handle, layer=0, flags=0)
Batches[1] = (oak_leaves_handle, layer=0, flags=0)
Batches[2] = (drudge_skin_handle_with_palette, layer=0, flags=0)
Batches[3] = (drudge_eyes_handle, layer=0, flags=0)
```
**Indirect buffer** (single buffer, two sections):
```
_indirectBuffer[0..2] = opaque section (3 entries, sorted front-to-back)
[0] = (count=oakBarkIdx, instanceCount=12, firstIndex=oakBarkFI, baseVertex=oakBV, baseInstance=0)
[1] = (count=drudgeSkinIdx, instanceCount=1, firstIndex=drudgeSkinFI, baseVertex=drudgeBV, baseInstance=24)
[2] = (count=drudgeEyesIdx, instanceCount=1, firstIndex=drudgeEyesFI, baseVertex=drudgeBV, baseInstance=25)
_indirectBuffer[3] = transparent section (1 entry)
[3] = (count=oakLeavesIdx, instanceCount=12, firstIndex=oakLeavesFI, baseVertex=oakBV, baseInstance=12)
_opaqueDrawCount = 3; _transparentDrawCount = 1; _transparentByteOffset = 3 * sizeof(DEIC) = 60.
```
**Shader access pattern** (per vertex):
```glsl
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID; // unique per (group, instance) pair
mat4 model = Instances[instanceIndex].transform;
BatchData b = Batches[gl_DrawIDARB]; // shared across all verts in this draw
sampler2DArray tex = sampler2DArray(b.textureHandle);
vec4 color = texture(tex, vec3(aTexCoord, float(b.textureLayer)));
```
**Per-frame CPU GL calls** (entity rendering, total):
- 3× `glBufferData` (instance SSBO, batch SSBO, indirect buffer).
- 1× `glBindVertexArray(globalVAO)`.
- 2× `glBindBufferBase` (SSBOs at bindings 0 + 1).
- 1× `glBindBuffer(DRAW_INDIRECT_BUFFER, _indirectBuffer)`.
- 2× `glMultiDrawElementsIndirect` (one opaque, one transparent).
- ~5 state changes (blend, depth mask, render pass uniform).
Total: ~15-20 GL calls per frame for entity rendering, regardless of group count. N.4 baseline is "few hundred."
---
## 6. Translucent rendering detail
Per Decision 2: WB's two-pass alpha-test pattern.
**Group classification.** `ClassifyBatches` puts groups into one of two arrays:
- **Opaque indirect:** `TranslucencyKind.Opaque` and `TranslucencyKind.ClipMap`.
- **Transparent indirect:** `TranslucencyKind.AlphaBlend`, `Additive`, `InvAlpha` all merged. Per Decision 2, additive renders as alpha-blend; falsifiable at visual verification.
Opaque groups stay sorted front-to-back by `SortDistance` (preserved from N.4 — depth-test reject of overdrawn fragments is a meaningful win on dense scenes).
**Pass GL state:**
```csharp
// Opaque pass
_gl.Disable(EnableCap.Blend);
_gl.DepthMask(true);
_gl.Enable(EnableCap.CullFace); _gl.CullFace(TriangleFace.Back); _gl.FrontFace(FrontFaceDirection.Ccw);
_shader.SetInt("uRenderPass", 0);
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)0, drawcount: _opaqueDrawCount, stride: (uint)sizeof(DEIC));
// Transparent pass
_gl.Enable(EnableCap.Blend);
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
_gl.DepthMask(false);
_shader.SetInt("uRenderPass", 1);
_gl.MultiDrawElementsIndirect(PrimitiveType.Triangles, DrawElementsType.UnsignedShort,
indirect: (void*)_transparentByteOffset, drawcount: _transparentDrawCount, stride: (uint)sizeof(DEIC));
// Cleanup
_gl.DepthMask(true); _gl.Disable(EnableCap.Blend); _gl.BindVertexArray(0);
```
**Visual verification gate (additive fallback plan).** During Week 2-3 visual verification, look at:
- Holtburg courtyard, dungeon entrance — confirm scenery + characters identical.
- Foundry interior — magic-themed content with potentially additive-flagged surfaces.
- Any glowing weapon decals, magical aura effects, or self-luminous textures observed.
If a visible regression appears (faded glow, missing additive bloom): amend spec to add a third indirect call within the transparent pass with `glBlendFunc(SrcAlpha, One)`. Group classification splits Additive into its own bucket. ~30-min change.
---
## 7. Error handling and fallback
### 7.1 GPU capability detection
WB's `OpenGLGraphicsDevice` already detects:
- `HasOpenGL43` (required for SSBOs, multi-draw indirect, `gl_BaseInstanceARB`).
- `HasBindless` (required for bindless texture handles).
`WbDrawDispatcher` is only constructed when `WbFoundationFlag.Enabled` is true, which gates on `_useModernRendering = HasOpenGL43 && HasBindless`. We inherit WB's gating.
**Additional check:** `GL_ARB_shader_draw_parameters` (for `gl_BaseInstanceARB`, `gl_DrawIDARB`). Standard on GL 4.6, available as extension on 4.3+. Add to N.5's capability check; if missing, `WbDrawDispatcher` constructor logs a one-time warning and the foundation flag flips off (falls back to `InstancedMeshRenderer`).
### 7.2 Shader compile failure
If `mesh_modern.vert/.frag` fails to compile (driver bug, GLSL version mismatch, extension issue): catch the compile exception in `WbDrawDispatcher` constructor, log the GLSL info log + GPU vendor/renderer string ONCE, flip `WbFoundationFlag.Enabled = false` for the session, fall back to `InstancedMeshRenderer`. Do not crash.
### 7.3 Non-resident handle (the bindless foot-gun)
Sampling a non-resident handle causes undefined behavior (driver-dependent: black texture, GPU fault, device-lost).
Mitigation in code: `TextureCache.MakeResidentHandle` is the only API that produces a handle, and it makes the handle resident in the same call. There is no API surface that produces a non-resident handle. Defense-in-depth: dispatcher asserts `BindlessTextureHandle != 0` before queuing a draw (zero handles get filtered out, same as zero `surfaceId` does today).
### 7.4 Indirect command corruption
`count`, `firstIndex`, `baseVertex` come from WB's `ObjectRenderBatch` (never user input; WB-internal correctness). `instanceCount` is `grp.Matrices.Count` (we control). `baseInstance` is `grp.FirstInstance` (we control, computed cumulatively). Bug-class is "WB-internal corruption + our cumulative-offset bug" — same surface area as N.4's `BaseInstance` already trusts. Add a debug-build assertion: cumulative `baseInstance` values must be strictly increasing.
### 7.5 Disposal order
`WbDrawDispatcher.Dispose` releases bindless handles before deleting underlying textures (driver UB otherwise). `TextureCache.Dispose` does this:
1. Iterate `_bindlessHandlesByGlName.Values`, call `glMakeTextureHandleNonResidentARB(handle)`.
2. Call `_glExtensions.MakeAllNonResidentARB` if available (some drivers prefer batch).
3. Then `glDeleteTextures` proceeds as today.
Dispatcher's own buffer cleanup (`_instanceSsbo`, `_batchSsbo`, `_indirectBuffer`) via `glDeleteBuffers`.
### 7.6 Persistent first-failure diagnostic
If shader compile fails OR an extension check fails OR `glMultiDrawElementsIndirect` returns `GL_INVALID_OPERATION` on first frame: log ONCE with GPU vendor/renderer string + GLSL info log. Don't spam. User pastes the line into a bug report; we know exactly where to look.
---
## 8. Testing and acceptance
### 8.1 Unit / conformance tests
- **`TextureCacheBindlessTests`** — for each `Bindless`-suffixed `GetOrUpload*`: returns non-zero `ulong`, returns same handle for same key (cache hit), distinct keys yield distinct handles, returned handle is resident per GL state query.
- **`WbDrawDispatcherIndirectBuilderTests`** — pure CPU test: given a fixture of `(entity, mesh, batch)` tuples, verify the indirect buffer layout: `count` / `firstIndex` / `baseVertex` / `baseInstance` per group, opaque section sorted front-to-back, transparent section in classification order (no sort — back-to-front sort can be added in a follow-up if measured useful).
- **`WbDrawDispatcherTranslucencyTests`** — verify groups land in correct indirect buffer (opaque vs transparent) per `TranslucencyKind`. `Additive`/`InvAlpha` go to transparent. `ClipMap` goes to opaque. Empty groups skipped.
- **Existing N.4 tests stay green.** All 60 tests captured by `FullyQualifiedName~Wb|MatrixComposition` filter remain at 60/0.
### 8.2 Visual verification
Same gate as N.4 used. Live ACE + retail dat, in-world testing.
- **Holtburg courtyard** — characters + scenery + buildings render identically to N.4. No missing entities, no z-fighting, no exploded parts.
- **Foundry interior** — dense static-object scene, stress-tests indirect call count and translucency classification.
- **Indoor → outdoor cell transition** — confirms cell visibility filtering still works (we cull on CPU; dispatcher should never see invisible-cell entities).
- **Drudge / character close-up** — confirms Issue #47 close-detail mesh preservation.
- **Magic content (additive fallback check)** — Foundry runes, glowing weapons if observable, boss models with luminous decals. Trigger spec amendment if regression spotted.
User-confirms each. These are visual identity checks against the running N.4 behavior (use `git stash` of N.5 changes + relaunch as the comparison baseline).
### 8.3 Perf measurement (the win gate)
`[WB-DIAG]` augmented:
```
[WB-DIAG] entSeen=N entDrawn=M ... drawsIssued=K groups=G (existing)
[WB-DIAG] cpu_us=Xmedian/Y95p gpu_us=Zmedian/W95p (new)
```
Capture before/after numbers in fixed scenes/cameras:
| Scene | Camera position | Metric |
|---|---|---|
| Holtburg courtyard | 30m elevated, looking SW | `cpu`, `gpu`, `drawsIssued` |
| Foundry interior | character spawn, default heading | `cpu`, `gpu`, `drawsIssued` |
| Open landscape | terrain wander, no entities | `cpu`, `gpu`, `drawsIssued` (sanity) |
**Acceptance gates** (paste into SHIP commit message):
- Visual identity to N.4 — confirmed via §8.2.
- CPU dispatcher time ≤ 70% of N.4 in Holtburg courtyard (target: ≥30% reduction).
- GPU rendering time within ±10% of N.4 (sanity: no regression).
- `drawsIssued ≤ 5 per pass` (down from "few hundred per pass").
- All tests green — 60+ Wb tests + new bindless/indirect tests.
- `ACDREAM_USE_WB_FOUNDATION=0` still works — `InstancedMeshRenderer` fallback runs and renders correctly.
### 8.4 Long-session sanity check
Hour-long session with `ACDREAM_WB_DIAG=1`. Watch resident-handle count grow. Expected: bounded plateau under 5K once content set is fully traversed. If unbounded growth, residency policy revisit required in N.6.
---
## 9. Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Driver bug in bindless residency | Low (mature in 2025+ drivers) | Crash / black textures | One-time logging on first failure; legacy fallback under flag-off |
| Driver bug in `glMultiDrawElementsIndirect` | Low | GL_INVALID_OPERATION | Capability check + first-failure logging + fallback |
| Resident handle count exceeds driver limit in long session | Low (acdream content is bounded) | Cumulative GPU memory pressure → eventual eviction surprises | `[WB-DIAG]` resident-count log; revisit eviction in N.6 if it grows unbounded |
| Shader compile fails on weird GPU | Medium-low | First-launch failure | Compile-error catch + fallback to `InstancedMeshRenderer` |
| Additive fidelity regression on rare GfxObj surfaces | Medium | Subtle visual difference | Visual verification at magic-themed content; spec amendment for additive sub-pass if found |
| `gl_BaseInstanceARB` fields not advancing per-instance attribs we still use | Low (we drop attribs entirely) | Wrong matrices | All instance data via SSBO; no vertex attrib at locations 3-6 to misalign |
| SSBO indexing GPU cost worse than uniform-array | Low (well-optimized in modern drivers) | Possible GPU time regression | GL timer queries detect; if observed, fall back to uniform array of bounded size |
| Persistent-mapped buffer foot-guns (chosen NOT to use in N.5) | n/a | n/a | Decision 7 defers to N.6 |
| Per-instance highlight (selection blink) feature creep | Low | Scope grows | Decision 8 defers; field reserved in design doc |
---
## 10. Out of scope (explicitly)
The following are NOT N.5 work. They become possible follow-ons.
- **WB's `TextureAtlasManager` adoption for atlas tier.** N.5 keeps acdream's `TextureCache` as the texture owner for everything. Atlas adoption is N.6+ if memory pressure shows up.
- **Persistent-mapped buffer ring with sync fences.** Decision 7. N.6 candidate if profiling shows residual `glBufferData` cost.
- **GPU-side culling (compute pre-pass).** Future phase.
- **Texture array repacking for multi-layer per-instance composites.** Future, if many palette-overrides actually share dimensions and could be packed.
- **Selection-blink highlight color.** Decision 8. Phase B.4 follow-up. Field reserved in `InstanceData` design (extend stride to 80 bytes when implementing).
- ~~**Deletion of legacy `InstancedMeshRenderer`.** N.6.~~ **Done in N.5 ship amendment**`InstancedMeshRenderer`, `StaticMeshRenderer`, and `WbFoundationFlag` were deleted in the retirement commit.
- **Terrain wiring through WB.** Future.
---
## 11. Open questions
None outstanding. All 8 brainstorm questions resolved + 1 clarification on highlight semantics. Ready for plan.
---
*End of design.*

View file

@ -14,6 +14,7 @@
</ItemGroup>
<ItemGroup>
<PackageReference Include="Silk.NET.OpenGL" Version="2.23.0" />
<PackageReference Include="Silk.NET.OpenGL.Extensions.ARB" Version="2.23.0" />
<PackageReference Include="Silk.NET.Windowing" Version="2.23.0" />
<PackageReference Include="Silk.NET.Input" Version="2.23.0" />
<PackageReference Include="Silk.NET.OpenAL" Version="2.23.0" />

View file

@ -25,14 +25,17 @@ public sealed class GameWindow : IDisposable
private DatCollection? _dats;
private float _lastMouseX;
private float _lastMouseY;
private InstancedMeshRenderer? _staticMesh;
private Shader? _meshShader;
private TextureCache? _textureCache;
/// <summary>Phase N.4: WB-backed rendering pipeline adapter. Non-null only
/// when <c>ACDREAM_USE_WB_FOUNDATION=1</c> is set; null otherwise.</summary>
/// <summary>Phase N.4+: WB-backed rendering pipeline adapter. Always non-null
/// after <c>OnLoad</c> completes (modern path is mandatory as of N.5).</summary>
private AcDream.App.Rendering.Wb.WbMeshAdapter? _wbMeshAdapter;
private AcDream.App.Rendering.Wb.EntitySpawnAdapter? _wbEntitySpawnAdapter;
private AcDream.App.Rendering.Wb.WbDrawDispatcher? _wbDrawDispatcher;
/// <summary>Phase N.5: ARB_bindless_texture + ARB_shader_draw_parameters
/// support. Required at startup — missing bindless throws
/// <see cref="NotSupportedException"/> in <c>OnLoad</c>.</summary>
private AcDream.App.Rendering.Wb.BindlessSupport? _bindlessSupport;
private SamplerCache? _samplerCache;
private DebugLineRenderer? _debugLines;
// K-fix4 (2026-04-26): default OFF. The orange BSP / green cylinder
@ -966,10 +969,6 @@ public sealed class GameWindow : IDisposable
Path.Combine(shadersDir, "terrain.vert"),
Path.Combine(shadersDir, "terrain.frag"));
_meshShader = new Shader(_gl,
Path.Combine(shadersDir, "mesh_instanced.vert"),
Path.Combine(shadersDir, "mesh_instanced.frag"));
// Phase G.1/G.2: shared scene-lighting UBO. Stays bound at
// binding=1 for the lifetime of the process — every shader that
// declares `layout(std140, binding = 1) uniform SceneLighting`
@ -1419,7 +1418,43 @@ public sealed class GameWindow : IDisposable
_heightTable = heightTable;
_surfaceCache = new Dictionary<uint, AcDream.Core.Terrain.SurfaceInfo>();
_textureCache = new TextureCache(_gl, _dats);
// N.5: detect ARB_bindless_texture + ARB_shader_draw_parameters.
// The modern path (SSBO + glMultiDrawElementsIndirect + bindless textures)
// is mandatory as of Phase N.5 — missing extensions throw at startup with
// a clear error so users can file a real bug report rather than silently
// falling back to a half-working renderer.
if (AcDream.App.Rendering.Wb.BindlessSupport.TryCreate(_gl, out var bindless))
{
if (bindless!.HasShaderDrawParameters(_gl))
{
_bindlessSupport = bindless;
Console.WriteLine("[N.5] modern path capabilities present (bindless + ARB_shader_draw_parameters)");
}
else
{
Console.WriteLine("[N.5] GL_ARB_shader_draw_parameters not present — modern path not available");
}
}
else
{
Console.WriteLine("[N.5] GL_ARB_bindless_texture not present — modern path not available");
}
if (_bindlessSupport is null)
{
throw new NotSupportedException(
"acdream requires GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters " +
"(GL 4.3+ with bindless support). Your GPU/driver does not expose these extensions. " +
"If this is unexpected, please file a bug report with your GPU vendor + driver version.");
}
// Mesh shader always loads (modern path is the only path).
_meshShader = new Shader(_gl,
Path.Combine(shadersDir, "mesh_modern.vert"),
Path.Combine(shadersDir, "mesh_modern.frag"));
Console.WriteLine("[N.5] mesh_modern shader loaded");
_textureCache = new TextureCache(_gl, _dats, _bindlessSupport);
// Two persistent GL sampler objects (Repeat + ClampToEdge) so
// the sky pass can pick wrap mode per submesh without mutating
// shared per-texture wrap state. See SamplerCache + the
@ -1427,17 +1462,14 @@ public sealed class GameWindow : IDisposable
// references/WorldBuilder/Chorizite.OpenGLSDLBackend/OpenGLGraphicsDevice.cs:115-132.
_samplerCache = new SamplerCache(_gl);
// Phase N.4 — WB rendering pipeline foundation. Constructed only when
// ACDREAM_USE_WB_FOUNDATION=1 is set; otherwise the legacy renderer
// path stays in charge. The full ObjectMeshManager bring-up lives in
// WbMeshAdapter (Task 9): OpenGLGraphicsDevice + DefaultDatReaderWriter
// + ObjectMeshManager. WbMeshAdapter opens its own file handles for
// the dat files (independent of our DatCollection).
if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled)
// Phase N.4+N.5 — WB rendering pipeline foundation. The modern path is
// mandatory as of N.5 ship amendment: WbMeshAdapter + WbDrawDispatcher
// always construct. WbMeshAdapter owns ObjectMeshManager and opens its
// own file handles for the dat files (independent of our DatCollection).
{
var wbLogger = Microsoft.Extensions.Logging.Abstractions.NullLogger<AcDream.App.Rendering.Wb.WbMeshAdapter>.Instance;
_wbMeshAdapter = new AcDream.App.Rendering.Wb.WbMeshAdapter(_gl, _datDir, _dats, wbLogger);
Console.WriteLine("[N.4] WbFoundation flag is ENABLED — routing static content through ObjectMeshManager.");
Console.WriteLine("[N.4+N.5] WB foundation + modern path active — routing all content through ObjectMeshManager.");
}
// Phase N.4 Task 12: construct LandblockSpawnAdapter under the feature flag
@ -1446,60 +1478,51 @@ public sealed class GameWindow : IDisposable
// one that carries the adapter so AddLandblock/RemoveLandblock notify WB.
// Phase N.4 Task 17: also construct EntitySpawnAdapter for server-spawned
// per-instance content under the same flag.
// N.5 mandatory path: spawn adapters + dispatcher always construct.
// _wbMeshAdapter, _meshShader, _textureCache, and _bindlessSupport are
// all guaranteed non-null here (startup throws above if any are missing).
{
AcDream.App.Rendering.Wb.LandblockSpawnAdapter? wbSpawnAdapter = null;
AcDream.App.Rendering.Wb.EntitySpawnAdapter? wbEntitySpawnAdapter = null;
if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled && _wbMeshAdapter is not null)
var wbSpawnAdapter = new AcDream.App.Rendering.Wb.LandblockSpawnAdapter(_wbMeshAdapter!);
// Sequencer factory: look up Setup + MotionTable from dats and build
// an AnimationSequencer. Falls back to a no-op sequencer when the
// entity has no motion table (static props, etc.). Uses _animLoader
// which is initialised earlier in OnLoad; it is non-null here.
var capturedDats = _dats;
var capturedAnimLoader = _animLoader;
AcDream.Core.Physics.AnimationSequencer SequencerFactory(AcDream.Core.World.WorldEntity e)
{
wbSpawnAdapter = new AcDream.App.Rendering.Wb.LandblockSpawnAdapter(_wbMeshAdapter);
// Sequencer factory: look up Setup + MotionTable from dats and build
// an AnimationSequencer. Falls back to a no-op sequencer when the
// entity has no motion table (static props, etc.). Uses _animLoader
// which is initialised at line 1004; it is non-null here because
// OnLoad wires _dats + _animLoader before this block runs.
var capturedDats = _dats;
var capturedAnimLoader = _animLoader;
AcDream.Core.Physics.AnimationSequencer SequencerFactory(AcDream.Core.World.WorldEntity e)
if (capturedDats is not null && capturedAnimLoader is not null)
{
if (capturedDats is not null && capturedAnimLoader is not null)
var setup = capturedDats.Get<DatReaderWriter.DBObjs.Setup>(e.SourceGfxObjOrSetupId);
if (setup is not null)
{
var setup = capturedDats.Get<DatReaderWriter.DBObjs.Setup>(e.SourceGfxObjOrSetupId);
if (setup is not null)
uint mtableId = (uint)setup.DefaultMotionTable;
if (mtableId != 0)
{
uint mtableId = (uint)setup.DefaultMotionTable;
if (mtableId != 0)
{
var mtable = capturedDats.Get<DatReaderWriter.DBObjs.MotionTable>(mtableId);
if (mtable is not null)
return new AcDream.Core.Physics.AnimationSequencer(setup, mtable, capturedAnimLoader);
}
// Setup exists but no motion table — no-op sequencer.
return new AcDream.Core.Physics.AnimationSequencer(
setup,
new DatReaderWriter.DBObjs.MotionTable(),
capturedAnimLoader);
var mtable = capturedDats.Get<DatReaderWriter.DBObjs.MotionTable>(mtableId);
if (mtable is not null)
return new AcDream.Core.Physics.AnimationSequencer(setup, mtable, capturedAnimLoader);
}
// Setup exists but no motion table — no-op sequencer.
return new AcDream.Core.Physics.AnimationSequencer(
setup,
new DatReaderWriter.DBObjs.MotionTable(),
capturedAnimLoader);
}
// Complete fallback: empty setup + empty motion table + null loader.
return new AcDream.Core.Physics.AnimationSequencer(
new DatReaderWriter.DBObjs.Setup(),
new DatReaderWriter.DBObjs.MotionTable(),
new NullAnimLoader());
}
wbEntitySpawnAdapter = new AcDream.App.Rendering.Wb.EntitySpawnAdapter(
_textureCache, SequencerFactory, _wbMeshAdapter);
_wbEntitySpawnAdapter = wbEntitySpawnAdapter;
// Complete fallback: empty setup + empty motion table + null loader.
return new AcDream.Core.Physics.AnimationSequencer(
new DatReaderWriter.DBObjs.Setup(),
new DatReaderWriter.DBObjs.MotionTable(),
new NullAnimLoader());
}
var wbEntitySpawnAdapter = new AcDream.App.Rendering.Wb.EntitySpawnAdapter(
_textureCache!, SequencerFactory, _wbMeshAdapter!);
_wbEntitySpawnAdapter = wbEntitySpawnAdapter;
_worldState = new AcDream.App.Streaming.GpuWorldState(wbSpawnAdapter, wbEntitySpawnAdapter);
}
_staticMesh = new InstancedMeshRenderer(_gl, _meshShader, _textureCache, _wbMeshAdapter);
if (AcDream.App.Rendering.Wb.WbFoundationFlag.IsEnabled
&& _wbMeshAdapter is not null && _wbEntitySpawnAdapter is not null)
{
_wbDrawDispatcher = new AcDream.App.Rendering.Wb.WbDrawDispatcher(
_gl, _meshShader, _textureCache, _wbMeshAdapter, _wbEntitySpawnAdapter);
_gl, _meshShader!, _textureCache!, _wbMeshAdapter!, _wbEntitySpawnAdapter, _bindlessSupport!);
}
// Phase G.1 sky renderer — its own shader (sky.vert / sky.frag)
@ -1509,7 +1532,7 @@ public sealed class GameWindow : IDisposable
Path.Combine(shadersDir, "sky.vert"),
Path.Combine(shadersDir, "sky.frag"));
_skyRenderer = new AcDream.App.Rendering.Sky.SkyRenderer(
_gl, _dats, skyShader, _textureCache, _samplerCache);
_gl, _dats, skyShader, _textureCache!, _samplerCache);
// Phase G.1 particle renderer — renders rain / snow / spell auras
// spawned into the shared ParticleSystem as billboard quads.
@ -2025,7 +2048,7 @@ public sealed class GameWindow : IDisposable
}
}
if (_dats is null || _staticMesh is null) return;
if (_dats is null) return;
if (spawn.Position is null || spawn.SetupTableId is null)
{
// Can't place a mesh without both. Most of these are inventory
@ -2360,10 +2383,9 @@ public sealed class GameWindow : IDisposable
continue;
}
_physicsDataCache.CacheGfxObj(mr.GfxObjId, gfx);
var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats);
_staticMesh.EnsureUploaded(mr.GfxObjId, subMeshes);
if (dumpClothing)
{
var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats);
int tris = 0; int subs = 0;
foreach (var sm in subMeshes) { tris += sm.Indices.Length / 3; subs++; }
dumpClothingTotalTris += tris;
@ -5194,44 +5216,25 @@ public sealed class GameWindow : IDisposable
portalPlanes, origin.X, origin.Y);
}
// Upload every GfxObj referenced by this landblock's entities.
// EnsureUploaded is idempotent so duplicates across landblocks are free.
if (_staticMesh is not null)
// N.5: WbMeshAdapter.Tick() handles GPU upload for all GfxObj meshes via
// ObjectMeshManager.PrepareMeshDataAsync. The legacy EnsureUploaded loop
// (and _pendingCellMeshes drain) are retired with InstancedMeshRenderer.
// Cache GfxObj physics data (BSP trees) for the physics engine — this
// loop is physics-only, not renderer-side.
foreach (var entity in lb.Entities)
{
// Task 8: drain any pending EnvCell room-mesh sub-meshes first.
// The worker thread pre-built these CPU-side and stored them in
// _pendingCellMeshes. We must upload them here (render thread) before
// the per-MeshRef loop below tries to look them up via GfxObjMesh.Build,
// which would fail because EnvCell ids (0xAAAA01xx) aren't real GfxObj
// dat ids. EnsureUploaded is idempotent so calling it here then seeing
// the same id again in the loop below is safe.
foreach (var entity in lb.Entities)
foreach (var meshRef in entity.MeshRefs)
{
foreach (var meshRef in entity.MeshRefs)
{
if (_pendingCellMeshes.TryRemove(meshRef.GfxObjId, out var cellSubMeshes))
_staticMesh.EnsureUploaded(meshRef.GfxObjId, cellSubMeshes);
}
}
// Now upload regular GfxObj sub-meshes (stabs, scenery, interior stabs).
// Skip any ids already uploaded (includes the cell meshes just drained).
foreach (var entity in lb.Entities)
{
foreach (var meshRef in entity.MeshRefs)
{
// Skip EnvCell synthetic ids — already handled above (or already
// uploaded on a prior tick). GfxObj ids are 0x01xxxxxx; Setup ids
// are 0x02xxxxxx; anything else is not a GfxObj dat record.
if ((meshRef.GfxObjId & 0xFF000000u) != 0x01000000u) continue;
var gfx = _dats.Get<DatReaderWriter.DBObjs.GfxObj>(meshRef.GfxObjId);
if (gfx is null) continue;
_physicsDataCache.CacheGfxObj(meshRef.GfxObjId, gfx);
var subMeshes = AcDream.Core.Meshing.GfxObjMesh.Build(gfx, _dats);
_staticMesh.EnsureUploaded(meshRef.GfxObjId, subMeshes);
}
if ((meshRef.GfxObjId & 0xFF000000u) != 0x01000000u) continue;
var gfx = _dats.Get<DatReaderWriter.DBObjs.GfxObj>(meshRef.GfxObjId);
if (gfx is null) continue;
_physicsDataCache.CacheGfxObj(meshRef.GfxObjId, gfx);
}
}
// Drain _pendingCellMeshes to prevent unbounded accumulation.
// The data is no longer consumed (WB handles EnvCell geometry through
// its own pipeline), but the worker thread still populates this dict.
_pendingCellMeshes.Clear();
// Task 7: register static entities into the ShadowObjectRegistry so the
// Transition system can find and collide against them during movement.
@ -6336,20 +6339,11 @@ public sealed class GameWindow : IDisposable
animatedIds.Add(k);
}
if (_wbDrawDispatcher is not null)
{
_wbDrawDispatcher.Draw(camera, _worldState.LandblockEntries, frustum,
neverCullLandblockId: playerLb,
visibleCellIds: visibility?.VisibleCellIds,
animatedEntityIds: animatedIds);
}
else
{
_staticMesh?.Draw(camera, _worldState.LandblockEntries, frustum,
neverCullLandblockId: playerLb,
visibleCellIds: visibility?.VisibleCellIds,
animatedEntityIds: animatedIds);
}
// N.5: WbDrawDispatcher is always non-null (modern path mandatory).
_wbDrawDispatcher!.Draw(camera, _worldState.LandblockEntries, frustum,
neverCullLandblockId: playerLb,
visibleCellIds: visibility?.VisibleCellIds,
animatedEntityIds: animatedIds);
// Phase G.1 / E.3: draw all live particles after opaque
// scene geometry so alpha blending composites correctly.
@ -8731,11 +8725,10 @@ public sealed class GameWindow : IDisposable
_liveSession?.Dispose();
_audioEngine?.Dispose(); // Phase E.2: stop all voices, close AL context
_wbDrawDispatcher?.Dispose();
_staticMesh?.Dispose();
_skyRenderer?.Dispose(); // depends on sampler cache; dispose first
_samplerCache?.Dispose();
_textureCache?.Dispose();
_wbMeshAdapter?.Dispose(); // Phase N.4 WB foundation — null when flag off
_wbMeshAdapter?.Dispose(); // Phase N.4+N.5 WB foundation (mandatory modern path)
_meshShader?.Dispose();
_terrain?.Dispose();

View file

@ -1,596 +0,0 @@
// src/AcDream.App/Rendering/InstancedMeshRenderer.cs
//
// True instanced rendering for static-object meshes.
// Groups entities by GfxObjId. All instance model matrices are written into
// a single shared instance VBO once per frame. Each sub-mesh is drawn with
// DrawElementsInstanced — one GL draw call per (GfxObj × sub-mesh) instead
// of one per entity. For a scene with N unique GfxObjs and M total entities
// this reduces draw calls from M*subMeshes to N*subMeshes.
//
// Matrix layout:
// System.Numerics.Matrix4x4 is row-major. Written to the float[] buffer in
// natural memory order (M11..M44). The GLSL shader reads 4 vec4 attributes
// (aInstanceRow0-3) and constructs mat4(row0, row1, row2, row3). Because
// GLSL mat4() takes column vectors, the rows of the C# matrix become the
// columns of the GLSL mat4 — which is the same transpose that UniformMatrix4
// with transpose=false produces. Visual result is identical to the old
// SetMatrix4("uModel", ...) path.
//
// Architecture note: public API matches StaticMeshRenderer so GameWindow only
// needs to update the shader and uniform setup at the call sites.
using System.Numerics;
using System.Runtime.InteropServices;
using AcDream.App.Rendering.Wb;
using AcDream.Core.Meshing;
using AcDream.Core.Terrain;
using AcDream.Core.World;
using Silk.NET.OpenGL;
namespace AcDream.App.Rendering;
public sealed unsafe class InstancedMeshRenderer : IDisposable
{
private readonly GL _gl;
private readonly Shader _shader;
private readonly TextureCache _textures;
/// <summary>
/// Optional WB adapter. Held but currently unused — Phase N.4 Adjustment 2
/// (2026-05-08) reverted Task 9's renderer-level routing. Tier-routing decisions
/// (atlas vs per-instance) belong at the spawn-callback layer (Task 11
/// LandblockSpawnAdapter for atlas-tier; Task 17 EntitySpawnAdapter for
/// per-instance), not in the renderer which is intentionally tier-blind. The
/// constructor parameter is preserved so GameWindow's wire-up doesn't shift
/// when later tasks need adapter access.
/// </summary>
private readonly WbMeshAdapter? _wbMeshAdapter;
// One GPU bundle per unique GfxObj id. Each GfxObj can have multiple sub-meshes.
private readonly Dictionary<uint, List<SubMeshGpu>> _gpuByGfxObj = new();
// Shared instance VBO — filled every frame with all instance model matrices.
private readonly uint _instanceVbo;
// Per-frame scratch: reused float buffer for instance matrix data.
// 16 floats per mat4. Grown on demand; never shrunk.
private float[] _instanceBuffer = new float[256 * 16]; // start at 256 instances
// ── Instance grouping scratch ─────────────────────────────────────────────
//
// Reused every frame to avoid per-frame allocation.
//
// **Group key = (GfxObjId, PaletteOverrideHash, SurfaceOverridesHash).**
//
// An earlier implementation grouped on <c>GfxObjId</c> alone and resolved
// the per-sub-mesh texture from the first instance in the group — which
// is fine for scenery where every tree shares the same palette, but
// utterly broken for NPCs: every humanoid uses the same base body
// GfxObjs and they all piled into one group, so the first NPC's palette
// was used for every NPC in the frame. Frustum culling + iteration
// order meant that "first NPC" changed as the camera turned — producing
// the "NPC clothing changes when I turn" symptom.
//
// Now we also key by the entity's PaletteOverride + per-MeshRef
// SurfaceOverrides signature so only entities that decode to the
// SAME texture for every sub-mesh can share a batch. Entities with
// unique appearance fall to single-instance groups (still correct,
// marginally slower than true instancing).
private readonly Dictionary<GroupKey, InstanceGroup> _groups = new();
private readonly record struct GroupKey(uint GfxObjId, ulong TextureSignature);
public InstancedMeshRenderer(GL gl, Shader shader, TextureCache textures,
WbMeshAdapter? wbMeshAdapter = null)
{
_gl = gl;
_shader = shader;
_textures = textures;
_wbMeshAdapter = wbMeshAdapter;
_instanceVbo = _gl.GenBuffer();
}
// ── Upload ────────────────────────────────────────────────────────────────
public void EnsureUploaded(uint gfxObjId, IReadOnlyList<GfxObjSubMesh> subMeshes)
{
if (_gpuByGfxObj.ContainsKey(gfxObjId))
return;
// Phase N.4 Adjustment 2 (2026-05-08): renderer is tier-blind. Tier-routing
// (atlas vs per-instance) lives at the spawn-callback layer (Tasks 11 + 17),
// not here. Smoke-test of the original Task 9 routing showed it caught
// characters / NPCs (server-spawned, per-instance tier) along with static
// scenery, because EnsureUploaded is called from both spawn paths.
var list = new List<SubMeshGpu>(subMeshes.Count);
foreach (var sm in subMeshes)
list.Add(UploadSubMesh(sm));
_gpuByGfxObj[gfxObjId] = list;
}
private SubMeshGpu UploadSubMesh(GfxObjSubMesh sm)
{
uint vao = _gl.GenVertexArray();
_gl.BindVertexArray(vao);
// ── Vertex buffer (positions, normals, UVs) ───────────────────────────
uint vbo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, vbo);
fixed (void* p = sm.Vertices)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(sm.Vertices.Length * sizeof(Vertex)), p, BufferUsageARB.StaticDraw);
uint stride = (uint)sizeof(Vertex);
_gl.EnableVertexAttribArray(0);
_gl.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, stride, (void*)0);
_gl.EnableVertexAttribArray(1);
_gl.VertexAttribPointer(1, 3, VertexAttribPointerType.Float, false, stride, (void*)(3 * sizeof(float)));
_gl.EnableVertexAttribArray(2);
_gl.VertexAttribPointer(2, 2, VertexAttribPointerType.Float, false, stride, (void*)(6 * sizeof(float)));
// Note: location 3 (uint TerrainLayer) is NOT used by mesh_instanced.vert;
// that slot is reserved for per-instance mat4 row 0 from the instance VBO.
// ── Index buffer ──────────────────────────────────────────────────────
uint ebo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, ebo);
fixed (void* p = sm.Indices)
_gl.BufferData(BufferTargetARB.ElementArrayBuffer,
(nuint)(sm.Indices.Length * sizeof(uint)), p, BufferUsageARB.StaticDraw);
// ── Per-instance model matrix (locations 3-6) ─────────────────────────
// Bind the shared instance VBO. The VAO captures this binding at each
// attribute location. At draw time we re-call VertexAttribPointer with
// the per-group byte offset (to address different groups in the VBO
// without DrawElementsInstancedBaseInstance).
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
// mat4 = 4 × vec4, stride = 64 bytes, divisor = 1 (advance once per instance)
for (uint row = 0; row < 4; row++)
{
uint loc = 3 + row;
_gl.EnableVertexAttribArray(loc);
_gl.VertexAttribPointer(loc, 4, VertexAttribPointerType.Float, false, 64, (void*)(row * 16));
_gl.VertexAttribDivisor(loc, 1);
}
_gl.BindVertexArray(0);
return new SubMeshGpu
{
Vao = vao,
Vbo = vbo,
Ebo = ebo,
IndexCount = sm.Indices.Length,
SurfaceId = sm.SurfaceId,
Translucency = sm.Translucency,
};
}
// ── Draw ──────────────────────────────────────────────────────────────────
public void Draw(ICamera camera,
IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
FrustumPlanes? frustum = null,
uint? neverCullLandblockId = null,
HashSet<uint>? visibleCellIds = null,
// L-fix1 (2026-04-28): set of entity ids that should bypass the
// landblock-level frustum cull. Animated entities (other
// players, NPCs, monsters) are always rendered if their
// landblock is loaded — without this they vanish whenever the
// camera rotates away from their landblock, even though
// they're within visible distance of the player. Pass null /
// empty to keep the previous "cull everything by landblock"
// behavior.
HashSet<uint>? animatedEntityIds = null)
{
_shader.Use();
var vp = camera.View * camera.Projection;
_shader.SetMatrix4("uViewProjection", vp);
// Phase G: lighting + ambient + fog are owned by the
// SceneLighting UBO (binding=1) uploaded once per frame by
// GameWindow. The instanced mesh fragment shader reads it
// directly — no per-draw uniform uploads needed.
// ── Collect and group instances ───────────────────────────────────────
CollectGroups(landblockEntries, frustum, neverCullLandblockId, visibleCellIds, animatedEntityIds);
// ── Build and upload the instance buffer ──────────────────────────────
// Count total instances.
int totalInstances = 0;
foreach (var grp in _groups.Values)
totalInstances += grp.Count;
// Grow the scratch buffer if needed.
int needed = totalInstances * 16;
if (_instanceBuffer.Length < needed)
_instanceBuffer = new float[needed + 256 * 16]; // extra headroom
// Write all groups contiguously. Record each group's starting offset
// (in units of instances, not bytes) so we can address them at draw time.
int instanceOffset = 0;
foreach (var grp in _groups.Values)
{
grp.BufferOffset = instanceOffset;
foreach (ref readonly var inst in CollectionsMarshal.AsSpan(grp.Entries))
WriteMatrix(_instanceBuffer, instanceOffset++ * 16, inst.Model);
}
// Upload all instance data in a single DynamicDraw call.
if (totalInstances > 0)
{
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
fixed (void* p = _instanceBuffer)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(totalInstances * 16 * sizeof(float)), p, BufferUsageARB.DynamicDraw);
}
// ── Pass 1: Opaque + ClipMap ──────────────────────────────────────────
// Diagnostic: ACDREAM_NO_CULL=1 disables backface culling entirely.
if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal))
{
_gl.Disable(EnableCap.CullFace);
}
foreach (var (key, grp) in _groups)
{
if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes))
continue;
bool hasOpaqueSubMesh = false;
foreach (var sub in subMeshes)
{
if (sub.Translucency == TranslucencyKind.Opaque ||
sub.Translucency == TranslucencyKind.ClipMap)
{
hasOpaqueSubMesh = true;
break;
}
}
if (!hasOpaqueSubMesh) continue;
// For this group, instance data starts at grp.BufferOffset in the VBO.
// We need to tell the VAO to read from that offset.
uint byteOffset = (uint)(grp.BufferOffset * 64); // 64 bytes per mat4
foreach (var sub in subMeshes)
{
if (sub.Translucency != TranslucencyKind.Opaque &&
sub.Translucency != TranslucencyKind.ClipMap)
continue;
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
// Bind VAO + re-point instance attributes to the group's slice
// in the shared VBO. This updates the VAO's stored offset for
// locations 3-6 without touching the vertex or index bindings.
_gl.BindVertexArray(sub.Vao);
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
for (uint row = 0; row < 4; row++)
{
_gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float,
false, 64, (void*)(byteOffset + row * 16));
}
// Resolve texture from the first instance (all instances in this
// group share the same GfxObj so they have compatible overrides
// only in the degenerate case of mixed-palette entities using the
// same GfxObj — rare enough to accept the approximation here).
if (grp.Count == 0) continue;
var firstEntry = grp.Entries[0];
uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.DrawElementsInstanced(PrimitiveType.Triangles,
(uint)sub.IndexCount,
DrawElementsType.UnsignedInt,
(void*)0,
(uint)grp.Count);
}
}
// ── Pass 2: Translucent (AlphaBlend, Additive, InvAlpha) ─────────────
_gl.Enable(EnableCap.Blend);
_gl.DepthMask(false);
// Diagnostic: ACDREAM_NO_CULL=1 disables backface culling (used 2026-05-01
// to test if our mesh winding (0,i,i+1) vs ACME's (i+1,i,0) is causing
// visible polygons to be culled, especially around the neck/coat seam).
if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal))
{
_gl.Disable(EnableCap.CullFace);
}
else
{
_gl.Enable(EnableCap.CullFace);
_gl.CullFace(TriangleFace.Back);
_gl.FrontFace(FrontFaceDirection.Ccw);
}
foreach (var (key, grp) in _groups)
{
if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes))
continue;
bool hasTranslucentSubMesh = false;
foreach (var sub in subMeshes)
{
if (sub.Translucency != TranslucencyKind.Opaque &&
sub.Translucency != TranslucencyKind.ClipMap)
{
hasTranslucentSubMesh = true;
break;
}
}
if (!hasTranslucentSubMesh) continue;
uint byteOffset = (uint)(grp.BufferOffset * 64);
foreach (var sub in subMeshes)
{
if (sub.Translucency == TranslucencyKind.Opaque ||
sub.Translucency == TranslucencyKind.ClipMap)
continue;
switch (sub.Translucency)
{
case TranslucencyKind.Additive:
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One);
break;
case TranslucencyKind.InvAlpha:
_gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha);
break;
default: // AlphaBlend
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
break;
}
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
_gl.BindVertexArray(sub.Vao);
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
for (uint row = 0; row < 4; row++)
{
_gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float,
false, 64, (void*)(byteOffset + row * 16));
}
if (grp.Count == 0) continue;
var firstEntry = grp.Entries[0];
uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.DrawElementsInstanced(PrimitiveType.Triangles,
(uint)sub.IndexCount,
DrawElementsType.UnsignedInt,
(void*)0,
(uint)grp.Count);
}
}
// Restore default GL state.
_gl.DepthMask(true);
_gl.Disable(EnableCap.Blend);
_gl.Disable(EnableCap.CullFace);
_gl.BindVertexArray(0);
}
// ── Grouping ──────────────────────────────────────────────────────────────
/// <summary>
/// Iterates all visible landblock entries and groups every (entity, meshRef)
/// pair by GfxObjId. Clears previous frame's groups before filling.
/// </summary>
private void CollectGroups(
IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
FrustumPlanes? frustum,
uint? neverCullLandblockId,
HashSet<uint>? visibleCellIds,
HashSet<uint>? animatedEntityIds)
{
foreach (var grp in _groups.Values)
grp.Entries.Clear();
foreach (var entry in landblockEntries)
{
// L-fix1 (2026-04-28): the landblock cull decision is now
// PER-LANDBLOCK boolean, not a continue. We still need to
// walk the entity list because animated entities (in
// animatedEntityIds) bypass the cull and render anyway.
bool landblockVisible = frustum is null
|| entry.LandblockId == neverCullLandblockId
|| FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax);
// Fast path: no animated entities globally → if landblock is
// culled, skip the whole entity list (preserves the original
// O(visible-landblocks) cost when the caller doesn't care
// about animated bypass).
if (!landblockVisible && (animatedEntityIds is null || animatedEntityIds.Count == 0))
continue;
foreach (var entity in entry.Entities)
{
if (entity.MeshRefs.Count == 0)
continue;
// L-fix1: when the landblock is frustum-culled, only
// render entities flagged as animated. This keeps
// remote players / NPCs / monsters visible even when
// their landblock rotates out of the view frustum.
bool isAnimated = animatedEntityIds?.Contains(entity.Id) == true;
if (!landblockVisible && !isAnimated)
continue;
// Step 4: portal visibility filter. If we have a visible cell set,
// skip interior entities whose parent cell isn't visible.
// visibleCellIds == null means camera is outdoors → show all interiors.
if (entity.ParentCellId.HasValue && visibleCellIds is not null
&& !visibleCellIds.Contains(entity.ParentCellId.Value))
continue;
var entityRoot =
Matrix4x4.CreateFromQuaternion(entity.Rotation) *
Matrix4x4.CreateTranslation(entity.Position);
// Hash the entity's PaletteOverride once — shared by every
// MeshRef on this entity, so we compute it outside the loop.
ulong palHash = HashPaletteOverride(entity.PaletteOverride);
foreach (var meshRef in entity.MeshRefs)
{
if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var cachedMeshes))
continue;
var model = meshRef.PartTransform * entityRoot;
// Texture signature = palette hash ^ surface-overrides hash.
// Two instances can share a batch only when their ResolveTex
// would return identical handles for every sub-mesh — that
// means identical palette AND identical surface overrides.
ulong surfHash = HashSurfaceOverrides(meshRef.SurfaceOverrides);
ulong texSig = palHash ^ surfHash;
var key = new GroupKey(meshRef.GfxObjId, texSig);
if (!_groups.TryGetValue(key, out var group))
{
group = new InstanceGroup();
_groups[key] = group;
}
group.Entries.Add(new InstanceEntry(model, entity, meshRef));
}
}
}
}
private static ulong HashPaletteOverride(AcDream.Core.World.PaletteOverride? p)
{
if (p is null) return 0UL;
ulong h = 0xCBF29CE484222325UL;
const ulong prime = 0x100000001B3UL;
h = (h ^ p.BasePaletteId) * prime;
foreach (var sp in p.SubPalettes)
{
h = (h ^ sp.SubPaletteId) * prime;
h = (h ^ sp.Offset) * prime;
h = (h ^ sp.Length) * prime;
}
return h;
}
/// <summary>
/// Order-independent hash of a SurfaceOverrides dictionary. XOR of each
/// (key, value) pair keeps the result stable regardless of Dictionary
/// iteration order, so two instances whose override maps contain the
/// same pairs will hash identically.
/// </summary>
private static ulong HashSurfaceOverrides(IReadOnlyDictionary<uint, uint>? overrides)
{
if (overrides is null || overrides.Count == 0) return 0UL;
ulong acc = 0UL;
foreach (var kvp in overrides)
{
ulong pair = ((ulong)kvp.Key << 32) | kvp.Value;
acc ^= pair;
}
// Fold with a prime so the zero case doesn't collide with "empty".
return (acc ^ 0xCBF29CE484222325UL) * 0x100000001B3UL;
}
// ── Matrix write ──────────────────────────────────────────────────────────
/// <summary>
/// Writes a System.Numerics Matrix4x4 into <paramref name="buf"/> starting
/// at <paramref name="offset"/> as 16 consecutive floats in row-major order
/// (the C# natural memory layout). The GLSL shader reads each 4-float row
/// as a column of the mat4 — identical to what UniformMatrix4(transpose=false)
/// produces for the uniform path.
/// </summary>
private static void WriteMatrix(float[] buf, int offset, in Matrix4x4 m)
{
buf[offset + 0] = m.M11; buf[offset + 1] = m.M12; buf[offset + 2] = m.M13; buf[offset + 3] = m.M14;
buf[offset + 4] = m.M21; buf[offset + 5] = m.M22; buf[offset + 6] = m.M23; buf[offset + 7] = m.M24;
buf[offset + 8] = m.M31; buf[offset + 9] = m.M32; buf[offset + 10] = m.M33; buf[offset + 11] = m.M34;
buf[offset + 12] = m.M41; buf[offset + 13] = m.M42; buf[offset + 14] = m.M43; buf[offset + 15] = m.M44;
}
// ── Texture resolution ────────────────────────────────────────────────────
private uint ResolveTex(WorldEntity entity, MeshRef meshRef, SubMeshGpu sub)
{
uint overrideOrigTex = 0;
bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null
&& meshRef.SurfaceOverrides.TryGetValue(sub.SurfaceId, out overrideOrigTex);
uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null;
if (entity.PaletteOverride is not null)
{
return _textures.GetOrUploadWithPaletteOverride(
sub.SurfaceId, origTexOverride, entity.PaletteOverride);
}
else if (hasOrigTexOverride)
{
return _textures.GetOrUploadWithOrigTextureOverride(sub.SurfaceId, overrideOrigTex);
}
else
{
return _textures.GetOrUpload(sub.SurfaceId);
}
}
// ── Disposal ──────────────────────────────────────────────────────────────
public void Dispose()
{
foreach (var subs in _gpuByGfxObj.Values)
{
foreach (var sub in subs)
{
_gl.DeleteBuffer(sub.Vbo);
_gl.DeleteBuffer(sub.Ebo);
_gl.DeleteVertexArray(sub.Vao);
}
}
_gl.DeleteBuffer(_instanceVbo);
_gpuByGfxObj.Clear();
_groups.Clear();
}
// ── Private types ─────────────────────────────────────────────────────────
private sealed class SubMeshGpu
{
public uint Vao;
public uint Vbo;
public uint Ebo;
public int IndexCount;
public uint SurfaceId;
public TranslucencyKind Translucency;
}
/// <summary>
/// All instances of one GfxObj for this frame, plus their starting offset
/// in the shared instance VBO (in units of instances, not bytes).
/// </summary>
private sealed class InstanceGroup
{
public readonly List<InstanceEntry> Entries = new();
public int BufferOffset;
public int Count => Entries.Count;
}
private readonly struct InstanceEntry
{
public readonly Matrix4x4 Model;
public readonly WorldEntity Entity;
public readonly MeshRef MeshRef;
public InstanceEntry(Matrix4x4 model, WorldEntity entity, MeshRef meshRef)
{
Model = model;
Entity = entity;
MeshRef = meshRef;
}
}
}

View file

@ -1,35 +0,0 @@
#version 430 core
// Per-vertex attributes
layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;
// Per-instance model matrix, split across four vec4 attribute slots.
// A mat4 consumes 4 consecutive attribute locations, so locations 3-6 are
// all occupied by this single logical matrix. The C# side must call
// VertexAttribPointer four times (one per row) and VertexAttribDivisor(loc, 1)
// on each of the four slots.
layout(location = 3) in vec4 aInstanceRow0;
layout(location = 4) in vec4 aInstanceRow1;
layout(location = 5) in vec4 aInstanceRow2;
layout(location = 6) in vec4 aInstanceRow3;
uniform mat4 uViewProjection;
out vec2 vTex;
out vec3 vWorldNormal;
out vec3 vWorldPos;
void main() {
// Reconstruct the per-instance model matrix from its four row vectors.
mat4 model = mat4(aInstanceRow0, aInstanceRow1, aInstanceRow2, aInstanceRow3);
vec4 worldPos = model * vec4(aPosition, 1.0);
gl_Position = uViewProjection * worldPos;
vWorldPos = worldPos.xyz;
// Transform normal into world space.
vWorldNormal = normalize(mat3(model) * aNormal);
vTex = aTexCoord;
}

View file

@ -1,24 +1,22 @@
#version 430 core
#extension GL_ARB_bindless_texture : require
in vec2 vTex;
in vec3 vWorldNormal;
in vec3 vNormal;
in vec2 vTexCoord;
in vec3 vWorldPos;
in flat uvec2 vTextureHandle;
in flat uint vTextureLayer;
out vec4 fragColor;
// uRenderPass values (Phase N.5 Decision 2 — two-pass alpha-test):
// 0 = opaque pass — discard fragments with alpha < 0.95
// (lets the depth write succeed for solid pixels)
// 1 = translucent pass — covers AlphaBlend / Additive / InvAlpha;
// discard alpha >= 0.95 (already drawn opaque) and
// alpha < 0.05 (skip empty fragments — large
// transparent overdraw cost otherwise)
uniform int uRenderPass;
// One 2D texture per draw call — same binding point as mesh.frag so the
// C# side can use the same TextureCache without a texture-array pipeline.
uniform sampler2D uDiffuse;
// Translucency kind — matches TranslucencyKind C# enum (same as mesh.frag):
// 0 = Opaque — depth write+test, no blend; shader never discards
// 1 = ClipMap — alpha-key discard at 0.5 (doors, windows, vegetation)
// 2 = AlphaBlend — GL blending handles compositing; do NOT discard
// 3 = Additive — GL additive blending; do NOT discard
// 4 = InvAlpha — GL inverted-alpha blending; do NOT discard
uniform int uTranslucencyKind;
// Phase G.1+G.2: shared scene-lighting UBO (see mesh.frag for layout docs).
// SceneLighting UBO — IDENTICAL layout to mesh_instanced.frag binding=1.
struct Light {
vec4 posAndKind;
vec4 dirAndRange;
@ -38,10 +36,8 @@ vec3 accumulateLights(vec3 N, vec3 worldPos) {
int activeLights = int(uCellAmbient.w);
for (int i = 0; i < 8; ++i) {
if (i >= activeLights) break;
int kind = int(uLights[i].posAndKind.w);
vec3 Lcol = uLights[i].colorAndIntensity.xyz * uLights[i].colorAndIntensity.w;
if (kind == 0) {
vec3 Ldir = -uLights[i].dirAndRange.xyz;
float ndl = max(0.0, dot(N, Ldir));
@ -77,16 +73,24 @@ vec3 applyFog(vec3 lit, vec3 worldPos) {
return mix(lit, uFogColor.xyz, fog);
}
out vec4 FragColor;
void main() {
vec4 color = texture(uDiffuse, vTex);
sampler2DArray tex = sampler2DArray(vTextureHandle);
vec4 color = texture(tex, vec3(vTexCoord, float(vTextureLayer)));
// Alpha cutout only for clip-map surfaces (doors, windows, vegetation).
if (uTranslucencyKind == 1 && color.a < 0.5) discard;
// Two-pass alpha-test (N.5 Decision 2).
if (uRenderPass == 0) {
if (color.a < 0.95) discard; // opaque pass
} else {
if (color.a >= 0.95) discard; // transparent pass
if (color.a < 0.05) discard; // skip totally-empty
}
vec3 N = normalize(vWorldNormal);
vec3 N = normalize(vNormal);
vec3 lit = accumulateLights(N, vWorldPos);
// Lightning flash — additive scene bump.
// Lightning flash — additive scene bump (matches mesh_instanced.frag).
lit += uFogParams.z * vec3(0.6, 0.6, 0.75);
// Retail clamp per-channel to 1.0 (r13 §13.1).
@ -94,5 +98,5 @@ void main() {
vec3 rgb = color.rgb * lit;
rgb = applyFog(rgb, vWorldPos);
fragColor = vec4(rgb, color.a);
FragColor = vec4(rgb, color.a);
}

View file

@ -0,0 +1,62 @@
#version 430 core
#extension GL_ARB_shader_draw_parameters : require
layout(location = 0) in vec3 aPosition;
layout(location = 1) in vec3 aNormal;
layout(location = 2) in vec2 aTexCoord;
struct InstanceData {
mat4 transform;
// Reserved for Phase B.4 follow-up (selection-blink retail-faithful
// highlight): vec4 highlightColor; — extend stride here, increase the
// _instanceSsbo upload size in WbDrawDispatcher, add a flat varying out,
// and consume in mesh_modern.frag.
};
struct BatchData {
uvec2 textureHandle; // bindless handle for sampler2DArray
uint textureLayer; // layer index (always 0 for per-instance composites)
uint flags; // reserved — N.5 dispatcher owns all blend state
// (glBlendFunc per pass). If a future phase wants
// shader-side per-batch additive flag (Decision 2
// fallback), encode it here as bit 0.
};
layout(std430, binding = 0) readonly buffer InstanceBuffer {
InstanceData Instances[];
};
// binding=1 here is the SSBO namespace — distinct from the UBO namespace.
// SceneLighting UBO also uses binding=1 in the fragment shader; GL keeps
// GL_SHADER_STORAGE_BUFFER and GL_UNIFORM_BUFFER binding tables separate.
// Task 10 dispatcher binds:
// glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, instanceSsbo)
// glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, batchSsbo)
// Existing SceneLightingUboBinding handles the UBO side.
layout(std430, binding = 1) readonly buffer BatchBuffer {
BatchData Batches[];
};
uniform mat4 uViewProjection;
out vec3 vNormal;
out vec2 vTexCoord;
out vec3 vWorldPos;
out flat uvec2 vTextureHandle;
out flat uint vTextureLayer;
void main() {
int instanceIndex = gl_BaseInstanceARB + gl_InstanceID;
mat4 model = Instances[instanceIndex].transform;
vec4 worldPos = model * vec4(aPosition, 1.0);
gl_Position = uViewProjection * worldPos;
vWorldPos = worldPos.xyz;
vNormal = normalize(mat3(model) * aNormal);
vTexCoord = aTexCoord;
BatchData b = Batches[gl_DrawIDARB];
vTextureHandle = b.textureHandle;
vTextureLayer = b.textureLayer;
}

View file

@ -1,293 +0,0 @@
// src/AcDream.App/Rendering/StaticMeshRenderer.cs
using System.Numerics;
using AcDream.Core.Meshing;
using AcDream.Core.Terrain;
using AcDream.Core.World;
using Silk.NET.OpenGL;
namespace AcDream.App.Rendering;
public sealed unsafe class StaticMeshRenderer : IDisposable
{
private readonly GL _gl;
private readonly Shader _shader;
private readonly TextureCache _textures;
// One GPU bundle per unique GfxObj id. Each GfxObj can have multiple sub-meshes.
private readonly Dictionary<uint, List<SubMeshGpu>> _gpuByGfxObj = new();
public StaticMeshRenderer(GL gl, Shader shader, TextureCache textures)
{
_gl = gl;
_shader = shader;
_textures = textures;
}
public void EnsureUploaded(uint gfxObjId, IReadOnlyList<GfxObjSubMesh> subMeshes)
{
if (_gpuByGfxObj.ContainsKey(gfxObjId))
return;
var list = new List<SubMeshGpu>(subMeshes.Count);
foreach (var sm in subMeshes)
list.Add(UploadSubMesh(sm));
_gpuByGfxObj[gfxObjId] = list;
}
private SubMeshGpu UploadSubMesh(GfxObjSubMesh sm)
{
uint vao = _gl.GenVertexArray();
_gl.BindVertexArray(vao);
uint vbo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, vbo);
fixed (void* p = sm.Vertices)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(sm.Vertices.Length * sizeof(Vertex)), p, BufferUsageARB.StaticDraw);
uint ebo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, ebo);
fixed (void* p = sm.Indices)
_gl.BufferData(BufferTargetARB.ElementArrayBuffer,
(nuint)(sm.Indices.Length * sizeof(uint)), p, BufferUsageARB.StaticDraw);
uint stride = (uint)sizeof(Vertex);
_gl.EnableVertexAttribArray(0);
_gl.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, stride, (void*)0);
_gl.EnableVertexAttribArray(1);
_gl.VertexAttribPointer(1, 3, VertexAttribPointerType.Float, false, stride, (void*)(3 * sizeof(float)));
_gl.EnableVertexAttribArray(2);
_gl.VertexAttribPointer(2, 2, VertexAttribPointerType.Float, false, stride, (void*)(6 * sizeof(float)));
_gl.EnableVertexAttribArray(3);
_gl.VertexAttribIPointer(3, 1, VertexAttribIType.UnsignedInt, stride, (void*)(8 * sizeof(float)));
_gl.BindVertexArray(0);
return new SubMeshGpu
{
Vao = vao,
Vbo = vbo,
Ebo = ebo,
IndexCount = sm.Indices.Length,
SurfaceId = sm.SurfaceId,
// Capture translucency at upload time so the draw loop never
// has to look it up from external state.
Translucency = sm.Translucency,
};
}
public void Draw(ICamera camera,
IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
FrustumPlanes? frustum = null,
uint? neverCullLandblockId = null)
{
_shader.Use();
_shader.SetMatrix4("uView", camera.View);
_shader.SetMatrix4("uProjection", camera.Projection);
// ── Pass 1: Opaque + ClipMap ──────────────────────────────────────────
// Depth write on (default). No blending. ClipMap surfaces use the
// alpha-discard path in the fragment shader (uTranslucencyKind == 1).
foreach (var entry in landblockEntries)
{
// Per-landblock frustum cull. Never cull the player's landblock.
if (frustum is not null &&
entry.LandblockId != neverCullLandblockId &&
!FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax))
continue;
foreach (var entity in entry.Entities)
{
if (entity.MeshRefs.Count == 0)
continue;
foreach (var meshRef in entity.MeshRefs)
{
if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var subMeshes))
continue;
var entityRoot =
Matrix4x4.CreateFromQuaternion(entity.Rotation) *
Matrix4x4.CreateTranslation(entity.Position);
var model = meshRef.PartTransform * entityRoot;
_shader.SetMatrix4("uModel", model);
foreach (var sub in subMeshes)
{
// Skip translucent sub-meshes in the first pass.
if (sub.Translucency != TranslucencyKind.Opaque &&
sub.Translucency != TranslucencyKind.ClipMap)
continue;
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
uint tex = ResolveTex(entity, meshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.BindVertexArray(sub.Vao);
_gl.DrawElements(PrimitiveType.Triangles, (uint)sub.IndexCount, DrawElementsType.UnsignedInt, (void*)0);
}
}
}
}
// ── Pass 2: Translucent (AlphaBlend, Additive, InvAlpha) ─────────────
// Depth test on so translucents composite correctly behind opaque geometry.
// Depth write OFF so translucents don't occlude each other or downstream
// opaque draws. Blend function is set per-draw based on TranslucencyKind.
//
// NOTE: translucent draws are NOT sorted by depth — overlapping translucent
// surfaces can composite in the wrong order. Portal-sized billboards don't
// overlap in practice so this is acceptable and avoids a larger refactor.
_gl.Enable(EnableCap.Blend);
_gl.DepthMask(false);
// Phase 9.2: enable back-face culling for the translucent pass so
// closed-shell translucents (lifestone crystal, glow gems, any
// convex blended mesh) don't draw their back faces over their
// front faces in arbitrary iteration order. Without this, the
// 58 triangles of the lifestone crystal composited with an
// "inside-out" look where the user saw through one face into
// the hollow interior. With back-face culling on, back faces are
// dropped at rasterization time, front faces composite as-is,
// and depth ordering within the front-facing subset is a
// non-issue for closed convex-ish shells. Matches WorldBuilder's
// per-batch CullMode handling in
// references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/
// BaseObjectRenderManager.cs:361-365.
//
// Our fan triangulation emits pos-side polygons as
// (0, i, i+1) which is CCW in standard OpenGL conventions, so
// GL_BACK + CCW front is the correct state. Neg-side polygons
// (if any) use reversed winding and get culled here — that's a
// known limitation and matches the opaque-pass behavior since
// neg-side polys are virtually never translucent in AC content.
_gl.Enable(EnableCap.CullFace);
_gl.CullFace(TriangleFace.Back);
_gl.FrontFace(FrontFaceDirection.Ccw);
foreach (var entry in landblockEntries)
{
// Same per-landblock frustum cull for pass 2.
if (frustum is not null &&
entry.LandblockId != neverCullLandblockId &&
!FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax))
continue;
foreach (var entity in entry.Entities)
{
if (entity.MeshRefs.Count == 0)
continue;
foreach (var meshRef in entity.MeshRefs)
{
if (!_gpuByGfxObj.TryGetValue(meshRef.GfxObjId, out var subMeshes))
continue;
var entityRoot =
Matrix4x4.CreateFromQuaternion(entity.Rotation) *
Matrix4x4.CreateTranslation(entity.Position);
var model = meshRef.PartTransform * entityRoot;
_shader.SetMatrix4("uModel", model);
foreach (var sub in subMeshes)
{
if (sub.Translucency == TranslucencyKind.Opaque ||
sub.Translucency == TranslucencyKind.ClipMap)
continue;
// Set per-draw blend function.
switch (sub.Translucency)
{
case TranslucencyKind.Additive:
// src*a + dst — portal swirls, glows
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One);
break;
case TranslucencyKind.InvAlpha:
// src*(1-a) + dst*a
_gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha);
break;
default: // AlphaBlend
// src*a + dst*(1-a)
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
break;
}
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
uint tex = ResolveTex(entity, meshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.BindVertexArray(sub.Vao);
_gl.DrawElements(PrimitiveType.Triangles, (uint)sub.IndexCount, DrawElementsType.UnsignedInt, (void*)0);
}
}
}
}
// Restore default GL state for subsequent renderers (terrain etc.).
_gl.DepthMask(true);
_gl.Disable(EnableCap.Blend);
_gl.Disable(EnableCap.CullFace);
_gl.BindVertexArray(0);
}
/// <summary>
/// Resolves the GL texture id for a sub-mesh, honouring palette and
/// texture overrides carried on the entity and the mesh-ref.
/// </summary>
private uint ResolveTex(WorldEntity entity, MeshRef meshRef, SubMeshGpu sub)
{
uint overrideOrigTex = 0;
bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null
&& meshRef.SurfaceOverrides.TryGetValue(sub.SurfaceId, out overrideOrigTex);
uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null;
if (entity.PaletteOverride is not null)
{
return _textures.GetOrUploadWithPaletteOverride(
sub.SurfaceId, origTexOverride, entity.PaletteOverride);
}
else if (hasOrigTexOverride)
{
return _textures.GetOrUploadWithOrigTextureOverride(sub.SurfaceId, overrideOrigTex);
}
else
{
return _textures.GetOrUpload(sub.SurfaceId);
}
}
public void Dispose()
{
foreach (var subs in _gpuByGfxObj.Values)
{
foreach (var sub in subs)
{
_gl.DeleteBuffer(sub.Vbo);
_gl.DeleteBuffer(sub.Ebo);
_gl.DeleteVertexArray(sub.Vao);
}
}
_gpuByGfxObj.Clear();
}
private sealed class SubMeshGpu
{
public uint Vao;
public uint Vbo;
public uint Ebo;
public int IndexCount;
public uint SurfaceId;
/// <summary>
/// Cached from GfxObjSubMesh.Translucency at upload time.
/// Avoids any per-draw lookup into external state.
/// </summary>
public TranslucencyKind Translucency;
}
}

View file

@ -29,10 +29,22 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab
private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), uint> _handlesByPalette = new();
private uint _magentaHandle;
public TextureCache(GL gl, DatCollection dats)
private readonly Wb.BindlessSupport? _bindless;
// Bindless / Texture2DArray parallel caches. Keys mirror the legacy three
// caches so a surface used by both the legacy (Texture2D, sampler2D) and
// modern (Texture2DArray, sampler2DArray) paths is uploaded twice — once
// per target. Each entry stores both the GL texture name (for Dispose
// cleanup) and the resident bindless handle (returned to callers).
private readonly Dictionary<uint, (uint Name, ulong Handle)> _bindlessBySurfaceId = new();
private readonly Dictionary<(uint surfaceId, uint origTexOverride), (uint Name, ulong Handle)> _bindlessByOverridden = new();
private readonly Dictionary<(uint surfaceId, uint origTexOverride, ulong paletteHash), (uint Name, ulong Handle)> _bindlessByPalette = new();
public TextureCache(GL gl, DatCollection dats, Wb.BindlessSupport? bindless = null)
{
_gl = gl;
_dats = dats;
_bindless = bindless;
}
/// <summary>
@ -149,6 +161,82 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab
return h;
}
/// <summary>
/// 64-bit bindless handle variant of <see cref="GetOrUpload"/> for the WB
/// modern rendering path. Uploads the texture as a 1-layer Texture2DArray
/// (so the shader's <c>sampler2DArray</c> can sample at layer 0) and returns
/// a resident bindless handle. Caches by surfaceId in a separate dictionary
/// from the legacy Texture2D path; the same surface may be uploaded twice
/// if used by both paths (acceptable transition cost — N.6 deletes the legacy
/// path).
/// Throws if BindlessSupport wasn't provided to the constructor.
/// </summary>
public ulong GetOrUploadBindless(uint surfaceId)
{
EnsureBindlessAvailable();
if (_bindlessBySurfaceId.TryGetValue(surfaceId, out var entry))
return entry.Handle;
var decoded = DecodeFromDats(surfaceId, origTextureOverride: null, paletteOverride: null);
uint name = UploadRgba8AsLayer1Array(decoded);
ulong handle = _bindless!.GetResidentHandle(name);
_bindlessBySurfaceId[surfaceId] = (name, handle);
return handle;
}
/// <summary>
/// 64-bit bindless handle variant of <see cref="GetOrUploadWithOrigTextureOverride"/>
/// for the WB modern rendering path. Uploads the texture as a 1-layer
/// Texture2DArray with the override SurfaceTexture id and returns a resident
/// bindless handle. Caches under a separate composite key from the legacy
/// path. Throws if BindlessSupport wasn't provided to the constructor.
/// </summary>
public ulong GetOrUploadWithOrigTextureOverrideBindless(uint surfaceId, uint overrideOrigTextureId)
{
EnsureBindlessAvailable();
var key = (surfaceId, overrideOrigTextureId);
if (_bindlessByOverridden.TryGetValue(key, out var entry))
return entry.Handle;
var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: null);
uint name = UploadRgba8AsLayer1Array(decoded);
ulong handle = _bindless!.GetResidentHandle(name);
_bindlessByOverridden[key] = (name, handle);
return handle;
}
/// <summary>
/// 64-bit bindless handle variant of <see cref="GetOrUploadWithPaletteOverride"/>
/// for the WB modern rendering path. Applies the palette override on top of
/// the texture's default palette before decoding, uploads as a 1-layer
/// Texture2DArray, and returns a resident bindless handle. Takes a
/// precomputed palette hash so the WB dispatcher can compute it once per
/// entity. Throws if BindlessSupport wasn't provided to the constructor.
/// </summary>
public ulong GetOrUploadWithPaletteOverrideBindless(
uint surfaceId,
uint? overrideOrigTextureId,
PaletteOverride paletteOverride,
ulong precomputedPaletteHash)
{
EnsureBindlessAvailable();
uint origTexKey = overrideOrigTextureId ?? 0;
var key = (surfaceId, origTexKey, precomputedPaletteHash);
if (_bindlessByPalette.TryGetValue(key, out var entry))
return entry.Handle;
var decoded = DecodeFromDats(surfaceId, origTextureOverride: overrideOrigTextureId, paletteOverride: paletteOverride);
uint name = UploadRgba8AsLayer1Array(decoded);
ulong handle = _bindless!.GetResidentHandle(name);
_bindlessByPalette[key] = (name, handle);
return handle;
}
private void EnsureBindlessAvailable()
{
if (_bindless is null)
throw new InvalidOperationException(
"TextureCache constructed without BindlessSupport — cannot generate bindless handles. " +
"WbDrawDispatcher requires the bindless-aware ctor overload (pass non-null BindlessSupport).");
}
/// <summary>
/// Cheap 64-bit hash over a palette override's identity so two
/// entities with the same palette setup share a decode. Internal so
@ -279,17 +367,79 @@ public sealed unsafe class TextureCache : Wb.ITextureCachePerInstance, IDisposab
return tex;
}
/// <summary>
/// Variant of <see cref="UploadRgba8"/> that uploads pixel data as a 1-layer
/// Texture2DArray. Required by the WB modern rendering path which samples via
/// sampler2DArray in its bindless shader. Pixel data is identical.
/// </summary>
private uint UploadRgba8AsLayer1Array(DecodedTexture decoded)
{
uint tex = _gl.GenTexture();
_gl.BindTexture(TextureTarget.Texture2DArray, tex);
fixed (byte* p = decoded.Rgba8)
_gl.TexImage3D(
TextureTarget.Texture2DArray,
0,
InternalFormat.Rgba8,
(uint)decoded.Width,
(uint)decoded.Height,
depth: 1,
border: 0,
PixelFormat.Rgba,
PixelType.UnsignedByte,
p);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapS, (int)TextureWrapMode.Repeat);
_gl.TexParameter(TextureTarget.Texture2DArray, TextureParameterName.TextureWrapT, (int)TextureWrapMode.Repeat);
_gl.BindTexture(TextureTarget.Texture2DArray, 0);
return tex;
}
public void Dispose()
{
// Phase 1: make all bindless handles non-resident BEFORE any
// DeleteTexture call. ARB_bindless_texture requires that resident
// handles be released before their backing texture is deleted —
// interleaving per-entry is UB. Single null-guard around the whole
// block (cleaner than per-call null-conditionals).
if (_bindless is not null)
{
foreach (var (_, handle) in _bindlessBySurfaceId.Values)
_bindless.MakeNonResident(handle);
foreach (var (_, handle) in _bindlessByOverridden.Values)
_bindless.MakeNonResident(handle);
foreach (var (_, handle) in _bindlessByPalette.Values)
_bindless.MakeNonResident(handle);
}
// Phase 2: delete the Texture2DArray textures backing those handles.
foreach (var (name, _) in _bindlessBySurfaceId.Values)
_gl.DeleteTexture(name);
_bindlessBySurfaceId.Clear();
foreach (var (name, _) in _bindlessByOverridden.Values)
_gl.DeleteTexture(name);
_bindlessByOverridden.Clear();
foreach (var (name, _) in _bindlessByPalette.Values)
_gl.DeleteTexture(name);
_bindlessByPalette.Clear();
// Phase 3: legacy Texture2D textures.
foreach (var h in _handlesBySurfaceId.Values)
_gl.DeleteTexture(h);
_handlesBySurfaceId.Clear();
foreach (var h in _handlesByOverridden.Values)
_gl.DeleteTexture(h);
_handlesByOverridden.Clear();
foreach (var h in _handlesByPalette.Values)
_gl.DeleteTexture(h);
_handlesByPalette.Clear();
if (_magentaHandle != 0)
{
_gl.DeleteTexture(_magentaHandle);

View file

@ -0,0 +1,55 @@
using Silk.NET.OpenGL;
using Silk.NET.OpenGL.Extensions.ARB;
namespace AcDream.App.Rendering.Wb;
/// <summary>
/// Thin wrapper around <see cref="ArbBindlessTexture"/> + capability detection
/// for the modern rendering path. Constructed once at startup via
/// <see cref="TryCreate"/>, which returns false if the extension isn't present.
/// </summary>
public sealed class BindlessSupport
{
private readonly ArbBindlessTexture _ext;
private BindlessSupport(ArbBindlessTexture extension)
{
_ext = extension;
}
public static bool TryCreate(GL gl, out BindlessSupport? support)
{
if (gl.TryGetExtension<ArbBindlessTexture>(out var ext))
{
support = new BindlessSupport(ext);
return true;
}
support = null;
return false;
}
/// <summary>Get a 64-bit bindless handle for the texture and make it resident.
/// Idempotent: handle is the same for a given texture name.</summary>
public ulong GetResidentHandle(uint textureName)
{
ulong h = _ext.GetTextureHandle(textureName);
if (!_ext.IsTextureHandleResident(h))
_ext.MakeTextureHandleResident(h);
return h;
}
/// <summary>Release residency for a handle. Call before deleting the underlying texture.</summary>
public void MakeNonResident(ulong handle)
{
if (_ext.IsTextureHandleResident(handle))
_ext.MakeTextureHandleNonResident(handle);
}
/// <summary>Detect <c>GL_ARB_shader_draw_parameters</c> in addition to bindless.
/// N.5's vertex shader uses <c>gl_BaseInstanceARB</c> and <c>gl_DrawIDARB</c>
/// from this extension.</summary>
public bool HasShaderDrawParameters(GL gl)
{
return gl.IsExtensionPresent("GL_ARB_shader_draw_parameters");
}
}

View file

@ -0,0 +1,17 @@
using System.Runtime.InteropServices;
namespace AcDream.App.Rendering.Wb;
/// <summary>
/// Layout matches what <c>glMultiDrawElementsIndirect</c> expects.
/// Total size 20 bytes; arrays are typically uploaded with stride = sizeof(this).
/// </summary>
[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DrawElementsIndirectCommand
{
public uint Count; // index count for this draw
public uint InstanceCount; // number of instances
public uint FirstIndex; // offset into IBO, in indices
public int BaseVertex; // vertex offset into VBO
public uint BaseInstance; // first instance ID (offsets per-instance attribs / SSBO read)
}

View file

@ -1,6 +1,7 @@
using System;
using System.Collections.Generic;
using System.Numerics;
using System.Runtime.InteropServices;
using AcDream.Core.Meshing;
using AcDream.Core.Terrain;
using AcDream.Core.World;
@ -12,45 +13,49 @@ namespace AcDream.App.Rendering.Wb;
/// <summary>
/// Draws entities using WB's <see cref="ObjectRenderData"/> (a single global
/// VAO/VBO/IBO under modern rendering) with acdream's <see cref="TextureCache"/>
/// for texture resolution and <see cref="AcSurfaceMetadataTable"/> for
/// for bindless texture resolution and <see cref="AcSurfaceMetadataTable"/> for
/// translucency classification.
///
/// <para>
/// <b>Atlas-tier</b> entities (<c>ServerGuid == 0</c>): mesh data comes from WB's
/// <see cref="ObjectMeshManager"/> via <see cref="WbMeshAdapter.TryGetRenderData"/>.
/// Textures resolve through <see cref="TextureCache.GetOrUpload"/> using the batch's
/// <c>SurfaceId</c>.
/// Textures resolve through the bindless-suffixed
/// <see cref="TextureCache.GetOrUploadBindless"/> variants, returning 64-bit
/// resident handles stored in the per-group SSBO.
/// </para>
///
/// <para>
/// <b>Per-instance-tier</b> entities (<c>ServerGuid != 0</c>): mesh data also from
/// WB, but textures resolve through <see cref="TextureCache"/> with palette and
/// surface overrides applied. <see cref="AnimatedEntityState"/> is currently
/// WB, but textures resolve through
/// <see cref="TextureCache.GetOrUploadWithPaletteOverrideBindless"/> with palette
/// and surface overrides applied. <see cref="AnimatedEntityState"/> is currently
/// unused at draw time — GameWindow's spawn path already bakes AnimPartChanges +
/// GfxObjDegradeResolver (Issue #47 close-detail mesh) into <c>MeshRefs</c>.
/// </para>
///
/// <para>
/// <b>GL strategy:</b> GROUPED instanced drawing. All visible (entity, batch)
/// pairs are bucketed by <see cref="GroupKey"/>; within a group a single
/// <c>glDrawElementsInstancedBaseVertexBaseInstance</c> renders all instances.
/// All matrices for the frame land in one shared instance VBO via a single
/// <c>BufferData</c> upload. This drops draw calls from O(entities×batches)
/// to O(unique GfxObj×batch×texture) — typically two orders of magnitude fewer.
/// <b>GL strategy (N.5 — mandatory):</b> <c>glMultiDrawElementsIndirect</c> with SSBOs
/// and <c>GL_ARB_bindless_texture</c> + <c>GL_ARB_shader_draw_parameters</c>.
/// All visible (entity, batch) pairs are bucketed by <see cref="GroupKey"/>;
/// each group becomes one <c>DrawElementsIndirectCommand</c>. Three GPU buffers
/// are uploaded per frame: instance matrices (SSBO binding 0), per-group batch
/// metadata/texture handles (SSBO binding 1), and the indirect draw commands.
/// Two <c>glMultiDrawElementsIndirect</c> calls cover the opaque and transparent
/// passes respectively — one GL call per pass regardless of group count.
/// </para>
///
/// <para>
/// <b>Shader:</b> reuses <c>mesh_instanced</c> (vert locations 0-2 = Position/
/// Normal/UV from WB's <c>VertexPositionNormalTexture</c>; locations 3-6 = instance
/// matrix from our VBO). WB's 32-byte vertex stride is compatible.
/// <b>Shader:</b> <c>mesh_modern</c> (bindless + <c>gl_DrawIDARB</c> /
/// <c>gl_BaseInstanceARB</c>). Missing bindless/draw-parameters throws
/// <see cref="NotSupportedException"/> at startup — there is no legacy fallback.
/// </para>
///
/// <para>
/// <b>Modern rendering assumption:</b> WB's <c>_useModernRendering</c> path (GL
/// 4.3 + bindless) puts every mesh in a single shared VAO/VBO/IBO and uses
/// <c>FirstIndex</c> + <c>BaseVertex</c> per batch. The dispatcher honors those
/// offsets via <c>DrawElementsInstancedBaseVertex(BaseInstance)</c>. The legacy
/// per-mesh-VAO path also works since FirstIndex/BaseVertex are zero there.
/// offsets inside each <c>DrawElementsIndirectCommand</c> via
/// <c>glMultiDrawElementsIndirect</c>.
/// </para>
/// </summary>
public sealed unsafe class WbDrawDispatcher : IDisposable
@ -61,14 +66,40 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
private readonly WbMeshAdapter _meshAdapter;
private readonly EntitySpawnAdapter _entitySpawnAdapter;
private readonly uint _instanceVbo;
private readonly HashSet<uint> _patchedVaos = new();
private readonly BindlessSupport _bindless;
// SSBO buffer ids
private uint _instanceSsbo;
private uint _batchSsbo;
private uint _indirectBuffer;
// Per-frame scratch arrays — Tasks 9-10 fully wire these.
private float[] _instanceData = new float[256 * 16]; // mat4 floats per instance
private BatchData[] _batchData = new BatchData[256];
private DrawElementsIndirectCommand[] _indirectCommands = new DrawElementsIndirectCommand[256];
private int _opaqueDrawCount;
private int _transparentDrawCount;
private int _transparentByteOffset;
// std430 layout: ulong TextureHandle (uvec2) at offset 0, uint TextureLayer
// at offset 8, uint Flags at offset 12. Total 16 bytes.
// Pack=8 (not 4) because std430's uvec2 requires 8-byte alignment — Pack=4
// works today by accident (TextureHandle is the first field, so offset 0 is
// always 8-byte aligned), but adding a 4-byte field before TextureHandle
// without bumping Pack would silently misalign the GPU struct.
[StructLayout(LayoutKind.Sequential, Pack = 8)]
private struct BatchData
{
public ulong TextureHandle; // bindless handle (uvec2 in GLSL)
public uint TextureLayer;
public uint Flags;
}
// Per-frame scratch — reused across frames to avoid per-frame allocation.
private readonly Dictionary<GroupKey, InstanceGroup> _groups = new();
private readonly List<InstanceGroup> _opaqueDraws = new();
private readonly List<InstanceGroup> _translucentDraws = new();
private float[] _instanceBuffer = new float[256 * 16]; // grow on demand, never shrink
// Per-entity-cull AABB radius. Conservative — covers most entities; large
// outliers (long banners, tall columns) are still landblock-culled.
@ -84,12 +115,23 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
private int _instancesIssued;
private long _lastLogTick;
// CPU + GPU timing for [WB-DIAG] under ACDREAM_WB_DIAG=1.
private readonly System.Diagnostics.Stopwatch _cpuStopwatch = new();
private readonly long[] _cpuSamples = new long[256]; // microseconds
private int _cpuSampleCursor;
private uint _gpuQueryOpaque;
private uint _gpuQueryTransparent;
private readonly long[] _gpuSamples = new long[256]; // microseconds
private int _gpuSampleCursor;
private bool _gpuQueriesInitialized;
public WbDrawDispatcher(
GL gl,
Shader shader,
TextureCache textures,
WbMeshAdapter meshAdapter,
EntitySpawnAdapter entitySpawnAdapter)
EntitySpawnAdapter entitySpawnAdapter,
BindlessSupport bindless)
{
ArgumentNullException.ThrowIfNull(gl);
ArgumentNullException.ThrowIfNull(shader);
@ -103,7 +145,10 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
_meshAdapter = meshAdapter;
_entitySpawnAdapter = entitySpawnAdapter;
_instanceVbo = _gl.GenBuffer();
_bindless = bindless ?? throw new ArgumentNullException(nameof(bindless));
_instanceSsbo = _gl.GenBuffer();
_batchSsbo = _gl.GenBuffer();
_indirectBuffer = _gl.GenBuffer();
}
public static Matrix4x4 ComposePartWorldMatrix(
@ -126,6 +171,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
bool diag = string.Equals(Environment.GetEnvironmentVariable("ACDREAM_WB_DIAG"), "1", StringComparison.Ordinal);
if (diag && !_gpuQueriesInitialized)
{
_gpuQueryOpaque = _gl.GenQuery();
_gpuQueryTransparent = _gl.GenQuery();
_gpuQueriesInitialized = true;
}
// Always run the CPU stopwatch — cheap; only logged under diag.
_cpuStopwatch.Restart();
// Camera world-space position for front-to-back sort (perf #2). The view
// matrix is the inverse of the camera's world transform, so the world
// translation lives in the inverse's translation row.
@ -235,23 +290,24 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
// Nothing visible — skip the GL pass entirely.
if (anyVao == 0)
{
_cpuStopwatch.Stop();
if (diag) MaybeFlushDiag();
return;
}
// ── Phase 2: lay matrices out contiguously, assign per-group offsets,
// split into opaque/translucent + compute sort keys ─────────
// ── Phase 3: assign FirstInstance per group, lay matrices contiguously, sort opaque ──
int totalInstances = 0;
foreach (var grp in _groups.Values) totalInstances += grp.Matrices.Count;
if (totalInstances == 0)
{
_cpuStopwatch.Stop();
if (diag) MaybeFlushDiag();
return;
}
int needed = totalInstances * 16;
if (_instanceBuffer.Length < needed)
_instanceBuffer = new float[needed + 256 * 16]; // headroom
if (_instanceData.Length < needed)
_instanceData = new float[needed + 256 * 16];
_opaqueDraws.Clear();
_translucentDraws.Clear();
@ -268,17 +324,17 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
// position for front-to-back sort (perf #2). Cheap heuristic; works
// well when instances of one group are spatially coherent
// (typical for trees in one landblock area, NPCs at one spawn).
var firstM = grp.Matrices[0];
var grpPos = new Vector3(firstM.M41, firstM.M42, firstM.M43);
var first = grp.Matrices[0];
var grpPos = new Vector3(first.M41, first.M42, first.M43);
grp.SortDistance = Vector3.DistanceSquared(camPos, grpPos);
for (int i = 0; i < grp.Matrices.Count; i++)
{
WriteMatrix(_instanceBuffer, cursor * 16, grp.Matrices[i]);
WriteMatrix(_instanceData, cursor * 16, grp.Matrices[i]);
cursor++;
}
if (grp.Translucency == TranslucencyKind.Opaque || grp.Translucency == TranslucencyKind.ClipMap)
if (IsOpaque(grp.Translucency))
_opaqueDraws.Add(grp);
else
_translucentDraws.Add(grp);
@ -290,90 +346,141 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
// Foundry interior).
_opaqueDraws.Sort(static (a, b) => a.SortDistance.CompareTo(b.SortDistance));
// ── Phase 3: one upload of all matrices ─────────────────────────────
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
fixed (float* p = _instanceBuffer)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(totalInstances * 16 * sizeof(float)), p, BufferUsageARB.DynamicDraw);
// ── Phase 4: build IndirectGroupInput list (opaque sorted, then translucent),
// fill via BuildIndirectArrays ──────────────────────────────────
int totalDraws = _opaqueDraws.Count + _translucentDraws.Count;
if (_batchData.Length < totalDraws)
_batchData = new BatchData[totalDraws + 64];
if (_indirectCommands.Length < totalDraws)
_indirectCommands = new DrawElementsIndirectCommand[totalDraws + 64];
// ── Phase 4: bind VAO once (modern rendering shares one global VAO) ──
EnsureInstanceAttribs(anyVao);
var groupInputs = new List<IndirectGroupInput>(totalDraws);
foreach (var g in _opaqueDraws) groupInputs.Add(ToInput(g));
foreach (var g in _translucentDraws) groupInputs.Add(ToInput(g));
// Cast _batchData (private BatchData) to public-mirror BatchDataPublic for BuildIndirectArrays.
// Layout is asserted at test time (BatchDataPublic_LayoutMatchesPrivateBatchData test).
var batchPublic = new BatchDataPublic[totalDraws];
var layout = BuildIndirectArrays(groupInputs, _indirectCommands, batchPublic);
// Copy back into _batchData
for (int i = 0; i < totalDraws; i++)
{
_batchData[i] = new BatchData
{
TextureHandle = batchPublic[i].TextureHandle,
TextureLayer = batchPublic[i].TextureLayer,
Flags = batchPublic[i].Flags,
};
}
_opaqueDrawCount = layout.OpaqueCount;
_transparentDrawCount = layout.TransparentCount;
_transparentByteOffset = layout.TransparentByteOffset;
// ── Phase 5: upload three buffers ───────────────────────────────────
fixed (float* ip = _instanceData)
UploadSsbo(_instanceSsbo, 0, ip, totalInstances * 16 * sizeof(float));
fixed (BatchData* bp = _batchData)
UploadSsbo(_batchSsbo, 1, bp, totalDraws * sizeof(BatchData));
fixed (DrawElementsIndirectCommand* cp = _indirectCommands)
{
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
_gl.BufferData(BufferTargetARB.DrawIndirectBuffer,
(nuint)(totalDraws * sizeof(DrawElementsIndirectCommand)), cp, BufferUsageARB.DynamicDraw);
}
// ── Phase 6: bind global VAO once ───────────────────────────────────
_gl.BindVertexArray(anyVao);
// ── Phase 5: opaque + ClipMap pass (front-to-back sorted) ───────────
if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal))
_gl.Disable(EnableCap.CullFace);
foreach (var grp in _opaqueDraws)
// ── Phase 7: opaque pass ─────────────────────────────────────────────
if (_opaqueDrawCount > 0)
{
_shader.SetInt("uTranslucencyKind", (int)grp.Translucency);
DrawGroup(grp);
_gl.Disable(EnableCap.Blend);
_gl.DepthMask(true);
_shader.SetInt("uRenderPass", 0);
_gl.BindBuffer(BufferTargetARB.DrawIndirectBuffer, _indirectBuffer);
if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque);
_gl.MultiDrawElementsIndirect(
PrimitiveType.Triangles,
DrawElementsType.UnsignedShort,
(void*)0,
(uint)_opaqueDrawCount,
(uint)DrawCommandStride);
if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed);
}
// ── Phase 6: translucent pass ───────────────────────────────────────
_gl.Enable(EnableCap.Blend);
_gl.DepthMask(false);
if (string.Equals(Environment.GetEnvironmentVariable("ACDREAM_NO_CULL"), "1", StringComparison.Ordinal))
// ── Phase 8: transparent pass ────────────────────────────────────────
if (_transparentDrawCount > 0)
{
_gl.Disable(EnableCap.CullFace);
}
else
{
_gl.Enable(EnableCap.CullFace);
_gl.CullFace(TriangleFace.Back);
_gl.FrontFace(FrontFaceDirection.Ccw);
_gl.Enable(EnableCap.Blend);
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
_gl.DepthMask(false);
_shader.SetInt("uRenderPass", 1);
if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryTransparent);
_gl.MultiDrawElementsIndirect(
PrimitiveType.Triangles,
DrawElementsType.UnsignedShort,
(void*)_transparentByteOffset,
(uint)_transparentDrawCount,
(uint)DrawCommandStride);
if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed);
_gl.DepthMask(true);
_gl.Disable(EnableCap.Blend);
}
foreach (var grp in _translucentDraws)
{
switch (grp.Translucency)
{
case TranslucencyKind.Additive:
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One);
break;
case TranslucencyKind.InvAlpha:
_gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha);
break;
default:
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
break;
}
_shader.SetInt("uTranslucencyKind", (int)grp.Translucency);
DrawGroup(grp);
}
_gl.DepthMask(true);
_gl.Disable(EnableCap.Blend);
_gl.Disable(EnableCap.CullFace);
_gl.BindVertexArray(0);
_cpuStopwatch.Stop();
if (diag)
{
_drawsIssued += _opaqueDraws.Count + _translucentDraws.Count;
long cpuUs = _cpuStopwatch.ElapsedTicks * 1_000_000L / System.Diagnostics.Stopwatch.Frequency;
_cpuSamples[_cpuSampleCursor] = cpuUs;
_cpuSampleCursor = (_cpuSampleCursor + 1) % _cpuSamples.Length;
// Read GPU samples non-blocking; the result for the previous frame's
// queries should be ready by now. If not, drop the sample (don't stall
// the CPU waiting for the GPU).
if (_gpuQueriesInitialized)
{
_gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.ResultAvailable, out int avail);
if (avail != 0)
{
_gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.Result, out ulong opaqueNs);
_gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.Result, out ulong transNs);
long gpuUs = (long)((opaqueNs + transNs) / 1000UL);
_gpuSamples[_gpuSampleCursor] = gpuUs;
_gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length;
}
}
_drawsIssued += _opaqueDrawCount + _transparentDrawCount;
_instancesIssued += totalInstances;
MaybeFlushDiag();
}
}
private void DrawGroup(InstanceGroup grp)
{
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, grp.TextureHandle);
_gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, grp.Ibo);
private static IndirectGroupInput ToInput(InstanceGroup g) => new(
IndexCount: g.IndexCount,
FirstIndex: g.FirstIndex,
BaseVertex: g.BaseVertex,
InstanceCount: g.InstanceCount,
FirstInstance: g.FirstInstance,
TextureHandle: g.BindlessTextureHandle,
TextureLayer: g.TextureLayer,
Translucency: g.Translucency);
// BaseInstance offsets the per-instance attribute fetches into our
// shared instance VBO so each group reads its own slice. Requires
// GL_ARB_base_instance (GL 4.2+); WB requires 4.3 so this is available.
_gl.DrawElementsInstancedBaseVertexBaseInstance(
PrimitiveType.Triangles,
(uint)grp.IndexCount,
DrawElementsType.UnsignedShort,
(void*)(grp.FirstIndex * sizeof(ushort)),
(uint)grp.InstanceCount,
grp.BaseVertex,
(uint)grp.FirstInstance);
private unsafe void UploadSsbo(uint ssbo, uint binding, void* data, int byteCount)
{
_gl.BindBuffer(BufferTargetARB.ShaderStorageBuffer, ssbo);
_gl.BufferData(BufferTargetARB.ShaderStorageBuffer, (nuint)byteCount, data, BufferUsageARB.DynamicDraw);
_gl.BindBufferBase(BufferTargetARB.ShaderStorageBuffer, binding, ssbo);
}
private void MaybeFlushDiag()
@ -381,13 +488,41 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
long now = Environment.TickCount64;
if (now - _lastLogTick > 5000)
{
long cpuMed = MedianMicros(_cpuSamples);
long cpuP95 = Percentile95Micros(_cpuSamples);
long gpuMed = MedianMicros(_gpuSamples);
long gpuP95 = Percentile95Micros(_gpuSamples);
Console.WriteLine(
$"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count}");
$"[WB-DIAG] entSeen={_entitiesSeen} entDrawn={_entitiesDrawn} meshMissing={_meshesMissing} drawsIssued={_drawsIssued} instances={_instancesIssued} groups={_groups.Count} " +
$"cpu_us={cpuMed}m/{cpuP95}p95 gpu_us={gpuMed}m/{gpuP95}p95");
_entitiesSeen = _entitiesDrawn = _meshesMissing = _drawsIssued = _instancesIssued = 0;
_lastLogTick = now;
// Don't reset the sample buffers — they're a moving window of the
// last 256 frames; clearing per 5s flush would lose recent history.
}
}
private static long MedianMicros(long[] samples)
{
var copy = (long[])samples.Clone();
Array.Sort(copy);
int nz = 0;
foreach (var v in copy) if (v > 0) nz++;
if (nz == 0) return 0;
return copy[copy.Length - nz / 2];
}
private static long Percentile95Micros(long[] samples)
{
var copy = (long[])samples.Clone();
Array.Sort(copy);
int nz = 0;
foreach (var v in copy) if (v > 0) nz++;
if (nz == 0) return 0;
int idx = copy.Length - 1 - (int)(nz * 0.05);
return copy[idx];
}
private void ClassifyBatches(
ObjectRenderData renderData,
ulong gfxObjId,
@ -413,12 +548,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
: TranslucencyKind.Opaque;
}
uint texHandle = ResolveTexture(entity, meshRef, batch, palHash);
ulong texHandle = ResolveTexture(entity, meshRef, batch, palHash);
if (texHandle == 0) continue;
// TextureLayer is always 0 for per-instance composites; non-zero when
// WB atlas is adopted in N.6+ and batches reference a shared atlas layer.
uint texLayer = 0;
var key = new GroupKey(
batch.IBO, batch.FirstIndex, (int)batch.BaseVertex,
batch.IndexCount, texHandle, translucency);
batch.IndexCount, texHandle, texLayer, translucency);
if (!_groups.TryGetValue(key, out var grp))
{
@ -428,7 +567,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
FirstIndex = batch.FirstIndex,
BaseVertex = (int)batch.BaseVertex,
IndexCount = batch.IndexCount,
TextureHandle = texHandle,
BindlessTextureHandle = texHandle,
TextureLayer = texLayer,
Translucency = translucency,
};
_groups[key] = grp;
@ -437,10 +577,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
}
}
private uint ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash)
private ulong ResolveTexture(WorldEntity entity, MeshRef meshRef, ObjectRenderBatch batch, ulong palHash)
{
// WB stores the surface id on batch.Key.SurfaceId (TextureKey struct);
// batch.SurfaceId is unset (zero) for batches built by ObjectMeshManager.
uint surfaceId = batch.Key.SurfaceId;
if (surfaceId == 0 || surfaceId == 0xFFFFFFFF) return 0;
@ -451,34 +589,16 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
if (entity.PaletteOverride is not null)
{
// perf #4: pass the entity-precomputed palette hash so TextureCache
// can skip its internal HashPaletteOverride for repeat lookups
// within the same character.
return _textures.GetOrUploadWithPaletteOverride(
return _textures.GetOrUploadWithPaletteOverrideBindless(
surfaceId, origTexOverride, entity.PaletteOverride, palHash);
}
else if (hasOrigTexOverride)
{
return _textures.GetOrUploadWithOrigTextureOverride(surfaceId, overrideOrigTex);
return _textures.GetOrUploadWithOrigTextureOverrideBindless(surfaceId, overrideOrigTex);
}
else
{
return _textures.GetOrUpload(surfaceId);
}
}
private void EnsureInstanceAttribs(uint vao)
{
if (!_patchedVaos.Add(vao)) return;
_gl.BindVertexArray(vao);
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
for (uint row = 0; row < 4; row++)
{
uint loc = 3 + row;
_gl.EnableVertexAttribArray(loc);
_gl.VertexAttribPointer(loc, 4, VertexAttribPointerType.Float, false, 64, (void*)(row * 16));
_gl.VertexAttribDivisor(loc, 1);
return _textures.GetOrUploadBindless(surfaceId);
}
}
@ -494,15 +614,138 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
{
if (_disposed) return;
_disposed = true;
_gl.DeleteBuffer(_instanceVbo);
_gl.DeleteBuffer(_instanceSsbo);
_gl.DeleteBuffer(_batchSsbo);
_gl.DeleteBuffer(_indirectBuffer);
if (_gpuQueriesInitialized)
{
_gl.DeleteQuery(_gpuQueryOpaque);
_gl.DeleteQuery(_gpuQueryTransparent);
}
}
// ── Public types + helpers for BuildIndirectArrays (Task 9) ─────────────
//
// These are public so the pure-CPU unit tests in AcDream.Core.Tests can
// exercise BuildIndirectArrays without needing a GL context.
/// <summary>
/// Stride in bytes of <c>DrawElementsIndirectCommand</c> in the indirect buffer.
/// 5 × <c>uint</c> = 20 bytes. Tests and callers reference this symbolically
/// rather than hard-coding <c>20</c> so a layout change produces a compile error.
/// </summary>
public const int DrawCommandStride = 20; // sizeof(DrawElementsIndirectCommand): 5 × uint
/// <summary>
/// Public view of the per-group inputs to <see cref="BuildIndirectArrays"/> — used in tests.
/// </summary>
public readonly record struct IndirectGroupInput(
int IndexCount,
uint FirstIndex,
int BaseVertex,
int InstanceCount,
int FirstInstance,
ulong TextureHandle,
uint TextureLayer,
TranslucencyKind Translucency);
/// <summary>
/// Public mirror of the per-group <see cref="BatchData"/> uploaded to the SSBO.
/// Tests verify the layout. Same field shape as the private BatchData.
/// </summary>
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct BatchDataPublic
{
public ulong TextureHandle;
public uint TextureLayer;
public uint Flags;
}
/// <summary>Result of <see cref="BuildIndirectArrays"/>.</summary>
public readonly record struct IndirectLayoutResult(
int OpaqueCount,
int TransparentCount,
int TransparentByteOffset);
/// <summary>
/// Lays out the indirect commands + parallel BatchData array contiguously:
/// opaque section first (caller sorts before calling), transparent section second.
/// Pure CPU, no GL state. Caller passes pre-sized scratch arrays.
/// </summary>
/// <remarks>
/// Classification: Opaque + ClipMap → opaque pass (ClipMap uses discard, not
/// blending). Everything else (AlphaBlend, Additive, InvAlpha) → transparent pass.
/// </remarks>
public static IndirectLayoutResult BuildIndirectArrays(
IReadOnlyList<IndirectGroupInput> groups,
DrawElementsIndirectCommand[] indirectScratch,
BatchDataPublic[] batchScratch)
{
int opaqueCount = 0;
int transparentCount = 0;
foreach (var g in groups)
{
if (IsOpaque(g.Translucency)) opaqueCount++;
else transparentCount++;
}
int oi = 0; // opaque write cursor (fills [0..opaqueCount))
int ti = opaqueCount; // transparent write cursor (fills [opaqueCount..end))
foreach (var g in groups)
{
var dec = new DrawElementsIndirectCommand
{
Count = (uint)g.IndexCount,
InstanceCount = (uint)g.InstanceCount,
FirstIndex = g.FirstIndex,
BaseVertex = g.BaseVertex,
BaseInstance = (uint)g.FirstInstance,
};
var bd = new BatchDataPublic
{
TextureHandle = g.TextureHandle,
TextureLayer = g.TextureLayer,
Flags = 0,
};
if (IsOpaque(g.Translucency))
{
indirectScratch[oi] = dec;
batchScratch[oi] = bd;
oi++;
}
else
{
indirectScratch[ti] = dec;
batchScratch[ti] = bd;
ti++;
}
}
return new IndirectLayoutResult(opaqueCount, transparentCount, opaqueCount * DrawCommandStride);
}
/// <summary>
/// Public test shim for <see cref="IsOpaque"/>. Locks in the N.5 Decision 2
/// translucency partition: Opaque + ClipMap → opaque indirect; AlphaBlend +
/// Additive + InvAlpha → transparent indirect.
/// </summary>
public static bool IsOpaquePublic(TranslucencyKind t) => IsOpaque(t);
private static bool IsOpaque(TranslucencyKind t)
=> t == TranslucencyKind.Opaque || t == TranslucencyKind.ClipMap;
// ────────────────────────────────────────────────────────────────────────
private readonly record struct GroupKey(
uint Ibo,
uint FirstIndex,
int BaseVertex,
int IndexCount,
uint TextureHandle,
ulong BindlessTextureHandle,
uint TextureLayer,
TranslucencyKind Translucency);
private sealed class InstanceGroup
@ -511,7 +754,8 @@ public sealed unsafe class WbDrawDispatcher : IDisposable
public uint FirstIndex;
public int BaseVertex;
public int IndexCount;
public uint TextureHandle;
public ulong BindlessTextureHandle; // 64-bit (was uint TextureHandle in N.4)
public uint TextureLayer; // 0 for per-instance composites; non-zero when WB atlas is adopted in N.6+
public TranslucencyKind Translucency;
public int FirstInstance; // offset into the shared instance VBO (in instances, not bytes)
public int InstanceCount;

View file

@ -1,39 +0,0 @@
namespace AcDream.App.Rendering.Wb;
/// <summary>
/// Process-lifetime cache of <c>ACDREAM_USE_WB_FOUNDATION</c> env var.
/// Read once at static-init time; all consumers import this rather than
/// re-reading the env var per call (env-var lookups on Windows are not
/// free at hot-path cadence).
///
/// <para>
/// <b>Default-on as of Phase N.4 ship (2026-05-08).</b> The WB foundation
/// (<c>WbMeshAdapter</c> + <c>WbDrawDispatcher</c>) is the production
/// rendering path. Set <c>ACDREAM_USE_WB_FOUNDATION=0</c> to fall back
/// to the legacy <c>InstancedMeshRenderer</c> path — kept as an escape
/// hatch until N.6 fully replaces it.
/// </para>
///
/// <para>
/// Per-instance customized content (server <c>CreateObject</c> entities
/// with palette / texture overrides) routes through
/// <see cref="TextureCache.GetOrUploadWithPaletteOverride"/> regardless
/// of the flag — the flag controls which DRAW path consumes those
/// textures.
/// </para>
/// </summary>
public static class WbFoundationFlag
{
private static bool _isEnabled =
System.Environment.GetEnvironmentVariable("ACDREAM_USE_WB_FOUNDATION") != "0";
public static bool IsEnabled => _isEnabled;
/// <summary>
/// FOR TESTS ONLY. Forces <see cref="IsEnabled"/> to <c>true</c> so
/// integration tests can exercise the WB adapter path without having to
/// set the env var before static initialisation. Never call from
/// production code.
/// </summary>
internal static void ForTestsOnly_ForceEnable() => _isEnabled = true;
}

View file

@ -144,7 +144,7 @@ public sealed class GpuWorldState
}
_loaded[landblock.LandblockId] = landblock;
if (WbFoundationFlag.IsEnabled && _wbSpawnAdapter is not null)
if (_wbSpawnAdapter is not null)
_wbSpawnAdapter.OnLandblockLoaded(_loaded[landblock.LandblockId]);
RebuildFlatView();
}
@ -195,7 +195,7 @@ public sealed class GpuWorldState
public void RemoveLandblock(uint landblockId)
{
if (WbFoundationFlag.IsEnabled && _wbSpawnAdapter is not null)
if (_wbSpawnAdapter is not null)
_wbSpawnAdapter.OnLandblockUnloaded(landblockId);
// Rescue persistent entities before removal. These get appended

View file

@ -0,0 +1,32 @@
using AcDream.App.Rendering;
using AcDream.App.Rendering.Wb;
using DatReaderWriter;
using Xunit;
namespace AcDream.Core.Tests.Rendering;
/// <summary>
/// Lightweight unit tests for <see cref="TextureCache"/>'s bindless path.
/// We can't construct a real TextureCache in a headless test (it requires a
/// live GL context), so this file documents contracts that future engineers
/// should preserve. Real bindless integration is verified at Task 14's
/// visual gate.
/// </summary>
public sealed class TextureCacheBindlessTests
{
[Fact]
public void Contract_BindlessMethodsThrowWithoutBindlessSupport()
{
// The actual throw lives in TextureCache.EnsureBindlessAvailable
// and is reached only via GL-bound Bindless* method calls. The
// contract is: if the dispatcher (which requires bindless) ever
// gets a TextureCache constructed without BindlessSupport, it
// should fail-fast with InvalidOperationException — NOT silently
// route a draw to handle 0 (which would produce a non-resident
// GPU fault).
//
// This test is a marker. Future engineers: do not weaken
// EnsureBindlessAvailable to swallow the missing dependency.
Assert.True(true, "Contract documented in TextureCache.EnsureBindlessAvailable");
}
}

View file

@ -19,16 +19,9 @@ namespace AcDream.Core.Tests.Rendering.Wb;
/// </summary>
public sealed class PendingSpawnIntegrationTests
{
/// <summary>
/// Force-enable WbFoundationFlag for this test class.
/// GpuWorldState gates its adapter calls on this static-cached flag;
/// calling the internal test hook lets us exercise the full integration
/// path without needing the env var set before process startup.
/// </summary>
static PendingSpawnIntegrationTests()
{
WbFoundationFlag.ForTestsOnly_ForceEnable();
}
// N.5 ship amendment: WbFoundationFlag was deleted — GpuWorldState
// no longer gates adapter calls on the flag; they are unconditional
// when the adapter is non-null. No static ctor hook needed.
[Fact]
public void LiveEntity_ParkedBeforeLandblock_DrainsButIsNotRegisteredWithAdapter()

View file

@ -0,0 +1,113 @@
using System.Numerics;
using AcDream.App.Rendering.Wb;
using AcDream.Core.Meshing;
using Xunit;
namespace AcDream.Core.Tests.Rendering.Wb;
/// <summary>
/// Pure CPU test of <see cref="WbDrawDispatcher.BuildIndirectArrays"/>.
/// Verifies that a synthetic group set lays out into the indirect buffer
/// + parallel batch data with opaque section first, transparent second,
/// per-group fields propagated correctly.
/// </summary>
public sealed class WbDrawDispatcherIndirectBuilderTests
{
[Fact]
public void TwoOpaqueGroupsAndOneTransparent_LaysOutContiguouslyOpaqueFirst()
{
// Arrange — three groups: 2 opaque (12+1 instances) + 1 transparent (12 instances)
var groups = new List<WbDrawDispatcher.IndirectGroupInput>
{
new(IndexCount: 100, FirstIndex: 0, BaseVertex: 0, InstanceCount: 12, FirstInstance: 0, TextureHandle: 0xAA, TextureLayer: 0, Translucency: TranslucencyKind.Opaque),
new(IndexCount: 200, FirstIndex: 100, BaseVertex: 0, InstanceCount: 12, FirstInstance: 12, TextureHandle: 0xBB, TextureLayer: 0, Translucency: TranslucencyKind.AlphaBlend),
new(IndexCount: 50, FirstIndex: 300, BaseVertex: 100, InstanceCount: 1, FirstInstance: 24, TextureHandle: 0xCC, TextureLayer: 0, Translucency: TranslucencyKind.Opaque),
};
var indirect = new DrawElementsIndirectCommand[16];
var batch = new WbDrawDispatcher.BatchDataPublic[16];
// Act
var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch);
// Assert layout
Assert.Equal(2, result.OpaqueCount);
Assert.Equal(1, result.TransparentCount);
Assert.Equal(2 * 20, result.TransparentByteOffset); // sizeof(DEIC) = 20
// Opaque section, in input order (Task 10 callers sort)
Assert.Equal(100u, indirect[0].Count);
Assert.Equal(0u, indirect[0].FirstIndex);
Assert.Equal(0, indirect[0].BaseVertex);
Assert.Equal(12u, indirect[0].InstanceCount);
Assert.Equal(0u, indirect[0].BaseInstance);
Assert.Equal(50u, indirect[1].Count);
Assert.Equal(300u, indirect[1].FirstIndex);
Assert.Equal(100, indirect[1].BaseVertex);
Assert.Equal(1u, indirect[1].InstanceCount);
Assert.Equal(24u, indirect[1].BaseInstance);
// Transparent section
Assert.Equal(200u, indirect[2].Count);
Assert.Equal(100u, indirect[2].FirstIndex);
Assert.Equal(12u, indirect[2].InstanceCount);
Assert.Equal(12u, indirect[2].BaseInstance);
// BatchData parallel — same indices as indirect
Assert.Equal(0xAAul, batch[0].TextureHandle);
Assert.Equal(0xCCul, batch[1].TextureHandle);
Assert.Equal(0xBBul, batch[2].TextureHandle);
}
[Fact]
public void EmptyGroupList_ProducesZeroCounts()
{
var groups = new List<WbDrawDispatcher.IndirectGroupInput>();
var indirect = new DrawElementsIndirectCommand[0];
var batch = new WbDrawDispatcher.BatchDataPublic[0];
var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch);
Assert.Equal(0, result.OpaqueCount);
Assert.Equal(0, result.TransparentCount);
Assert.Equal(0, result.TransparentByteOffset);
}
[Fact]
public void ClipMapTreatedAsOpaque()
{
// ClipMap surfaces (alpha-cutout) belong with the opaque pass
// because the discard handles transparency, not blending.
var groups = new List<WbDrawDispatcher.IndirectGroupInput>
{
new(IndexCount: 10, FirstIndex: 0, BaseVertex: 0, InstanceCount: 1, FirstInstance: 0, TextureHandle: 0x1, TextureLayer: 0, Translucency: TranslucencyKind.ClipMap),
};
var indirect = new DrawElementsIndirectCommand[4];
var batch = new WbDrawDispatcher.BatchDataPublic[4];
var result = WbDrawDispatcher.BuildIndirectArrays(groups, indirect, batch);
Assert.Equal(1, result.OpaqueCount);
Assert.Equal(0, result.TransparentCount);
}
[Fact]
public void BatchDataPublic_LayoutMatchesPrivateBatchData()
{
// Task 10 will use MemoryMarshal.Cast<BatchData, BatchDataPublic> to
// expose the dispatcher's per-frame BatchData[] scratch to BuildIndirectArrays
// without copying. The cast is only safe if the structs have identical
// layout (size, field offsets). Both use [StructLayout(Sequential, Pack=8)].
Assert.Equal(16, System.Runtime.CompilerServices.Unsafe.SizeOf<WbDrawDispatcher.BatchDataPublic>());
Assert.Equal(0, (int)System.Runtime.InteropServices.Marshal.OffsetOf<WbDrawDispatcher.BatchDataPublic>(nameof(WbDrawDispatcher.BatchDataPublic.TextureHandle)));
Assert.Equal(8, (int)System.Runtime.InteropServices.Marshal.OffsetOf<WbDrawDispatcher.BatchDataPublic>(nameof(WbDrawDispatcher.BatchDataPublic.TextureLayer)));
Assert.Equal(12, (int)System.Runtime.InteropServices.Marshal.OffsetOf<WbDrawDispatcher.BatchDataPublic>(nameof(WbDrawDispatcher.BatchDataPublic.Flags)));
}
[Fact]
public void DrawCommandStride_MatchesStructSize()
{
Assert.Equal(WbDrawDispatcher.DrawCommandStride, System.Runtime.CompilerServices.Unsafe.SizeOf<DrawElementsIndirectCommand>());
}
}

View file

@ -0,0 +1,25 @@
using AcDream.App.Rendering.Wb;
using AcDream.Core.Meshing;
using Xunit;
namespace AcDream.Core.Tests.Rendering.Wb;
/// <summary>
/// Locks in the N.5 translucency partition contract (spec Decision 2).
/// If the partition drifts, the dispatcher's opaque + transparent indirect
/// passes will silently render the wrong groups in the wrong pass — visible
/// regression that's hard to spot in code review.
/// </summary>
public sealed class WbDrawDispatcherTranslucencyTests
{
[Theory]
[InlineData(TranslucencyKind.Opaque, true)]
[InlineData(TranslucencyKind.ClipMap, true)]
[InlineData(TranslucencyKind.AlphaBlend, false)]
[InlineData(TranslucencyKind.Additive, false)]
[InlineData(TranslucencyKind.InvAlpha, false)]
public void IsOpaque_PartitionsByKind(TranslucencyKind kind, bool expected)
{
Assert.Equal(expected, WbDrawDispatcher.IsOpaquePublic(kind));
}
}