Final cross-cutting review of N.5 found that Task 15's deletion of mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned — ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still works" claim was inaccurate. Resolution: formal retirement of the legacy renderer path within N.5 instead of deferring to N.6. Deleted: - src/AcDream.App/Rendering/InstancedMeshRenderer.cs - src/AcDream.App/Rendering/StaticMeshRenderer.cs - src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs GameWindow simplified — capability detection is unconditional, missing bindless throws NotSupportedException with a clear message at startup. WbDrawDispatcher + mesh_modern shader load are mandatory after init. No escape hatch. GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on AddLandblock/RemoveLandblock removed; adapter calls are unconditional when the adapter is non-null. PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable static ctor removed (flag is gone; adapter calls are unconditional). The ApplyLoadedTerrain physics-data loop was also simplified: the EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone; _pendingCellMeshes is now explicitly cleared to prevent unbounded accumulation (the worker thread still populates it, but WB handles EnvCell geometry through its own pipeline). Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment section added. Roadmap updated (N.5 ships with retirement; N.6 scope narrowed to perf-only). CLAUDE.md "WB integration cribs" updated. Perf baseline doc updated. WbDrawDispatcher class summary docstring corrected to describe the as-shipped SSBO + multi-draw-indirect path. ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7). Bindless support is now a hard requirement. Modern desktop GPUs universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters; if a user hits the NotSupportedException, that's a real bug report worth investigating, not a silent fallback. Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
72 lines
3.9 KiB
Markdown
72 lines
3.9 KiB
Markdown
# Phase N.5 perf baseline
|
||
|
||
**Captured:** 2026-05-08, against N.5 head (post-Task 12) on local machine.
|
||
**Method:** `ACDREAM_WB_DIAG=1` + character at Holtburg spawn position +
|
||
roaming. Numbers below are 5-second window medians from `[WB-DIAG]`.
|
||
|
||
## Holtburg courtyard (steady state)
|
||
|
||
| Metric | N.5 measured | N.4 (estimated*) | Gate |
|
||
|---|---|---|---|
|
||
| CPU dispatcher (median) | **1227 µs / frame** | ≥2500 µs / frame | ≤70% of N.4 → **PASS** |
|
||
| CPU dispatcher (p95) | 1303 µs / frame | — | — |
|
||
| GPU rendering (median) | unmeasured (see below) | — | within ±10% — **DEFERRED** |
|
||
| `drawsIssued` per 5s | 4.85M (= 1662 groups × ~580 fps) | far higher per frame | — |
|
||
| `drawsIssued` per pass (CPU GL calls) | **2** (1 opaque + 1 transparent indirect) | ~hundreds per pass | ≤5 → **PASS** |
|
||
| `groups` (working set) | 1662 | ~similar | sanity |
|
||
| Frame rate (inferred) | ~810 fps | ~100-200 fps | substantial uplift |
|
||
|
||
*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame"
|
||
estimate assumes N.4's per-group glBindTexture + glBindBuffer +
|
||
glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per
|
||
group and N.4 has ~1700 groups in this scene, putting the GL portion alone
|
||
at ~2.5 ms before adding the entity-walk overhead. N.5's measurement
|
||
includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO
|
||
uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably
|
||
half of the lower bound estimate.
|
||
|
||
## Acceptance gates (spec §8.3)
|
||
|
||
- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE: Holtburg
|
||
courtyard renders identical, no missing entities, no z-fighting, no
|
||
exploded parts.
|
||
- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures 1.23 ms/frame
|
||
median; estimated N.4 ≥2.5 ms/frame; **comfortably under 70%**.
|
||
- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The
|
||
`GL_TIME_ELAPSED` query polling never reports `avail != 0` in our
|
||
single-frame poll loop; the driver hasn't finalized the result by the
|
||
time we check. The fix is double-buffering (issue queryA on frame N,
|
||
read result on frame N+2). N.6 perf polish item.
|
||
- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 indirect
|
||
calls per frame regardless of scene size.
|
||
- [x] **All tests green** — 70/70 in
|
||
`FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`.
|
||
8 pre-existing failures in `MotionInterpreter` / `BSPStepUp` /
|
||
`PositionManager` / `PlayerMovementController` / `Dispatcher` are
|
||
carry-forward from before N.5 and unrelated to rendering.
|
||
- [N/A] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — escape hatch
|
||
formally retired in N.5 ship amendment. `InstancedMeshRenderer`,
|
||
`StaticMeshRenderer`, and `WbFoundationFlag` deleted. Missing
|
||
bindless throws `NotSupportedException` at startup with a clear
|
||
error message. No fallback path.
|
||
|
||
## Visual verification (Task 14)
|
||
|
||
- [x] **Holtburg courtyard** — PASS at Task 10 USER GATE.
|
||
- [ ] **Foundry interior / dense static-object scene** — TODO Task 14.
|
||
- [ ] **Indoor → outdoor cell transition** — TODO Task 14.
|
||
- [ ] **Drudge / character close-up (Issue #47 close-detail mesh)** — TODO Task 14.
|
||
- [ ] **Magic content (Decision 2 additive fallback check)** — TODO Task 14.
|
||
- [ ] **Long-session sanity** — DEFERRED (N.6 watchlist; not load-bearing for ship).
|
||
|
||
## Open follow-ups for N.6
|
||
|
||
1. **GPU timer query double-buffering** — the current single-frame poll
|
||
pattern never sees `QueryResultAvailable=true`. Issue queryA on frame N,
|
||
queryB on frame N+1, read queryA on frame N+2. ~30 lines of state.
|
||
2. **Direct N.4 vs N.5 perf comparison** — re-run with `git checkout`ed N.4
|
||
SHIP (`c445364`) for a side-by-side measurement. Not load-bearing but
|
||
useful for N.6 ship message.
|
||
3. **Persistent-mapped buffers** — Decision 7 deferral. If profiling shows
|
||
the per-frame `glBufferData` cost is the residual hot spot, layer it on
|
||
top of the modern path.
|