acdream/docs/plans/2026-05-08-phase-n5-perf-baseline.md
Erik dcae2b6b94 phase(N.5): retirement amendment — InstancedMeshRenderer + StaticMeshRenderer + WbFoundationFlag deleted
Final cross-cutting review of N.5 found that Task 15's deletion of
mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned —
ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with
no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still
works" claim was inaccurate.

Resolution: formal retirement of the legacy renderer path within N.5
instead of deferring to N.6.

Deleted:
- src/AcDream.App/Rendering/InstancedMeshRenderer.cs
- src/AcDream.App/Rendering/StaticMeshRenderer.cs
- src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs

GameWindow simplified — capability detection is unconditional, missing
bindless throws NotSupportedException with a clear message at startup.
WbDrawDispatcher + mesh_modern shader load are mandatory after init.
No escape hatch.

GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on
AddLandblock/RemoveLandblock removed; adapter calls are unconditional
when the adapter is non-null.

PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable
static ctor removed (flag is gone; adapter calls are unconditional).

The ApplyLoadedTerrain physics-data loop was also simplified: the
EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone;
_pendingCellMeshes is now explicitly cleared to prevent unbounded
accumulation (the worker thread still populates it, but WB handles
EnvCell geometry through its own pipeline).

Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment
section added. Roadmap updated (N.5 ships with retirement; N.6 scope
narrowed to perf-only). CLAUDE.md "WB integration cribs" updated.
Perf baseline doc updated. WbDrawDispatcher class summary docstring
corrected to describe the as-shipped SSBO + multi-draw-indirect path.
ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7).

Bindless support is now a hard requirement. Modern desktop GPUs
universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters;
if a user hits the NotSupportedException, that's a real bug report
worth investigating, not a silent fallback.

Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:01:36 +02:00

72 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase N.5 perf baseline
**Captured:** 2026-05-08, against N.5 head (post-Task 12) on local machine.
**Method:** `ACDREAM_WB_DIAG=1` + character at Holtburg spawn position +
roaming. Numbers below are 5-second window medians from `[WB-DIAG]`.
## Holtburg courtyard (steady state)
| Metric | N.5 measured | N.4 (estimated*) | Gate |
|---|---|---|---|
| CPU dispatcher (median) | **1227 µs / frame** | ≥2500 µs / frame | ≤70% of N.4 → **PASS** |
| CPU dispatcher (p95) | 1303 µs / frame | — | — |
| GPU rendering (median) | unmeasured (see below) | — | within ±10% — **DEFERRED** |
| `drawsIssued` per 5s | 4.85M (= 1662 groups × ~580 fps) | far higher per frame | — |
| `drawsIssued` per pass (CPU GL calls) | **2** (1 opaque + 1 transparent indirect) | ~hundreds per pass | ≤5 → **PASS** |
| `groups` (working set) | 1662 | ~similar | sanity |
| Frame rate (inferred) | ~810 fps | ~100-200 fps | substantial uplift |
*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame"
estimate assumes N.4's per-group glBindTexture + glBindBuffer +
glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per
group and N.4 has ~1700 groups in this scene, putting the GL portion alone
at ~2.5 ms before adding the entity-walk overhead. N.5's measurement
includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO
uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably
half of the lower bound estimate.
## Acceptance gates (spec §8.3)
- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE: Holtburg
courtyard renders identical, no missing entities, no z-fighting, no
exploded parts.
- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures 1.23 ms/frame
median; estimated N.4 ≥2.5 ms/frame; **comfortably under 70%**.
- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The
`GL_TIME_ELAPSED` query polling never reports `avail != 0` in our
single-frame poll loop; the driver hasn't finalized the result by the
time we check. The fix is double-buffering (issue queryA on frame N,
read result on frame N+2). N.6 perf polish item.
- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 indirect
calls per frame regardless of scene size.
- [x] **All tests green** — 70/70 in
`FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`.
8 pre-existing failures in `MotionInterpreter` / `BSPStepUp` /
`PositionManager` / `PlayerMovementController` / `Dispatcher` are
carry-forward from before N.5 and unrelated to rendering.
- [N/A] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — escape hatch
formally retired in N.5 ship amendment. `InstancedMeshRenderer`,
`StaticMeshRenderer`, and `WbFoundationFlag` deleted. Missing
bindless throws `NotSupportedException` at startup with a clear
error message. No fallback path.
## Visual verification (Task 14)
- [x] **Holtburg courtyard** — PASS at Task 10 USER GATE.
- [ ] **Foundry interior / dense static-object scene** — TODO Task 14.
- [ ] **Indoor → outdoor cell transition** — TODO Task 14.
- [ ] **Drudge / character close-up (Issue #47 close-detail mesh)** — TODO Task 14.
- [ ] **Magic content (Decision 2 additive fallback check)** — TODO Task 14.
- [ ] **Long-session sanity** — DEFERRED (N.6 watchlist; not load-bearing for ship).
## Open follow-ups for N.6
1. **GPU timer query double-buffering** — the current single-frame poll
pattern never sees `QueryResultAvailable=true`. Issue queryA on frame N,
queryB on frame N+1, read queryA on frame N+2. ~30 lines of state.
2. **Direct N.4 vs N.5 perf comparison** — re-run with `git checkout`ed N.4
SHIP (`c445364`) for a side-by-side measurement. Not load-bearing but
useful for N.6 ship message.
3. **Persistent-mapped buffers** — Decision 7 deferral. If profiling shows
the per-frame `glBufferData` cost is the residual hot spot, layer it on
top of the modern path.