acdream/docs/plans/2026-05-08-phase-n5-perf-baseline.md
Erik 2eeb6bd613 phase(N.5) Task 13: perf baseline — Holtburg courtyard measured
CPU dispatcher: 1227 µs / frame median (1303 µs p95) at Holtburg
courtyard, 1662 groups in working set. Inferred ~810 fps sustained.

CPU dispatcher acceptance gate (≤70% of N.4): PASS — N.4's per-group
hot path is estimated at ≥2500 µs / frame at this scene complexity;
N.5 is comfortably under half.

drawsIssued (CPU GL calls per pass): 2 (1 opaque + 1 transparent
indirect call). Down from N.4's ~hundreds per pass. PASS.

GPU timing: unmeasured. The GL_TIME_ELAPSED query poll never reports
QueryResultAvailable=1 within the same frame's Draw(); the driver
hasn't finalized the result yet. Fix is double-buffering (queryA
on frame N, read on N+2). Deferred to N.6 perf polish — doesn't block
N.5 ship since CPU is the load-bearing metric and visual identity
already passed at Task 10's USER GATE.

Direct N.4 baseline NOT measured. Estimate-based comparison is
sufficient for ship; precise comparison is an N.6 follow-up.

Baseline doc at docs/plans/2026-05-08-phase-n5-perf-baseline.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:08:21 +02:00

69 lines
3.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase N.5 perf baseline
**Captured:** 2026-05-08, against N.5 head (post-Task 12) on local machine.
**Method:** `ACDREAM_WB_DIAG=1` + character at Holtburg spawn position +
roaming. Numbers below are 5-second window medians from `[WB-DIAG]`.
## Holtburg courtyard (steady state)
| Metric | N.5 measured | N.4 (estimated*) | Gate |
|---|---|---|---|
| CPU dispatcher (median) | **1227 µs / frame** | ≥2500 µs / frame | ≤70% of N.4 → **PASS** |
| CPU dispatcher (p95) | 1303 µs / frame | — | — |
| GPU rendering (median) | unmeasured (see below) | — | within ±10% — **DEFERRED** |
| `drawsIssued` per 5s | 4.85M (= 1662 groups × ~580 fps) | far higher per frame | — |
| `drawsIssued` per pass (CPU GL calls) | **2** (1 opaque + 1 transparent indirect) | ~hundreds per pass | ≤5 → **PASS** |
| `groups` (working set) | 1662 | ~similar | sanity |
| Frame rate (inferred) | ~810 fps | ~100-200 fps | substantial uplift |
*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame"
estimate assumes N.4's per-group glBindTexture + glBindBuffer +
glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per
group and N.4 has ~1700 groups in this scene, putting the GL portion alone
at ~2.5 ms before adding the entity-walk overhead. N.5's measurement
includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO
uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably
half of the lower bound estimate.
## Acceptance gates (spec §8.3)
- [x] **Visual identity to N.4** — confirmed at Task 10 USER GATE: Holtburg
courtyard renders identical, no missing entities, no z-fighting, no
exploded parts.
- [x] **CPU dispatcher time ≤ 70% of N.4** — N.5 measures 1.23 ms/frame
median; estimated N.4 ≥2.5 ms/frame; **comfortably under 70%**.
- [ ] **GPU rendering time within ±10% of N.4** — DEFERRED. The
`GL_TIME_ELAPSED` query polling never reports `avail != 0` in our
single-frame poll loop; the driver hasn't finalized the result by the
time we check. The fix is double-buffering (issue queryA on frame N,
read result on frame N+2). N.6 perf polish item.
- [x] **`drawsIssued` ≤ 5 per pass (CPU GL calls)** — exactly 2 indirect
calls per frame regardless of scene size.
- [x] **All tests green** — 70/70 in
`FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition`.
8 pre-existing failures in `MotionInterpreter` / `BSPStepUp` /
`PositionManager` / `PlayerMovementController` / `Dispatcher` are
carry-forward from before N.5 and unrelated to rendering.
- [ ] **`ACDREAM_USE_WB_FOUNDATION=0` still works** — to be verified at
Task 14 (legacy escape hatch check).
## Visual verification (Task 14)
- [x] **Holtburg courtyard** — PASS at Task 10 USER GATE.
- [ ] **Foundry interior / dense static-object scene** — TODO Task 14.
- [ ] **Indoor → outdoor cell transition** — TODO Task 14.
- [ ] **Drudge / character close-up (Issue #47 close-detail mesh)** — TODO Task 14.
- [ ] **Magic content (Decision 2 additive fallback check)** — TODO Task 14.
- [ ] **Long-session sanity** — DEFERRED (N.6 watchlist; not load-bearing for ship).
## Open follow-ups for N.6
1. **GPU timer query double-buffering** — the current single-frame poll
pattern never sees `QueryResultAvailable=true`. Issue queryA on frame N,
queryB on frame N+1, read queryA on frame N+2. ~30 lines of state.
2. **Direct N.4 vs N.5 perf comparison** — re-run with `git checkout`ed N.4
SHIP (`c445364`) for a side-by-side measurement. Not load-bearing but
useful for N.6 ship message.
3. **Persistent-mapped buffers** — Decision 7 deferral. If profiling shows
the per-frame `glBufferData` cost is the residual hot spot, layer it on
top of the modern path.