acdream/docs/plans/2026-05-08-phase-n5-perf-baseline.md
Erik dcae2b6b94 phase(N.5): retirement amendment — InstancedMeshRenderer + StaticMeshRenderer + WbFoundationFlag deleted
Final cross-cutting review of N.5 found that Task 15's deletion of
mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned —
ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with
no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still
works" claim was inaccurate.

Resolution: formal retirement of the legacy renderer path within N.5
instead of deferring to N.6.

Deleted:
- src/AcDream.App/Rendering/InstancedMeshRenderer.cs
- src/AcDream.App/Rendering/StaticMeshRenderer.cs
- src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs

GameWindow simplified — capability detection is unconditional, missing
bindless throws NotSupportedException with a clear message at startup.
WbDrawDispatcher + mesh_modern shader load are mandatory after init.
No escape hatch.

GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on
AddLandblock/RemoveLandblock removed; adapter calls are unconditional
when the adapter is non-null.

PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable
static ctor removed (flag is gone; adapter calls are unconditional).

The ApplyLoadedTerrain physics-data loop was also simplified: the
EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone;
_pendingCellMeshes is now explicitly cleared to prevent unbounded
accumulation (the worker thread still populates it, but WB handles
EnvCell geometry through its own pipeline).

Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment
section added. Roadmap updated (N.5 ships with retirement; N.6 scope
narrowed to perf-only). CLAUDE.md "WB integration cribs" updated.
Perf baseline doc updated. WbDrawDispatcher class summary docstring
corrected to describe the as-shipped SSBO + multi-draw-indirect path.
ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7).

Bindless support is now a hard requirement. Modern desktop GPUs
universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters;
if a user hits the NotSupportedException, that's a real bug report
worth investigating, not a silent fallback.

Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 22:01:36 +02:00

3.9 KiB
Raw Blame History

Phase N.5 perf baseline

Captured: 2026-05-08, against N.5 head (post-Task 12) on local machine. Method: ACDREAM_WB_DIAG=1 + character at Holtburg spawn position + roaming. Numbers below are 5-second window medians from [WB-DIAG].

Holtburg courtyard (steady state)

Metric N.5 measured N.4 (estimated*) Gate
CPU dispatcher (median) 1227 µs / frame ≥2500 µs / frame ≤70% of N.4 → PASS
CPU dispatcher (p95) 1303 µs / frame
GPU rendering (median) unmeasured (see below) within ±10% — DEFERRED
drawsIssued per 5s 4.85M (= 1662 groups × ~580 fps) far higher per frame
drawsIssued per pass (CPU GL calls) 2 (1 opaque + 1 transparent indirect) ~hundreds per pass ≤5 → PASS
groups (working set) 1662 ~similar sanity
Frame rate (inferred) ~810 fps ~100-200 fps substantial uplift

*N.4 baseline NOT measured directly in this run. The "≥2500 µs / frame" estimate assumes N.4's per-group glBindTexture + glBindBuffer + glDrawElementsInstancedBaseVertexBaseInstance hot path costs ≥1.5 µs per group and N.4 has ~1700 groups in this scene, putting the GL portion alone at ~2.5 ms before adding the entity-walk overhead. N.5's measurement includes ALL dispatcher work (entity walk + group bucketing + 3 SSBO uploads + 2 indirect calls + state changes) at 1230 µs total — comfortably half of the lower bound estimate.

Acceptance gates (spec §8.3)

  • Visual identity to N.4 — confirmed at Task 10 USER GATE: Holtburg courtyard renders identical, no missing entities, no z-fighting, no exploded parts.
  • CPU dispatcher time ≤ 70% of N.4 — N.5 measures 1.23 ms/frame median; estimated N.4 ≥2.5 ms/frame; comfortably under 70%.
  • GPU rendering time within ±10% of N.4 — DEFERRED. The GL_TIME_ELAPSED query polling never reports avail != 0 in our single-frame poll loop; the driver hasn't finalized the result by the time we check. The fix is double-buffering (issue queryA on frame N, read result on frame N+2). N.6 perf polish item.
  • drawsIssued ≤ 5 per pass (CPU GL calls) — exactly 2 indirect calls per frame regardless of scene size.
  • All tests green — 70/70 in FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition. 8 pre-existing failures in MotionInterpreter / BSPStepUp / PositionManager / PlayerMovementController / Dispatcher are carry-forward from before N.5 and unrelated to rendering.
  • [N/A] ACDREAM_USE_WB_FOUNDATION=0 still works — escape hatch formally retired in N.5 ship amendment. InstancedMeshRenderer, StaticMeshRenderer, and WbFoundationFlag deleted. Missing bindless throws NotSupportedException at startup with a clear error message. No fallback path.

Visual verification (Task 14)

  • Holtburg courtyard — PASS at Task 10 USER GATE.
  • Foundry interior / dense static-object scene — TODO Task 14.
  • Indoor → outdoor cell transition — TODO Task 14.
  • Drudge / character close-up (Issue #47 close-detail mesh) — TODO Task 14.
  • Magic content (Decision 2 additive fallback check) — TODO Task 14.
  • Long-session sanity — DEFERRED (N.6 watchlist; not load-bearing for ship).

Open follow-ups for N.6

  1. GPU timer query double-buffering — the current single-frame poll pattern never sees QueryResultAvailable=true. Issue queryA on frame N, queryB on frame N+1, read queryA on frame N+2. ~30 lines of state.
  2. Direct N.4 vs N.5 perf comparison — re-run with git checkouted N.4 SHIP (c445364) for a side-by-side measurement. Not load-bearing but useful for N.6 ship message.
  3. Persistent-mapped buffers — Decision 7 deferral. If profiling shows the per-frame glBufferData cost is the residual hot spot, layer it on top of the modern path.