docs(perf #N6.1): apply final-review fixes — spec, baseline doc, issue #55
Final code review of slice 1 flagged one Important issue (the spec's "zero cost when off" claim for the surface-dump path is technically violated — _uploadMetadata always writes one dict entry per upload regardless of env var) plus minor doc/consistency gaps. Applied: 1. Spec §5 "Cost when off": dropped the "Zero" claim; replaced with "Negligible — one Dictionary write per upload (~30-50 KB at Holtburg) plus a hash-table write per upload. Expensive work (file I/O, histogram construction) is still env-gated." This matches reality. 2. Baseline doc §5: rewrote from "Raw logs (scratch, can be deleted)" referencing files that were never preserved in this worktree, to "Reproducing the measurements" with the actual PowerShell launch commands. Honest about the raw logs not being kept; the captured medians in section 2 are the canonical record. 3. New issue #55 filed in docs/ISSUES.md — static-entity slow path reports ~1.45M meshMissing/5s at r4 standstill, drops to ~0 when walking. LOW severity (no visible regression), hypothesis points at a "permanently-missing entity gets re-classified every frame" pattern that Tier 1 cache doesn't cover. 4. Roadmap shipped table: renamed "N.6.1" row to "N.6 slice 1" to match every other artifact's naming. Search-discoverability fix. None of these change the slice's conclusion or next-phase recommendation (C.1.5 first, then reduced-scope slice 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
76ca3ffca8
commit
41981c4d74
4 changed files with 60 additions and 11 deletions
|
|
@ -63,7 +63,7 @@
|
|||
| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ |
|
||||
| N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ |
|
||||
| N.5b | Terrain on the modern rendering path — `TerrainModernRenderer` replaces `TerrainChunkRenderer` (the latter plus `TerrainRenderer` + `terrain.vert/.frag` deleted). Single global VBO/EBO with slot allocator (one slot per landblock), per-frame `DrawElementsIndirectCommand[]` upload + `glMultiDrawElementsIndirect`, bindless atlas handles passed as `uvec2` uniforms reconstructed via `sampler2DArray(handle)`. **Path C** chosen: mirrors WB's `TerrainRenderManager` pattern but consumes `LandblockMesh.Build` so retail's `FSplitNESW` formula is preserved (closes ISSUE #51). Path A killed by 49.98% measured divergence between WB's `CalculateSplitDirection` and retail's at addr `00531d10`; Path B (fork-patch WB) rejected for permanent maintenance burden. Perf at Holtburg radius=5 (commit `da56063`): modern 6.4-7.0 µs / 9-14 µs p95 vs legacy 1.5 µs / 3.0 µs — **modern is ~4× SLOWER on CPU at radius=5** because legacy's 16×16-LB chunking collapsed visible LBs to one `glDrawElements`. Architectural wins (zero `glBindTexture`/frame, constant-cost dispatch, per-LB frustum cull) manifest at higher radius (A.5 territory). Spec acceptance criterion 5 ("≥10% lower CPU at radius=5") amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Three gotchas captured in memory: `uniform sampler2DArray` + `glProgramUniformHandleARB` GL_INVALID_OPERATIONs on at least one driver (use `uniform uvec2` + `sampler2DArray(handle)` constructor instead — N.5's mesh_modern pattern); `MaybeFlushTerrainDiag` median-calc underflow on first sample; visual gates need actual visual confirmation, not assent. Plan archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. | Live ✓ |
|
||||
| N.6.1 | Phase N.6 slice 1 — GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ |
|
||||
| N.6 slice 1 | GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ |
|
||||
|
||||
Plus polish that doesn't get its own phase number:
|
||||
- FlyCamera default speed lowered + Shift-to-boost
|
||||
|
|
|
|||
|
|
@ -165,14 +165,29 @@ default N₁=4; worth revisiting if a future quality preset wants N₁=8 as defa
|
|||
cpu_us median > 5,000 µs or gpu_us p95 > 8,000 µs, re-open the escalation question.
|
||||
Otherwise, hold the C.1.5 → reduced-slice-2 sequence.
|
||||
|
||||
## §5. Raw logs
|
||||
## §5. Reproducing the measurements
|
||||
|
||||
Scratch logs from this measurement run (not committed; can be deleted once the doc is
|
||||
reviewed):
|
||||
Raw `[WB-DIAG]` output from each run was inspected live during measurement and the
|
||||
median of the last three steady-state lines from each scenario was transcribed into §2.
|
||||
The raw launch logs were not preserved — the captured medians in §2 are the canonical
|
||||
record. To reproduce on the same hardware:
|
||||
|
||||
- `baseline-r4-stand.log`, `baseline-r4-walk.log`
|
||||
- `baseline-r8-stand.log`, `baseline-r8-walk.log`
|
||||
- `baseline-r12-stand.log`, `baseline-r12-walk.log`
|
||||
- `baseline-surfaces.log` (launch log for `ACDREAM_DUMP_SURFACES=1` run)
|
||||
- `baseline-surfaces.txt` (copy of `%LOCALAPPDATA%\acdream\n6-surfaces.txt`)
|
||||
- `task1-verify.log` (Task 1 manual verification log)
|
||||
```powershell
|
||||
$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call"
|
||||
$env:ACDREAM_LIVE = "1"
|
||||
$env:ACDREAM_TEST_HOST = "127.0.0.1"
|
||||
$env:ACDREAM_TEST_PORT = "9000"
|
||||
$env:ACDREAM_TEST_USER = "testaccount"
|
||||
$env:ACDREAM_TEST_PASS = "testpassword"
|
||||
$env:ACDREAM_WB_DIAG = "1"
|
||||
$env:ACDREAM_STREAM_RADIUS = "4" # or 8, 12
|
||||
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline.log"
|
||||
```
|
||||
|
||||
Stand still for ~30 s at the target radius (60 s at radius 12 to let streaming settle),
|
||||
or walk N→E→S→W across one landblock. Then `Select-String -Path baseline.log -Pattern
|
||||
"\[WB-DIAG\]" | Select-Object -Last 3` captures the steady-state numbers.
|
||||
|
||||
For the surface histogram, also set `$env:ACDREAM_DUMP_SURFACES = "1"`, stay in-world
|
||||
~30 s after streaming has loaded ≥100 textures (the cache-size gate), then read
|
||||
`$env:LOCALAPPDATA\acdream\n6-surfaces.txt`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue