docs(perf #N6.1): apply final-review fixes — spec, baseline doc, issue #55
Final code review of slice 1 flagged one Important issue (the spec's "zero cost when off" claim for the surface-dump path is technically violated — _uploadMetadata always writes one dict entry per upload regardless of env var) plus minor doc/consistency gaps. Applied: 1. Spec §5 "Cost when off": dropped the "Zero" claim; replaced with "Negligible — one Dictionary write per upload (~30-50 KB at Holtburg) plus a hash-table write per upload. Expensive work (file I/O, histogram construction) is still env-gated." This matches reality. 2. Baseline doc §5: rewrote from "Raw logs (scratch, can be deleted)" referencing files that were never preserved in this worktree, to "Reproducing the measurements" with the actual PowerShell launch commands. Honest about the raw logs not being kept; the captured medians in section 2 are the canonical record. 3. New issue #55 filed in docs/ISSUES.md — static-entity slow path reports ~1.45M meshMissing/5s at r4 standstill, drops to ~0 when walking. LOW severity (no visible regression), hypothesis points at a "permanently-missing entity gets re-classified every frame" pattern that Tier 1 cache doesn't cover. 4. Roadmap shipped table: renamed "N.6.1" row to "N.6 slice 1" to match every other artifact's naming. Search-discoverability fix. None of these change the slice's conclusion or next-phase recommendation (C.1.5 first, then reduced-scope slice 2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
76ca3ffca8
commit
41981c4d74
4 changed files with 60 additions and 11 deletions
|
|
@ -46,6 +46,40 @@ Copy this block when adding a new issue:
|
|||
|
||||
# Active issues
|
||||
|
||||
## #55 — Static-entity slow path reports ~1.45M `meshMissing` per 5s at r4 standstill
|
||||
|
||||
**Status:** OPEN
|
||||
**Severity:** LOW (no visible regression — affects a diagnostic counter, not rendered output)
|
||||
**Filed:** 2026-05-11
|
||||
**Component:** rendering / `WbDrawDispatcher` static-entity classification path
|
||||
|
||||
**Description:** During the Phase N.6 slice 1 baseline measurement (`docs/plans/2026-05-11-phase-n6-perf-baseline.md` §2),
|
||||
the radius=4 standstill scenario reported `meshMissing ≈ 1,450,000` per 5-second
|
||||
`[WB-DIAG]` window. The same scenario while walking drops to near-zero (`meshMissing = 0`
|
||||
in the steady state) as new landblocks stream in and previously-missing meshes resolve.
|
||||
This suggests the static-entity slow path's mesh-load lifecycle has some delay before
|
||||
populating for newly-streamed content but eventually catches up; the standstill case
|
||||
keeps re-counting the same set of entities-with-unresolved-meshes for the duration of
|
||||
the run. The counter is per-frame so the absolute number scales with FPS — at the
|
||||
measured ~150 FPS that's ~290K reports/s, or ~1900 entities each reported each frame.
|
||||
|
||||
**Root cause / status:** Not investigated. Hypothesis: an entity classification path
|
||||
counts mesh-missing on every frame for static entities whose `MeshRef` resolution races
|
||||
the streaming loader. The Tier 1 cache (#53) populates only for entities whose
|
||||
classification succeeded, so persistently-failing entities run the slow path every frame
|
||||
forever and bump `meshMissing` every time. If true, the fix is either (a) cache the
|
||||
"this entity's mesh genuinely doesn't exist" result so we stop re-checking, or (b)
|
||||
deferred-classify the entity once its `MeshRef` resolves.
|
||||
|
||||
**Files:** `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` (the slow path that
|
||||
increments `_meshesMissing`), `src/AcDream.App/Rendering/Wb/EntityClassificationCache.cs`
|
||||
(the Tier 1 cache — likely needs to learn about "permanently missing" entries).
|
||||
|
||||
**Acceptance:** `meshMissing` should drop to near-zero within ~5 seconds of streaming
|
||||
settle at any radius/motion combination, not stay at ~1.45M/5s indefinitely at standstill.
|
||||
|
||||
---
|
||||
|
||||
## #50 — Road-edge tree at 0xA9B1 visible in acdream but not retail
|
||||
|
||||
**Status:** OPEN
|
||||
|
|
|
|||
|
|
@ -63,7 +63,7 @@
|
|||
| N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ |
|
||||
| N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ |
|
||||
| N.5b | Terrain on the modern rendering path — `TerrainModernRenderer` replaces `TerrainChunkRenderer` (the latter plus `TerrainRenderer` + `terrain.vert/.frag` deleted). Single global VBO/EBO with slot allocator (one slot per landblock), per-frame `DrawElementsIndirectCommand[]` upload + `glMultiDrawElementsIndirect`, bindless atlas handles passed as `uvec2` uniforms reconstructed via `sampler2DArray(handle)`. **Path C** chosen: mirrors WB's `TerrainRenderManager` pattern but consumes `LandblockMesh.Build` so retail's `FSplitNESW` formula is preserved (closes ISSUE #51). Path A killed by 49.98% measured divergence between WB's `CalculateSplitDirection` and retail's at addr `00531d10`; Path B (fork-patch WB) rejected for permanent maintenance burden. Perf at Holtburg radius=5 (commit `da56063`): modern 6.4-7.0 µs / 9-14 µs p95 vs legacy 1.5 µs / 3.0 µs — **modern is ~4× SLOWER on CPU at radius=5** because legacy's 16×16-LB chunking collapsed visible LBs to one `glDrawElements`. Architectural wins (zero `glBindTexture`/frame, constant-cost dispatch, per-LB frustum cull) manifest at higher radius (A.5 territory). Spec acceptance criterion 5 ("≥10% lower CPU at radius=5") amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Three gotchas captured in memory: `uniform sampler2DArray` + `glProgramUniformHandleARB` GL_INVALID_OPERATIONs on at least one driver (use `uniform uvec2` + `sampler2DArray(handle)` constructor instead — N.5's mesh_modern pattern); `MaybeFlushTerrainDiag` median-calc underflow on first sample; visual gates need actual visual confirmation, not assent. Plan archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. | Live ✓ |
|
||||
| N.6.1 | Phase N.6 slice 1 — GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ |
|
||||
| N.6 slice 1 | GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ |
|
||||
|
||||
Plus polish that doesn't get its own phase number:
|
||||
- FlyCamera default speed lowered + Shift-to-boost
|
||||
|
|
|
|||
|
|
@ -165,14 +165,29 @@ default N₁=4; worth revisiting if a future quality preset wants N₁=8 as defa
|
|||
cpu_us median > 5,000 µs or gpu_us p95 > 8,000 µs, re-open the escalation question.
|
||||
Otherwise, hold the C.1.5 → reduced-slice-2 sequence.
|
||||
|
||||
## §5. Raw logs
|
||||
## §5. Reproducing the measurements
|
||||
|
||||
Scratch logs from this measurement run (not committed; can be deleted once the doc is
|
||||
reviewed):
|
||||
Raw `[WB-DIAG]` output from each run was inspected live during measurement and the
|
||||
median of the last three steady-state lines from each scenario was transcribed into §2.
|
||||
The raw launch logs were not preserved — the captured medians in §2 are the canonical
|
||||
record. To reproduce on the same hardware:
|
||||
|
||||
- `baseline-r4-stand.log`, `baseline-r4-walk.log`
|
||||
- `baseline-r8-stand.log`, `baseline-r8-walk.log`
|
||||
- `baseline-r12-stand.log`, `baseline-r12-walk.log`
|
||||
- `baseline-surfaces.log` (launch log for `ACDREAM_DUMP_SURFACES=1` run)
|
||||
- `baseline-surfaces.txt` (copy of `%LOCALAPPDATA%\acdream\n6-surfaces.txt`)
|
||||
- `task1-verify.log` (Task 1 manual verification log)
|
||||
```powershell
|
||||
$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call"
|
||||
$env:ACDREAM_LIVE = "1"
|
||||
$env:ACDREAM_TEST_HOST = "127.0.0.1"
|
||||
$env:ACDREAM_TEST_PORT = "9000"
|
||||
$env:ACDREAM_TEST_USER = "testaccount"
|
||||
$env:ACDREAM_TEST_PASS = "testpassword"
|
||||
$env:ACDREAM_WB_DIAG = "1"
|
||||
$env:ACDREAM_STREAM_RADIUS = "4" # or 8, 12
|
||||
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline.log"
|
||||
```
|
||||
|
||||
Stand still for ~30 s at the target radius (60 s at radius 12 to let streaming settle),
|
||||
or walk N→E→S→W across one landblock. Then `Select-String -Path baseline.log -Pattern
|
||||
"\[WB-DIAG\]" | Select-Object -Last 3` captures the steady-state numbers.
|
||||
|
||||
For the surface histogram, also set `$env:ACDREAM_DUMP_SURFACES = "1"`, stay in-world
|
||||
~30 s after streaming has loaded ≥100 textures (the cache-size gate), then read
|
||||
`$env:LOCALAPPDATA\acdream\n6-surfaces.txt`.
|
||||
|
|
|
|||
|
|
@ -196,7 +196,7 @@ Plus rollups at the end:
|
|||
|
||||
### Cost when off
|
||||
|
||||
Zero — gated by the env-var check. The dump method is only called from a guarded `if` in `GameWindow.cs`.
|
||||
Negligible — one `Dictionary<uint, …>` write per `UploadRgba8`/`UploadRgba8AsLayer1Array` call (the `_uploadMetadata` insertion is unconditional so the dump path doesn't have to query GL state when it does fire). At Holtburg with 760 textures that's ~30–50 KB of process memory and one hash-table write per upload — invisible at runtime, no GC pressure. The expensive work (file I/O, histogram construction) is gated by the env-var check inside `TickSurfaceHistogramDumpIfEnabled` and only runs when `ACDREAM_DUMP_SURFACES=1`.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue