diff --git a/docs/ISSUES.md b/docs/ISSUES.md index 3565c30..e6d52f4 100644 --- a/docs/ISSUES.md +++ b/docs/ISSUES.md @@ -46,6 +46,40 @@ Copy this block when adding a new issue: # Active issues +## #55 — Static-entity slow path reports ~1.45M `meshMissing` per 5s at r4 standstill + +**Status:** OPEN +**Severity:** LOW (no visible regression — affects a diagnostic counter, not rendered output) +**Filed:** 2026-05-11 +**Component:** rendering / `WbDrawDispatcher` static-entity classification path + +**Description:** During the Phase N.6 slice 1 baseline measurement (`docs/plans/2026-05-11-phase-n6-perf-baseline.md` §2), +the radius=4 standstill scenario reported `meshMissing ≈ 1,450,000` per 5-second +`[WB-DIAG]` window. The same scenario while walking drops to near-zero (`meshMissing = 0` +in the steady state) as new landblocks stream in and previously-missing meshes resolve. +This suggests the static-entity slow path's mesh-load lifecycle has some delay before +populating for newly-streamed content but eventually catches up; the standstill case +keeps re-counting the same set of entities-with-unresolved-meshes for the duration of +the run. The counter is per-frame so the absolute number scales with FPS — at the +measured ~150 FPS that's ~290K reports/s, or ~1900 entities each reported each frame. + +**Root cause / status:** Not investigated. Hypothesis: an entity classification path +counts mesh-missing on every frame for static entities whose `MeshRef` resolution races +the streaming loader. The Tier 1 cache (#53) populates only for entities whose +classification succeeded, so persistently-failing entities run the slow path every frame +forever and bump `meshMissing` every time. If true, the fix is either (a) cache the +"this entity's mesh genuinely doesn't exist" result so we stop re-checking, or (b) +deferred-classify the entity once its `MeshRef` resolves. + +**Files:** `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` (the slow path that +increments `_meshesMissing`), `src/AcDream.App/Rendering/Wb/EntityClassificationCache.cs` +(the Tier 1 cache — likely needs to learn about "permanently missing" entries). + +**Acceptance:** `meshMissing` should drop to near-zero within ~5 seconds of streaming +settle at any radius/motion combination, not stay at ~1.45M/5s indefinitely at standstill. + +--- + ## #50 — Road-edge tree at 0xA9B1 visible in acdream but not retail **Status:** OPEN diff --git a/docs/plans/2026-04-11-roadmap.md b/docs/plans/2026-04-11-roadmap.md index cab23c6..91b674a 100644 --- a/docs/plans/2026-04-11-roadmap.md +++ b/docs/plans/2026-04-11-roadmap.md @@ -63,7 +63,7 @@ | N.4 | Rendering pipeline foundation — adopted WB's `ObjectMeshManager` as the production mesh pipeline behind `ACDREAM_USE_WB_FOUNDATION` (default-on). `WbMeshAdapter` is the single seam (owns `ObjectMeshManager`, drains the staged-upload queue per frame, populates `AcSurfaceMetadataTable` with per-batch translucency / luminosity / fog metadata). `WbDrawDispatcher` is the production draw path: groups all visible (entity, batch) pairs, single-uploads the matrix buffer, fires one `glDrawElementsInstancedBaseVertexBaseInstance` per group with `BaseInstance` slicing into the shared instance VBO. `LandblockSpawnAdapter` + `EntitySpawnAdapter` bridge spawn lifecycle to WB ref-counts (atlas tier vs per-instance). Perf wins shipped as part of N.4: per-entity frustum cull, opaque front-to-back sort, palette-hash memoization (compute once per entity, reuse across batches). Visual verification at Holtburg passed: scenery + connected characters with full close-detail geometry (Issue #47 regression resolved). Legacy `InstancedMeshRenderer` retained as `ACDREAM_USE_WB_FOUNDATION=0` escape hatch until N.6 (retired early in N.5 ship amendment). | Live ✓ | | N.5 | Modern rendering path — lifted `WbDrawDispatcher` onto bindless textures (`GL_ARB_bindless_texture`) + `glMultiDrawElementsIndirect`. Per-frame entity rendering: 3 SSBO uploads (instance matrices @ binding=0, batch data @ binding=1, indirect commands) + 2 indirect draw calls (opaque + transparent). ~12-15 GL calls per frame regardless of group count, down from hundreds-of-per-group in N.4. CPU dispatcher: 1.23 ms/frame median at Holtburg courtyard (1662 groups, ~810 fps sustained). All textures on the WB modern path use 1-layer `Texture2DArray` + `sampler2DArray`. Legacy callers keep `Texture2D` / `sampler2D` via the parallel `TextureCache` path until N.6 retires them. Three gotchas captured in memory: texture target lock-in, bindless Dispose order (two-phase non-resident before delete), GL_TIME_ELAPSED double-buffering. **Ship amendment 2026-05-08:** legacy renderers (`InstancedMeshRenderer`, `StaticMeshRenderer`, `WbFoundationFlag`) retired within N.5 — modern path is mandatory; missing bindless throws `NotSupportedException` at startup. N.6 scope narrowed accordingly. Plan archived at `docs/superpowers/plans/2026-05-08-phase-n5-modern-rendering.md`. | Live ✓ | | N.5b | Terrain on the modern rendering path — `TerrainModernRenderer` replaces `TerrainChunkRenderer` (the latter plus `TerrainRenderer` + `terrain.vert/.frag` deleted). Single global VBO/EBO with slot allocator (one slot per landblock), per-frame `DrawElementsIndirectCommand[]` upload + `glMultiDrawElementsIndirect`, bindless atlas handles passed as `uvec2` uniforms reconstructed via `sampler2DArray(handle)`. **Path C** chosen: mirrors WB's `TerrainRenderManager` pattern but consumes `LandblockMesh.Build` so retail's `FSplitNESW` formula is preserved (closes ISSUE #51). Path A killed by 49.98% measured divergence between WB's `CalculateSplitDirection` and retail's at addr `00531d10`; Path B (fork-patch WB) rejected for permanent maintenance burden. Perf at Holtburg radius=5 (commit `da56063`): modern 6.4-7.0 µs / 9-14 µs p95 vs legacy 1.5 µs / 3.0 µs — **modern is ~4× SLOWER on CPU at radius=5** because legacy's 16×16-LB chunking collapsed visible LBs to one `glDrawElements`. Architectural wins (zero `glBindTexture`/frame, constant-cost dispatch, per-LB frustum cull) manifest at higher radius (A.5 territory). Spec acceptance criterion 5 ("≥10% lower CPU at radius=5") amended via `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`. Three gotchas captured in memory: `uniform sampler2DArray` + `glProgramUniformHandleARB` GL_INVALID_OPERATIONs on at least one driver (use `uniform uvec2` + `sampler2DArray(handle)` constructor instead — N.5's mesh_modern pattern); `MaybeFlushTerrainDiag` median-calc underflow on first sample; visual gates need actual visual confirmation, not assent. Plan archived at `docs/superpowers/plans/2026-05-09-phase-n5b-terrain-modern.md`. | Live ✓ | -| N.6.1 | Phase N.6 slice 1 — GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ | +| N.6 slice 1 | GPU timing fix + radius=12 perf baseline. Fixed the gpu_us double-buffering bug in `WbDrawDispatcher` (ring-of-3 query slots, read-before-overwrite, vendor-neutral across AMD/NVIDIA/Intel desktop GL). Added env-gated `ACDREAM_DUMP_SURFACES=1` one-shot surface-format histogram dump in `TextureCache` for the atlas-opportunity audit. Captured authoritative baseline at Holtburg radii 4 / 8 / 12 (standstill + walking) with the now-working `gpu_us` diagnostic; baseline doc concludes CPU dominates GPU by 30–50× at every radius and recommends C.1.5 next then reduced-scope slice 2 (atlas + persistent-mapped buffers dropped). Baseline numbers at [docs/plans/2026-05-11-phase-n6-perf-baseline.md](2026-05-11-phase-n6-perf-baseline.md). Plan archived at `docs/superpowers/plans/2026-05-11-phase-n6-slice1.md`. | Live ✓ | Plus polish that doesn't get its own phase number: - FlyCamera default speed lowered + Shift-to-boost diff --git a/docs/plans/2026-05-11-phase-n6-perf-baseline.md b/docs/plans/2026-05-11-phase-n6-perf-baseline.md index ba5f0a1..93870e7 100644 --- a/docs/plans/2026-05-11-phase-n6-perf-baseline.md +++ b/docs/plans/2026-05-11-phase-n6-perf-baseline.md @@ -165,14 +165,29 @@ default N₁=4; worth revisiting if a future quality preset wants N₁=8 as defa cpu_us median > 5,000 µs or gpu_us p95 > 8,000 µs, re-open the escalation question. Otherwise, hold the C.1.5 → reduced-slice-2 sequence. -## §5. Raw logs +## §5. Reproducing the measurements -Scratch logs from this measurement run (not committed; can be deleted once the doc is -reviewed): +Raw `[WB-DIAG]` output from each run was inspected live during measurement and the +median of the last three steady-state lines from each scenario was transcribed into §2. +The raw launch logs were not preserved — the captured medians in §2 are the canonical +record. To reproduce on the same hardware: -- `baseline-r4-stand.log`, `baseline-r4-walk.log` -- `baseline-r8-stand.log`, `baseline-r8-walk.log` -- `baseline-r12-stand.log`, `baseline-r12-walk.log` -- `baseline-surfaces.log` (launch log for `ACDREAM_DUMP_SURFACES=1` run) -- `baseline-surfaces.txt` (copy of `%LOCALAPPDATA%\acdream\n6-surfaces.txt`) -- `task1-verify.log` (Task 1 manual verification log) +```powershell +$env:ACDREAM_DAT_DIR = "$env:USERPROFILE\Documents\Asheron's Call" +$env:ACDREAM_LIVE = "1" +$env:ACDREAM_TEST_HOST = "127.0.0.1" +$env:ACDREAM_TEST_PORT = "9000" +$env:ACDREAM_TEST_USER = "testaccount" +$env:ACDREAM_TEST_PASS = "testpassword" +$env:ACDREAM_WB_DIAG = "1" +$env:ACDREAM_STREAM_RADIUS = "4" # or 8, 12 +dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath "baseline.log" +``` + +Stand still for ~30 s at the target radius (60 s at radius 12 to let streaming settle), +or walk N→E→S→W across one landblock. Then `Select-String -Path baseline.log -Pattern +"\[WB-DIAG\]" | Select-Object -Last 3` captures the steady-state numbers. + +For the surface histogram, also set `$env:ACDREAM_DUMP_SURFACES = "1"`, stay in-world +~30 s after streaming has loaded ≥100 textures (the cache-size gate), then read +`$env:LOCALAPPDATA\acdream\n6-surfaces.txt`. diff --git a/docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md b/docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md index 3c35307..ec80af2 100644 --- a/docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md +++ b/docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md @@ -196,7 +196,7 @@ Plus rollups at the end: ### Cost when off -Zero — gated by the env-var check. The dump method is only called from a guarded `if` in `GameWindow.cs`. +Negligible — one `Dictionary` write per `UploadRgba8`/`UploadRgba8AsLayer1Array` call (the `_uploadMetadata` insertion is unconditional so the dump path doesn't have to query GL state when it does fire). At Holtburg with 760 textures that's ~30–50 KB of process memory and one hash-table write per upload — invisible at runtime, no GC pressure. The expensive work (file I/O, histogram construction) is gated by the env-var check inside `TickSurfaceHistogramDumpIfEnabled` and only runs when `ACDREAM_DUMP_SURFACES=1`. ---