Commit graph

3 commits

Author SHA1 Message Date
Erik
41981c4d74 docs(perf #N6.1): apply final-review fixes — spec, baseline doc, issue #55
Final code review of slice 1 flagged one Important issue (the spec's
"zero cost when off" claim for the surface-dump path is technically
violated — _uploadMetadata always writes one dict entry per upload
regardless of env var) plus minor doc/consistency gaps. Applied:

1. Spec §5 "Cost when off": dropped the "Zero" claim; replaced with
   "Negligible — one Dictionary write per upload (~30-50 KB at Holtburg)
   plus a hash-table write per upload. Expensive work (file I/O,
   histogram construction) is still env-gated." This matches reality.

2. Baseline doc §5: rewrote from "Raw logs (scratch, can be deleted)"
   referencing files that were never preserved in this worktree, to
   "Reproducing the measurements" with the actual PowerShell launch
   commands. Honest about the raw logs not being kept; the captured
   medians in section 2 are the canonical record.

3. New issue #55 filed in docs/ISSUES.md — static-entity slow path
   reports ~1.45M meshMissing/5s at r4 standstill, drops to ~0 when
   walking. LOW severity (no visible regression), hypothesis points
   at a "permanently-missing entity gets re-classified every frame"
   pattern that Tier 1 cache doesn't cover.

4. Roadmap shipped table: renamed "N.6.1" row to "N.6 slice 1" to
   match every other artifact's naming. Search-discoverability fix.

None of these change the slice's conclusion or next-phase
recommendation (C.1.5 first, then reduced-scope slice 2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 12:51:10 +02:00
Erik
76ca3ffca8 docs(perf #N6.1): apply code-quality review fixes to baseline doc
Code-quality review on commit 13abf96 flagged 3 Important issues in
the baseline document plus 2 minor roadmap consistency gaps. Applied
all of them:

1. The "CPU scales superlinearly with N₁" claim was imprecise because
   CPU growth (4.0×) is actually sublinear vs near-LB count (7.7×).
   Clarified: CPU grows more than linearly with radius N₁ but
   sublinearly with visible-LB count; frustum cull discards most far
   LBs early. The outer per-LB walk still scales with N₁, which is
   what Tier 2's persistent groups address.

2. The "40-50% memory footprint reduction from atlas packing" estimate
   was asserted without derivation and likely too optimistic given all
   surfaces are already power-of-two and same-format (RGBA8). Replaced
   with a more honest bound: "low-MB to ~10 MB absolute saving" with
   explicit per-array metadata overhead reasoning. Conclusion is
   unchanged — atlas adoption still isn't justified given GPU
   under-utilization.

3. The "spec §6 threshold for atlas is >30%" citation pointed at text
   that doesn't exist in the spec. Replaced with "A conventional
   rule-of-thumb" so a future reader doesn't chase a phantom citation.

Plus roadmap consistency:

M1: The N.6 slice 1 bullet now uses the canonical "✓ SHIPPED — Title.
   Shipped YYYY-MM-DD." prefix that every other shipped phase uses.
M2: Added N.6.1 row to the shipped table at the top of the roadmap
   (lines ~55-66) so the at-a-glance shipped list is complete.

None of these change the conclusion or the next-phase recommendation
(C.1.5 first, then reduced N.6 slice 2). The fixes improve doc accuracy
and future-readability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 12:43:35 +02:00
Erik
13abf96a5e docs(perf): Phase N.6 slice 1 — radius=12 baseline + surface dump path
Capture authoritative CPU+GPU dispatch numbers at Holtburg with the
gpu_us diagnostic now working (commit 25cb147). Three radii (4/8/12)
x two motion modes (standstill/walking) + a surface-format histogram
from ACDREAM_DUMP_SURFACES=1.

Adds env-gated one-shot dump path (TextureCache.TickSurfaceHistogramDumpIfEnabled,
called from GameWindow.OnRender) that fires once after both (a) frame
600 of the session AND (b) the upload-metadata dict reaches 100 entries
-- the cache-size gate prevents the dump from firing during pre-world
GUI ticks where OnRender spins at high rates but no scenery has streamed.
Output writes to %LOCALAPPDATA%\acdream\n6-surfaces.txt with a try/catch
around the I/O so disk-full / permission errors don't crash mid-measurement.

Baseline document at docs/plans/2026-05-11-phase-n6-perf-baseline.md
documents:
- CPU dominates GPU by 30-50x at every radius (strongly CPU-bound)
- GPU wildly under-utilized (max gpu_us p95 ~600us vs 16,600us frame budget)
- CPU scales superlinearly with N1 (Tier 1 cache wins on inner loop but
  not outer LB walk)
- Surface atlas opportunity high (59% of textures in top-3 triples) but
  win is memory-only since GPU isn't bottlenecked

Recommendation: C.1.5 (PES emitter wiring) next, then a reduced-scope
N.6 slice 2 (drop atlas + persistent-mapped buffers -- not justified by
the GPU under-utilization observed).

Roadmap entry amended to split N.6 into slice 1 (shipped) and slice 2
(planned, reduced scope, deferred until after C.1.5).

Spec: docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md.
Plan: docs/superpowers/plans/2026-05-11-phase-n6-slice1.md (Task 4).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 12:34:10 +02:00