Final code review of slice 1 flagged one Important issue (the spec's
"zero cost when off" claim for the surface-dump path is technically
violated — _uploadMetadata always writes one dict entry per upload
regardless of env var) plus minor doc/consistency gaps. Applied:
1. Spec §5 "Cost when off": dropped the "Zero" claim; replaced with
"Negligible — one Dictionary write per upload (~30-50 KB at Holtburg)
plus a hash-table write per upload. Expensive work (file I/O,
histogram construction) is still env-gated." This matches reality.
2. Baseline doc §5: rewrote from "Raw logs (scratch, can be deleted)"
referencing files that were never preserved in this worktree, to
"Reproducing the measurements" with the actual PowerShell launch
commands. Honest about the raw logs not being kept; the captured
medians in section 2 are the canonical record.
3. New issue #55 filed in docs/ISSUES.md — static-entity slow path
reports ~1.45M meshMissing/5s at r4 standstill, drops to ~0 when
walking. LOW severity (no visible regression), hypothesis points
at a "permanently-missing entity gets re-classified every frame"
pattern that Tier 1 cache doesn't cover.
4. Roadmap shipped table: renamed "N.6.1" row to "N.6 slice 1" to
match every other artifact's naming. Search-discoverability fix.
None of these change the slice's conclusion or next-phase
recommendation (C.1.5 first, then reduced-scope slice 2).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-quality review on commit 13abf96 flagged 3 Important issues in
the baseline document plus 2 minor roadmap consistency gaps. Applied
all of them:
1. The "CPU scales superlinearly with N₁" claim was imprecise because
CPU growth (4.0×) is actually sublinear vs near-LB count (7.7×).
Clarified: CPU grows more than linearly with radius N₁ but
sublinearly with visible-LB count; frustum cull discards most far
LBs early. The outer per-LB walk still scales with N₁, which is
what Tier 2's persistent groups address.
2. The "40-50% memory footprint reduction from atlas packing" estimate
was asserted without derivation and likely too optimistic given all
surfaces are already power-of-two and same-format (RGBA8). Replaced
with a more honest bound: "low-MB to ~10 MB absolute saving" with
explicit per-array metadata overhead reasoning. Conclusion is
unchanged — atlas adoption still isn't justified given GPU
under-utilization.
3. The "spec §6 threshold for atlas is >30%" citation pointed at text
that doesn't exist in the spec. Replaced with "A conventional
rule-of-thumb" so a future reader doesn't chase a phantom citation.
Plus roadmap consistency:
M1: The N.6 slice 1 bullet now uses the canonical "✓ SHIPPED — Title.
Shipped YYYY-MM-DD." prefix that every other shipped phase uses.
M2: Added N.6.1 row to the shipped table at the top of the roadmap
(lines ~55-66) so the at-a-glance shipped list is complete.
None of these change the conclusion or the next-phase recommendation
(C.1.5 first, then reduced N.6 slice 2). The fixes improve doc accuracy
and future-readability.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture authoritative CPU+GPU dispatch numbers at Holtburg with the
gpu_us diagnostic now working (commit 25cb147). Three radii (4/8/12)
x two motion modes (standstill/walking) + a surface-format histogram
from ACDREAM_DUMP_SURFACES=1.
Adds env-gated one-shot dump path (TextureCache.TickSurfaceHistogramDumpIfEnabled,
called from GameWindow.OnRender) that fires once after both (a) frame
600 of the session AND (b) the upload-metadata dict reaches 100 entries
-- the cache-size gate prevents the dump from firing during pre-world
GUI ticks where OnRender spins at high rates but no scenery has streamed.
Output writes to %LOCALAPPDATA%\acdream\n6-surfaces.txt with a try/catch
around the I/O so disk-full / permission errors don't crash mid-measurement.
Baseline document at docs/plans/2026-05-11-phase-n6-perf-baseline.md
documents:
- CPU dominates GPU by 30-50x at every radius (strongly CPU-bound)
- GPU wildly under-utilized (max gpu_us p95 ~600us vs 16,600us frame budget)
- CPU scales superlinearly with N1 (Tier 1 cache wins on inner loop but
not outer LB walk)
- Surface atlas opportunity high (59% of textures in top-3 triples) but
win is memory-only since GPU isn't bottlenecked
Recommendation: C.1.5 (PES emitter wiring) next, then a reduced-scope
N.6 slice 2 (drop atlas + persistent-mapped buffers -- not justified by
the GPU under-utilization observed).
Roadmap entry amended to split N.6 into slice 1 (shipped) and slice 2
(planned, reduced scope, deferred until after C.1.5).
Spec: docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md.
Plan: docs/superpowers/plans/2026-05-11-phase-n6-slice1.md (Task 4).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>