Captured 2026-05-10 during Phase A.5 polish discussion. User asked why
the 9070 XT @ 1440p doesn't hit Unreal-level FPS for an old game like
AC. Answer: architectural — we rebuild the entire draw plan from
scratch every frame instead of caching pre-baked static-world data.
Tier 1 (entity-classification cache) lands as A.5 polish (separate
commit). Tiers 2 + 3 documented here for future scheduling:
- Tier 2 — Static/dynamic split with persistent groups
~2-week phase. Static entities (~95% of world) get permanent GPU-
resident matrix slots, populated at spawn, dirty-tracked for delta
upload. Per-frame CPU cost for static = LB-cull + dirty-flag check
only. Estimated entity dispatcher: 3.5ms → 0.5-1ms median.
400-600 FPS at standstill, radius=12.
- Tier 3 — GPU-side culling (compute pre-pass)
~1-month phase. Per-instance frustum cull moves to GPU compute
shader. Compute writes draw-indirect buffer; rasterizer reads it.
Estimated CPU dispatcher: ~0.05ms (essentially free).
600-1000+ FPS at standstill, radius=12.
Doc captures effort estimates, sub-decisions, risks, mitigations, and
scheduling triggers for each tier. Also notes the architectural
ceiling (~800-1500 FPS for a C# + GL client; reaching native engine
performance requires becoming a different engine).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>