# Phase A.5 — Two-tier Streaming + Horizon LOD — Cold-Start Handoff **Created:** 2026-05-10, immediately after N.5b ship. **Audience:** the next agent picking up streaming + horizon-LOD work. **Purpose:** brief you on where N.5b left things, what A.5 actually has to do to make the world look and feel great, and the load-bearing facts the brainstorm should be informed by. --- ## TL;DR N.5b just shipped: outdoor terrain rendering is on bindless + multi-draw indirect via `TerrainModernRenderer`. Constant-cost dispatch as the visible landblock count grows — radius=5 vs radius=15 are the same number of GL calls for terrain. **A.5's actual goal — verbatim from the user, 2026-05-09:** > "I just want great smooth HIGH fps visuals. Should look great. As long > as it scales and we get very high FPS" That reframes priorities. We are NOT optimizing the inner loop at radius=5 (it's solved). We're scaling visual reach + scene density without the client falling off a perf cliff. **Concretely, A.5 ships three things:** 1. **Two-tier streaming.** Near tier (≤ N₁ landblocks) loads everything as today (terrain + scenery + EnvCells + collision). Far tier (N₁ < r ≤ N₂) loads terrain mesh ONLY. No scenery generation, no collision, no entity registration for the far tier. 2. **Per-LB entity bucketing for the WB dispatcher.** Today the entity dispatcher walks every loaded entity each frame for AABB cull — ~16K entities @ ~1µs/test = 4.3ms/frame, dominating the frame budget. Bucket entities by landblock so the cull is hierarchical: cull the LB first, then only walk entities inside surviving LBs. 3. **Off-thread mesh build.** `LandblockMesh.Build` currently runs on the render thread when a new LB streams in. At today's radius=5 this is invisible; at A.5's higher N₂ it becomes a visible frame-time spike when 4-5 LBs stream simultaneously. Move the build to a worker pool; hand finished `LandblockMeshData` back via a queue. The headline win you're shooting for: **radius=15 sustains the user's target FPS in Holtburg with no streaming hitches.** --- ## Where N.5b left things ### Branch state (relative to main) After N.5b ships: - N.5b SHIP at `08b7362` (final commit; appended SHIP record to plan) - Roadmap entry, issue #51 closure, perf baseline doc all in place at `083c10c` - Legacy `TerrainChunkRenderer` + `TerrainRenderer` + `terrain.vert/.frag` deleted at `7dfa2af`. **The modern path is the only path.** ### Captured perf baseline (load-bearing for A.5's "what's actually hot") From `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`, measured 2026-05-09 at Holtburg town dueling field, radius=5, ~30s standstill: | Subsystem | cpu_us median per frame | Notes | |---|---|---| | **Entity dispatcher** (`WbDrawDispatcher`) | **~4,300** | 86% of frame budget. ~16K entities walked for AABB cull. THIS is the bottleneck. | | Terrain dispatcher (`TerrainModernRenderer`) | ~6.4 | <1% of frame. Constant-cost regardless of radius (proved in N.5b). | | Everything else (sky, particles, ImGui, swap, audio) | ~700 | Small. | **Actual FPS at radius=5 in Holtburg: ~200 fps** (frame time ≈ 5ms). NOT the "810 fps" inferred from the N.5 ship doc (that was 1/dispatcher_ms, which is only the WB dispatcher CPU cost in isolation, not real frame time). ### What naive radius increase does If you simply raised `ACDREAM_STREAM_RADIUS` to 15 today without A.5: - Loaded landblocks: 121 → ~961 (8× more). Acceptable. - Loaded entities: ~16K → ~125K (linear scaling with LB count). **NOT acceptable.** At ~1µs per AABB cull, the entity dispatcher would take ~125ms/frame = 8 FPS. Slideshow. - Memory footprint: similar 8× explosion in scenery instance buffers. So the perf cliff is real and immediate. A.5 has to address it BEFORE the radius can be safely raised. ### What N.5b set up that A.5 inherits - **Modern terrain dispatcher.** `TerrainModernRenderer` is O(1) GL calls in radius. As you add far-tier LBs (terrain only), the terrain dispatcher cost stays flat (~6µs/frame). This is the one subsystem that doesn't need any A.5 work — it just scales. - **Slot allocator for terrain GPU buffers.** Already grows by power-of-two doubling. Will absorb radius=15 (~961 slots × ~15 KB each = ~14 MB) without manual tuning. - **`[TERRAIN-DIAG]` instrumentation.** Reports per-frame median + p95 in microseconds. Use this to confirm A.5 doesn't regress terrain perf. - **Conformance sentinel.** `TerrainModernConformanceTests` proves visual mesh Z agrees with `TerrainSurface.SampleZFromHeightmap` to 0.015 mm. Don't break this — physics ↔ visual agreement must hold across both tiers. - **Bindless atlas.** `TerrainAtlas.GetBindlessHandles()`. The far tier shares the atlas (it's region-wide). Zero atlas-related per-LB cost. --- ## The brainstorm questions (the hard calls A.5 has to make) These are the questions to resolve in the brainstorm step. Bring them to the user with options + recommendation; don't prejudge. ### 1. Tier radii: what are N₁ and N₂? - **N₁** = near-tier radius (everything loads). Today's default `STREAM_RADIUS`. Probably stays at 5 (or maybe 4; maybe 3). - **N₂** = far-tier radius (terrain mesh only). Could be 8, 12, 15, 20. Tradeoffs: bigger N₂ = more world visible = looks better. But each far-tier LB still costs ~16 KB GPU memory + a frustum cull AABB + a slot allocation. At N₂=15, that's ~961 LBs × 16 KB = ~15 MB GPU mem (cheap) + ~961 cull tests (cheap, ~1ms total at 1µs each — and we'll do this per-LB cull anyway as part of #2 below). Verify against retail: cdb attach + check how many landblocks retail keeps loaded at a given vantage point. Probably around 10-12 per the AC2D references and the holtburger client's behavior. ### 2. Far tier: terrain only? Or also impostor scenery? Two options: - **Terrain only** (cleanest). Beyond N₁, no trees, no rocks. Skyline is the terrain mesh against the sky. - **Impostor scenery** (more retail-like). Beyond N₁, generate flat billboards or low-poly trees instead of full meshes. Adds substantial complexity (billboard pipeline, mesh-LOD generation, per-camera-angle rotation). Recommendation: start with terrain-only. Add impostors only if the horizon looks wrong (too bare). Retail definitely has SOME distant scenery but the cutoff is gradual; we can match it later if needed. ### 3. Entity bucketing structure Today: `WbDrawDispatcher` keeps a flat dictionary of all entities and walks all of them per frame. To bucket by LB, we need: - A `Dictionary>` keyed by landblock ID - On `AddEntity(...)`, also stash it in the LB bucket (the spawn flow already knows the LB context) - On `RemoveEntity(...)`, remove from the LB bucket too - Per frame: cull at LB granularity first; then cull entities only inside surviving LBs LB-level AABBs are already computed (per the existing `_visibleSlots` logic in `TerrainModernRenderer` — the same AABB applies to entities, modulo a Z-range bump for trees/buildings). Open question: do entities outside a known LB exist? (Items dropped on the ground? Ephemeral effects? Player projectiles?) If yes, they need a fallback "unknown LB" bucket that's still walked every frame. Probably small. ### 4. Where does the off-thread mesh build land? Today `LandblockMesh.Build` runs synchronously inside `OnLandblockLoaded` on the render thread. To move it off: - `StreamingLoader` worker thread (already async for dat reads) signals "LB X is ready" - A new worker pool consumes that signal, builds the mesh on a worker thread, posts the finished `LandblockMeshData` to a `ConcurrentQueue` - Render thread drains the queue at the start of each frame, calling `_terrain.AddLandblock(...)` for each ready mesh Gotcha: the `TerrainBlendingContext` is shared. Need to confirm it's read-only (it is — built once at startup). Also `_surfaceCache` — currently a plain `Dictionary` populated lazily by `TerrainBlending.BuildSurface`. Either lock it, replace with `ConcurrentDictionary`, or pre-populate with all known palCodes at startup. ### 5. Streaming hysteresis at the tier boundary When the player crosses N₁ → near-tier shrinks, far-tier grows. LBs that were near-tier need to: - Drop their scenery (unregister entities) - Drop their EnvCells - Keep the terrain mesh (still in far tier) When the player crosses back: the LB needs scenery + EnvCells re-loaded. Hysteresis (don't churn at the exact boundary) is needed. The streaming loader already has hysteresis for full LB load/unload. A.5 extends that: a separate hysteresis radius for the scenery/entity layer. ### 6. Visual quality wins to ride along A.5 is the natural place to land 2-3 nearly-free quality wins: - **Mipmapped terrain atlas + anisotropic 16x.** Today the atlas is `GL_LINEAR` no mipmaps; distant terrain shimmers. ~half-day fix. Big visible improvement at far tier. - **Tree alpha-test → alpha-to-coverage with MSAA.** Today tree edges are binary cutoff and pixel-edged. A2C with MSAA fixes them. ~one day. - **Correct depth-write for transparent foliage.** Some scenery passes may be writing depth incorrectly; confirm + fix. These are not strictly required for A.5 to ship, but they amplify the "looks great" payoff. ### 7. Acceptance metrics The user's goal is "smooth + high FPS + great-looking + scales." Pin this concretely: - Target FPS at radius (whatever final N₁ + N₂): ≥ user's monitor refresh (probably 144 or 240 Hz). Capture before/after numbers in a perf baseline doc parallel to N.5b's. - No frame-time spikes > 5ms during streaming (record a 60-second trace running through Holtburg → North Yanshi). - Visual horizon visible at the new N₂. Capture screenshots from the same vantage point at the start of A.5 (before) and at ship (after) for the SHIP record. ### 8. What's NOT in A.5 A.5 does not need to ship: - GPU-side culling (compute-shader cull). Bigger lift; N.6 territory. - Persistent-mapped indirect buffer. N.6 territory. - Sky / particles / EnvCells migration. Separate N.7+ phases. - Shadow mapping. Separate visual phase. Don't let scope creep pull these in. --- ## Files to read before brainstorming In rough order of relevance: 1. **`docs/research/2026-05-09-phase-n5b-handoff.md`** — N.5b's handoff (read for context on what was just shipped + the structure of these handoff docs). 2. **`docs/plans/2026-05-09-phase-n5b-perf-baseline.md`** — captured perf numbers + the architectural reasoning for what A.5 inherits. 3. **`memory/project_phase_n5b_state.md`** — three high-value gotchas captured during N.5b (especially #1: bindless uniform-sampler driver quirk; A.5 won't directly need this, but it's the prior art for any new shader code in the phase). 4. **`docs/plans/2026-04-11-roadmap.md`** A.5 entry — the original A.5 description. 5. **The streaming loader** — `src/AcDream.Core/World/StreamingLoader.cs` (or wherever it lives; grep for `OnLandblockLoaded`). Understand the existing ring + hysteresis logic before extending it. 6. **WB dispatcher entity flow** — `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` lines covering `Draw` (the per-entity walk) and `EntitySpawnAdapter` (where entities get registered). The bucketing change lands here. 7. **`LandblockMesh.Build`** — `src/AcDream.Core/Terrain/LandblockMesh.cs`. Its inputs (heightmap, ctx, surfaceCache) determine what the worker thread needs. ~150 lines. 8. **WB's `SceneryRenderManager`** — `references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/SceneryRenderManager.cs`. Has a render-distance cap; informs N₁ vs N₂ defaults. 9. **`TerrainModernRenderer`** — `src/AcDream.App/Rendering/TerrainModernRenderer.cs`. Don't modify; confirm the slot allocator handles radius=15 cleanly. --- ## Acceptance criteria for the whole phase 1. Build green; existing tests stay green; N.5b's conformance sentinel still passes (visual mesh Z = TerrainSurface Z within 1mm). 2. **Far-tier LBs render terrain visibly past N₁** in user-driven visual verification. 3. **Per-frame entity-dispatcher cpu_us at radius=N₁ drops** vs today (the bucketing should help even at the current radius). 4. **Per-frame entity-dispatcher cpu_us at radius (N₁+N₂) is bounded** — does NOT scale linearly with total loaded LBs. Specifically: bucketed cull should be < 1.5× today's cost despite far-tier LBs loading. 5. **No streaming hitch > 5ms** when running at run-speed across N₁/N₂ tier boundaries simultaneously (capture a 60s trace). 6. **`[TERRAIN-DIAG]` cpu_us stays flat** as N₂ grows — the terrain dispatcher proven O(1) (regression check). 7. Visual identity at near-tier (no scenery missing inside N₁; no z-fighting; no cell-boundary wobble — N.5b sentinel still applies). 8. SHIP record + perf baseline + memory entry written, mirroring N.5b's pattern. --- ## What you'll be doing in the first 30 minutes 1. Read this handoff in full. 2. Read `docs/research/2026-05-09-phase-n5b-handoff.md` for the structural pattern. 3. Read `docs/plans/2026-05-09-phase-n5b-perf-baseline.md` for the captured numbers A.5 inherits. 4. Read `memory/project_phase_n5b_state.md` for gotchas. 5. Verify build is green: `dotnet build`. 6. Verify N.5b ship is intact: `dotnet test --filter "FullyQualifiedName~TerrainSlot|FullyQualifiedName~TerrainModernConformance|FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless"` (target ≥114 passing, 0 failures). 7. Capture a baseline radius=5 frame trace yourself (one launch, 30s standstill at Holtburg dueling field) so you have a "before" number in your own measurement environment, not just trusting N.5b's number. 8. Invoke `superpowers:brainstorming` with the user. Walk through the 8 brainstorm questions above. Present each with options + my recommendation; don't prejudge. 9. After agreement, write the spec; then the plan; then execute task-by-task using `superpowers:subagent-driven-development`. Don't skip the brainstorm. The N₁/N₂ values, the bucketing structure trade-offs, and the worker-thread design are real decisions with downstream consequences that need user input — not "the agent makes a call and goes." --- ## Things to NOT do - **Don't raise `ACDREAM_STREAM_RADIUS` without A.5's tiered loading in place.** The entity-cull cliff is immediate and severe (8 FPS at naive radius=15). - **Don't put scenery in the far tier just to "look more retail" without a billboard/impostor pipeline.** Full-detail scenery in the far tier is what causes the cull cliff. - **Don't move `LandblockMesh.Build` to a worker thread without first auditing `TerrainBlendingContext` + `_surfaceCache` for thread safety.** Concurrent writes to the surfaceCache will produce silently-wrong terrain blending. - **Don't break the N.5b conformance sentinel.** If A.5 changes how meshes are built (e.g., for the worker thread), the conformance test must still pass — it's the load-bearing physics ↔ visual Z agreement guard. - **Don't bundle GPU-side culling, persistent-mapped buffers, or shadow mapping into A.5.** Those are N.6+ territory; A.5 is "make the world look big and not stutter." - **Don't ship without honest perf numbers.** If A.5 doesn't actually hit its FPS target, document why and ship N.6 next instead of papering over it. The N.5b precedent is honest reporting. - **Don't skip the visual verification gate.** Same lesson from N.5b's black-terrain regression: "go" doesn't mean "verified." User must actually launch the client at radius=N₂ and confirm the horizon looks great + FPS hits target. --- ## Reference: where the FPS budget actually goes today For brainstorming purposes, the per-frame breakdown at radius=5 / Holtburg (real measurement, 2026-05-09): ``` ~5,000 µs total frame time (= 200 fps) ├── 4,300 µs WbDrawDispatcher entity cull + dispatch ← THE BOTTLENECK │ ~16K entity AABB tests / frame │ A.5's entity bucketing attacks this directly ├── 6 µs TerrainModernRenderer │ O(1) in radius. Won't grow with A.5. Already solved. ├── ~700 µs Sky, particles, ImGui, audio, swap-buffers, misc │ Mostly fixed cost; some VSync-related └── rest GPU side (we don't measure this — query plumbing deferred to N.6). Could be substantial. ``` The first action of A.5 is to recognize that the perf claim "810 fps" from N.5 was misleading. Don't repeat the mistake — measure the actual frame time, not just one subsystem. --- Good luck. The phase is meaty (~2 weeks) but the structural work is well-shaped: tiered streaming has clear boundaries, entity bucketing is an isolated dispatcher change, off-thread mesh build is a well-understood worker pattern. The hard call is the N₁/N₂ values, and that's a brainstorm question — bring it to the user with data.