acdream/docs/research/2026-05-10-phase-a5-handoff.md
Erik f7f88674e1 docs(A.5): cold-start handoff for the next session
Records what N.5b shipped, where the actual FPS bottleneck lives
(WbDrawDispatcher entity cull at ~4.3ms/frame, 86% of frame budget;
terrain dispatcher is now <1% of frame), and what A.5 has to do to
make the world look big without falling off a perf cliff.

Three concrete A.5 deliverables:
1. Two-tier streaming (near = full, far = terrain-only)
2. Per-LB entity bucketing in WbDrawDispatcher
3. Off-thread LandblockMesh.Build to avoid streaming hitches at higher
   radius

Eight brainstorm questions for the next session, plus acceptance
criteria, files-to-read list, and explicit "don't do" warnings (don't
raise STREAM_RADIUS without tiering in place; don't put scenery in
far tier without an impostor pipeline; don't break the N.5b conformance
sentinel; etc.).

User's stated goal verbatim: "great smooth HIGH fps visuals. Should
look great. As long as it scales and we get very high FPS." This
reframes priorities away from radius=5 micro-optimization toward
visual scale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 21:11:46 +02:00

376 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase A.5 — Two-tier Streaming + Horizon LOD — Cold-Start Handoff
**Created:** 2026-05-10, immediately after N.5b ship.
**Audience:** the next agent picking up streaming + horizon-LOD work.
**Purpose:** brief you on where N.5b left things, what A.5 actually has to do
to make the world look and feel great, and the load-bearing facts the
brainstorm should be informed by.
---
## TL;DR
N.5b just shipped: outdoor terrain rendering is on bindless + multi-draw
indirect via `TerrainModernRenderer`. Constant-cost dispatch as the
visible landblock count grows — radius=5 vs radius=15 are the same number
of GL calls for terrain.
**A.5's actual goal — verbatim from the user, 2026-05-09:**
> "I just want great smooth HIGH fps visuals. Should look great. As long
> as it scales and we get very high FPS"
That reframes priorities. We are NOT optimizing the inner loop at radius=5
(it's solved). We're scaling visual reach + scene density without the
client falling off a perf cliff.
**Concretely, A.5 ships three things:**
1. **Two-tier streaming.** Near tier (≤ N₁ landblocks) loads everything as
today (terrain + scenery + EnvCells + collision). Far tier (N₁ < r N₂)
loads terrain mesh ONLY. No scenery generation, no collision, no
entity registration for the far tier.
2. **Per-LB entity bucketing for the WB dispatcher.** Today the entity
dispatcher walks every loaded entity each frame for AABB cull
~16K entities @ ~1µs/test = 4.3ms/frame, dominating the frame budget.
Bucket entities by landblock so the cull is hierarchical: cull the LB
first, then only walk entities inside surviving LBs.
3. **Off-thread mesh build.** `LandblockMesh.Build` currently runs on the
render thread when a new LB streams in. At today's radius=5 this is
invisible; at A.5's higher N it becomes a visible frame-time spike
when 4-5 LBs stream simultaneously. Move the build to a worker pool;
hand finished `LandblockMeshData` back via a queue.
The headline win you're shooting for: **radius=15 sustains the user's
target FPS in Holtburg with no streaming hitches.**
---
## Where N.5b left things
### Branch state (relative to main)
After N.5b ships:
- N.5b SHIP at `08b7362` (final commit; appended SHIP record to plan)
- Roadmap entry, issue #51 closure, perf baseline doc all in place at `083c10c`
- Legacy `TerrainChunkRenderer` + `TerrainRenderer` + `terrain.vert/.frag`
deleted at `7dfa2af`. **The modern path is the only path.**
### Captured perf baseline (load-bearing for A.5's "what's actually hot")
From `docs/plans/2026-05-09-phase-n5b-perf-baseline.md`, measured
2026-05-09 at Holtburg town dueling field, radius=5, ~30s standstill:
| Subsystem | cpu_us median per frame | Notes |
|---|---|---|
| **Entity dispatcher** (`WbDrawDispatcher`) | **~4,300** | 86% of frame budget. ~16K entities walked for AABB cull. THIS is the bottleneck. |
| Terrain dispatcher (`TerrainModernRenderer`) | ~6.4 | <1% of frame. Constant-cost regardless of radius (proved in N.5b). |
| Everything else (sky, particles, ImGui, swap, audio) | ~700 | Small. |
**Actual FPS at radius=5 in Holtburg: ~200 fps** (frame time 5ms).
NOT the "810 fps" inferred from the N.5 ship doc (that was 1/dispatcher_ms,
which is only the WB dispatcher CPU cost in isolation, not real frame time).
### What naive radius increase does
If you simply raised `ACDREAM_STREAM_RADIUS` to 15 today without A.5:
- Loaded landblocks: 121 ~961 (8× more). Acceptable.
- Loaded entities: ~16K ~125K (linear scaling with LB count). **NOT
acceptable.** At ~1µs per AABB cull, the entity dispatcher would take
~125ms/frame = 8 FPS. Slideshow.
- Memory footprint: similar 8× explosion in scenery instance buffers.
So the perf cliff is real and immediate. A.5 has to address it BEFORE
the radius can be safely raised.
### What N.5b set up that A.5 inherits
- **Modern terrain dispatcher.** `TerrainModernRenderer` is O(1) GL calls
in radius. As you add far-tier LBs (terrain only), the terrain
dispatcher cost stays flat (~6µs/frame). This is the one subsystem
that doesn't need any A.5 work it just scales.
- **Slot allocator for terrain GPU buffers.** Already grows by power-of-two
doubling. Will absorb radius=15 (~961 slots × ~15 KB each = ~14 MB)
without manual tuning.
- **`[TERRAIN-DIAG]` instrumentation.** Reports per-frame median + p95 in
microseconds. Use this to confirm A.5 doesn't regress terrain perf.
- **Conformance sentinel.** `TerrainModernConformanceTests` proves visual
mesh Z agrees with `TerrainSurface.SampleZFromHeightmap` to 0.015 mm.
Don't break this physics visual agreement must hold across both
tiers.
- **Bindless atlas.** `TerrainAtlas.GetBindlessHandles()`. The far tier
shares the atlas (it's region-wide). Zero atlas-related per-LB cost.
---
## The brainstorm questions (the hard calls A.5 has to make)
These are the questions to resolve in the brainstorm step. Bring them to
the user with options + recommendation; don't prejudge.
### 1. Tier radii: what are N₁ and N₂?
- **N₁** = near-tier radius (everything loads). Today's default `STREAM_RADIUS`.
Probably stays at 5 (or maybe 4; maybe 3).
- **N₂** = far-tier radius (terrain mesh only). Could be 8, 12, 15, 20.
Tradeoffs: bigger N = more world visible = looks better. But each far-tier
LB still costs ~16 KB GPU memory + a frustum cull AABB + a slot allocation.
At N₂=15, that's ~961 LBs × 16 KB = ~15 MB GPU mem (cheap) + ~961 cull
tests (cheap, ~1ms total at 1µs each and we'll do this per-LB cull
anyway as part of #2 below).
Verify against retail: cdb attach + check how many landblocks retail keeps
loaded at a given vantage point. Probably around 10-12 per the AC2D
references and the holtburger client's behavior.
### 2. Far tier: terrain only? Or also impostor scenery?
Two options:
- **Terrain only** (cleanest). Beyond N₁, no trees, no rocks. Skyline is the
terrain mesh against the sky.
- **Impostor scenery** (more retail-like). Beyond N₁, generate flat
billboards or low-poly trees instead of full meshes. Adds substantial
complexity (billboard pipeline, mesh-LOD generation, per-camera-angle
rotation).
Recommendation: start with terrain-only. Add impostors only if the
horizon looks wrong (too bare). Retail definitely has SOME distant
scenery but the cutoff is gradual; we can match it later if needed.
### 3. Entity bucketing structure
Today: `WbDrawDispatcher` keeps a flat dictionary of all entities and
walks all of them per frame. To bucket by LB, we need:
- A `Dictionary<uint, List<EntityHandle>>` keyed by landblock ID
- On `AddEntity(...)`, also stash it in the LB bucket (the spawn flow
already knows the LB context)
- On `RemoveEntity(...)`, remove from the LB bucket too
- Per frame: cull at LB granularity first; then cull entities only inside
surviving LBs
LB-level AABBs are already computed (per the existing `_visibleSlots`
logic in `TerrainModernRenderer` the same AABB applies to entities,
modulo a Z-range bump for trees/buildings).
Open question: do entities outside a known LB exist? (Items dropped on the
ground? Ephemeral effects? Player projectiles?) If yes, they need a
fallback "unknown LB" bucket that's still walked every frame. Probably
small.
### 4. Where does the off-thread mesh build land?
Today `LandblockMesh.Build` runs synchronously inside `OnLandblockLoaded`
on the render thread. To move it off:
- `StreamingLoader` worker thread (already async for dat reads) signals
"LB X is ready"
- A new worker pool consumes that signal, builds the mesh on a worker
thread, posts the finished `LandblockMeshData` to a `ConcurrentQueue`
- Render thread drains the queue at the start of each frame, calling
`_terrain.AddLandblock(...)` for each ready mesh
Gotcha: the `TerrainBlendingContext` is shared. Need to confirm it's
read-only (it is built once at startup). Also `_surfaceCache`
currently a plain `Dictionary` populated lazily by `TerrainBlending.BuildSurface`.
Either lock it, replace with `ConcurrentDictionary`, or pre-populate with
all known palCodes at startup.
### 5. Streaming hysteresis at the tier boundary
When the player crosses N near-tier shrinks, far-tier grows.
LBs that were near-tier need to:
- Drop their scenery (unregister entities)
- Drop their EnvCells
- Keep the terrain mesh (still in far tier)
When the player crosses back: the LB needs scenery + EnvCells re-loaded.
Hysteresis (don't churn at the exact boundary) is needed.
The streaming loader already has hysteresis for full LB load/unload. A.5
extends that: a separate hysteresis radius for the scenery/entity layer.
### 6. Visual quality wins to ride along
A.5 is the natural place to land 2-3 nearly-free quality wins:
- **Mipmapped terrain atlas + anisotropic 16x.** Today the atlas is
`GL_LINEAR` no mipmaps; distant terrain shimmers. ~half-day fix.
Big visible improvement at far tier.
- **Tree alpha-test alpha-to-coverage with MSAA.** Today tree edges are
binary cutoff and pixel-edged. A2C with MSAA fixes them. ~one day.
- **Correct depth-write for transparent foliage.** Some scenery passes
may be writing depth incorrectly; confirm + fix.
These are not strictly required for A.5 to ship, but they amplify the
"looks great" payoff.
### 7. Acceptance metrics
The user's goal is "smooth + high FPS + great-looking + scales." Pin
this concretely:
- Target FPS at radius (whatever final N + N₂): user's monitor refresh
(probably 144 or 240 Hz). Capture before/after numbers in a perf
baseline doc parallel to N.5b's.
- No frame-time spikes > 5ms during streaming (record a 60-second
trace running through Holtburg → North Yanshi).
- Visual horizon visible at the new N₂. Capture screenshots from the
same vantage point at the start of A.5 (before) and at ship (after)
for the SHIP record.
### 8. What's NOT in A.5
A.5 does not need to ship:
- GPU-side culling (compute-shader cull). Bigger lift; N.6 territory.
- Persistent-mapped indirect buffer. N.6 territory.
- Sky / particles / EnvCells migration. Separate N.7+ phases.
- Shadow mapping. Separate visual phase.
Don't let scope creep pull these in.
---
## Files to read before brainstorming
In rough order of relevance:
1. **`docs/research/2026-05-09-phase-n5b-handoff.md`** — N.5b's handoff
(read for context on what was just shipped + the structure of these
handoff docs).
2. **`docs/plans/2026-05-09-phase-n5b-perf-baseline.md`** — captured
perf numbers + the architectural reasoning for what A.5 inherits.
3. **`memory/project_phase_n5b_state.md`** — three high-value gotchas
captured during N.5b (especially #1: bindless uniform-sampler driver
quirk; A.5 won't directly need this, but it's the prior art for any
new shader code in the phase).
4. **`docs/plans/2026-04-11-roadmap.md`** A.5 entry — the original A.5
description.
5. **The streaming loader**`src/AcDream.Core/World/StreamingLoader.cs`
(or wherever it lives; grep for `OnLandblockLoaded`). Understand the
existing ring + hysteresis logic before extending it.
6. **WB dispatcher entity flow**
`src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` lines covering
`Draw` (the per-entity walk) and `EntitySpawnAdapter` (where entities
get registered). The bucketing change lands here.
7. **`LandblockMesh.Build`** — `src/AcDream.Core/Terrain/LandblockMesh.cs`.
Its inputs (heightmap, ctx, surfaceCache) determine what the worker
thread needs. ~150 lines.
8. **WB's `SceneryRenderManager`**
`references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/SceneryRenderManager.cs`.
Has a render-distance cap; informs N₁ vs N₂ defaults.
9. **`TerrainModernRenderer`** —
`src/AcDream.App/Rendering/TerrainModernRenderer.cs`. Don't modify;
confirm the slot allocator handles radius=15 cleanly.
---
## Acceptance criteria for the whole phase
1. Build green; existing tests stay green; N.5b's conformance sentinel
still passes (visual mesh Z = TerrainSurface Z within 1mm).
2. **Far-tier LBs render terrain visibly past N₁** in user-driven visual
verification.
3. **Per-frame entity-dispatcher cpu_us at radius=N₁ drops** vs today
(the bucketing should help even at the current radius).
4. **Per-frame entity-dispatcher cpu_us at radius (N₁+N₂) is bounded**
— does NOT scale linearly with total loaded LBs. Specifically:
bucketed cull should be < 1.5× today's cost despite far-tier LBs
loading.
5. **No streaming hitch > 5ms** when running at run-speed across N₁/N
tier boundaries simultaneously (capture a 60s trace).
6. **`[TERRAIN-DIAG]` cpu_us stays flat** as N grows the terrain
dispatcher proven O(1) (regression check).
7. Visual identity at near-tier (no scenery missing inside N₁; no
z-fighting; no cell-boundary wobble N.5b sentinel still applies).
8. SHIP record + perf baseline + memory entry written, mirroring N.5b's
pattern.
---
## What you'll be doing in the first 30 minutes
1. Read this handoff in full.
2. Read `docs/research/2026-05-09-phase-n5b-handoff.md` for the structural
pattern.
3. Read `docs/plans/2026-05-09-phase-n5b-perf-baseline.md` for the captured
numbers A.5 inherits.
4. Read `memory/project_phase_n5b_state.md` for gotchas.
5. Verify build is green: `dotnet build`.
6. Verify N.5b ship is intact: `dotnet test --filter "FullyQualifiedName~TerrainSlot|FullyQualifiedName~TerrainModernConformance|FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless"` (target 114 passing, 0 failures).
7. Capture a baseline radius=5 frame trace yourself (one launch, 30s
standstill at Holtburg dueling field) so you have a "before" number
in your own measurement environment, not just trusting N.5b's number.
8. Invoke `superpowers:brainstorming` with the user. Walk through the
8 brainstorm questions above. Present each with options + my
recommendation; don't prejudge.
9. After agreement, write the spec; then the plan; then execute
task-by-task using `superpowers:subagent-driven-development`.
Don't skip the brainstorm. The N₁/N values, the bucketing structure
trade-offs, and the worker-thread design are real decisions with
downstream consequences that need user input not "the agent makes a
call and goes."
---
## Things to NOT do
- **Don't raise `ACDREAM_STREAM_RADIUS` without A.5's tiered loading
in place.** The entity-cull cliff is immediate and severe (8 FPS at
naive radius=15).
- **Don't put scenery in the far tier just to "look more retail" without
a billboard/impostor pipeline.** Full-detail scenery in the far tier
is what causes the cull cliff.
- **Don't move `LandblockMesh.Build` to a worker thread without first
auditing `TerrainBlendingContext` + `_surfaceCache` for thread
safety.** Concurrent writes to the surfaceCache will produce
silently-wrong terrain blending.
- **Don't break the N.5b conformance sentinel.** If A.5 changes how
meshes are built (e.g., for the worker thread), the conformance
test must still pass it's the load-bearing physics visual Z
agreement guard.
- **Don't bundle GPU-side culling, persistent-mapped buffers, or shadow
mapping into A.5.** Those are N.6+ territory; A.5 is "make the world
look big and not stutter."
- **Don't ship without honest perf numbers.** If A.5 doesn't actually
hit its FPS target, document why and ship N.6 next instead of
papering over it. The N.5b precedent is honest reporting.
- **Don't skip the visual verification gate.** Same lesson from N.5b's
black-terrain regression: "go" doesn't mean "verified." User must
actually launch the client at radius=N₂ and confirm the horizon
looks great + FPS hits target.
---
## Reference: where the FPS budget actually goes today
For brainstorming purposes, the per-frame breakdown at radius=5 / Holtburg
(real measurement, 2026-05-09):
```
~5,000 µs total frame time (= 200 fps)
├── 4,300 µs WbDrawDispatcher entity cull + dispatch ← THE BOTTLENECK
│ ~16K entity AABB tests / frame
│ A.5's entity bucketing attacks this directly
├── 6 µs TerrainModernRenderer
│ O(1) in radius. Won't grow with A.5. Already solved.
├── ~700 µs Sky, particles, ImGui, audio, swap-buffers, misc
│ Mostly fixed cost; some VSync-related
└── rest GPU side (we don't measure this — query plumbing
deferred to N.6). Could be substantial.
```
The first action of A.5 is to recognize that the perf claim "810 fps"
from N.5 was misleading. Don't repeat the mistake measure the actual
frame time, not just one subsystem.
---
Good luck. The phase is meaty (~2 weeks) but the structural work is
well-shaped: tiered streaming has clear boundaries, entity bucketing is
an isolated dispatcher change, off-thread mesh build is a well-understood
worker pattern. The hard call is the N₁/N values, and that's a
brainstorm question bring it to the user with data.