docs(perf): Phase N.6 slice 1 — spec for gpu_us fix + radius=12 baseline
Brainstormed design for the first slice of Phase N.6 (perf polish). Slice 1 ships two commits: (1) fix the GPU timing query double-buffering in WbDrawDispatcher (cross-vendor ring of 3, read-before-overwrite), (2) add an env-gated surface-format histogram dump + capture the radius=12 perf baseline at Holtburg. Slice 2 (TextureCache cleanup + shader migration + optional persistent-mapped buffers) is deferred until after C.1.5 (PES emitter wiring), with the next-phase decision to be made on the baseline numbers slice 1 produces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
175ad14f8b
commit
05d590cd54
1 changed files with 335 additions and 0 deletions
335
docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md
Normal file
335
docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md
Normal file
|
|
@ -0,0 +1,335 @@
|
|||
# Phase N.6 slice 1 — GPU timing fix + radius=12 perf baseline (design)
|
||||
|
||||
**Created:** 2026-05-11.
|
||||
**Status:** approved design, ready for implementation plan.
|
||||
**Phase context:** Phase N.6 (perf polish) split into two slices on 2026-05-11 — this is slice 1. Slice 2 (legacy `TextureCache` cleanup + shader migration + optional persistent-mapped buffers) is deferred until after C.1.5 (PES emitter wiring), and gets its own spec then.
|
||||
**Roadmap entry:** [docs/plans/2026-04-11-roadmap.md](../../plans/2026-04-11-roadmap.md) lines 690-705 (to be amended in commit 2 to reflect the slice split).
|
||||
|
||||
---
|
||||
|
||||
## §1. Problem
|
||||
|
||||
`WbDrawDispatcher` runs `glBeginQuery(GL_TIME_ELAPSED, …) … glEndQuery` around the opaque and transparent indirect draws, then immediately polls `glGetQueryObject(…, ResultAvailable, …)` **on the same frame** to read the result. The GPU has not finished executing the draw by the time the polling call runs, so `avail` is always 0, the sample is dropped, and the `_gpuSamples` ring stays all-zero forever. The user sees `gpu_us=0m/0p95` in every `[WB-DIAG]` line under `ACDREAM_WB_DIAG=1`.
|
||||
|
||||
Verified at [src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs:849-859](../../../src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs#L849).
|
||||
|
||||
Without this fix:
|
||||
- Every future perf decision (Tier 2 vs Tier 3 vs slice 2 vs do-nothing) is made on CPU-only data.
|
||||
- We cannot tell whether the dispatcher is CPU-bound or GPU-bound at radius=12.
|
||||
- We cannot validate that N.5/N.5b/Tier 1 changes actually moved GPU time.
|
||||
|
||||
This slice ships the GPU-timing fix and uses the now-working diagnostic to produce one authoritative perf baseline document so the next phase decision (slice 2 vs C.1.5 vs Tier 2/3) is data-driven.
|
||||
|
||||
---
|
||||
|
||||
## §2. Goals and non-goals
|
||||
|
||||
### Goals
|
||||
|
||||
1. `[WB-DIAG]` reports non-zero `gpu_us` for the entity dispatcher's opaque+transparent passes at Holtburg radius=12 with `ACDREAM_WB_DIAG=1`.
|
||||
2. The fix works on AMD, NVIDIA, and Intel desktop OpenGL drivers without vendor-specific code paths.
|
||||
3. Produce a baseline document at `docs/plans/2026-05-11-phase-n6-perf-baseline.md` with CPU and GPU numbers across radii 4 / 8 / 12 (standstill + walking), a surface-format histogram, and a memory snapshot.
|
||||
4. The baseline document closes with a recommendation paragraph: should the next phase be N.6 slice 2 (perf cleanup), C.1.5 (PES wiring), or escalation to Tier 2 (static/dynamic split). Rationale grounded in the captured numbers.
|
||||
5. `dotnet build` and `dotnet test` green; no functional regression in the rendering path.
|
||||
|
||||
### Non-goals
|
||||
|
||||
- Persistent-mapped buffers (`BufferSubData` → `GL_MAP_PERSISTENT_BIT`). Deferred to slice 2 unless the baseline shows it's a hot spot.
|
||||
- Legacy `TextureCache` cleanup, `mesh.frag` orphan deletion, sky/UI text shader migration to bindless. All deferred to slice 2.
|
||||
- WB atlas adoption / texture-array consolidation. Deferred to slice 2 pending the surface histogram from goal 3.
|
||||
- Adding GPU queries to terrain / sky / particle / debug-line passes. Slice 1 keeps query scope to the existing two queries inside `WbDrawDispatcher` (opaque-pass + transparent-pass).
|
||||
- GPU compute culling. That's Tier 3 of [docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md](../../plans/2026-05-10-perf-tiers-2-3-roadmap.md), separate roadmap.
|
||||
|
||||
---
|
||||
|
||||
## §3. Design decisions (from brainstorming, 2026-05-11)
|
||||
|
||||
| # | Decision | Rationale |
|
||||
|---|---|---|
|
||||
| Q1 | **Ring of 3 query-pair slots** (not ring of 2) | Vendor-neutral. NVIDIA drivers with triple-buffering + vsync can queue ~3 frames ahead; AMD typically 1–2; Intel iGPUs vary. Ring of 2 plus `ResultAvailable` guard works everywhere but drops more samples on deeper queues. Ring of 3 collects samples reliably across all desktop drivers. Cost: one extra `GLuint` query pair (~12 bytes of GPU state) plus one frame of latency on the printed value, which is invisible because the diagnostic is a 256-frame moving-window median. |
|
||||
| Q2 | **Read-before-issue, same-slot pattern** | On frame N, attempt to read slot `N%3` (which contains frame N-3's result — the *oldest* unread data, ~50 ms ago at 60 fps) *before* overwriting it with frame N's queries. Reading the oldest data maximizes the chance that `ResultAvailable=1` across all desktop drivers. Use `ResultAvailable` as a guard — if not ready, skip the sample. `MedianMicros` already computes over the non-zero subset, so dropped samples don't poison the result. |
|
||||
| Q3 | **Keep query scope unchanged** — just the two existing queries (opaque-pass + transparent-pass for the WB dispatcher) | Slice 1 is "fix what's broken," not "expand instrumentation." Adding terrain / sky / particle queries is slice-2-or-later work and would inflate this slice past the half-day budget. |
|
||||
| Q4 | **Surface-format histogram via env-gated one-shot dump** (`ACDREAM_DUMP_SURFACES=1`) | The atlas-adoption decision in slice 2 needs to know whether enough surfaces share dimensions/format to make consolidation worthwhile. A one-time dump on first frame to a fixed file path is cheap to implement, zero cost when off, and lets the user re-run cheaply when needed. Output goes to `%LOCALAPPDATA%\acdream\n6-surfaces.txt` (not stdout) to avoid spamming the launch log. |
|
||||
| Q5 | **Two commits, not one** | Commit 1 is the GPU-timing fix (code change, regression-bisectable). Commit 2 is the surface-dump path + baseline document (docs + env-gated diag). Keeping them separate means a future bisect for a GPU-timing regression doesn't land on a doc commit. |
|
||||
| Q6 | **Baseline measurement is Holtburg + High preset only** (per the user's hardware) | Slice 1 doesn't pretend to be a cross-hardware perf survey. It's one canonical measurement on the dev machine. The document template captures setup explicitly so a NVIDIA / lower-end run can be added later without re-architecting the doc. |
|
||||
|
||||
---
|
||||
|
||||
## §4. Change 1 — GPU query double-buffering
|
||||
|
||||
### Files touched
|
||||
|
||||
- `src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs` — single-file change, ~30 LOC delta.
|
||||
|
||||
### Current state (verified)
|
||||
|
||||
```csharp
|
||||
// Field declarations near line 155:
|
||||
private uint _gpuQueryOpaque;
|
||||
private uint _gpuQueryTransparent;
|
||||
private readonly long[] _gpuSamples = new long[256];
|
||||
private bool _gpuQueriesInitialized;
|
||||
|
||||
// Init at line ~347:
|
||||
if (diag && !_gpuQueriesInitialized) {
|
||||
_gpuQueryOpaque = _gl.GenQuery();
|
||||
_gpuQueryTransparent = _gl.GenQuery();
|
||||
_gpuQueriesInitialized = true;
|
||||
}
|
||||
|
||||
// Around the opaque draw at line ~774:
|
||||
if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque);
|
||||
… opaque indirect draw …
|
||||
if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed);
|
||||
|
||||
// Same pattern around transparent draw at line ~823.
|
||||
|
||||
// Read at line ~849 — BUG: same frame, never ready:
|
||||
if (_gpuQueriesInitialized) {
|
||||
_gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.ResultAvailable, out int avail);
|
||||
if (avail != 0) {
|
||||
_gl.GetQueryObject(_gpuQueryOpaque, QueryObjectParameterName.Result, out ulong opaqueNs);
|
||||
_gl.GetQueryObject(_gpuQueryTransparent, QueryObjectParameterName.Result, out ulong transNs);
|
||||
long gpuUs = (long)((opaqueNs + transNs) / 1000UL);
|
||||
_gpuSamples[_gpuSampleCursor] = gpuUs;
|
||||
_gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length;
|
||||
}
|
||||
}
|
||||
|
||||
// Dispose at line ~1140:
|
||||
if (_gpuQueriesInitialized) {
|
||||
_gl.DeleteQuery(_gpuQueryOpaque);
|
||||
_gl.DeleteQuery(_gpuQueryTransparent);
|
||||
}
|
||||
```
|
||||
|
||||
### Target state
|
||||
|
||||
```csharp
|
||||
private const int GpuQueryRingDepth = 3;
|
||||
private readonly uint[] _gpuQueryOpaque = new uint[GpuQueryRingDepth];
|
||||
private readonly uint[] _gpuQueryTransparent = new uint[GpuQueryRingDepth];
|
||||
private int _gpuQueryFrameIndex; // increments every frame we issue queries
|
||||
private bool _gpuQueriesInitialized;
|
||||
|
||||
// Init:
|
||||
if (diag && !_gpuQueriesInitialized) {
|
||||
for (int i = 0; i < GpuQueryRingDepth; i++) {
|
||||
_gpuQueryOpaque[i] = _gl.GenQuery();
|
||||
_gpuQueryTransparent[i] = _gl.GenQuery();
|
||||
}
|
||||
_gpuQueriesInitialized = true;
|
||||
}
|
||||
|
||||
// Compute the slot index for this frame. We read this slot's previous
|
||||
// contents (frame N-3's queries — the oldest data in the ring) and then
|
||||
// overwrite it with this frame's queries.
|
||||
int slot = _gpuQueryFrameIndex % GpuQueryRingDepth;
|
||||
|
||||
// Read frame N-3's result BEFORE overwriting. Gated on "we've completed
|
||||
// at least one full ring of writes" so we don't read uninitialized slots
|
||||
// during warm-up.
|
||||
if (_gpuQueriesInitialized && _gpuQueryFrameIndex >= GpuQueryRingDepth) {
|
||||
_gl.GetQueryObject(_gpuQueryOpaque[slot], QueryObjectParameterName.ResultAvailable, out int avail);
|
||||
if (avail != 0) {
|
||||
_gl.GetQueryObject(_gpuQueryOpaque[slot], QueryObjectParameterName.Result, out ulong opaqueNs);
|
||||
_gl.GetQueryObject(_gpuQueryTransparent[slot], QueryObjectParameterName.Result, out ulong transNs);
|
||||
long gpuUs = (long)((opaqueNs + transNs) / 1000UL);
|
||||
_gpuSamples[_gpuSampleCursor] = gpuUs;
|
||||
_gpuSampleCursor = (_gpuSampleCursor + 1) % _gpuSamples.Length;
|
||||
}
|
||||
// If avail==0 the sample is dropped silently. MedianMicros already
|
||||
// computes over the non-zero subset, so dropped samples don't poison
|
||||
// the median.
|
||||
}
|
||||
|
||||
// Issue this frame's queries into the same slot — overwriting the data
|
||||
// we just (attempted to) read.
|
||||
if (diag && _gpuQueriesInitialized) _gl.BeginQuery(QueryTarget.TimeElapsed, _gpuQueryOpaque[slot]);
|
||||
… opaque indirect draw …
|
||||
if (diag && _gpuQueriesInitialized) _gl.EndQuery(QueryTarget.TimeElapsed);
|
||||
|
||||
… same for transparent with _gpuQueryTransparent[slot] …
|
||||
|
||||
_gpuQueryFrameIndex++;
|
||||
|
||||
// Dispose: loop over the ring.
|
||||
```
|
||||
|
||||
### Behavior
|
||||
|
||||
- Frames 0, 1, 2 issue queries but no reads happen (the `>= RingDepth` gate skips them).
|
||||
- Frame 3 reads frame 0's queries (oldest in ring) and writes new queries into slot 0. Frame 4 reads frame 1's, etc.
|
||||
- Steady-state: each frame's queries are read exactly once, three frames after they were issued. Frames 0/1/2's queries are intentionally lost (startup artifact, ~50 ms of measurement).
|
||||
- The diagnostic prints over a 256-frame moving window — at 200 fps that's ~1.3 s of history, so the first valid `gpu_us` median appears within ~2 s of moving.
|
||||
|
||||
### Diag interaction
|
||||
|
||||
`MaybeFlushDiag` already prints every 5 s; no change there.
|
||||
|
||||
`MedianMicros` already filters non-zero samples; no change there.
|
||||
|
||||
The user-visible behavior change: `gpu_us=Xm/Yp95` numbers in `[WB-DIAG]` reflect real GPU draw time for the entity dispatcher's two indirect calls.
|
||||
|
||||
---
|
||||
|
||||
## §5. Change 2 — Surface-format histogram one-shot dump
|
||||
|
||||
### Files touched
|
||||
|
||||
- `src/AcDream.App/Rendering/TextureCache.cs` — add an env-gated dump method, ~40 LOC.
|
||||
- One caller in `GameWindow.cs` (first-frame hook) — ~5 LOC.
|
||||
|
||||
### Trigger
|
||||
|
||||
Env var `ACDREAM_DUMP_SURFACES=1`. When set, on **frame index 600** of the session (~10 s at 60 fps, ~3 s at 200 fps — both well past streaming settle at radius≤12), iterate all entries in the bindless caches (`_bindlessBySurfaceId`, `_bindlessByOverridden`, `_bindlessByPalette`) and emit a histogram to `%LOCALAPPDATA%\acdream\n6-surfaces.txt`. One-shot — fires once per session at the exact frame, no repeats. The user can re-launch to capture a fresh snapshot.
|
||||
|
||||
### Output schema
|
||||
|
||||
Per entry, one line: `surfaceId(uint32 hex), width(uint16), height(uint16), format(string), byteCount(uint32)`.
|
||||
|
||||
Plus rollups at the end:
|
||||
- Count by `(width × height)` bucket — answers "how many distinct dimension pairs?".
|
||||
- Count by source `SurfaceFormat` (INDEX16, BGRA, DXT1, etc.).
|
||||
- Total bytes (sum of `width × height × 4` for RGBA8 uploads).
|
||||
- Top 10 most-shared `(width, height, format)` triples by count — this is the atlas-opportunity input.
|
||||
|
||||
### Cost when off
|
||||
|
||||
Zero — gated by the env-var check. The dump method is only called from a guarded `if` in `GameWindow.cs`.
|
||||
|
||||
---
|
||||
|
||||
## §6. Change 3 — Baseline document
|
||||
|
||||
### File
|
||||
|
||||
`docs/plans/2026-05-11-phase-n6-perf-baseline.md`.
|
||||
|
||||
### Setup section
|
||||
|
||||
- Hardware: Radeon RX 9070 XT (the user's machine).
|
||||
- Resolution: 1440p.
|
||||
- Quality preset: High (default).
|
||||
- Connection: live ACE at `127.0.0.1:9000`, character `+Acdream` at Holtburg.
|
||||
- Sky: clear midday, controlled via `F7` to remove weather noise.
|
||||
- Build: Debug (matches the user's normal launch).
|
||||
- Date measured: 2026-05-11.
|
||||
|
||||
### Measurements
|
||||
|
||||
Three radii: 4, 8, 12. Two motion modes per radius: standstill (camera anchored 30 s) and walking (`+Acdream` walks N→E→S→W across one landblock, 30 s).
|
||||
|
||||
Per radius/mode, capture from `[WB-DIAG]` and the window title:
|
||||
- CPU dispatcher: `cpu_us` median, p95.
|
||||
- GPU dispatcher: `gpu_us` median, p95 (now real).
|
||||
- FPS.
|
||||
- Entities seen / drawn.
|
||||
- Groups.
|
||||
- Frame time (window title).
|
||||
|
||||
### Memory snapshot
|
||||
|
||||
One-time output from the `ACDREAM_DUMP_SURFACES=1` run, summarized:
|
||||
- Total surfaces in cache.
|
||||
- Total GPU texture bytes.
|
||||
- Dimension distribution (top 10 by count).
|
||||
- Format distribution.
|
||||
- Atlas-opportunity score: percentage of surfaces in the top-3 dimension buckets.
|
||||
|
||||
### Conclusion section
|
||||
|
||||
A recommendation paragraph addressing:
|
||||
1. Is the entity dispatcher CPU-bound or GPU-bound at radius=12?
|
||||
2. Does `gpu_us` p95 leave headroom or is the GPU saturated?
|
||||
3. Does the atlas-opportunity score justify slice-2 atlas work?
|
||||
4. Given (1)–(3), what should the next phase be? Slice 2 (perf cleanup), C.1.5 (PES emitter wiring), or escalation to Tier 2 (static/dynamic split)?
|
||||
|
||||
The paragraph is opinionated — the next phase decision should be obvious from the numbers, not require a separate debate.
|
||||
|
||||
---
|
||||
|
||||
## §7. Test plan
|
||||
|
||||
### Automated tests (none new)
|
||||
|
||||
This slice is intentionally test-light:
|
||||
- The GPU-timing fix has no observable behavior in tests — it only changes a diagnostic readout. No new unit tests.
|
||||
- The surface-dump path is env-gated diag; no need to lock its output format in tests.
|
||||
- Existing 1688 tests must remain green. `WbDrawDispatcher` tests (bucketing, indirect-command construction, classification cache) must not be perturbed.
|
||||
|
||||
### Manual verification
|
||||
|
||||
1. Launch live with `ACDREAM_WB_DIAG=1`. Walk Holtburg for ~30 s. Confirm `[WB-DIAG]` prints `gpu_us=Xm/Yp95` with X > 0 within ~5 s.
|
||||
2. Launch live with `ACDREAM_DUMP_SURFACES=1 ACDREAM_WB_DIAG=1`. Wait ~10 s for streaming to settle. Open `%LOCALAPPDATA%\acdream\n6-surfaces.txt`. Confirm it contains a non-empty histogram.
|
||||
3. Run the baseline measurement procedure end-to-end. Confirm the document populates with real numbers, not placeholders.
|
||||
|
||||
---
|
||||
|
||||
## §8. Sequencing / ship gates
|
||||
|
||||
### Commit 1 — GPU query fix
|
||||
|
||||
**Message:** `feat(perf): Phase N.6 slice 1 — fix gpu_us double-buffering in WbDrawDispatcher`
|
||||
|
||||
**Scope:** `WbDrawDispatcher.cs` changes only. Build green, tests green, manual verification step 1 from §7 passes.
|
||||
|
||||
**Gate:** if `gpu_us` still reports 0 after ~10 s of movement, do NOT proceed to commit 2. Bump ring depth to 4 or investigate driver behavior before continuing.
|
||||
|
||||
### Commit 2 — Baseline doc + surface dump
|
||||
|
||||
**Message:** `docs(perf): Phase N.6 slice 1 — radius=12 baseline + surface dump path`
|
||||
|
||||
**Scope:** `TextureCache.cs` dump method, `GameWindow.cs` hook, `docs/plans/2026-05-11-phase-n6-perf-baseline.md`, and the roadmap amendment at `docs/plans/2026-04-11-roadmap.md` lines 690-705 (split N.6 into slice 1 / slice 2 in the bullet list).
|
||||
|
||||
**Gate:** manual verification steps 2 and 3 from §7 pass; baseline document's conclusion paragraph is filled in (not "TBD"); roadmap update lands in the same commit.
|
||||
|
||||
---
|
||||
|
||||
## §9. Acceptance criteria
|
||||
|
||||
1. `[WB-DIAG]` reports non-zero `gpu_us` for the entity dispatcher's opaque+transparent passes at Holtburg radius=12 with `ACDREAM_WB_DIAG=1`.
|
||||
2. The fix uses only core OpenGL 3.3+ features (`GL_TIME_ELAPSED`, `glGetQueryObject`, `GL_QUERY_RESULT_AVAILABLE`). No vendor-specific extensions.
|
||||
3. `docs/plans/2026-05-11-phase-n6-perf-baseline.md` exists, contains numbers (not placeholders) for the 3 radii × 2 motion modes, contains the surface histogram summary, and closes with a recommendation paragraph.
|
||||
4. The roadmap entry at `docs/plans/2026-04-11-roadmap.md:690-705` is amended to reflect the slice split.
|
||||
5. `dotnet build` succeeds with no new warnings.
|
||||
6. `dotnet test` succeeds with the existing pass/fail baseline (1688 passing, ~8 pre-existing physics/input failures unchanged).
|
||||
7. No visible regression in the rendering path — Holtburg outdoor, day/night cycle, entity rendering, transparent surfaces all look the same as before the change.
|
||||
|
||||
---
|
||||
|
||||
## §10. Risks
|
||||
|
||||
| Risk | Likelihood | Mitigation |
|
||||
|---|---|---|
|
||||
| `ResultAvailable` is 0 even for frame N-3 (driver queues 4+ frames ahead) | Low — would be unusual on desktop GL | Sample is dropped silently; diagnostic prints zeros; user reports it. Fix: bump `GpuQueryRingDepth` to 4. No regression in the render path itself. |
|
||||
| Query-pair allocation leaks across init/Dispose cycles | Low | Dispose loop deletes the full ring; existing pattern just gains an array index. |
|
||||
| Surface-dump path fires before streaming settles, gets a sparse picture | Medium | Document the procedure as "wait ~10 s after entering world before reading the file." The dump path itself can also be re-runnable if needed (deferred unless slice 1 hits this in practice). |
|
||||
| Conclusion paragraph in the baseline document is hard to write because the numbers don't clearly favor one direction | Medium — this is the slice's whole purpose | Acknowledge the ambiguity in the document and propose a "slice 1 conclusion plus a short re-brainstorm with the user" flow. The slice still ships if the numbers force a re-brainstorm; the value is in having the numbers, not in pre-deciding the answer. |
|
||||
| Hidden vendor-specific behavior in `GL_TIME_ELAPSED` produces non-comparable numbers across hardware | Low — `GL_TIME_ELAPSED` is nanosecond-accurate per spec | Document the measurement hardware explicitly in the baseline doc setup section so future runs on different GPUs can be tagged appropriately. |
|
||||
|
||||
---
|
||||
|
||||
## §11. Out of scope / future work
|
||||
|
||||
These are explicitly NOT in slice 1, listed here so the next phase has a clean shopping list:
|
||||
|
||||
- **Slice 2 — `TextureCache` cleanup.** Delete orphan `mesh.frag` (verify zero callers post-N.5 amendment). Delete dead entity-style legacy caches (`_handlesByOverridden`, `_handlesByPalette`) that no live renderer reads. Decide on bindless-everywhere vs legacy-island for the remaining `sampler2D` consumers (sky, UI text, particles).
|
||||
- **Slice 2 — Particle shader migration.** Tied to C.1.5 outcome; particles migrate after C.1.5 lands more visible content to regression-test against.
|
||||
- **Slice 2 — Persistent-mapped buffers.** Conditional on slice 1's baseline showing `BufferSubData` as a hot spot.
|
||||
- **Slice 2 — WB atlas adoption.** Conditional on slice 1's surface histogram showing a real opportunity.
|
||||
- **C.1.5 — PES emitter wiring.** Portals, chimneys, fireplaces. Separate phase; gets its own brainstorm/spec.
|
||||
- **Tier 2 — static/dynamic split with persistent groups.** Separate roadmap at [docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md](../../plans/2026-05-10-perf-tiers-2-3-roadmap.md).
|
||||
- **Tier 3 — GPU compute culling.** Depends on Tier 2 first. Same roadmap.
|
||||
- **Cross-vendor perf comparison.** Slice 1 is one machine. A NVIDIA companion run is a backlog item, not in scope.
|
||||
|
||||
---
|
||||
|
||||
## §12. References
|
||||
|
||||
- Existing dispatcher code: [src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs](../../../src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs).
|
||||
- Existing texture cache: [src/AcDream.App/Rendering/TextureCache.cs](../../../src/AcDream.App/Rendering/TextureCache.cs).
|
||||
- Prior perf baseline (style template): [docs/plans/2026-05-09-phase-n5b-perf-baseline.md](../../plans/2026-05-09-phase-n5b-perf-baseline.md).
|
||||
- Roadmap N.6 entry: [docs/plans/2026-04-11-roadmap.md:690-705](../../plans/2026-04-11-roadmap.md).
|
||||
- Perf tiers 2/3 alternative path: [docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md](../../plans/2026-05-10-perf-tiers-2-3-roadmap.md).
|
||||
- Phase C.1 plan with C.1.5 scope: [docs/plans/2026-04-27-phase-c1-pes-particles.md:285-295](../../plans/2026-04-27-phase-c1-pes-particles.md).
|
||||
Loading…
Add table
Add a link
Reference in a new issue