After the post-A.5 lifestone (#52) + JobKind plumbing (#54) work shipped, only Priority 3 (Tier 1 entity-classification cache retry, ISSUE #53) remains. This handoff captures the audit insights gathered during the #52 investigation that the original post-A.5 handoff didn't have: - MeshRef is a `readonly record struct` — its fields can NOT be mutated in place. The actual per-frame mutation for animated entities is the entire MeshRefs LIST replacement at GameWindow.cs:7474-7553. This reframes the cache design. - _animatedEntities dict at GameWindow.cs:160 is the source of truth for which entities go through the per-frame rebuild path. - Static entity = entity.Id NOT in _animatedEntities. Its MeshRefs is the same instance from spawn until rare events (ObjDesc / palette swap / part hide / scale apply). - Recommended cache approach: static-only with explicit invalidation hooks on the network/spawn-time write sites enumerated in the doc. Doc covers: where main is, what shipped this session, why the first Tier 1 attempt failed, the pre-started audit, cache design options, acceptance criteria, files to read, workflow for the next session, and things-to-NOT-do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18 KiB
Phase Post-A.5 — Tier 1 Retry (ISSUE #53) — Cold-Start Handoff
Created: 2026-05-10, immediately after closing ISSUES #52 (lifestone) + #54 (JobKind plumbing) and merging to main. Audience: the next agent picking up Priority 3 of the Post-A.5 polish phase. Purpose: drop straight into the Tier 1 entity-classification cache retry without re-litigating what the prior session settled.
TL;DR
Post-A.5 polish was sized at three priorities. 2 of 3 shipped to main during the 2026-05-10 session; only Priority 3 (Tier 1 retry, ISSUE #53) remains. Tier 1 is the biggest perf headroom in the post-A.5 phase: it should drop the entity dispatcher cpu_us median from ~3.5 ms to ~1-1.5 ms, putting the dispatcher inside the spec's 2.0 ms budget and unlocking ~300-400 FPS at standstill.
The first Tier 1 attempt (commit 3639a6f, reverted at 9b49009) broke animation. The next attempt MUST start with an animation-mutation audit. This handoff has the audit pre-started — there's specific evidence captured below that the previous handoff didn't have.
Sized: ~5-7 days including audit + design + spec + implementation + visual gate.
Where main is
mainHEAD:da08490— Merge ofclaude/cranky-varahamihira-fe423f. Includes the lifestone fix + JobKind plumbing.- CLAUDE.md "Currently in flight" updated to "Post-A.5 polish — Tier 1 retry (only remaining priority)".
docs/ISSUES.mdhas both #52 and #54 in Recently closed with full root-cause writeups; only #53 remains in Active issues.- N.5b conformance sentinel: 94/94. Full suite: 1688/1696 passing (8 pre-existing physics/input failures unchanged across all session work).
Recent commit chain on main (newest first):
| SHA | Subject |
|---|---|
da08490 |
Merge branch 'claude/cranky-varahamihira-fe423f' — Post-A.5 polish: close #52 (lifestone) + #54 (JobKind plumbing) |
9a55354 |
docs(post-A.5 #54): close JobKind plumbing issue + update CLAUDE.md flight status |
bf31e59 |
fix(streaming): close #54 — plumb JobKind through BuildLandblockForStreaming |
b19f1d1 |
docs(post-A.5 #52): close lifestone issue + update CLAUDE.md flight status |
e40159f |
fix(render): close #52 — lifestone visible (alpha-test + cull + uDrawIDOffset) |
c111312 |
docs(post-A.5): cold-start handoff for the next session (the prior handoff this work used) |
What shipped this session
Priority 1 — ISSUE #52 (lifestone missing) — closed by e40159f
Three independent root causes regressed with the WB rendering migration (Phase N.5 retirement amendment, commit dcae2b6, 2026-05-08):
- Alpha-test discard in
mesh_modern.fragtransparent pass killed high-α pixels of dat-flagged transparent surfaces. The lifestone crystal core (surface0x080011DE) decoded with α≥0.95, so 100% of fragments were discarded. Fix: removeα >= 0.95 discardfrom transparent pass; keepα < 0.05 discardas a fragment-cost optimization. - Cull state regression:
WbDrawDispatcher.DrawPhase 8 had no GL cull state — Phase 9.2'sEnable(CullFace) + Back + CCWsetup (commit6f1971a, 2026-04-11) was lost when the legacyStaticMeshRendererwas deleted. Closed-shell translucents composited back-faces over front-faces in iteration order underDepthMask(false). Fix: re-establish Phase 9.2's GL state at the top of Phase 8. uDrawIDOffsetindexing bug:gl_DrawIDARBresets to 0 at the start of eachglMultiDrawElementsIndirect, so the transparent pass was readingBatches[0..transparentCount)(the OPAQUE section) instead ofBatches[opaqueCount..end). The lifestone flickered to whatever opaque batch sorted to index 0 each frame. Fix: adduniform int uDrawIDOffsettomesh_modern.vert, set per-pass in dispatcher (0 for opaque,_opaqueDrawCountfor transparent). Mirrors WB'sBaseObjectRenderManager.cs:845.
User-confirmed visually via +Acdream test character at the Holtburg outdoor lifestone (Z=94 platform).
Priority 2 — ISSUE #54 (JobKind plumbing) — closed by bf31e59
LandblockStreamer.cs primary ctor signature changed from Func<uint, LoadedLandblock?> to Func<uint, LandblockStreamJobKind, LoadedLandblock?>. A back-compat overload preserves the old signature for the 5 ctor sites in LandblockStreamerTests.cs (no test changes needed). BuildLandblockForStreaming(uint, JobKind) in GameWindow.cs early-outs for LoadFar with a heightmap-only path. The Bug A post-load entity strip in LandblockStreamer.HandleJob is retained as a Debug.Assert + Release safety net.
Per-LB worker cost on far-tier dropped from ~tens of ms (full hydration including LandBlockInfo + SceneryGenerator + interior cells) to ~sub-ms (single LandBlock dat read).
Memory entry from this session
feedback_wb_migration_state_audit.md — captures the meta-lesson that WB-migration phases need a systematic GL-state and shader-uniform diff vs the legacy renderer being replaced. Future phases at risk: Sky/Particles modern path migration, EnvCell modern path, Shadow mapping. Also captures the workflow lesson: when the user says "we had this nailed down before", the first move is git log -- <legacy file> BEFORE adding new diagnostic instrumentation.
Priority 3 — ISSUE #53 — Tier 1 entity-classification cache retry
What the first attempt was and why it failed
Commit 3639a6f (reverted at 9b49009) cached meshRef.PartTransform baked into per-(entity, batch) classification at first-frame visit. For static entities this is stable; for animated entities the cache froze the pose and NPCs/players stopped animating. Some buildings also showed at wrong positions (likely entities incorrectly flagged as animated).
The "trust MeshRefs as the source of truth" comment in the dispatcher gave false confidence. MeshRefs IS the source of truth, but it's mutated EVERY frame for animated entities.
The audit (PRE-STARTED in the prior session — read this carefully)
The previous handoff and ISSUE #53 describe the bug as "AnimationSequencer mutates meshRef.PartTransform every frame to apply the current skeletal pose." That framing is technically wrong in a way that matters for the retry design. Discovered during the post-A.5 lifestone session:
MeshRefatsrc/AcDream.Core/World/MeshRef.cs:15is areadonly record struct— its fields cannot be mutated in place:public readonly record struct MeshRef(uint GfxObjId, Matrix4x4 PartTransform)- The actual per-frame mutation for animated entities is the entire
MeshRefsLIST replacement atsrc/AcDream.App/Rendering/GameWindow.cs:7474-7553:var newMeshRefs = new List<AcDream.Core.World.MeshRef>(partCount); // ... loop building per-part transforms from sequencer.Advance(dt) ... ae.Entity.MeshRefs = newMeshRefs; - The source of truth for which entities go through that per-frame path is the
_animatedEntitiesdictionary atGameWindow.cs:160:
Population:private readonly Dictionary<uint, AnimatedEntity> _animatedEntities = new();_animatedEntities[entity.Id] = new AnimatedEntity{...}at GameWindow.cs:2724 (spawn). Removal:_animatedEntities.Remove(...)at GameWindow.cs:2935 (despawn).
Therefore: a static entity is one whose Id is NOT in _animatedEntities. Its MeshRefs list is the same instance from spawn until rare events (ObjDesc / palette swap / part hide). Other static-entity write sites that must be invalidation-aware:
src/AcDream.App/Rendering/GameWindow.cs:2333and:2365— ObjDescEvent / AnimPartChange events rebuild aMeshRefelement. Network-driven, infrequent.src/AcDream.App/Rendering/GameWindow.cs:2524— entity scale apply at spawn (one-shot).- Lines 4682-4924, 4996-5074 — dat-side hydration paths in OnLoad / scenery / interior. Spawn-time only.
What this means for cache design
The cleanest design is now clearer than the original handoff suggested:
Recommended approach (option a from the original handoff): static-only cache with explicit invalidation hooks.
- Cache the (entity, batch) → InstanceGroup-key + model-matrix mapping for entities where
_animatedEntities.ContainsKey(entity.Id) == false. - Animated entities skip the cache entirely; they go through today's per-frame
ClassifyBatchespath. - Invalidate the cache for an entity on:
- ObjDesc / AnimPartChange events (
GameWindow.cs:2333, 2365) — rebuild that entity's cache entry. - Palette override changes (rare; usually only on initial server spawn or a re-equip event).
- Entity despawn — drop the cache entry.
- ObjDesc / AnimPartChange events (
- Static entities never animate. The dispatcher's per-frame work for cached entities reduces from "walk + classify all batches" to "walk + lookup-and-emit-pre-classified".
Why this is safer than the first attempt: the first attempt cached the POSE (model matrix). This attempt would cache only the (group key, texture handle, blend mode, per-part meshRef.PartTransform * entityWorld for the spawn-time stable subset). Animation never enters the cache surface.
Cache design options reconsidered
(a) Static-only cache (recommended). As described above. Clean invariant: animated entities skip the cache; static entities go through it. Requires careful enumeration of all writes to entity.MeshRefs for static entities (see audit list above) so each one fires invalidation.
(b) Dynamic-aware cache with invalidation hooks. Cache everything but expose InvalidateEntity(uint) / RefreshEntityPalette(uint) hooks; wire from network handlers. More complex but might let some animated entities also benefit if their per-frame mutations are localized. NOT RECOMMENDED for a first retry — error-prone and the first attempt already failed at this scope.
(c) Static-only + animated-bypass + DEBUG cross-check. Like (a), but in DEBUG builds, log a warning every frame if a cached entity's MeshRefs reference no longer matches the cached snapshot (catches mis-classified dynamics). Belt-and-suspenders. Recommended IF you're nervous about the audit being incomplete.
Acceptance criteria (from the original handoff, refined)
- Build green; existing 999+ tests pass; 8 pre-existing physics/input failures stay at 8.
- 1-3 new tests covering: cache hit for static entity (lookup), cache bypass for animated entity (no-op), cache invalidation on entity despawn, cache invalidation on ObjDesc/palette event.
- N.5b conformance sentinel intact (89+ tests; in this session it's 94/94 — must stay clean).
- Visual gate: launch + walk Holtburg → North Yanshi at horizon-safe preset; confirm:
- Animation works (NPCs, player character animate normally — including the lifestone crystal closed by #52).
- Buildings at correct positions.
- No new visual regressions.
- Perf gate (with
[WB-DIAG]underACDREAM_WB_DIAG=1):- Entity dispatcher cpu_us median drops from ~3.5 ms to ≤2.0 ms (matches spec budget).
- p95 stays ≤2.5 ms.
Files to read before brainstorming
In rough order:
- This handoff end-to-end — captures audit insights from the prior session that the original handoff didn't have.
docs/research/2026-05-10-post-a5-polish-handoff.md— the prior handoff. §"Priority 3" has the original (slightly outdated) framing of the bug. Read for context but trust THIS handoff's audit insights over its.docs/ISSUES.mdissue #53 — the issue's own description (now updated post-#52/#54 close).docs/superpowers/specs/2026-05-09-phase-a5-two-tier-streaming-design.md— A.5 spec for the entity dispatcher's data-flow context (esp. §4.10 Quality Preset and §11 deferred items).docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md— the perf-tier roadmap. Tier 1 is in scope; Tier 2 + Tier 3 are explicitly NOT (those are dedicated multi-week phases).memory/feedback_wb_migration_state_audit.md— the new memory entry on WB migration state-loss patterns. Tier 1 doesn't touch the WB migration directly, but the meta-lesson "audit before assume" is exactly what this priority needs.memory/project_phase_a5_state.md— the 5 gotchas. Critical for avoiding the same traps, especially #3 (caching mutable per-frame state breaks animation silently) — the exact bug the first Tier 1 attempt hit.src/AcDream.Core/World/MeshRef.cs— confirm thereadonly record structshape; understand that "mutating PartTransform" actually means "replacing the whole MeshRef record."src/AcDream.App/Rendering/GameWindow.cs:7340-7560— the per-frame animation rebuild loop. Read this end-to-end for the audit. Find every line that writes toentity.MeshRefsfor animated entities.src/AcDream.App/Rendering/GameWindow.cs:160+ lines 2710-2760, 2920-2940 —_animatedEntitiesdeclaration + spawn/despawn population.src/AcDream.App/Rendering/Wb/WbDrawDispatcher.cs—DrawandClassifyBatches. Where the cache will land.src/AcDream.Core/Physics/AnimationSequencer.cs— the per-frame animation engine. Audit any field it mutates that the dispatcher reads.src/AcDream.Core/Physics/AnimationHookRouter.cs— secondary mutation source via animation hooks.
Workflow for the next session
- Read this handoff in full.
- Verify build green:
dotnet build. Verify ~1688 tests pass:dotnet test --no-build. Verify N.5b sentinel: filterTerrainSlot|TerrainModernConformance|Wb|MatrixComposition|TextureCacheBindless|SplitFormulaDivergence→ expect 94 passing. - Read the files above in order. Especially deep on §"Files to read" #8-#13.
- Audit step (1-2 days): open a fresh research note
docs/research/2026-05-10-tier1-mutation-audit.mdand write down:- Every code path that writes
entity.MeshRefs = ...for any entity. - Tag each as STATIC (one-shot at spawn or rare event) or DYNAMIC (per-frame).
- For each STATIC write, identify the trigger (network event, scale apply, etc.) and design the invalidation hook.
- For each DYNAMIC write, confirm it fires only for entities in
_animatedEntities(which means cache bypass is the right answer).
- Every code path that writes
- Spec (~1 day): brainstorm the cache design with the user (use
superpowers:brainstorming). Writedocs/superpowers/specs/2026-05-10-issue-53-tier1-cache-design.md. Include the audit findings, the chosen cache approach (probably option (a)), the invariants, the invalidation API, the test plan, the perf-gate measurement plan. - Implement (~2-3 days): TDD via
superpowers:test-driven-development. Tests first for cache hit/miss/invalidation, then implementation inWbDrawDispatcher. Wire invalidation hooks into the relevant write sites inGameWindow.cs. - Visual gate: launch + walk; confirm animation works on a moving NPC; confirm static buildings/scenery still render at correct positions; confirm lifestone (closed by #52) still renders.
- Perf gate: capture
[WB-DIAG]cpu_us median + p95 withACDREAM_WB_DIAG=1at horizon-safe preset (NEAR=4, FAR=12). Compare to today's ~3.5 ms baseline; expect ≤2.0 ms. - Ship: commit, close #53 in ISSUES.md, update CLAUDE.md "Currently in flight" (this would close out the post-A.5 polish phase entirely), update memory with any new gotchas captured during the audit/implementation.
- Next phase after #53 ships: N.6 (perf polish) per the roadmap. Or escalate to Tier 2 (static/dynamic split with persistent groups) per
docs/plans/2026-05-10-perf-tiers-2-3-roadmap.mdif Tier 1 alone doesn't hit the perf target.
Things to NOT do
- Don't skip the audit. The whole reason the first attempt failed was that the audit was implicit and incomplete. The audit step should produce a written list of every MeshRefs write site, classified static vs dynamic, before any cache code is written.
- Don't bundle Tier 2 or Tier 3 into this phase. Those are dedicated multi-week phases per
docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md. If the audit reveals Tier 1 alone can't hit the perf target, file a follow-up issue and escalate as a separate phase. - Don't re-add the
Tier1cache that was reverted. Start fresh after the audit. Cherry-picking commit3639a6freintroduces the animation freeze. - Don't break the N.5b conformance sentinel. Run the filter on every commit:
Expect 94 passing, 0 failures.dotnet test --no-build --filter "FullyQualifiedName~TerrainSlot|FullyQualifiedName~TerrainModernConformance|FullyQualifiedName~Wb|FullyQualifiedName~MatrixComposition|FullyQualifiedName~TextureCacheBindless|FullyQualifiedName~SplitFormulaDivergence" - Don't skip the visual gate. Animation has been the highest-risk regression in this codebase repeatedly (Tier 1 first attempt, the lifestone crystal in this session, the foundry statue earlier). Confirm visually with a moving animated NPC, a stationary building, and the lifestone before declaring done.
- Don't trust "it was working in prod before." That was the first Tier 1 attempt's posture. The audit is what makes it actually safe.
Reference: Tier 1 perf math
Per the perf-tier roadmap and A.5 final state:
- Today (post-A.5 ship + #52/#54): entity dispatcher cpu_us median ~3.5 ms at radius=12 on Radeon RX 9070 XT @ 1440p. ~200-240 FPS at standstill.
- After Tier 1: ~1.0-1.5 ms median expected. ~300-400 FPS at standstill. Inside the spec's 2.0 ms budget.
- After Tier 2 (separate phase): ~0.5-1.0 ms. ~400-600 FPS.
- After Tier 3 (GPU compute culling, separate phase): ~0.05 ms. ~600-1000+ FPS.
Tier 1 is the lowest-risk, highest-leverage perf win remaining for the post-A.5 polish phase.
Good luck. The audit is the load-bearing thing — invest in it. The implementation is mechanical once the audit is solid.
Holler at the user if any of the audit reveals a write site that doesn't fit the static/dynamic dichotomy cleanly.