acdream/docs/research/2026-05-10-tier1-mutation-audit.md
Erik 4abb838729 docs(post-A.5 #53): Tier 1 retry — mutation audit + cache design spec
The audit at docs/research/2026-05-10-tier1-mutation-audit.md enumerates
every entity.MeshRefs write site (5 STATIC at hydration, 1 DYNAMIC at
GameWindow.cs:7580 inside TickAnimations) and verifies that all 7
Position/Rotation write sites only touch entities in _animatedEntities.
Establishes the load-bearing invariant: an entity's renderer state is
stable from spawn to despawn iff entity.Id is NOT in _animatedEntities.

The spec at docs/superpowers/specs/2026-05-10-issue-53-tier1-cache-design.md
locks in the design from brainstorming on 2026-05-10:
- Static-only cache + DEBUG cross-check (option c) — catches future
  regressions of the prior bug class without paying perf cost in Release
- Separate EntityClassificationCache class injected into WbDrawDispatcher
- Cache the rest pose, not the full model matrix (Position/Rotation read
  live each frame so Release stays correct even if the invariant breaks)
- Pre-flatten Setup multi-parts at populate time (the bulk of the win)
- 15 new tests covering all invalidation paths + DEBUG cross-check +
  Setup pre-flatten + lifecycle pin

Closes the audit + design steps of the post-A.5 polish Priority 3 work.
Implementation plan owned by superpowers:writing-plans next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:50:26 +02:00

18 KiB
Raw Blame History

Tier 1 entity-classification cache — mutation audit

Created: 2026-05-10, opening move of the ISSUE #53 retry session. Purpose: enumerate every code path that writes to WorldEntity.MeshRefs (the dispatcher's load-bearing per-entity input) and every adjacent state read by WbDrawDispatcher.ClassifyBatches / model-matrix composition, classify each as STATIC or DYNAMIC, and design the cache invalidation surface BEFORE touching renderer code.

This audit is the load-bearing prerequisite the prior Tier 1 attempt (commit 3639a6f, reverted at 9b49009) skipped. Cache design follows from the audit, not the other way around.


TL;DR — the invariant

An entity's MeshRefs reference, Position, Rotation, PaletteOverride, HiddenPartsMask, ParentCellId, and Scale are stable from spawn to despawn IF AND ONLY IF the entity is NOT in GameWindow._animatedEntities.

That is the invariant the cache rides on. Animated entities (player + remote NPCs/players + animated dat scenery like the lifestone crystal) get a fresh MeshRefs list every frame from TickAnimations plus per-frame Position/Rotation writes from physics/dead-reckoning. Everything else — stabs, scenery, cell-mesh entities, interior static objects — touches none of those fields after construction.

The cache should hold per-entity classification ONLY for entities whose Id is not in _animatedEntities at lookup time. Animated entities go through today's per-frame classification path unchanged.


§1. entity.MeshRefs = ... write sites (the core question)

WorldEntity.MeshRefs is IReadOnlyList<MeshRef> with a set accessor (see src/AcDream.Core/World/WorldEntity.cs:28). MeshRef itself is a readonly record struct (src/AcDream.Core/World/MeshRef.cs:15) — its fields cannot be mutated in place. So every "MeshRefs change" is a whole-list replacement, not a per-element edit.

Six write sites total in src/. Five STATIC, one DYNAMIC.

Site 1 — OnLiveEntitySpawnedLocked (server-spawned entity hydration)

src/AcDream.App/Rendering/GameWindow.cs:2578MeshRefs = meshRefs in the WorldEntity { … } constructor.

Classification: STATIC at first spawn.

Trigger: server's 0xF745 CreateObject for any entity (NPC, monster, player, item, statue, lifestone). Also re-runs from OnLiveAppearanceUpdated (server's 0xF625 ObjDescEvent) → spawn dedup at top of OnLiveEntitySpawnedLocked invokes RemoveLiveEntityByServerGuid, then re-spawns. Each invocation gets a NEW local entity.Id from _liveEntityIdCounter++ (line 2573).

Implication for cache: ObjDescEvent isn't a "mutate existing entity" event — it's a despawn+respawn pair. The despawn path (next subsection) clears the cache for the old Id; the respawn populates fresh under the new Id. The cache never sees a stale entry for a still-active Id from this path.

Pre-construction parts[…] mutations at lines 2333 and 2365 (AnimPartChanges + DIDDegrade resolver) edit the local parts list before it becomes the meshRefs argument; they're not separate write sites.

Site 2 — BuildLandblockForStreaming (stab hydration)

src/AcDream.App/Rendering/GameWindow.cs:4748MeshRefs = meshRefs constructing dat-stab entities.

Classification: STATIC at hydration. Worker-thread only.

Trigger: streaming worker's near-tier load path (LandblockStreamJobKind.LoadNear or PromoteToNear). Single-GfxObj stabs use Matrix4x4.Identity; multi-part Setups go through SetupMesh.Flatten to produce per-part MeshRefs.

Lifetime: lives until the entity's owning landblock is demoted (Near→Far) or unloaded — see Site invalidation §3.2.

Site 3 — BuildSceneryEntitiesForStreaming (procedural scenery)

src/AcDream.App/Rendering/GameWindow.cs:4951MeshRefs = meshRefs for trees / rocks / bushes / fences.

Classification: STATIC at hydration. Worker-thread only.

Lifetime: identical to Site 2.

Site 4 — Interior cell-mesh entity

src/AcDream.App/Rendering/GameWindow.cs:5023MeshRefs = new[] { cellMeshRef } for the EnvCell's own room geometry as a renderable entity.

Classification: STATIC at hydration.

Site 5 — Interior static-object entity

src/AcDream.App/Rendering/GameWindow.cs:5083MeshRefs = meshRefs for static objects placed inside an EnvCell (furniture, fixtures).

Classification: STATIC at hydration.

Site 6 — TickAnimations per-frame rebuild

src/AcDream.App/Rendering/GameWindow.cs:7580ae.Entity.MeshRefs = newMeshRefs; after constructing a fresh List<MeshRef>(partCount) at line 7501 from sequencer.Advance(dt) output.

Classification: DYNAMIC every frame.

Trigger: per-frame iteration over _animatedEntities.Values inside TickAnimations. If entity.Id ∈ _animatedEntities, this loop runs for that entity every frame (subject to motion-table presence). If entity.Id ∉ _animatedEntities, this loop never runs for it.

Consequence: any cache that captures entity.MeshRefs[i].PartTransform for an entity in _animatedEntities will freeze the pose. This is exactly what the prior Tier 1 attempt did.


§2. _animatedEntities membership transitions

_animatedEntities at GameWindow.cs:160 is the gating dict. The cache's "static" predicate is ! _animatedEntities.ContainsKey(entity.Id).

Population

  • GameWindow.cs:2724_animatedEntities[entity.Id] = new AnimatedEntity { … } at server-spawn for entities with a non-empty motion table + a resolvable idle cycle.
  • GameWindow.cs:7685_animatedEntities[pe.Id] = ae; in UpdatePlayerAnimation to re-add the local player entity if a prior UpdateMotion removed it (the "Phase 6.8 stationary remove" pattern). This is the only path that can flip an entity from STATIC to ANIMATED mid-life.

Removal

  • GameWindow.cs:2935_animatedEntities.Remove(existingEntity.Id) inside RemoveLiveEntityByServerGuid. Fires for 0xF747 DeleteObject and as the dedup leg of OnLiveAppearanceUpdated.

Cache implication

Membership IS NOT cached by the dispatcher. The cache lookup checks _animatedEntities.ContainsKey(entity.Id) at lookup time. If the player flips STATIC→ANIMATED mid-session (Site 7685 above), a stale cache entry would still exist for the player Id but never be read; the next despawn (Site 2935) clears it. No special-casing needed.

The reverse flip (ANIMATED→STATIC, e.g. a ground-state demote) leaves no cache entry; the dispatcher takes the cache-miss path on the first frame and populates fresh. Also no special-casing needed.


§3. Position / Rotation write sites (matters for the cached model matrix)

The dispatcher composes model = meshRef.PartTransform * entityWorld for non-Setup entities, and model = restPose * meshRef.PartTransform * entityWorld for Setup multi-parts (with entityWorld = Rotation × Translation). If Position or Rotation changes for a STATIC entity, a cached model matrix would be stale.

Audit shows: every Position/Rotation write site in GameWindow.cs operates on entities that are in _animatedEntities. Static entities never have these fields touched after construction.

Line Context Animated?
3992-3993 entity.SetPosition(worldPos); entity.Rotation = rot; (player physics snap-on-arrival) YES — entity is the local player
4116 entity.SetPosition(rmState.Body.Position); (remote dead-reckon snap branch) YES — remote NPC/player
4230 same context, near-enqueue branch YES
4362-4363 remote dead-reckon physics tick body sync YES
4407-4408 local player position snap (teleport / GoHome) YES
7045-7046 ae.Entity.SetPosition(rm.Body.Position); ae.Entity.Rotation = rm.Body.Orientation; (TickAnimations body sync) YES — ae.Entity is in _animatedEntities by definition
7373-7374 same body-sync context, fall-through path YES

No Position/Rotation writes happen on entities that are NOT in _animatedEntities. Confirmed via grep.


§4. Other entity fields read by the dispatcher

WbDrawDispatcher.Draw and ClassifyBatches read these WorldEntity fields beyond MeshRefs, Position, Rotation:

Field Mutability Cache impact
PaletteOverride init-only (WorldEntity.cs:37) Stable post-spawn → safe to fold into cache key / texHandle resolution
HiddenPartsMask init-only (WorldEntity.cs:73) Stable; doesn't apply to dispatcher anyway (animation tick handles part-hide via s_hidePartIndex debug global, animated path only)
ParentCellId init-only (WorldEntity.cs:45) Stable; visibility filter input
AabbMin/AabbMax/AabbDirty Mutated lazily by RefreshAabb (WorldEntity.cs:79-91) on AabbDirty flag, set by SetPosition Read by WalkEntitiesInto, NOT used by classification. AABB stays static for static entities (Position never changes → never marked dirty after first refresh)
MeshRefs[i].SurfaceOverrides init-only on the MeshRef record struct Stable for the lifetime of the MeshRef list (Sites 1-5)
MeshRefs[i].GfxObjId Stable (readonly record struct) Forms part of the cache key
MeshRefs[i].PartTransform Stable for STATIC entities (the list is replaced atomically in Site 6 only for ANIMATED entities) Cacheable for STATIC entities

No hidden mutability surface. The cache is safe for entities outside _animatedEntities.


§5. Cache invalidation events (the wire-up)

The cache is keyed by entity.Id. Only TWO event sources can invalidate a cached entry:

§5.1 Per-entity despawn (live server entities)

GameWindow.cs:2933-2935_worldState.RemoveEntityByServerGuid(serverGuid); _worldGameState.RemoveById(...); _animatedEntities.Remove(...);

This block fires for:

  • 0xF747 DeleteObject (server explicitly says entity is gone).
  • 0xF625 ObjDescEvent (dedup leg before respawn).

Hook: add _wbDrawDispatcher.InvalidateEntity(existingEntity.Id) to this block.

§5.2 Landblock demote / unload (static dat entities)

src/AcDream.App/Streaming/GpuWorldState.cs:373RemoveEntitiesFromLandblock(landblockId) clears the entity list for a landblock. Called from StreamingController.Tick at StreamingController.cs:116 for ToDemote (Near→Far) and via _enqueueUnload for ToUnload.

Hook: add _wbDrawDispatcher.InvalidateLandblock(landblockId) adjacent to the RemoveEntitiesFromLandblock call. Walk the LB's pre-removal entity list; invalidate each Id.

Implementation note: RemoveEntitiesFromLandblock already has the entity list in scope before zeroing it — adding the invalidation walk inside the method (or via a callback) is cheap. Alternative: StreamingController walks the LB's entries before invoking RemoveEntitiesFromLandblock. Either works; brainstorming will pick.

§5.3 No other invalidation paths needed

Confirmed:

  • MarkPersistent (GameWindow.cs:2024) — keeps player Id pinned across LB unloads. No MeshRefs change.
  • DrainRescued (GameWindow.cs:5885) — re-attaches rescued persistent entities. No MeshRefs change.
  • RelocateEntity (GameWindow.cs:6026) — moves entity between landblocks. Doesn't change MeshRefs/Position/Rotation. Safe.
  • AddEntitiesToExistingLandblock (GpuWorldState.cs:401) — Far→Near promotion adds entities. New entries get cache-miss naturally.

AnimationSequencer (src/AcDream.Core/Physics/AnimationSequencer.cs) does NOT write to entity.MeshRefs or entity.Position/entity.Rotation directly. It produces PartTransform[] frames consumed by TickAnimations. Confirmed via grep — only docstring mention of MeshRef. Sequencer is safe to ignore for cache design.

Core library has zero entity.MeshRefs = ... writes. All writes are in the App layer, all in GameWindow.cs. Confirmed via grep.


Pre-spec recommendation; final design picks settle in the brainstorming session.

// Per-(entity, partIdx, batchIdx) classification result.
private readonly record struct CachedBatch(
    GroupKey Key,                  // bucket identity
    ulong    BindlessTextureHandle, // resolved texture (via palette + override)
    Matrix4x4 RestPose);            // meshRef.PartTransform (or restPose * meshRef.PartTransform for Setup)

// Per-entity cache value.
private sealed class EntityCache
{
    public List<CachedBatch> Batches = new(); // ordered: (part, batch) flat
    public uint LandblockHint;                 // for InvalidateLandblock
}

// Cache state.
private readonly Dictionary<uint /*entityId*/, EntityCache> _entityCache = new();

// Hot path:
//   if (_animatedEntities.ContainsKey(entity.Id)) → today's path (full ClassifyBatches)
//   else if (_entityCache.TryGetValue(entity.Id, out var cached)) →
//        for each batch: append (cached.RestPose * entityWorld) to its group's matrices
//   else → ClassifyBatches once, populate cache, then same fast path next frame.

Per-frame static cost: dictionary lookup + per-batch matrix multiply + matrices.Add. No texture resolution, no group-key construction, no metaTable lookup.

Worst case: if every entity is animated (e.g. a city full of NPCs), the cache adds one ContainsKey lookup per visible entity vs today's path. Negligible overhead. In practice ~10K entities total at radius=12 with ~50 animated → 99.5% cache hit rate on the static path.

Risk surface: the cache invariant rests on TWO claims, both verified in the audit above:

  1. STATIC entity Position / Rotation never mutate post-spawn. Verified §3.
  2. STATIC entity MeshRefs reference never changes post-spawn. Verified §1 (only Site 6 writes, only for animated entities).

If either claim breaks in a future change (e.g. someone adds an "earthquake" effect that mutates static-tree positions), the cache will quietly serve stale matrices. Defense:

  • DEBUG-only assertion in the cache hit path: Debug.Assert(!_animatedEntities.ContainsKey(entity.Id)).
  • DEBUG-only cross-check: in DEBUG builds, in the cache-hit path, also recompute the live model matrix and compare against cached.RestPose * entityWorld. Log a warning if they differ. Catches the "someone added a new mutation site" failure mode without paying the cost in Release.

(Belt-and-suspenders option (c) from the original handoff. Recommended for the first retry given the prior bug.)


§7. What does NOT need to be in the cache design

  • Texture invalidation on bindless handle change. Bindless handles are issued on first texture upload and remain valid for the texture's lifetime. TextureCache doesn't evict entries during normal play (only on shutdown). Static-entity texture handles never change.
  • GfxObj re-decode. WbMeshAdapter.TryGetRenderData returns the same ObjectRenderData instance for a given gfxObjId for the session. Static-entity batches never change.
  • SurfaceOverrides reactivation. Init-only on MeshRef, set at Site 1's hydration time, stable for the MeshRef's lifetime.
  • Per-frame Time / dt inputs. The dispatcher doesn't read time. Texture animation (e.g. animated UV scrolls) happens in the shader from gl_Time-equivalent uniforms, not from cached state.

§8. Open questions for brainstorming

These need a user decision before I write the spec:

  1. Where do InvalidateEntity / InvalidateLandblock live? On WbDrawDispatcher (cache lives there)? On a new EntityClassificationCache class injected into the dispatcher (separation of concerns; testable in isolation)? My lean: separate class, dispatcher gets it via ctor.
  2. Static-only (option a) vs static-only + DEBUG cross-check (option c)? Cross-check costs nothing in Release and catches the exact bug class that bit us last time. My lean: option (c).
  3. Cache the full model matrix or the rest pose? Full matrix saves a per-frame multiply but bakes Position/Rotation into the cache (theoretically violatable). Rest pose is safer + costs ~one mat4 mult per batch. My lean: rest pose.
  4. Setup multi-part flattening: cache the per-part setupPart.PartTransform * meshRef.PartTransform product? Today's Draw walks renderData.SetupParts per-frame even though that list is per-GfxObj-immutable. The cache could pre-flatten into the batch list. My lean: yes — that's where the visible CPU win is.
  5. Test plan: where do new tests live? tests/AcDream.Core.Tests/Rendering/Wb/EntityClassificationCacheTests.cs? Pure-CPU tests on the cache class without GL state? My lean: yes, separate test file in the existing Wb test directory.

§9. Sentinel + baseline (verified at audit start, 2026-05-10)

  • dotnet build: green (after git submodule update --init for the WorldBuilder ref tree, which was missing in this fresh worktree).
  • dotnet test --no-build: 1688 passing, 8 pre-existing failures in AcDream.Core.Tests. Matches the post-#52/#54 baseline in the handoff.
  • N.5b sentinel filter (TerrainSlot|TerrainModernConformance|Wb|MatrixComposition|TextureCacheBindless|SplitFormulaDivergence): 94/94 passing. Matches the post-#52/#54 baseline.

These are the floors the Tier 1 retry must keep clean throughout.