acdream/docs/research/2026-05-10-tier1-mutation-audit.md
Erik 4abb838729 docs(post-A.5 #53): Tier 1 retry — mutation audit + cache design spec
The audit at docs/research/2026-05-10-tier1-mutation-audit.md enumerates
every entity.MeshRefs write site (5 STATIC at hydration, 1 DYNAMIC at
GameWindow.cs:7580 inside TickAnimations) and verifies that all 7
Position/Rotation write sites only touch entities in _animatedEntities.
Establishes the load-bearing invariant: an entity's renderer state is
stable from spawn to despawn iff entity.Id is NOT in _animatedEntities.

The spec at docs/superpowers/specs/2026-05-10-issue-53-tier1-cache-design.md
locks in the design from brainstorming on 2026-05-10:
- Static-only cache + DEBUG cross-check (option c) — catches future
  regressions of the prior bug class without paying perf cost in Release
- Separate EntityClassificationCache class injected into WbDrawDispatcher
- Cache the rest pose, not the full model matrix (Position/Rotation read
  live each frame so Release stays correct even if the invariant breaks)
- Pre-flatten Setup multi-parts at populate time (the bulk of the win)
- 15 new tests covering all invalidation paths + DEBUG cross-check +
  Setup pre-flatten + lifecycle pin

Closes the audit + design steps of the post-A.5 polish Priority 3 work.
Implementation plan owned by superpowers:writing-plans next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 16:50:26 +02:00

246 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Tier 1 entity-classification cache — mutation audit
**Created:** 2026-05-10, opening move of the ISSUE #53 retry session.
**Purpose:** enumerate every code path that writes to `WorldEntity.MeshRefs` (the dispatcher's load-bearing per-entity input) and every adjacent state read by `WbDrawDispatcher.ClassifyBatches` / model-matrix composition, classify each as STATIC or DYNAMIC, and design the cache invalidation surface BEFORE touching renderer code.
This audit is the load-bearing prerequisite the prior Tier 1 attempt (commit `3639a6f`, reverted at `9b49009`) skipped. Cache design follows from the audit, not the other way around.
---
## TL;DR — the invariant
> **An entity's `MeshRefs` reference, `Position`, `Rotation`, `PaletteOverride`, `HiddenPartsMask`, `ParentCellId`, and `Scale` are stable from spawn to despawn IF AND ONLY IF the entity is NOT in `GameWindow._animatedEntities`.**
That is the invariant the cache rides on. Animated entities (player + remote NPCs/players + animated dat scenery like the lifestone crystal) get a fresh `MeshRefs` list every frame from `TickAnimations` plus per-frame `Position`/`Rotation` writes from physics/dead-reckoning. Everything else — stabs, scenery, cell-mesh entities, interior static objects — touches none of those fields after construction.
The cache should hold per-entity classification ONLY for entities whose `Id` is not in `_animatedEntities` at lookup time. Animated entities go through today's per-frame classification path unchanged.
---
## §1. `entity.MeshRefs = ...` write sites (the core question)
`WorldEntity.MeshRefs` is `IReadOnlyList<MeshRef>` with a `set` accessor (see [src/AcDream.Core/World/WorldEntity.cs:28](../../src/AcDream.Core/World/WorldEntity.cs#L28)). `MeshRef` itself is a `readonly record struct` ([src/AcDream.Core/World/MeshRef.cs:15](../../src/AcDream.Core/World/MeshRef.cs#L15)) — its fields cannot be mutated in place. So every "MeshRefs change" is a whole-list replacement, not a per-element edit.
Six write sites total in `src/`. Five STATIC, one DYNAMIC.
### Site 1 — `OnLiveEntitySpawnedLocked` (server-spawned entity hydration)
**[src/AcDream.App/Rendering/GameWindow.cs:2578](../../src/AcDream.App/Rendering/GameWindow.cs#L2578)** — `MeshRefs = meshRefs` in the `WorldEntity { … }` constructor.
**Classification:** **STATIC** at first spawn.
**Trigger:** server's `0xF745 CreateObject` for any entity (NPC, monster, player, item, statue, lifestone). Also re-runs from `OnLiveAppearanceUpdated` (server's `0xF625 ObjDescEvent`) → spawn dedup at top of `OnLiveEntitySpawnedLocked` invokes `RemoveLiveEntityByServerGuid`, then re-spawns. Each invocation gets a NEW local `entity.Id` from `_liveEntityIdCounter++` (line 2573).
**Implication for cache:** ObjDescEvent isn't a "mutate existing entity" event — it's a despawn+respawn pair. The despawn path (next subsection) clears the cache for the old Id; the respawn populates fresh under the new Id. The cache never sees a stale entry for a still-active Id from this path.
**Pre-construction `parts[…]` mutations** at lines 2333 and 2365 (AnimPartChanges + DIDDegrade resolver) edit the *local* `parts` list before it becomes the `meshRefs` argument; they're not separate write sites.
### Site 2 — `BuildLandblockForStreaming` (stab hydration)
**[src/AcDream.App/Rendering/GameWindow.cs:4748](../../src/AcDream.App/Rendering/GameWindow.cs#L4748)** — `MeshRefs = meshRefs` constructing dat-stab entities.
**Classification:** **STATIC** at hydration. Worker-thread only.
**Trigger:** streaming worker's near-tier load path (`LandblockStreamJobKind.LoadNear` or `PromoteToNear`). Single-GfxObj stabs use `Matrix4x4.Identity`; multi-part Setups go through `SetupMesh.Flatten` to produce per-part MeshRefs.
**Lifetime:** lives until the entity's owning landblock is demoted (Near→Far) or unloaded — see Site invalidation §3.2.
### Site 3 — `BuildSceneryEntitiesForStreaming` (procedural scenery)
**[src/AcDream.App/Rendering/GameWindow.cs:4951](../../src/AcDream.App/Rendering/GameWindow.cs#L4951)** — `MeshRefs = meshRefs` for trees / rocks / bushes / fences.
**Classification:** **STATIC** at hydration. Worker-thread only.
**Lifetime:** identical to Site 2.
### Site 4 — Interior cell-mesh entity
**[src/AcDream.App/Rendering/GameWindow.cs:5023](../../src/AcDream.App/Rendering/GameWindow.cs#L5023)** — `MeshRefs = new[] { cellMeshRef }` for the EnvCell's own room geometry as a renderable entity.
**Classification:** **STATIC** at hydration.
### Site 5 — Interior static-object entity
**[src/AcDream.App/Rendering/GameWindow.cs:5083](../../src/AcDream.App/Rendering/GameWindow.cs#L5083)** — `MeshRefs = meshRefs` for static objects placed inside an EnvCell (furniture, fixtures).
**Classification:** **STATIC** at hydration.
### Site 6 — `TickAnimations` per-frame rebuild
**[src/AcDream.App/Rendering/GameWindow.cs:7580](../../src/AcDream.App/Rendering/GameWindow.cs#L7580)** — `ae.Entity.MeshRefs = newMeshRefs;` after constructing a fresh `List<MeshRef>(partCount)` at line 7501 from `sequencer.Advance(dt)` output.
**Classification:** **DYNAMIC** every frame.
**Trigger:** per-frame iteration over `_animatedEntities.Values` inside `TickAnimations`. If `entity.Id ∈ _animatedEntities`, this loop runs for that entity every frame (subject to motion-table presence). If `entity.Id ∉ _animatedEntities`, this loop never runs for it.
**Consequence:** any cache that captures `entity.MeshRefs[i].PartTransform` for an entity in `_animatedEntities` will freeze the pose. **This is exactly what the prior Tier 1 attempt did.**
---
## §2. `_animatedEntities` membership transitions
`_animatedEntities` at [GameWindow.cs:160](../../src/AcDream.App/Rendering/GameWindow.cs#L160) is the gating dict. The cache's "static" predicate is `! _animatedEntities.ContainsKey(entity.Id)`.
### Population
- **[GameWindow.cs:2724](../../src/AcDream.App/Rendering/GameWindow.cs#L2724)** — `_animatedEntities[entity.Id] = new AnimatedEntity { … }` at server-spawn for entities with a non-empty motion table + a resolvable idle cycle.
- **[GameWindow.cs:7685](../../src/AcDream.App/Rendering/GameWindow.cs#L7685)** — `_animatedEntities[pe.Id] = ae;` in `UpdatePlayerAnimation` to *re-add* the local player entity if a prior `UpdateMotion` removed it (the "Phase 6.8 stationary remove" pattern). This is the only path that can flip an entity from STATIC to ANIMATED mid-life.
### Removal
- **[GameWindow.cs:2935](../../src/AcDream.App/Rendering/GameWindow.cs#L2935)** — `_animatedEntities.Remove(existingEntity.Id)` inside `RemoveLiveEntityByServerGuid`. Fires for `0xF747 DeleteObject` and as the dedup leg of `OnLiveAppearanceUpdated`.
### Cache implication
Membership IS NOT cached by the dispatcher. The cache lookup checks `_animatedEntities.ContainsKey(entity.Id)` at lookup time. If the player flips STATIC→ANIMATED mid-session (Site 7685 above), a stale cache entry would still exist for the player Id but never be read; the next despawn (Site 2935) clears it. No special-casing needed.
The reverse flip (ANIMATED→STATIC, e.g. a ground-state demote) leaves no cache entry; the dispatcher takes the cache-miss path on the first frame and populates fresh. Also no special-casing needed.
---
## §3. Position / Rotation write sites (matters for the cached model matrix)
The dispatcher composes `model = meshRef.PartTransform * entityWorld` for non-Setup entities, and `model = restPose * meshRef.PartTransform * entityWorld` for Setup multi-parts (with `entityWorld = Rotation × Translation`). If `Position` or `Rotation` changes for a STATIC entity, a cached model matrix would be stale.
Audit shows: **every Position/Rotation write site in `GameWindow.cs` operates on entities that are in `_animatedEntities`.** Static entities never have these fields touched after construction.
| Line | Context | Animated? |
|---|---|---|
| 3992-3993 | `entity.SetPosition(worldPos); entity.Rotation = rot;` (player physics snap-on-arrival) | YES — `entity` is the local player |
| 4116 | `entity.SetPosition(rmState.Body.Position);` (remote dead-reckon snap branch) | YES — remote NPC/player |
| 4230 | same context, near-enqueue branch | YES |
| 4362-4363 | remote dead-reckon physics tick body sync | YES |
| 4407-4408 | local player position snap (teleport / GoHome) | YES |
| 7045-7046 | `ae.Entity.SetPosition(rm.Body.Position); ae.Entity.Rotation = rm.Body.Orientation;` (TickAnimations body sync) | YES — `ae.Entity` is in `_animatedEntities` by definition |
| 7373-7374 | same body-sync context, fall-through path | YES |
No Position/Rotation writes happen on entities that are NOT in `_animatedEntities`. Confirmed via grep.
---
## §4. Other entity fields read by the dispatcher
`WbDrawDispatcher.Draw` and `ClassifyBatches` read these `WorldEntity` fields beyond `MeshRefs`, `Position`, `Rotation`:
| Field | Mutability | Cache impact |
|---|---|---|
| `PaletteOverride` | `init`-only ([WorldEntity.cs:37](../../src/AcDream.Core/World/WorldEntity.cs#L37)) | Stable post-spawn → safe to fold into cache key / texHandle resolution |
| `HiddenPartsMask` | `init`-only ([WorldEntity.cs:73](../../src/AcDream.Core/World/WorldEntity.cs#L73)) | Stable; doesn't apply to dispatcher anyway (animation tick handles part-hide via `s_hidePartIndex` debug global, animated path only) |
| `ParentCellId` | `init`-only ([WorldEntity.cs:45](../../src/AcDream.Core/World/WorldEntity.cs#L45)) | Stable; visibility filter input |
| `AabbMin/AabbMax/AabbDirty` | Mutated lazily by `RefreshAabb` ([WorldEntity.cs:79-91](../../src/AcDream.Core/World/WorldEntity.cs#L79)) on `AabbDirty` flag, set by `SetPosition` | Read by `WalkEntitiesInto`, NOT used by classification. AABB stays static for static entities (Position never changes → never marked dirty after first refresh) |
| `MeshRefs[i].SurfaceOverrides` | `init`-only on the MeshRef record struct | Stable for the lifetime of the MeshRef list (Sites 1-5) |
| `MeshRefs[i].GfxObjId` | Stable (`readonly record struct`) | Forms part of the cache key |
| `MeshRefs[i].PartTransform` | Stable for STATIC entities (the list is replaced atomically in Site 6 only for ANIMATED entities) | Cacheable for STATIC entities |
No hidden mutability surface. The cache is safe for entities outside `_animatedEntities`.
---
## §5. Cache invalidation events (the wire-up)
The cache is keyed by `entity.Id`. Only TWO event sources can invalidate a cached entry:
### §5.1 Per-entity despawn (live server entities)
**[GameWindow.cs:2933-2935](../../src/AcDream.App/Rendering/GameWindow.cs#L2933)** — `_worldState.RemoveEntityByServerGuid(serverGuid); _worldGameState.RemoveById(...); _animatedEntities.Remove(...);`
This block fires for:
- `0xF747 DeleteObject` (server explicitly says entity is gone).
- `0xF625 ObjDescEvent` (dedup leg before respawn).
**Hook:** add `_wbDrawDispatcher.InvalidateEntity(existingEntity.Id)` to this block.
### §5.2 Landblock demote / unload (static dat entities)
**[src/AcDream.App/Streaming/GpuWorldState.cs:373](../../src/AcDream.App/Streaming/GpuWorldState.cs#L373)** — `RemoveEntitiesFromLandblock(landblockId)` clears the entity list for a landblock. Called from `StreamingController.Tick` at [StreamingController.cs:116](../../src/AcDream.App/Streaming/StreamingController.cs#L116) for `ToDemote` (Near→Far) and via `_enqueueUnload` for `ToUnload`.
**Hook:** add `_wbDrawDispatcher.InvalidateLandblock(landblockId)` adjacent to the `RemoveEntitiesFromLandblock` call. Walk the LB's pre-removal entity list; invalidate each Id.
Implementation note: `RemoveEntitiesFromLandblock` already has the entity list in scope before zeroing it — adding the invalidation walk inside the method (or via a callback) is cheap. Alternative: `StreamingController` walks the LB's entries before invoking `RemoveEntitiesFromLandblock`. Either works; brainstorming will pick.
### §5.3 No other invalidation paths needed
Confirmed:
- `MarkPersistent` ([GameWindow.cs:2024](../../src/AcDream.App/Rendering/GameWindow.cs#L2024)) — keeps player Id pinned across LB unloads. No MeshRefs change.
- `DrainRescued` ([GameWindow.cs:5885](../../src/AcDream.App/Rendering/GameWindow.cs#L5885)) — re-attaches rescued persistent entities. No MeshRefs change.
- `RelocateEntity` ([GameWindow.cs:6026](../../src/AcDream.App/Rendering/GameWindow.cs#L6026)) — moves entity between landblocks. Doesn't change MeshRefs/Position/Rotation. Safe.
- `AddEntitiesToExistingLandblock` ([GpuWorldState.cs:401](../../src/AcDream.App/Streaming/GpuWorldState.cs#L401)) — Far→Near promotion adds entities. New entries get cache-miss naturally.
`AnimationSequencer` ([src/AcDream.Core/Physics/AnimationSequencer.cs](../../src/AcDream.Core/Physics/AnimationSequencer.cs)) does NOT write to `entity.MeshRefs` or `entity.Position`/`entity.Rotation` directly. It produces `PartTransform[]` frames consumed by `TickAnimations`. Confirmed via grep — only docstring mention of `MeshRef`. Sequencer is safe to ignore for cache design.
`Core` library has zero `entity.MeshRefs = ...` writes. All writes are in the App layer, all in `GameWindow.cs`. Confirmed via grep.
---
## §6. Recommended cache shape (for brainstorming, not yet committed)
Pre-spec recommendation; final design picks settle in the brainstorming session.
```csharp
// Per-(entity, partIdx, batchIdx) classification result.
private readonly record struct CachedBatch(
GroupKey Key, // bucket identity
ulong BindlessTextureHandle, // resolved texture (via palette + override)
Matrix4x4 RestPose); // meshRef.PartTransform (or restPose * meshRef.PartTransform for Setup)
// Per-entity cache value.
private sealed class EntityCache
{
public List<CachedBatch> Batches = new(); // ordered: (part, batch) flat
public uint LandblockHint; // for InvalidateLandblock
}
// Cache state.
private readonly Dictionary<uint /*entityId*/, EntityCache> _entityCache = new();
// Hot path:
// if (_animatedEntities.ContainsKey(entity.Id)) → today's path (full ClassifyBatches)
// else if (_entityCache.TryGetValue(entity.Id, out var cached)) →
// for each batch: append (cached.RestPose * entityWorld) to its group's matrices
// else → ClassifyBatches once, populate cache, then same fast path next frame.
```
**Per-frame static cost:** dictionary lookup + per-batch matrix multiply + matrices.Add. No texture resolution, no group-key construction, no metaTable lookup.
**Worst case:** if every entity is animated (e.g. a city full of NPCs), the cache adds one `ContainsKey` lookup per visible entity vs today's path. Negligible overhead. In practice ~10K entities total at radius=12 with ~50 animated → 99.5% cache hit rate on the static path.
**Risk surface:** the cache invariant rests on TWO claims, both verified in the audit above:
1. STATIC entity Position / Rotation never mutate post-spawn. Verified §3.
2. STATIC entity MeshRefs reference never changes post-spawn. Verified §1 (only Site 6 writes, only for animated entities).
If either claim breaks in a future change (e.g. someone adds an "earthquake" effect that mutates static-tree positions), the cache will quietly serve stale matrices. Defense:
- **DEBUG-only assertion** in the cache hit path: `Debug.Assert(!_animatedEntities.ContainsKey(entity.Id))`.
- **DEBUG-only cross-check**: in DEBUG builds, in the cache-hit path, also recompute the live model matrix and compare against `cached.RestPose * entityWorld`. Log a warning if they differ. Catches the "someone added a new mutation site" failure mode without paying the cost in Release.
(Belt-and-suspenders option (c) from the original handoff. Recommended for the first retry given the prior bug.)
---
## §7. What does NOT need to be in the cache design
- **Texture invalidation on bindless handle change.** Bindless handles are issued on first texture upload and remain valid for the texture's lifetime. `TextureCache` doesn't evict entries during normal play (only on shutdown). Static-entity texture handles never change.
- **GfxObj re-decode.** `WbMeshAdapter.TryGetRenderData` returns the same `ObjectRenderData` instance for a given `gfxObjId` for the session. Static-entity batches never change.
- **`SurfaceOverrides` reactivation.** Init-only on `MeshRef`, set at Site 1's hydration time, stable for the MeshRef's lifetime.
- **Per-frame `Time` / `dt` inputs.** The dispatcher doesn't read time. Texture animation (e.g. animated UV scrolls) happens in the shader from `gl_Time`-equivalent uniforms, not from cached state.
---
## §8. Open questions for brainstorming
These need a user decision before I write the spec:
1. **Where do `InvalidateEntity` / `InvalidateLandblock` live?** On `WbDrawDispatcher` (cache lives there)? On a new `EntityClassificationCache` class injected into the dispatcher (separation of concerns; testable in isolation)? My lean: separate class, dispatcher gets it via ctor.
2. **Static-only (option a) vs static-only + DEBUG cross-check (option c)?** Cross-check costs nothing in Release and catches the exact bug class that bit us last time. My lean: option (c).
3. **Cache the full model matrix or the rest pose?** Full matrix saves a per-frame multiply but bakes Position/Rotation into the cache (theoretically violatable). Rest pose is safer + costs ~one mat4 mult per batch. My lean: rest pose.
4. **Setup multi-part flattening: cache the per-part `setupPart.PartTransform * meshRef.PartTransform` product?** Today's `Draw` walks `renderData.SetupParts` per-frame even though that list is per-GfxObj-immutable. The cache could pre-flatten into the batch list. My lean: yes — that's where the visible CPU win is.
5. **Test plan: where do new tests live?** `tests/AcDream.Core.Tests/Rendering/Wb/EntityClassificationCacheTests.cs`? Pure-CPU tests on the cache class without GL state? My lean: yes, separate test file in the existing Wb test directory.
---
## §9. Sentinel + baseline (verified at audit start, 2026-05-10)
- `dotnet build`: green (after `git submodule update --init` for the WorldBuilder ref tree, which was missing in this fresh worktree).
- `dotnet test --no-build`: 1688 passing, 8 pre-existing failures in `AcDream.Core.Tests`. Matches the post-#52/#54 baseline in the handoff.
- N.5b sentinel filter (`TerrainSlot|TerrainModernConformance|Wb|MatrixComposition|TextureCacheBindless|SplitFormulaDivergence`): 94/94 passing. Matches the post-#52/#54 baseline.
These are the floors the Tier 1 retry must keep clean throughout.