Code-quality review on Task 1 (commit a7c9800) flagged an asymmetric
diag gate: the read-before-overwrite block at the top of the dispatcher
was not gated on diag, but the frame-counter increment and BeginQuery
calls were. If a maintainer toggled ACDREAM_WB_DIAG from "1" to "" mid-
session, _gpuQueryFrameIndex would freeze (gated inside if(diag)) while
the read kept firing every frame at the same slot — producing duplicate
stale samples.
Add diag to the read block's outer condition so the read/issue/increment
trio is symmetric. One-line change; behavior under the normal usage
pattern (env var set at launch, never toggled) is unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dispatcher's GPU TimeElapsed queries were polled in the same frame
as the indirect draw, so glGetQueryObject(ResultAvailable) always
returned 0 and gpu_us in [WB-DIAG] was stuck at 0m/0p95.
Replace the 2 single-handle queries with ring-of-3 arrays and move the
result read to BEFORE issuing the next frame's queries into the same
slot — at frame N we read slot N%3 which holds frame N-3's queries
(oldest in the ring, ~50ms old at 60fps and definitely done across all
desktop GL drivers). Vendor-neutral: AMD/NVIDIA/Intel desktop GL all
work without driver-specific code.
The gpuQuerySlot variable is hoisted to function scope (just before
Phase 7 opaque pass) so both the opaque and transparent passes
reference the same slot — the plan placed it inside the opaque-pass
if-block, which would have been out of scope for the transparent
BeginQuery; corrected in the implementation.
No new tests — the change is purely a diagnostic readout fix, no
observable behavior in the rendering path. Build green; tests at
baseline (1711 passing, 8 pre-existing physics/MotionInterpreter
failures unchanged). Manual gpu_us verification still pending in-world.
Spec: docs/superpowers/specs/2026-05-11-phase-n6-slice1-design.md (§4).
Plan: docs/superpowers/plans/2026-05-11-phase-n6-slice1.md (Task 1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported (cache enabled, post-c55acdc): drudge statue renders fully
but many trees are missing branches. Cache-disabled A/B run rendered trees
correctly. So the bug is in the cache wiring.
Root cause: c55acdc's `currentEntityIncomplete = false;` reset fired
UNCONDITIONALLY at the top of every iteration. For a tree with MeshRefs
[trunk valid, branches null, leaves valid], the tuple sequence is:
- tuple 0 (trunk): no flag set
- tuple 1 (branches): TryGetRenderData null → set flag, continue
- tuple 2 (leaves): unconditional reset → flag = false (WRONG)
- end-of-entity: flag is false, scratch has trunk+leaves batches but NOT
branches → MaybeFlushOnEntityChange populates a PARTIAL cache entry
- cache hits forever serve trunk+leaves with no branches
Drudge happened to render correctly because its missing MeshRef was at the
END of its MeshRefs list — no later tuple reset the flag.
Adds a per-tuple `prevTupleEntityId` tracker for entity-change detection,
updated UNCONDITIONALLY at end of each tuple (including tuples that skip
via null renderData). The flag-reset block now fires ONLY on actual entity
change. Within the same entity, the flag accumulates across tuples.
Also includes ACDREAM_DISABLE_TIER1_CACHE=1 diagnostic env-var added
inline (was stashed previously) for future A/B testing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported: the drudge statue on top of the Foundry (a multi-part
live-spawned entity with AnimPartChange + texChanges) renders only
PARTIALLY — some parts visible, some missing.
Root cause: the dispatcher's slow path skips a MeshRef when
_meshAdapter.TryGetRenderData returns null (mesh still async-decoding
via ObjectMeshManager.PrepareMeshDataAsync). The classified-batches
collector accumulates only the MeshRefs that DID resolve. At entity
boundary, the cache populates with the PARTIAL set. Frame-2 cache hits
serve that partial entry forever — even after the missing mesh loads,
the cache continues to skip those parts because classification never
reruns for cached entities.
Fix: track currentEntityIncomplete during the foreach. Set it true on
any null renderData. At entity boundary (and at end-of-loop), if the
flag is set, DROP the accumulated populate scratch instead of writing
it to the cache. The slow path retries on the next frame; once all
meshes have loaded, the populate fires correctly with the complete
classification.
Adds a regression test pinning the contract — incomplete entities
produce zero cache entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User confirmed via A/B test (ACDREAM_DISABLE_TIER1_CACHE=1) that the
visual bug — buildings rendering up in the air outside Holtburg — is in
the cache wiring, not elsewhere. The matrix math (restPose * entityWorld
== model) was provably correct, so the bug had to be cache key collision.
Stabs were namespaced in commit 71d0edc, but scenery (0x80LLBB00 +
localIndex) and interior (0x40LLBB00 + localCounter) still have the
same 256-overflow risk. Dense LBs outside Holtburg (forest, urban) push
localIndex past 255, wrapping into the lbY byte and creating cross-LB
collisions.
Fix: change the cache key from uint entityId to (uint, uint) tuple of
(EntityId, LandblockHint). The cache is now correct-by-construction
regardless of any hydration path's Id-generation strategy. Defensive
against future regressions in any ID namespace.
InvalidateEntity becomes a sweep (was O(1)), but it's called rarely
(only on live-entity despawn). InvalidateLandblock was already a sweep.
Updated 14 existing cache tests + 1 dispatcher integration test to thread
landblockHint through TryGet / DebugCrossCheck calls.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds EntityClassificationCache.DebugCrossCheck(entityId, liveBatches) that
asserts cached state matches a live re-classification. Wires a simpler
predicate assert into WbDrawDispatcher's cache-hit branch (asserts
isAnimated == false on cache hit). Tests #13a and #13b cover the
batch-count mismatch and clean-match cases via a custom TraceListener
that captures Debug.Assert calls.
Zero cost in Release. In DEBUG, the assert fires immediately if a future
regression mutates static-entity state outside the audit's known write
sites — the same failure mode that bit the prior Tier 1 attempt.
Phase 4 complete. Cache + invalidation + safety net all in place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 10 (commit 0cbef3c) called ApplyCacheHit inside the per-(entity, partIdx)
foreach loop, but cachedEntry.Batches is flat across all MeshRefs of the
entity. For a 3-MeshRef static building on frame 2: 3 tuples times 6 cached
batches per call = 18 instances drawn instead of 6. Severe Z-fighting and
3x perf hit on every multi-part static entity (buildings, statues, multi-
MeshRef NPCs).
This is the symmetric mirror of the Task 9 bug fixed at 00fa8ae. Both
spec section 5.2 and the plan describe the foreach as per-entity, but
_walkScratch has been per-tuple since Task 6. The implementation
faithfully ported the buggy spec.
Fix: track lastHitEntityId; the cache-hit fast path fires only on the
first tuple of each entity, and subsequent tuples skip the iteration
body via continue. Adds a regression test pinning the per-entity
amplification invariant.
Caught by code review (subagent-driven-development) before Phase 3
dispatched. The bug was invisible in the no-multi-frame-test 1702/8
baseline; would have manifested as visible Z-fighting on every multi-
part building on second-and-subsequent frames once Task 13 perf gate
captured live runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WbDrawDispatcher.Draw now branches on cache hit before running classification:
on hit, walks the cached flat batch list and appends RestPose times entityWorld
to the matching groups; on miss, runs today's classification and populates
the cache (Task 9). Animated entities skip the cache entirely.
Adds dispatcher integration tests #11 (static entity populates + reuses)
and #12 (animated bypasses) per spec test plan section 7.2, plus the
multi-MeshRef regression test that would have caught the bug fixed in
commit 00fa8ae (cache populate must flush at entity boundary, not per-tuple).
Phase 2 (dispatcher integration) complete. End-to-end caching now live.
Invalidation hooks (Phase 3) ensure correctness across despawns + LB demotes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 9 (commit 2f489a8) called _cache.Populate inside the per-tuple
foreach loop, but _walkScratch contains one tuple per (entity, MeshRefIndex)
and the cache is keyed by entity.Id. For multi-MeshRef entities (multi-part
Setup buildings, statues, multi-MeshRef NPCs), each iteration's Populate
OVERWROTE the previous one — only the last MeshRef's batches survived.
The bug was invisible at commit time because Task 10 had not landed
(cache populates but isn't read). It would have manifested the moment
Task 10 wired the cache-hit fast path: every multi-part static building
in Holtburg would render as N stacked copies of its last part.
Fix: restructure the per-entity loop with a flush-on-entity-change pattern.
Track the previous entity's Id; when the iteration moves to a different
entity, flush the previous entity's accumulated _populateScratch via one
Populate call. After the loop, flush the final entity. _populateScratch
is now cleared at flush time, not per-iteration.
Caught by code review (subagent-driven-development) before Task 10 dispatched.
Verified: 1699/8 baseline preserved, sentinel 105/105 unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructures Draw's per-entity loop: animated entities still skip the
cache entirely, but static entities now collect their classification into
_populateScratch and call cache.Populate at the end of the iteration.
Cache fast-path (skip slow classification on cache hit) lands in Task 10.
This intermediate state is verifiable: behavior unchanged, but the cache
is being populated as entities render. Diagnostic-friendly split.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ClassifyBatches now accepts a restPose parameter (the model-matrix
component without entityWorld baked in) and an optional collector. When
collector is non-null, each classified batch is appended as a CachedBatch
record. Defaults preserve today's behavior. Used in Task 9 to populate
the cache on a static-entity miss.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the cache as a constructor parameter on WbDrawDispatcher and a
private field on GameWindow. The cache is passed through but not yet
consumed by Draw — that wires up in Task 9 (cache miss / populate) and
Task 10 (cache hit / fast path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the walk scratch tuple from (entity, meshRefIndex) to
(entity, meshRefIndex, landblockId). The dispatcher's per-entity loop now
has the landblock id available for EntityClassificationCache.Populate's
landblockHint argument (consumed in Task 9). No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical refactor: GroupKey was a private nested record struct on
WbDrawDispatcher. The upcoming EntityClassificationCache (ISSUE #53) needs
to store GroupKey inside CachedBatch records, so it must be visible to
both the dispatcher and the cache. Promoting to internal at file scope is
the smallest change that achieves this.
No behavior change. 1688 tests pass; 8 pre-existing failures unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three root causes regressed the Holtburg lifestone since the WB rendering
migration (Phase N.5 retirement amendment, commit dcae2b6, 2026-05-08).
All confirmed via temporary [LIFESTONE-DIAG] instrumentation and visually
verified by the user through the +Acdream test character.
1. **Alpha-test discard** in mesh_modern.frag transparent pass killed
high-α pixels of dat-flagged transparent surfaces. Native AC
transparent surfaces routinely include effectively-opaque pixels —
e.g. the lifestone crystal core (surface 0x080011DE) — that compose
correctly under (SrcAlpha, 1-SrcAlpha) blending. The original N.5
§2 rationale ("high-α belongs in opaque pass") doesn't hold for
surfaces flagged transparent at the dat level: those pixels can't
reach the opaque pass at all. Fix: remove `α >= 0.95 discard` from
the transparent pass, keep `α < 0.05 discard` as a fragment-cost
optimization (skip totally-empty pixels).
2. **Cull state** for the transparent pass was unset by
WbDrawDispatcher after the N.5 retirement amendment deleted
StaticMeshRenderer.cs (which had the Phase 9.2 setup at commit
6f1971a, 2026-04-11). Closed-shell translucents — lifestone crystal,
glow gems — need GL_CULL_FACE + GL_BACK + GL_CCW in the transparent
pass; otherwise back faces composite over front faces in iteration
order under DepthMask(false). Fix: re-establish Phase 9.2's exact
GL state setup at the top of Phase 8.
3. **uDrawIDOffset uniform** was missing from mesh_modern.vert.
gl_DrawIDARB resets to 0 at the start of each
glMultiDrawElementsIndirect call, so the transparent pass — which
begins later in the indirect buffer — was fetching
Batches[0..transparentCount) instead of its actual section at
Batches[opaqueCount..end). The lifestone crystal ended up reading
the FIRST OPAQUE batch's TextureHandle every frame; as the camera
moved and the front-to-back opaque sort reordered which group
landed at BatchData[0], the crystal's apparent texture flickered to
whatever sat first — typically the player character's body parts.
Fix: add `uniform int uDrawIDOffset` to the vertex shader, change
Batches[gl_DrawIDARB] → Batches[uDrawIDOffset + gl_DrawIDARB], and
set the uniform per-pass in WbDrawDispatcher (0 for opaque,
_opaqueDrawCount for transparent). Mirrors WorldBuilder's
BaseObjectRenderManager.cs line 845.
Tests: 1688/1696 passing (8 pre-existing physics/input failures
unchanged). N.5b conformance sentinel 94/94 clean.
Visual: Holtburg lifestone now renders with the spinning blue crystal
correctly composed over the pedestal. Other transparent content (glass,
particle effects, NPC clothing) is unaffected — the same uniform fix
applies globally and is correct for all transparent draws.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per docs/plans/2026-05-10-perf-tiers-2-3-roadmap.md Tier 1: cache the
per-(entity, meshRef, batch) classification (TextureCache lookup,
GroupKey hash, _groups dict insert) so the per-frame Draw inner loop
becomes "look up cache → walk assignments → append matrix to group's
Matrices list."
For static entities (~95% of world: trees, rocks, buildings, scenery),
the answer never changes between frames. Cache once at first visit;
reuse permanently. Per-frame work for static drops from 4 expensive
operations per (meshRef, batch) to 1 list-append.
Estimated entity dispatcher: 3.5ms → ~1-1.5ms median at radius=12.
Should land inside the 2.0ms spec budget.
Implementation:
- New EntityClassificationCache class (per-meshRef list of cached
(group ref, baked-PartTransform) tuples) keyed by entity.Id.
- ClassifyEntity does the one-time work; result populates _groups and
the cache.
- Draw inner loop: cache lookup → for each assignment, model =
PartTransform × entityWorld; group.Matrices.Add(model).
- Cache miss when ClassifyEntity finds NO mesh loaded yet (Vao == 0)
→ don't store; retry next frame. Avoids cache thrash during the
streaming-in window.
- Public InvalidateEntity(uint id) + ClearEntityCache() for explicit
invalidation hooks. Wiring (palette swap on ObjDescEvent, MeshRefs
hot-swap) is post-A.5 follow-up — for now, cache-stale entities
show their pre-swap appearance until next respawn.
Tier 2 (static/dynamic split with persistent groups) and Tier 3 (GPU
compute culling) tracked in the roadmap doc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
T17's WalkEntities helper allocated a fresh List<(WorldEntity, int)>
per frame to hold the (entity, meshRefIndex) pairs that pass visibility
filters. At ~10K entities × ~3 mesh refs = ~30K tuples × 16 bytes =
~480 KB / frame of GC pressure on the render thread. The implementer's
self-review flagged this as a future N.6 optimization; the post-T26
diagnostic showed it materially contributing to the perf regression
(though Bug A — far-tier entity load — was the dominant factor).
Refactor: split WalkEntities into two overloads.
- WalkEntities(...) — test-friendly, allocates a fresh ToDraw list per
call. Tests keep using this signature unchanged.
- WalkEntitiesInto(..., scratch, ref result) — no-alloc, clears + populates
a caller-provided scratch list. Draw uses this with a per-dispatcher
_walkScratch field reused across frames.
Test count unchanged (40 streaming + 8 bucketing tests still pass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GameWindow.OnLoad resolves QualitySettings.From(_persistedDisplay.Quality)
+ WithEnvOverrides() immediately after LoadAndApplyPersistedSettings, stores
result in _resolvedQuality field. All six quality dimensions applied:
- NearRadius / FarRadius: replace old T16 env-var-only block; preset drives
the radii, legacy ACDREAM_STREAM_RADIUS override still honoured.
- MsaaSamples: WindowOptions.Samples reads from startup quality resolution
in Run() (pre-window-create read from SettingsStore). MSAA cannot change
at runtime; ReapplyQualityPreset logs a restart-required warning if the
new preset would change it.
- AnisotropicLevel: TerrainAtlas.SetAnisotropic() called after Build() and
again in ReapplyQualityPreset. Temporarily removes bindless residency
before the GL TexParameter call, re-makes resident after.
- AlphaToCoverage: WbDrawDispatcher.AlphaToCoverage property gates the
glEnable/glDisable(SampleAlphaToCoverage) pair around the opaque pass.
- MaxCompletionsPerFrame: set on StreamingController after construction
and after each mid-session restart.
ReapplyQualityPreset(QualityPreset) method handles mid-session changes
(Settings panel Quality dropdown Save): rebuilds streamer + controller for
radius changes, toggles A2C and aniso immediately, logs MSAA restart caveat.
onSaveDisplay callback updated to call ReapplyQualityPreset when Quality
field changes.
TerrainModernRenderer.Atlas property added to expose the atlas for
mid-session aniso updates.
991 tests passing, 8 pre-existing failures unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Phase A.5 spec §2 acceptance criterion 6: entity dispatcher median
≤ 2.0ms; terrain dispatcher median ≤ 1.0ms at standstill. When the
median exceeds the budget, prefix the DIAG line with " BUDGET_OVER" so
the regression is grep-friendly during perf testing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Phase A.5 spec §4.9.2: ClipMap foliage uses binary alpha-cutoff.
At N₂=12 horizon distance the pixel-stepped silhouettes are visible.
A2C with MSAA 4x produces smooth retail-faithful tree edges.
GL context now requests Samples=4. WbDrawDispatcher's opaque pass
toggles GL_SAMPLE_ALPHA_TO_COVERAGE on/off around the multi-draw
indirect call. mesh_modern.frag's opaque pass now discards only
truly-empty (α<0.05) so the GPU derives sample mask from coverage;
transparent pass boundary logic is unchanged.
MSAA audit: no custom FBOs found — all rendering uses default
framebuffer. Sky/particles/ImGui are all MSAA-compatible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Phase A.5 spec §4.6 Change #1: when an LB is invisible AND
animatedEntityIds is non-empty, the inner loop walked every entity
in the LB just to find the few animated ones. At ~10.7K entities
(N1=4) that is wasted iteration cost per frame.
Extracted a pure-CPU internal static WalkEntities helper. When LB
is invisible: iterate animatedEntityIds directly and look each up
in a per-LB AnimatedById dictionary (typically <50 animated vs
~10K total). When LB is visible: walk all entities as before.
GpuWorldState.LandblockEntries now yields an AnimatedById map as a
5th tuple field alongside the AABB tuple. Dictionary is built on
each yield (cheap — ~132 entities/LB max). A caching layer is out
of A.5 scope.
WbDrawDispatcher.Draw signature updated to consume the 5-tuple.
GameWindow.cs call site passes _worldState.LandblockEntries which
now yields the 5-tuple — no change needed there.
8 new tests in WbDrawDispatcherBucketingTests cover T17 Change #1
(invisible LB / animated set / neverCull / null frustum) and
T18 Change #2 guard tests (cached AABB / dirty flag / animated bypass).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final cross-cutting review of N.5 found that Task 15's deletion of
mesh_instanced.vert/.frag left InstancedMeshRenderer orphaned —
ACDREAM_USE_WB_FOUNDATION=0 silently rendered terrain+sky only with
no entities. The SHIP commit's "[x] ACDREAM_USE_WB_FOUNDATION=0 still
works" claim was inaccurate.
Resolution: formal retirement of the legacy renderer path within N.5
instead of deferring to N.6.
Deleted:
- src/AcDream.App/Rendering/InstancedMeshRenderer.cs
- src/AcDream.App/Rendering/StaticMeshRenderer.cs
- src/AcDream.App/Rendering/Wb/WbFoundationFlag.cs
GameWindow simplified — capability detection is unconditional, missing
bindless throws NotSupportedException with a clear message at startup.
WbDrawDispatcher + mesh_modern shader load are mandatory after init.
No escape hatch.
GpuWorldState simplified — WbFoundationFlag.IsEnabled guards on
AddLandblock/RemoveLandblock removed; adapter calls are unconditional
when the adapter is non-null.
PendingSpawnIntegrationTests updated — WbFoundationFlag.ForTestsOnly_ForceEnable
static ctor removed (flag is gone; adapter calls are unconditional).
The ApplyLoadedTerrain physics-data loop was also simplified: the
EnsureUploaded sub-loop that fed InstancedMeshRenderer is gone;
_pendingCellMeshes is now explicitly cleared to prevent unbounded
accumulation (the worker thread still populates it, but WB handles
EnvCell geometry through its own pipeline).
Spec §2 Decision 5 + §10 Out-of-Scope updated. Plan ship-amendment
section added. Roadmap updated (N.5 ships with retirement; N.6 scope
narrowed to perf-only). CLAUDE.md "WB integration cribs" updated.
Perf baseline doc updated. WbDrawDispatcher class summary docstring
corrected to describe the as-shipped SSBO + multi-draw-indirect path.
ISSUES.md #51 updated (terrain not in N.5 scope; deferred to N.7).
Bindless support is now a hard requirement. Modern desktop GPUs
universally expose GL_ARB_bindless_texture + GL_ARB_shader_draw_parameters;
if a user hits the NotSupportedException, that's a real bug report
worth investigating, not a silent fallback.
Build: 0 errors, 0 warnings. Tests: 71/71 (Wb+MatrixComposition+TextureCacheBindless filter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds median + 95th-percentile CPU + GPU dispatch time to the existing
5-second [WB-DIAG] rollup. CPU via Stopwatch (always running, cheap;
only logged under ACDREAM_WB_DIAG=1). GPU via two GL_TIME_ELAPSED
queries (opaque + transparent) wrapping each glMultiDrawElementsIndirect,
polled non-blocking via QueryResultAvailable on the next frame.
Sample window is 256 frames per signal; median + p95 reported.
Numbers populate the SHIP commit's perf table at Task 19.
Silk.NET naming note: GL_TIME_ELAPSED queries use QueryTarget.TimeElapsed
(confirmed present in Silk.NET.OpenGL 2.23.0 DLL). The 64-bit result is
read via GetQueryObject(..., out ulong) which dispatches to
glGetQueryObjectui64v; the int overload (glGetQueryObjectiv) is used for
the ResultAvailable poll, matching WorldBuilder's VisibilityManager pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Locks in Decision 2 (Opaque + ClipMap → opaque indirect; AlphaBlend +
Additive + InvAlpha → transparent indirect). Catches future refactors
that drift the partition — silent visual regression otherwise (groups
rendered in the wrong pass with the wrong blend state).
Adds public static IsOpaquePublic shim on WbDrawDispatcher; the
underlying IsOpaque stays private.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces WbDrawDispatcher's per-group glDrawElementsInstancedBaseVertexBaseInstance
loop with two glMultiDrawElementsIndirect calls (opaque + transparent).
Per-frame uploads three SSBOs:
- _instanceSsbo @ binding=0 (mat4 per instance, indexed by gl_BaseInstanceARB + gl_InstanceID)
- _batchSsbo @ binding=1 (BatchData per group, indexed by gl_DrawIDARB)
- _indirectBuffer (DrawElementsIndirectCommand[] — opaque first, transparent second)
GameWindow swaps the shader load to mesh_modern when _bindlessSupport
is non-null. Capability detection + shader load now run in the right
order (capability before TextureCache + before Shader).
Deletes the obsolete DrawGroup stub, EnsureInstanceAttribs, _instanceBuffer,
_patchedVaos. ClassifyBatches + ResolveTexture already migrated in
Task 8 to use ulong bindless handles.
BuildIndirectArrays (Task 9) wired in: _opaqueDraws + _translucentDraws
are flattened into IndirectGroupInput[], laid out via the helper into
contiguous indirect commands + parallel BatchData[]. opaqueByteOffset=0,
transparentByteOffset = opaqueCount × DrawCommandStride.
Visual verification (USER GATE) PASS: Holtburg courtyard renders
identical to N.4 — terrain, scenery, characters, NPCs all visible
without artifacts. [N.5] modern path capabilities present + mesh_modern
shader loaded log lines confirm the boot path. [WB-DIAG] hot-path
counters show healthy entity/draw activity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught:
- sizeofDEIC was a local; promoted to public const DrawCommandStride
so tests can reference it symbolically.
- BatchDataPublic layout invariant (size + field offsets) wasn't
asserted in tests. Added BatchDataPublic_LayoutMatchesPrivateBatchData
+ DrawCommandStride_MatchesStructSize tests to gate Task 10's
MemoryMarshal.Cast<BatchData, BatchDataPublic> safety.
- Plan doc updated: BatchDataPublic spec was Pack=4 (wrong — must
match private BatchData's Pack=8 for the cast to work). Implementation
was already correct; plan now matches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure CPU helper that lays out a group list into a contiguous indirect
buffer (DrawElementsIndirectCommand[]) and parallel BatchData[] —
opaque section first, transparent section second. Returns counts +
byte offset for the transparent section.
Tests cover: spec §5 walk-through layout; empty group list edge case;
ClipMap classification (treated as opaque, not transparent).
Static + public so tests can exercise without a GL context. Task 10
wires it into the rewritten Draw() method.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces uint TextureHandle (32-bit GL name) with ulong
BindlessTextureHandle (64-bit) in InstanceGroup + GroupKey + ResolveTexture
return type. Adds TextureLayer (always 0 for per-instance composites,
becomes meaningful when WB atlas is adopted in N.6).
ClassifyBatches now calls TextureCache.GetOrUpload*Bindless variants —
these return Texture2DArray-backed bindless handles (Task 3 work).
DrawGroup body throws NotImplementedException — Task 10 rewrites the
whole Draw() method to use glMultiDrawElementsIndirect, which makes
DrawGroup obsolete. CPU-only tests don't invoke DrawGroup so the build
+ test gates stay green; visual launch fails until Task 10 (intentional).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code quality review caught that BatchData uses Pack=4 but contains a
ulong field. With the current field order (TextureHandle first), offset
0 is always 8-byte aligned so std430 works. But adding a 4-byte field
before TextureHandle without bumping Pack would silently misalign the
GPU struct. Pack=8 makes the alignment requirement explicit and adds
a comment documenting expected std430 offsets.
No runtime change — current offsets (0/8/12) are identical under both
Pack values for this field order.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds DrawElementsIndirectCommand struct (20-byte layout for
glMultiDrawElementsIndirect). Replaces _instanceVbo field on
WbDrawDispatcher with three buffers: _instanceSsbo (mat4[]),
_batchSsbo (BatchData[]), _indirectBuffer (DEIC[]). Adds BindlessSupport
constructor parameter — non-null required since the dispatcher is only
constructed when WB foundation is on (which implies bindless is present
per Task 6 capability detection).
Existing Draw() method substitutes _instanceVbo -> _instanceSsbo for
compile. Behavior is temporarily wrong (SSBO bound as ArrayBuffer for
per-vertex attribs); Tasks 9-10 fully rewrite the draw loop and the
per-frame uploads to use BindBufferBase + glMultiDrawElementsIndirect.
GameWindow construction site updated to add _bindlessSupport guard and
pass it as the new last argument to the constructor. Dispatcher is only
constructed when bindless is guaranteed present.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four small wins on top of the grouped-instanced refactor.
1. Drop unused animState lookup. Was a side-effect-free
_entitySpawnAdapter.GetState call per per-instance entity, made
redundant by the Issue #47 fix that trusts MeshRefs.
2. Front-to-back sort opaque groups. Squared distance from camera to
each group's first-instance translation; ascending sort. Lets the
GPU's depth test reject fragments behind closer geometry — real
win on dense scenes (Holtburg courtyard, Foundry interior).
3. Per-entity AABB frustum cull. 5m-radius AABB check per entity
before walking parts. Skips work for distant entities even when
their landblock is partially visible. Animated entities (other
characters, NPCs, monsters) bypass — they always need per-frame
work for animation regardless. Conservative radius covers typical
entity bounds; large outliers stay landblock-culled.
4. Memoize palette hash per entity. TextureCache.HashPaletteOverride
is now internal; new GetOrUploadWithPaletteOverride overload takes
a precomputed hash. The dispatcher computes it ONCE per entity and
reuses across every (part, batch) lookup, avoiding the per-batch
FNV-1a fold over SubPalettes. Trees / scenery without palette
overrides skip entirely (palHash stays 0).
Visual output unchanged; FPS up further, especially in dense scenes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three bugs surfaced and resolved during Task 26 visual verification.
1. **No-scenery + exploded characters**: WB's modern rendering path
(GL 4.3 + bindless) packs every mesh into a single global VAO/VBO/IBO
(GlobalMeshBuffer). Each batch references its slice via FirstIndex
(offset into IBO) + BaseVertex (offset into VBO). The dispatcher's
DrawElementsInstanced(indices=0) read offset 0 of the global IBO
for every entity — drawing the same first triangle from every
entity position. Switched to glDrawElementsInstancedBaseVertex(
BaseInstance) with the batch's offsets. Scenery + connected
characters now render correctly.
2. **Issue #47 character regression**: Adjustment 6 stored
AnimPartChanges on WorldEntity.PartOverrides using the raw
server-sent NewModelId (no degrade resolver applied). The
dispatcher's animState.ResolvePartGfxObj override path then
clobbered MeshRefs (which GameWindow's spawn code correctly
resolves to close-detail meshes via GfxObjDegradeResolver).
Result: humanoids drew low-detail (~14 verts/17 polys) base
meshes instead of close-detail (~32 verts/60 polys), losing
bicep / shoulder / back geometry. Fix: trust MeshRefs as the
source of truth and don't re-apply animState overrides at draw
time. AnimatedEntityState's overrides only matter for hot-swap
appearance updates (0xF625) which today rebuild MeshRefs anyway.
3. **Performance — sub-100 FPS on Holtburg**: per-entity
single-instance draws meant ~16K glDraw calls/frame plus a
64-byte glBufferSubData per call. Refactored to grouped
instanced rendering: bucket all (entity, batch) pairs by
GroupKey(Ibo, FirstIndex, BaseVertex, IndexCount, TextureHandle,
Translucency); upload all matrices in ONE BufferData call;
one glDrawElementsInstancedBaseVertexBaseInstance per group
with BaseInstance pointing at the group's slice in the shared
instance VBO. Down from ~16K to a few hundred draws/frame
(~30× fewer). Bind VAO once per frame (modern WB shares one
global VAO). Removed redundant per-draw VertexAttribPointer
(VAO captures that state).
Result: Holtburg renders correctly with characters showing full
detail; FPS climbed substantially. Two more bugs (mesh loading
+ batch.Key.SurfaceId) were fixed in the prior commit (943652d).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task 26 visual verification surfaced three bugs in the dispatcher.
Two are fixed here; the third is documented as a remaining issue.
1. WB's IncrementRefCount only bumps a usage counter — it does NOT
trigger mesh loading. Fixed in WbMeshAdapter.IncrementRefCount:
call PrepareMeshDataAsync(id, isSetup: false) on first registration.
Result auto-enqueues to _stagedMeshData (line 510 of WB's
ObjectMeshManager) which Tick() drains onto the GPU.
2. EntitySpawnAdapter never registered per-instance entity meshes
with WB. LandblockSpawnAdapter only registers atlas-tier
(ServerGuid == 0); per-instance entities fell through. Fixed by
adding optional IWbMeshAdapter constructor param + tracking unique
GfxObj ids per server-guid for IncrementRefCount on OnCreate /
DecrementRefCount on OnRemove.
3. WbDrawDispatcher.ResolveTexture used batch.SurfaceId which WB
never populates (line 1746 of ObjectMeshManager only sets
batch.Key — the TextureKey struct that has SurfaceId). Switched
to batch.Key.SurfaceId.
Plus diagnostic counters (ACDREAM_WB_DIAG=1) for entity-seen / drawn
/ mesh-missing / draws-issued counts.
Status: with these fixes the dispatcher now issues real draw calls
(~16K/frame, validated via diagnostic). However visual verification
shows characters appear "exploded" (parts spaced too far apart) and
scenery (trees/rocks/fences/buildings) does not appear. Root cause
analysis pending — Adjustment 7 in the plan documents the deferred
work. Flag stays default-off; legacy renderer remains the
production path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WbDrawDispatcher draws all entities through WB's ObjectRenderData
(VAO/VBO per GfxObj, per-batch IBO) using acdream's TextureCache for
texture resolution. Two-pass rendering (opaque+ClipMap, then
translucent) matching the existing InstancedMeshRenderer pattern.
Per-entity single-instance drawing for N.4 simplicity — true
instancing grouping deferred to N.6.
Atlas-tier entities: mesh from WB, texture from TextureCache via
batch SurfaceId. Per-instance-tier entities: AnimatedEntityState
drives part overrides + hidden-parts, palette/surface overrides
resolve through TextureCache's composite-key caches.
Side-table population (Task 23 folded in): WbMeshAdapter now takes
DatCollection and populates AcSurfaceMetadataTable on first
IncrementRefCount per GfxObj. The side-table provides TranslucencyKind
(critical for ClipMap alpha-test on vegetation) plus Luminosity,
Diffuse, SurfOpacity, NeedsUvRepeat, DisableFog for sky-pass and
lighting.
GameWindow wiring: when WbFoundationFlag is enabled, WbDrawDispatcher
draws everything and InstancedMeshRenderer is skipped. Flag-off path
is unchanged.
Matrix composition: restPose * animOverride * entityWorld, matching
the spec. Three MatrixCompositionTests verify the contract.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>