acdream/src/AcDream.App/Rendering/InstancedMeshRenderer.cs
Erik 3308cddda7 fix(movement+anim+session): clothing dedup, motion wire format, jump-skill default
Three separate fixes landed today, each addressing a specific bug the
user observed during live play:

1. NPC clothing changes by camera angle (InstancedMeshRenderer)
   - Group key was (GfxObjId) only, so every humanoid NPC using the
     same body mesh piled into one instance group; only the first
     instance's texture was used for the entire DrawInstanced batch,
     so which NPC's palette "won" changed as frustum culling and
     iteration order shuffled entries.
   - Now keyed by (GfxObjId, PaletteHash ^ SurfaceOverridesHash)
     so only compatible instances batch; each unique appearance gets
     its own draw call. Perf hit is small (humanoid NPCs each emit
     one more draw call); visually every NPC is now stable.

2. GpuWorldState dedup on respawn
   - Server re-sends CreateObject for the same guid on visibility
     refresh / landblock crossing / appearance update. AppendLiveEntity
     was blindly appending each time, so GpuWorldState accumulated
     multiple copies of the same entity, each with its own
     PaletteOverride / MeshRefs. That alone wasn't the clothing bug
     (that was #1) but it would have caused other overlap problems
     downstream.
   - Added RemoveEntityByServerGuid + WorldGameState.RemoveById;
     OnLiveEntitySpawnedLocked calls both before creating the new
     entity so respawns replace cleanly.

3. Motion wire format — run animation sync with retail observers
   - ACE's MovementData constructor only computes interpState.ForwardSpeed
     on the WalkForward/WalkBackwards branch; every other ForwardCommand
     falls into `else` and passes through WITHOUT speed set, giving
     observers speed=0. Sending RunForward directly meant retail
     clients saw us "run in place" while position drifted forward.
   - Wire: always WalkForward + HoldKey.Run for running. ACE
     auto-upgrades to RunForward with creature.GetRunRate() for
     broadcast — correct command + correct speed at observers.
   - Added per-axis FORWARD_HOLD_KEY / SIDE_STEP_HOLD_KEY /
     TURN_HOLD_KEY so every active axis carries HoldKey.Run when
     running (matches holtburger's build_motion_state_raw_motion_state).
   - Added LocalAnimationCommand to MovementResult so our own
     client still plays the RunForward cycle locally while the wire
     stays WalkForward. Wire vs. local animation command are now
     decoupled.
   - Walk-backward wire command changed from WalkForward@-0.65 to
     WalkBackward@1.0 (holtburger pattern).
   - Strafe speed changed from 0.5 to 1.0 on wire AND local physics
     (matches retail sidestep pace).

4. Jump height default + env-var tuning
   - Default jumpSkill bumped from 100 → 200 (jump ≈ 3m at full
     charge, closer to retail feel for a mid-level character).
   - ACDREAM_RUN_SKILL and ACDREAM_JUMP_SKILL env vars now override
     the defaults so the user can tune per-character until we parse
     PlayerDescription and plumb real skill values through.

5. JustLanded signal on MovementResult
   - Tracks airborne→grounded transition so future animation code
     can fire the landing cycle when we land. Just a bool flag for
     now — no consumer yet (the proper action-queue path will use it).

Not in this commit: jump animation itself. An earlier attempt to
SetCycle(Jump=0x2500003b) fed an Action-type motion into the SubState
cycle resolver, which produced a "torso" mis-render. Reverted. The
proper fix is porting the retail motion action-queue semantics into
AnimationSequencer — see docs/research/deepdives/r03-motion-animation.md
for the spec. That's the next session's work.

470 tests pass, build clean.
2026-04-18 15:01:32 +02:00

534 lines
23 KiB
C#
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

// src/AcDream.App/Rendering/InstancedMeshRenderer.cs
//
// True instanced rendering for static-object meshes.
// Groups entities by GfxObjId. All instance model matrices are written into
// a single shared instance VBO once per frame. Each sub-mesh is drawn with
// DrawElementsInstanced — one GL draw call per (GfxObj × sub-mesh) instead
// of one per entity. For a scene with N unique GfxObjs and M total entities
// this reduces draw calls from M*subMeshes to N*subMeshes.
//
// Matrix layout:
// System.Numerics.Matrix4x4 is row-major. Written to the float[] buffer in
// natural memory order (M11..M44). The GLSL shader reads 4 vec4 attributes
// (aInstanceRow0-3) and constructs mat4(row0, row1, row2, row3). Because
// GLSL mat4() takes column vectors, the rows of the C# matrix become the
// columns of the GLSL mat4 — which is the same transpose that UniformMatrix4
// with transpose=false produces. Visual result is identical to the old
// SetMatrix4("uModel", ...) path.
//
// Architecture note: public API matches StaticMeshRenderer so GameWindow only
// needs to update the shader and uniform setup at the call sites.
using System.Numerics;
using System.Runtime.InteropServices;
using AcDream.Core.Meshing;
using AcDream.Core.Terrain;
using AcDream.Core.World;
using Silk.NET.OpenGL;
namespace AcDream.App.Rendering;
public sealed unsafe class InstancedMeshRenderer : IDisposable
{
private readonly GL _gl;
private readonly Shader _shader;
private readonly TextureCache _textures;
// One GPU bundle per unique GfxObj id. Each GfxObj can have multiple sub-meshes.
private readonly Dictionary<uint, List<SubMeshGpu>> _gpuByGfxObj = new();
// Shared instance VBO — filled every frame with all instance model matrices.
private readonly uint _instanceVbo;
// Per-frame scratch: reused float buffer for instance matrix data.
// 16 floats per mat4. Grown on demand; never shrunk.
private float[] _instanceBuffer = new float[256 * 16]; // start at 256 instances
// ── Instance grouping scratch ─────────────────────────────────────────────
//
// Reused every frame to avoid per-frame allocation.
//
// **Group key = (GfxObjId, PaletteOverrideHash, SurfaceOverridesHash).**
//
// An earlier implementation grouped on <c>GfxObjId</c> alone and resolved
// the per-sub-mesh texture from the first instance in the group — which
// is fine for scenery where every tree shares the same palette, but
// utterly broken for NPCs: every humanoid uses the same base body
// GfxObjs and they all piled into one group, so the first NPC's palette
// was used for every NPC in the frame. Frustum culling + iteration
// order meant that "first NPC" changed as the camera turned — producing
// the "NPC clothing changes when I turn" symptom.
//
// Now we also key by the entity's PaletteOverride + per-MeshRef
// SurfaceOverrides signature so only entities that decode to the
// SAME texture for every sub-mesh can share a batch. Entities with
// unique appearance fall to single-instance groups (still correct,
// marginally slower than true instancing).
private readonly Dictionary<GroupKey, InstanceGroup> _groups = new();
private readonly record struct GroupKey(uint GfxObjId, ulong TextureSignature);
public InstancedMeshRenderer(GL gl, Shader shader, TextureCache textures)
{
_gl = gl;
_shader = shader;
_textures = textures;
_instanceVbo = _gl.GenBuffer();
}
// ── Upload ────────────────────────────────────────────────────────────────
public void EnsureUploaded(uint gfxObjId, IReadOnlyList<GfxObjSubMesh> subMeshes)
{
if (_gpuByGfxObj.ContainsKey(gfxObjId))
return;
var list = new List<SubMeshGpu>(subMeshes.Count);
foreach (var sm in subMeshes)
list.Add(UploadSubMesh(sm));
_gpuByGfxObj[gfxObjId] = list;
}
private SubMeshGpu UploadSubMesh(GfxObjSubMesh sm)
{
uint vao = _gl.GenVertexArray();
_gl.BindVertexArray(vao);
// ── Vertex buffer (positions, normals, UVs) ───────────────────────────
uint vbo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, vbo);
fixed (void* p = sm.Vertices)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(sm.Vertices.Length * sizeof(Vertex)), p, BufferUsageARB.StaticDraw);
uint stride = (uint)sizeof(Vertex);
_gl.EnableVertexAttribArray(0);
_gl.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, false, stride, (void*)0);
_gl.EnableVertexAttribArray(1);
_gl.VertexAttribPointer(1, 3, VertexAttribPointerType.Float, false, stride, (void*)(3 * sizeof(float)));
_gl.EnableVertexAttribArray(2);
_gl.VertexAttribPointer(2, 2, VertexAttribPointerType.Float, false, stride, (void*)(6 * sizeof(float)));
// Note: location 3 (uint TerrainLayer) is NOT used by mesh_instanced.vert;
// that slot is reserved for per-instance mat4 row 0 from the instance VBO.
// ── Index buffer ──────────────────────────────────────────────────────
uint ebo = _gl.GenBuffer();
_gl.BindBuffer(BufferTargetARB.ElementArrayBuffer, ebo);
fixed (void* p = sm.Indices)
_gl.BufferData(BufferTargetARB.ElementArrayBuffer,
(nuint)(sm.Indices.Length * sizeof(uint)), p, BufferUsageARB.StaticDraw);
// ── Per-instance model matrix (locations 3-6) ─────────────────────────
// Bind the shared instance VBO. The VAO captures this binding at each
// attribute location. At draw time we re-call VertexAttribPointer with
// the per-group byte offset (to address different groups in the VBO
// without DrawElementsInstancedBaseInstance).
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
// mat4 = 4 × vec4, stride = 64 bytes, divisor = 1 (advance once per instance)
for (uint row = 0; row < 4; row++)
{
uint loc = 3 + row;
_gl.EnableVertexAttribArray(loc);
_gl.VertexAttribPointer(loc, 4, VertexAttribPointerType.Float, false, 64, (void*)(row * 16));
_gl.VertexAttribDivisor(loc, 1);
}
_gl.BindVertexArray(0);
return new SubMeshGpu
{
Vao = vao,
Vbo = vbo,
Ebo = ebo,
IndexCount = sm.Indices.Length,
SurfaceId = sm.SurfaceId,
Translucency = sm.Translucency,
};
}
// ── Draw ──────────────────────────────────────────────────────────────────
public void Draw(ICamera camera,
IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
FrustumPlanes? frustum = null,
uint? neverCullLandblockId = null,
HashSet<uint>? visibleCellIds = null)
{
_shader.Use();
var vp = camera.View * camera.Projection;
_shader.SetMatrix4("uViewProjection", vp);
// Lighting uniforms matching ACME StaticObject.vert.
var lightDir = Vector3.Normalize(new Vector3(0.5f, 0.3f, -0.3f));
_shader.SetVec3("uLightDirection", lightDir);
_shader.SetFloat("uAmbientIntensity", 0.45f);
// ── Collect and group instances ───────────────────────────────────────
CollectGroups(landblockEntries, frustum, neverCullLandblockId, visibleCellIds);
// ── Build and upload the instance buffer ──────────────────────────────
// Count total instances.
int totalInstances = 0;
foreach (var grp in _groups.Values)
totalInstances += grp.Count;
// Grow the scratch buffer if needed.
int needed = totalInstances * 16;
if (_instanceBuffer.Length < needed)
_instanceBuffer = new float[needed + 256 * 16]; // extra headroom
// Write all groups contiguously. Record each group's starting offset
// (in units of instances, not bytes) so we can address them at draw time.
int instanceOffset = 0;
foreach (var grp in _groups.Values)
{
grp.BufferOffset = instanceOffset;
foreach (ref readonly var inst in CollectionsMarshal.AsSpan(grp.Entries))
WriteMatrix(_instanceBuffer, instanceOffset++ * 16, inst.Model);
}
// Upload all instance data in a single DynamicDraw call.
if (totalInstances > 0)
{
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
fixed (void* p = _instanceBuffer)
_gl.BufferData(BufferTargetARB.ArrayBuffer,
(nuint)(totalInstances * 16 * sizeof(float)), p, BufferUsageARB.DynamicDraw);
}
// ── Pass 1: Opaque + ClipMap ──────────────────────────────────────────
foreach (var (key, grp) in _groups)
{
if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes))
continue;
bool hasOpaqueSubMesh = false;
foreach (var sub in subMeshes)
{
if (sub.Translucency == TranslucencyKind.Opaque ||
sub.Translucency == TranslucencyKind.ClipMap)
{
hasOpaqueSubMesh = true;
break;
}
}
if (!hasOpaqueSubMesh) continue;
// For this group, instance data starts at grp.BufferOffset in the VBO.
// We need to tell the VAO to read from that offset.
uint byteOffset = (uint)(grp.BufferOffset * 64); // 64 bytes per mat4
foreach (var sub in subMeshes)
{
if (sub.Translucency != TranslucencyKind.Opaque &&
sub.Translucency != TranslucencyKind.ClipMap)
continue;
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
// Bind VAO + re-point instance attributes to the group's slice
// in the shared VBO. This updates the VAO's stored offset for
// locations 3-6 without touching the vertex or index bindings.
_gl.BindVertexArray(sub.Vao);
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
for (uint row = 0; row < 4; row++)
{
_gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float,
false, 64, (void*)(byteOffset + row * 16));
}
// Resolve texture from the first instance (all instances in this
// group share the same GfxObj so they have compatible overrides
// only in the degenerate case of mixed-palette entities using the
// same GfxObj — rare enough to accept the approximation here).
if (grp.Count == 0) continue;
var firstEntry = grp.Entries[0];
uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.DrawElementsInstanced(PrimitiveType.Triangles,
(uint)sub.IndexCount,
DrawElementsType.UnsignedInt,
(void*)0,
(uint)grp.Count);
}
}
// ── Pass 2: Translucent (AlphaBlend, Additive, InvAlpha) ─────────────
_gl.Enable(EnableCap.Blend);
_gl.DepthMask(false);
_gl.Enable(EnableCap.CullFace);
_gl.CullFace(TriangleFace.Back);
_gl.FrontFace(FrontFaceDirection.Ccw);
foreach (var (key, grp) in _groups)
{
if (!_gpuByGfxObj.TryGetValue(key.GfxObjId, out var subMeshes))
continue;
bool hasTranslucentSubMesh = false;
foreach (var sub in subMeshes)
{
if (sub.Translucency != TranslucencyKind.Opaque &&
sub.Translucency != TranslucencyKind.ClipMap)
{
hasTranslucentSubMesh = true;
break;
}
}
if (!hasTranslucentSubMesh) continue;
uint byteOffset = (uint)(grp.BufferOffset * 64);
foreach (var sub in subMeshes)
{
if (sub.Translucency == TranslucencyKind.Opaque ||
sub.Translucency == TranslucencyKind.ClipMap)
continue;
switch (sub.Translucency)
{
case TranslucencyKind.Additive:
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.One);
break;
case TranslucencyKind.InvAlpha:
_gl.BlendFunc(BlendingFactor.OneMinusSrcAlpha, BlendingFactor.SrcAlpha);
break;
default: // AlphaBlend
_gl.BlendFunc(BlendingFactor.SrcAlpha, BlendingFactor.OneMinusSrcAlpha);
break;
}
_shader.SetInt("uTranslucencyKind", (int)sub.Translucency);
_gl.BindVertexArray(sub.Vao);
_gl.BindBuffer(BufferTargetARB.ArrayBuffer, _instanceVbo);
for (uint row = 0; row < 4; row++)
{
_gl.VertexAttribPointer(3 + row, 4, VertexAttribPointerType.Float,
false, 64, (void*)(byteOffset + row * 16));
}
if (grp.Count == 0) continue;
var firstEntry = grp.Entries[0];
uint tex = ResolveTex(firstEntry.Entity, firstEntry.MeshRef, sub);
_gl.ActiveTexture(TextureUnit.Texture0);
_gl.BindTexture(TextureTarget.Texture2D, tex);
_gl.DrawElementsInstanced(PrimitiveType.Triangles,
(uint)sub.IndexCount,
DrawElementsType.UnsignedInt,
(void*)0,
(uint)grp.Count);
}
}
// Restore default GL state.
_gl.DepthMask(true);
_gl.Disable(EnableCap.Blend);
_gl.Disable(EnableCap.CullFace);
_gl.BindVertexArray(0);
}
// ── Grouping ──────────────────────────────────────────────────────────────
/// <summary>
/// Iterates all visible landblock entries and groups every (entity, meshRef)
/// pair by GfxObjId. Clears previous frame's groups before filling.
/// </summary>
private void CollectGroups(
IEnumerable<(uint LandblockId, Vector3 AabbMin, Vector3 AabbMax, IReadOnlyList<WorldEntity> Entities)> landblockEntries,
FrustumPlanes? frustum,
uint? neverCullLandblockId,
HashSet<uint>? visibleCellIds)
{
foreach (var grp in _groups.Values)
grp.Entries.Clear();
foreach (var entry in landblockEntries)
{
if (frustum is not null &&
entry.LandblockId != neverCullLandblockId &&
!FrustumCuller.IsAabbVisible(frustum.Value, entry.AabbMin, entry.AabbMax))
continue;
foreach (var entity in entry.Entities)
{
if (entity.MeshRefs.Count == 0)
continue;
// Step 4: portal visibility filter. If we have a visible cell set,
// skip interior entities whose parent cell isn't visible.
// visibleCellIds == null means camera is outdoors → show all interiors.
if (entity.ParentCellId.HasValue && visibleCellIds is not null
&& !visibleCellIds.Contains(entity.ParentCellId.Value))
continue;
var entityRoot =
Matrix4x4.CreateFromQuaternion(entity.Rotation) *
Matrix4x4.CreateTranslation(entity.Position);
// Hash the entity's PaletteOverride once — shared by every
// MeshRef on this entity, so we compute it outside the loop.
ulong palHash = HashPaletteOverride(entity.PaletteOverride);
foreach (var meshRef in entity.MeshRefs)
{
if (!_gpuByGfxObj.ContainsKey(meshRef.GfxObjId))
continue;
var model = meshRef.PartTransform * entityRoot;
// Texture signature = palette hash ^ surface-overrides hash.
// Two instances can share a batch only when their ResolveTex
// would return identical handles for every sub-mesh — that
// means identical palette AND identical surface overrides.
ulong surfHash = HashSurfaceOverrides(meshRef.SurfaceOverrides);
ulong texSig = palHash ^ surfHash;
var key = new GroupKey(meshRef.GfxObjId, texSig);
if (!_groups.TryGetValue(key, out var group))
{
group = new InstanceGroup();
_groups[key] = group;
}
group.Entries.Add(new InstanceEntry(model, entity, meshRef));
}
}
}
}
private static ulong HashPaletteOverride(AcDream.Core.World.PaletteOverride? p)
{
if (p is null) return 0UL;
ulong h = 0xCBF29CE484222325UL;
const ulong prime = 0x100000001B3UL;
h = (h ^ p.BasePaletteId) * prime;
foreach (var sp in p.SubPalettes)
{
h = (h ^ sp.SubPaletteId) * prime;
h = (h ^ sp.Offset) * prime;
h = (h ^ sp.Length) * prime;
}
return h;
}
/// <summary>
/// Order-independent hash of a SurfaceOverrides dictionary. XOR of each
/// (key, value) pair keeps the result stable regardless of Dictionary
/// iteration order, so two instances whose override maps contain the
/// same pairs will hash identically.
/// </summary>
private static ulong HashSurfaceOverrides(IReadOnlyDictionary<uint, uint>? overrides)
{
if (overrides is null || overrides.Count == 0) return 0UL;
ulong acc = 0UL;
foreach (var kvp in overrides)
{
ulong pair = ((ulong)kvp.Key << 32) | kvp.Value;
acc ^= pair;
}
// Fold with a prime so the zero case doesn't collide with "empty".
return (acc ^ 0xCBF29CE484222325UL) * 0x100000001B3UL;
}
// ── Matrix write ──────────────────────────────────────────────────────────
/// <summary>
/// Writes a System.Numerics Matrix4x4 into <paramref name="buf"/> starting
/// at <paramref name="offset"/> as 16 consecutive floats in row-major order
/// (the C# natural memory layout). The GLSL shader reads each 4-float row
/// as a column of the mat4 — identical to what UniformMatrix4(transpose=false)
/// produces for the uniform path.
/// </summary>
private static void WriteMatrix(float[] buf, int offset, in Matrix4x4 m)
{
buf[offset + 0] = m.M11; buf[offset + 1] = m.M12; buf[offset + 2] = m.M13; buf[offset + 3] = m.M14;
buf[offset + 4] = m.M21; buf[offset + 5] = m.M22; buf[offset + 6] = m.M23; buf[offset + 7] = m.M24;
buf[offset + 8] = m.M31; buf[offset + 9] = m.M32; buf[offset + 10] = m.M33; buf[offset + 11] = m.M34;
buf[offset + 12] = m.M41; buf[offset + 13] = m.M42; buf[offset + 14] = m.M43; buf[offset + 15] = m.M44;
}
// ── Texture resolution ────────────────────────────────────────────────────
private uint ResolveTex(WorldEntity entity, MeshRef meshRef, SubMeshGpu sub)
{
uint overrideOrigTex = 0;
bool hasOrigTexOverride = meshRef.SurfaceOverrides is not null
&& meshRef.SurfaceOverrides.TryGetValue(sub.SurfaceId, out overrideOrigTex);
uint? origTexOverride = hasOrigTexOverride ? overrideOrigTex : (uint?)null;
if (entity.PaletteOverride is not null)
{
return _textures.GetOrUploadWithPaletteOverride(
sub.SurfaceId, origTexOverride, entity.PaletteOverride);
}
else if (hasOrigTexOverride)
{
return _textures.GetOrUploadWithOrigTextureOverride(sub.SurfaceId, overrideOrigTex);
}
else
{
return _textures.GetOrUpload(sub.SurfaceId);
}
}
// ── Disposal ──────────────────────────────────────────────────────────────
public void Dispose()
{
foreach (var subs in _gpuByGfxObj.Values)
{
foreach (var sub in subs)
{
_gl.DeleteBuffer(sub.Vbo);
_gl.DeleteBuffer(sub.Ebo);
_gl.DeleteVertexArray(sub.Vao);
}
}
_gl.DeleteBuffer(_instanceVbo);
_gpuByGfxObj.Clear();
_groups.Clear();
}
// ── Private types ─────────────────────────────────────────────────────────
private sealed class SubMeshGpu
{
public uint Vao;
public uint Vbo;
public uint Ebo;
public int IndexCount;
public uint SurfaceId;
public TranslucencyKind Translucency;
}
/// <summary>
/// All instances of one GfxObj for this frame, plus their starting offset
/// in the shared instance VBO (in units of instances, not bytes).
/// </summary>
private sealed class InstanceGroup
{
public readonly List<InstanceEntry> Entries = new();
public int BufferOffset;
public int Count => Entries.Count;
}
private readonly struct InstanceEntry
{
public readonly Matrix4x4 Model;
public readonly WorldEntity Entity;
public readonly MeshRef MeshRef;
public InstanceEntry(Matrix4x4 model, WorldEntity entity, MeshRef meshRef)
{
Model = model;
Entity = entity;
MeshRef = meshRef;
}
}
}