feat(render): Phase U.3 — GPU clip-plane gate (gl_ClipDistance), no-clip default

Adds the GPU mechanism to clip drawing to a per-cell screen-space convex
region via gl_ClipDistance, consumed by the mesh + terrain vertex shaders.
This is the MECHANISM only — every instance defaults to slot 0 (no-clip /
pass-all) and terrain to count 0, so the running game renders IDENTICALLY to
pre-U.3 (verified: offline launch compiles both shaders and reaches steady
state; no GL errors). U.4 populates real clip data from portal visibility.

Binding contract (define once, both sides obey):
- mesh_modern.vert: SSBO binding=2 CellClip[] (shared per-frame regions, slot 0
  reserved no-clip) + SSBO binding=3 uint[] per-instance slot, indexed by the
  IDENTICAL gl_BaseInstanceARB+gl_InstanceID used for binding=0. binding=0/1
  untouched.
- terrain_modern.vert: UBO binding=2 TerrainClip { int count; vec4 planes[8]; }
  for the single OutsideView region (UBO namespace; SceneLighting is UBO
  binding=1, so binding=2 is free and does not collide with the mesh SSBO
  binding=2). count 0 = ungated.
- Both redeclare out gl_PerVertex { vec4 gl_Position; float gl_ClipDistance[8]; }
  and set unused planes (i >= count) to +1.0 so they pass everything.

CellClip std430 layout (144 bytes/slot): count@0, 3 pad uints@4/8/12,
planes[8]@16 (vec4 stride 16). Terrain UBO std140: count@0 (padded to 16),
planes[8]@16 → 144 bytes. Verified by ClipFrameLayoutTests (8 new tests).

Pieces:
- ClipFrame: per-frame container + uploader for the SHARED clip data (binding=2
  SSBO + terrain UBO). NoClip() = slot 0 + terrain count 0. AppendSlot /
  SetTerrainClip pack std430/std140 bytes for U.4. UploadShared binds both.
- WbDrawDispatcher + EnvCellRenderer: each owns its binding=3 zero buffer
  (all-zeros sized to its instance count → slot 0), re-binds binding=2 from the
  shared ClipFrame id (or an internal no-clip fallback if unwired) before MDI.
  gl_ClipDistance is per-vertex, so the single glMultiDrawElementsIndirect per
  group is preserved — no draw splitting.
- TerrainModernRenderer: binds the terrain clip UBO (shared or no-clip fallback)
  before its draw.
- GameWindow: glEnable(GL_CLIP_DISTANCE0..7) once at init (unused planes pass-all
  so always-on avoids per-draw thrash); per frame builds ClipFrame.NoClip(),
  UploadShared, and hands the buffer ids to the three renderers (tiny diff; U.4
  swaps NoClip() for the real portal-visibility frame).

Gate: dotnet build green; App suite 134/134; offline launch confirms both
shaders compile + link with no GL errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-05-30 17:27:30 +02:00
parent 0b125830fe
commit bf2e559369
8 changed files with 797 additions and 1 deletions

View file

@ -0,0 +1,276 @@
// ClipFrame.cs
//
// Phase U.3: the GPU-side container + uploader for the SHARED per-frame clip
// data consumed by mesh_modern.vert (SSBO binding=2) and terrain_modern.vert
// (UBO binding=2). This is the "shared" half of the U.3 clip mechanism; the
// per-instance slot index buffer (SSBO binding=3) is PER-RENDERER and owned by
// each renderer (WbDrawDispatcher / EnvCellRenderer), parallel to its instance
// buffer — it is NOT here.
//
// === The contract (both shader sides obey) ===================================
// binding=2 mesh SSBO holds an array of CellClip, one per "slot":
// struct CellClip { uint count; uint _p0; uint _p1; uint _p2; vec4 planes[8]; };
// std430 layout: count at byte 0, three pad uints at 4/8/12, planes[8] at 16
// (vec4 stride 16) → 144 bytes per slot. Slot 0 is RESERVED = no-clip (count 0).
// binding=2 terrain UBO holds the single OutsideView region:
// layout(std140) { int uTerrainClipCount; vec4 uTerrainClipPlanes[8]; };
// std140 layout: count at byte 0 (padded to 16), planes[8] at 16 → 144 bytes.
//
// In U.3 a ClipFrame is built via NoClip(): one slot (slot 0, count 0) and a
// terrain count of 0. Everything renders exactly as before. U.4 populates real
// slots from a PortalVisibilityFrame (one CellClip per visible cell) and sets the
// terrain OutsideView planes, then points each renderer's per-instance slot
// buffer at the right slots.
//
// Pure CPU byte-packing + a thin GL upload. NO GL types appear except in
// UploadShared. The byte layout is asserted by ClipFrameLayoutTests so a silent
// std430/std140 drift can't reach the GPU.
using System;
using System.Collections.Generic;
using System.Numerics;
using Silk.NET.OpenGL;
namespace AcDream.App.Rendering;
/// <summary>
/// Per-frame container + uploader for the SHARED clip data: the binding=2 mesh
/// SSBO (one <c>CellClip</c> per slot, slot 0 reserved no-clip) and the binding=2
/// terrain UBO (the single OutsideView region). See the file header for the exact
/// std430 / std140 byte layout. Per-instance slot buffers (binding=3) are owned by
/// each renderer, not here.
/// </summary>
public sealed class ClipFrame : IDisposable
{
// ---- Layout constants (mirror mesh_modern.vert + terrain_modern.vert) ----
/// <summary>Max planes per clip region — matches the shader's <c>planes[8]</c>
/// and GL's guaranteed <c>GL_MAX_CLIP_DISTANCES &gt;= 8</c>.</summary>
public const int MaxPlanes = 8;
/// <summary>std430 stride of one <c>CellClip</c>: 16 (count + 3 pad uints) +
/// 8 × 16 (vec4 planes) = 144 bytes.</summary>
public const int CellClipStrideBytes = 16 + MaxPlanes * 16; // 144
/// <summary>Byte offset of <c>planes[0]</c> within a <c>CellClip</c> (after the
/// count + 3 pad uints).</summary>
public const int CellClipPlanesOffset = 16;
/// <summary>std140 size of the terrain UBO block: int count padded to 16, then
/// 8 × 16 (vec4 planes) = 144 bytes. Same number as the SSBO stride by
/// coincidence of the 16-byte vec4 rule, but a DIFFERENT layout family.</summary>
public const int TerrainUboBytes = 16 + MaxPlanes * 16; // 144
/// <summary>SSBO binding index for the shared per-cell clip regions
/// (mesh_modern.vert binding=2).</summary>
public const uint MeshClipSsboBinding = 2;
/// <summary>UBO binding index for the terrain OutsideView clip region
/// (terrain_modern.vert binding=2). UBO namespace — distinct from the SSBO
/// binding=2 above.</summary>
public const uint TerrainClipUboBinding = 2;
// ---- CPU-side state ------------------------------------------------------
// Packed std430 bytes for clipRegions[]. Always holds at least slot 0.
private byte[] _regionBytes;
private int _slotCount;
// Packed std140 bytes for the terrain UBO (always TerrainUboBytes long).
private readonly byte[] _terrainBytes = new byte[TerrainUboBytes];
// ---- GL-side state (lazily created on first UploadShared) ----------------
private uint _regionSsbo;
private uint _terrainUbo;
private bool _glInitialized;
private bool _disposed;
private ClipFrame(byte[] regionBytes, int slotCount)
{
_regionBytes = regionBytes;
_slotCount = slotCount;
// Terrain defaults to count 0 (ungated). _terrainBytes is already all
// zeros, which encodes count=0 + zeroed (unused) planes.
}
/// <summary>
/// The U.3 default frame: exactly slot 0 (no-clip, count 0) and a terrain
/// count of 0. The whole scene renders ungated — identical to pre-U.3. U.4
/// replaces this with a frame built from real portal visibility.
/// </summary>
public static ClipFrame NoClip()
{
// One slot, all zeros: count=0 ⇒ shader passes every plane.
var bytes = new byte[CellClipStrideBytes];
return new ClipFrame(bytes, slotCount: 1);
}
/// <summary>Number of clip slots currently packed (always &gt;= 1 — slot 0 is
/// the reserved no-clip slot).</summary>
public int SlotCount => _slotCount;
/// <summary>The shared mesh-clip SSBO id, or 0 before the first
/// <see cref="UploadShared"/>. Renderers may bind this directly if they don't
/// receive it via a parameter; <see cref="UploadShared"/> already binds it to
/// <see cref="MeshClipSsboBinding"/>.</summary>
public uint RegionSsbo => _regionSsbo;
/// <summary>The terrain-clip UBO id, or 0 before the first
/// <see cref="UploadShared"/>. Handed to <see cref="TerrainModernRenderer"/>
/// so it can re-bind binding=2 (UBO namespace) before its draw.</summary>
public uint TerrainUbo => _terrainUbo;
/// <summary>
/// Append one clip region (becomes the next slot index) from a
/// <see cref="ClipPlaneSet"/>. Only the convex-plane case is supported in
/// U.3 — <c>Count &gt; 0</c> packs that many planes; <c>Count == 0</c> packs a
/// no-clip region (pass-all). The scissor / nothing-visible fallbacks that
/// <see cref="ClipPlaneSet"/> can carry are deferred to U.4 (which will draw
/// the AABB box or skip the cell on the CPU side, not via this slot). Returns
/// the new slot's index.
/// </summary>
public int AppendSlot(ClipPlaneSet set)
{
int count = Math.Min(set.Count, MaxPlanes);
if (count == 0)
return AppendSlot(ReadOnlySpan<Vector4>.Empty);
Span<Vector4> planes = stackalloc Vector4[count];
for (int i = 0; i < count; i++)
planes[i] = set.Planes[i];
return AppendSlot(planes);
}
/// <summary>
/// Append one clip region from a raw plane list. <paramref name="planes"/>
/// length 0 packs a no-clip (pass-all) region; otherwise up to
/// <see cref="MaxPlanes"/> planes are packed (extras ignored). Each plane is
/// <c>(nx, ny, 0, dw)</c> in clip space; a clip-space vertex is inside iff
/// <c>dot(plane, gl_Position) &gt;= 0</c> for every plane. Returns the new
/// slot index.
/// </summary>
public int AppendSlot(ReadOnlySpan<Vector4> planes)
{
int count = Math.Min(planes.Length, MaxPlanes);
int slot = _slotCount;
int byteOffset = slot * CellClipStrideBytes;
EnsureRegionCapacity(byteOffset + CellClipStrideBytes);
// count (uint) at byteOffset; the 3 pad uints stay zero.
WriteUInt(_regionBytes, byteOffset, (uint)count);
for (int i = 0; i < count; i++)
{
int po = byteOffset + CellClipPlanesOffset + i * 16;
WriteVec4(_regionBytes, po, planes[i]);
}
_slotCount++;
return slot;
}
/// <summary>
/// Set the terrain OutsideView clip region (the single region the terrain
/// shader gates against). <paramref name="planes"/> length 0 ungates terrain
/// (count 0). U.3 callers never touch this — <see cref="NoClip"/> leaves it
/// at count 0. U.4 calls it with the OutsideView planes.
/// </summary>
public void SetTerrainClip(ReadOnlySpan<Vector4> planes)
{
int count = Math.Min(planes.Length, MaxPlanes);
Array.Clear(_terrainBytes);
WriteInt(_terrainBytes, 0, count);
for (int i = 0; i < count; i++)
WriteVec4(_terrainBytes, CellClipPlanesOffset + i * 16, planes[i]);
}
/// <summary>
/// Upload the shared mesh-clip SSBO (binding=2) and the terrain-clip UBO
/// (binding=2, UBO namespace) and bind both to their binding points. Idempotent
/// to call once per frame. Creates the GL buffers lazily on first call.
/// </summary>
public unsafe void UploadShared(GL gl)
{
ArgumentNullException.ThrowIfNull(gl);
ObjectDisposedException.ThrowIf(_disposed, this);
if (!_glInitialized)
{
_regionSsbo = gl.GenBuffer();
_terrainUbo = gl.GenBuffer();
_glInitialized = true;
}
int regionByteCount = _slotCount * CellClipStrideBytes;
gl.BindBuffer(BufferTargetARB.ShaderStorageBuffer, _regionSsbo);
fixed (byte* p = _regionBytes)
{
gl.BufferData(BufferTargetARB.ShaderStorageBuffer,
(nuint)regionByteCount, p, BufferUsageARB.DynamicDraw);
}
gl.BindBufferBase(BufferTargetARB.ShaderStorageBuffer, MeshClipSsboBinding, _regionSsbo);
gl.BindBuffer(BufferTargetARB.UniformBuffer, _terrainUbo);
fixed (byte* p = _terrainBytes)
{
gl.BufferData(BufferTargetARB.UniformBuffer,
(nuint)TerrainUboBytes, p, BufferUsageARB.DynamicDraw);
}
gl.BindBufferBase(BufferTargetARB.UniformBuffer, TerrainClipUboBinding, _terrainUbo);
}
public void Dispose()
{
if (_disposed) return;
_disposed = true;
// GL buffers are deleted by the owner's GL context teardown; ClipFrame
// is a per-frame transient in U.3 (NoClip() each frame). We do not hold a
// GL handle to delete here because UploadShared may not have run. If a
// future phase makes ClipFrame long-lived, add buffer deletion guarded by
// _glInitialized + a captured GL reference.
}
// ---- byte helpers (little-endian; matches x86/x64 GPU upload) ------------
private void EnsureRegionCapacity(int requiredBytes)
{
if (_regionBytes.Length >= requiredBytes) return;
int newLen = Math.Max(requiredBytes, _regionBytes.Length * 2);
Array.Resize(ref _regionBytes, newLen);
}
private static void WriteUInt(byte[] dst, int offset, uint value)
{
dst[offset + 0] = (byte)(value & 0xFF);
dst[offset + 1] = (byte)((value >> 8) & 0xFF);
dst[offset + 2] = (byte)((value >> 16) & 0xFF);
dst[offset + 3] = (byte)((value >> 24) & 0xFF);
}
private static void WriteInt(byte[] dst, int offset, int value)
=> WriteUInt(dst, offset, unchecked((uint)value));
private static void WriteVec4(byte[] dst, int offset, Vector4 v)
{
WriteFloat(dst, offset + 0, v.X);
WriteFloat(dst, offset + 4, v.Y);
WriteFloat(dst, offset + 8, v.Z);
WriteFloat(dst, offset + 12, v.W);
}
private static void WriteFloat(byte[] dst, int offset, float value)
{
uint bits = BitConverter.SingleToUInt32Bits(value);
WriteUInt(dst, offset, bits);
}
// ---- Test seams ----------------------------------------------------------
/// <summary>Test seam: the packed std430 region bytes (slot 0..SlotCount-1).
/// Read-only snapshot used by ClipFrameLayoutTests to assert the byte layout.</summary>
internal ReadOnlySpan<byte> RegionBytesForTest => _regionBytes.AsSpan(0, _slotCount * CellClipStrideBytes);
/// <summary>Test seam: the packed std140 terrain UBO bytes.</summary>
internal ReadOnlySpan<byte> TerrainBytesForTest => _terrainBytes;
}