close #125: bounded upload retry kills the sticky-drop debt (failed GL uploads were never re-staged)

The GL root cause was fixed in fcade06 (the gpu_us query-ring stale
errors). This closes the remaining design debt: a genuinely-failed
UploadMeshData was dropped permanently.

Exact mechanism (traced this session): UploadMeshData's catch returns
null, the staged item is already consumed, and _renderData stays empty -
but the prepared data lingers in _cpuMeshCache, so the #128 EnsureLoaded
re-arm hits PrepareMeshDataAsync's CPU-cache short-circuit
(ObjectMeshManager.cs:448-453) which returns the cached data WITHOUT
re-staging it for upload. The mesh stays invisible until CPU-cache
eviction - session-sticky under low cache pressure (the in-tower
scenario).

Fix: the per-frame Tick drain (WbMeshAdapter) now re-stages a failed
upload for the NEXT frame via ObjectMeshManager.UploadOrRequeue, bounded
by MaxUploadRetries (3). The attempt counter lives on the ObjectMeshData
object so it resets to 0 naturally on re-prepare. Re-stages are
collected and re-enqueued AFTER the drain loop, never inside it, so a
deterministic failure cannot spin the queue within a single frame; past
the cap it gives up with a loud [up-retry] ... giving up line - a
genuine GL defect now surfaces instead of the old silent permanent drop
or an unbounded retry storm. Retail loads content synchronously and has
no such failure mode; this converges the async pipeline toward that
guarantee.

The uncaught GenerateMipmaps path (open-question c) is INTENTIONALLY
left to surface errors - a blanket catch there would mask future real
defects (no-workarounds rule), and its trigger (fcade06) is retired.

No visual gate (robustness). Build green; App.Tests 264 + WbMeshAdapter
tests green. No GL-context test seam exists for the upload path, so the
bounded retry is verified by construction + the regression suite.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Erik 2026-06-13 10:27:26 +02:00
parent bf18a54369
commit 8682a8db70
3 changed files with 81 additions and 8 deletions

View file

@ -62,6 +62,24 @@ namespace AcDream.App.Rendering.Wb {
public VertexPositionNormalTexture[] Vertices { get; set; } = Array.Empty<VertexPositionNormalTexture>();
public List<MeshBatchData> Batches { get; set; } = new();
/// <summary>
/// #125 (2026-06-12): GL upload-retry counter. A failed
/// <see cref="ObjectMeshManager.UploadMeshData"/> (returns null from its
/// catch) used to be dropped permanently — the staged item was consumed,
/// no render data was produced, and the prepared data lingered in the CPU
/// cache where <c>PrepareMeshDataAsync</c>'s cache-hit short-circuit
/// returned it without ever re-staging it for upload (session-sticky
/// invisible mesh, one [wb-error] line). The drain loop now re-stages a
/// failed upload for the NEXT frame up to <see cref="ObjectMeshManager.
/// MaxUploadRetries"/> times. The counter lives on the mesh-data object so
/// it resets to 0 naturally whenever the id is re-prepared (fresh object),
/// and bounds a deterministic GL failure to a few loud lines instead of a
/// silent permanent drop OR an unbounded per-frame retry storm. Retail
/// loads content synchronously and has no such failure mode — this
/// converges our async pipeline toward that guarantee.
/// </summary>
public int UploadAttempts;
/// <summary>For EnvCell: the geometry of the cell itself.</summary>
public ObjectMeshData? EnvCellGeometry { get; set; }
@ -216,6 +234,32 @@ namespace AcDream.App.Rendering.Wb {
private readonly ConcurrentQueue<ObjectMeshData> _stagedMeshData = new();
public ConcurrentQueue<ObjectMeshData> StagedMeshData => _stagedMeshData;
/// <summary>#125: how many times a failed GL upload is re-staged before
/// giving up loudly. Small — a transient GL error clears on the next
/// frame; anything that fails this many times is a genuine defect to
/// surface, not retry forever. See <see cref="ObjectMeshData.UploadAttempts"/>.</summary>
public const int MaxUploadRetries = 3;
/// <summary>
/// #125: drain one staged upload, returning whether it should be
/// re-staged for a later frame. The caller (the per-frame Tick drain)
/// collects the re-stages and re-enqueues them AFTER the drain loop —
/// never inside it — so a deterministic failure can't spin the queue in
/// a single frame. Increments the mesh-data's own attempt counter (resets
/// on re-prepare) and gives up loudly past <see cref="MaxUploadRetries"/>.
/// </summary>
public bool UploadOrRequeue(ObjectMeshData meshData) {
if (UploadMeshData(meshData) is not null)
return false; // success (incl. legitimate 0-vertex → empty render data)
if (HasRenderData(meshData.ObjectId))
return false; // raced to present by another path
meshData.UploadAttempts++;
if (meshData.UploadAttempts < MaxUploadRetries)
return true; // re-stage for next frame
Console.WriteLine($"[up-retry] 0x{meshData.ObjectId:X10} upload failed {meshData.UploadAttempts}x — giving up (was the #125 silent sticky drop; a GL error is being surfaced, not hidden)");
return false;
}
// Cache for decoded textures to avoid redundant BCn decoding
private readonly ConcurrentQueue<uint> _decodedTextureLru = new();
private readonly ConcurrentDictionary<uint, byte[]> _decodedTextureCache = new();

View file

@ -244,10 +244,21 @@ public sealed class WbMeshAdapter : IDisposable, IWbMeshAdapter
if (_disposed) return;
_graphicsDevice!.ProcessGLQueue();
// #125: drain staged uploads; a FAILED upload (UploadMeshData returned
// null from its catch) is re-staged for a LATER frame, not dropped. The
// re-stages are collected and re-enqueued AFTER the loop — re-enqueuing
// inside the while would let a deterministic failure spin the queue in a
// single frame. UploadOrRequeue bounds the retries (MaxUploadRetries) so
// a genuine defect surfaces loudly instead of the old silent sticky drop.
List<ObjectMeshData>? requeue = null;
while (_meshManager!.StagedMeshData.TryDequeue(out var meshData))
{
_meshManager.UploadMeshData(meshData);
if (_meshManager.UploadOrRequeue(meshData))
(requeue ??= new()).Add(meshData);
}
if (requeue is not null)
foreach (var m in requeue)
_meshManager.StagedMeshData.Enqueue(m);
bool texProbe = AcDream.Core.Rendering.RenderingDiagnostics.ProbeTexFlushEnabled;
var pendingBefore = texProbe