acdream/docs/research/2026-05-19-indoor-cell-rendering-cause.md
Erik b838eccb38 feat(wb): ConsoleErrorLogger + cause report — H1 swallowed-exception confirmed
Phase 2 diagnostic chain identified the EXACT cause of 26/123 Holtburg
cells silently failing in WB's PrepareEnvCellMeshData:
ArgumentOutOfRangeException thrown from Setup.Unpack inside
DatReaderWriter when WB calls TryGet<Setup>(stab.Id, ...) on a stab id
whose prefix is GfxObj (0x01xxxxxx), not Setup (0x02xxxxxx).
DatReaderWriter finds the file in Portal's tree (GfxObjs and Setups
share tree-lookups), attempts to parse GfxObj bytes as Setup format,
throws OOR. Exception bubbles to PrepareMeshData's outer try/catch
which silently swallows + returns null. Entire cell fails to upload.

This commit lands the diagnostic infrastructure that surfaced the bug:

- WbMeshAdapter: replaced NullLogger<ObjectMeshManager> with a small
  Console-backed ConsoleErrorLogger<T> private class. Filters to
  LogLevel.Error+. WB's existing _logger.LogError(ex, ...) at the
  swallow site now writes [wb-error] lines with type + message + top 5
  stack frames. Bridges WB's intentional log point to acdream's console.
- WbMeshAdapter: extended [indoor-upload] NULL_RESULT probe with
  reader-divergence diagnostic (ourCellDb.TryGet, wbResolveId.Count,
  wbSelectedType, wbDbIsPortal, wbDbTryGet<EnvCell>, hadRenderData).
  Made it possible to rule out cache-hits and reader-divergence as
  causes before identifying the real one.
- Cause report at docs/research/2026-05-19-indoor-cell-rendering-cause.md
  documents the full chain: 55 ArgumentOutOfRangeException stack traces
  captured in one launch, all from PrepareEnvCellMeshData line 1223.

The fix itself (1-line guard at WB's TryGet<Setup> call site) is applied
to references/WorldBuilder/.../ObjectMeshManager.cs — which is a git
submodule. Will be committed separately to the WB submodule after
visual verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:00:18 +02:00

5.8 KiB

Indoor Cell Rendering — Phase 2 Cause Report

Date: 2026-05-19 Predecessor: Phase 1 capture confirmed H1 (silent failure in WB). Capture method: Phase 2's ContinueWith + ConsoleErrorLogger injected into WB's ObjectMeshManager surfaced the exception WB was silently catching.

Cause

Single failure mode: ArgumentOutOfRangeException thrown from DatReaderWriter.DBObjs.Setup.Unpack at WB's ObjectMeshManager.cs:1223:

// For EnvCell static objects, we need to manually collect emitters if they are Setups
if (_dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) {  // ← throws

WB iterates envCell.StaticObjects and blindly calls TryGet<Setup> on every stab id, regardless of whether the id is actually a Setup-prefix (0x02xxxxxx) or a GfxObj-prefix (0x01xxxxxx). When stab.Id is a GfxObj, DatReaderWriter finds the file (Portal dat has both GfxObjs and Setups under the same tree-lookup) and attempts to deserialize the GfxObj bytes as a Setup record. The Setup format is structurally different — early parse fails inside QualifiedDataId.UnpackDatBinReader.ReadBytesInternal throws ArgumentOutOfRangeException.

The exception bubbles up to PrepareMeshData's outer try/catch at line 589:

catch (Exception ex) {
    _logger.LogError(ex, "Error preparing mesh data for 0x{Id:X16}", id);
    return null;  // ← swallows exception, returns null
}

The entire EnvCell upload fails silently. The cell's room geometry (floor / walls / ceiling) never reaches _renderData, so the dispatcher skips drawing it. Static objects inside the cell (which acdream hydrates separately) still render — they have their own GfxObj uploads.

This also explains the user's "objects below ground" observation: with the floor mesh missing, you see the cell's static objects (tables / chairs / fireplaces) through where the floor should be. Visually they appear "below ground."

Sample evidence

55 NULL_RESULT cells captured at multiple landblocks (0xA5B4, 0xA7B4, 0xA8B2, 0xA9B0, 0xA9B2, 0xA9B3, 0xA9B4). All 55 share the same exception type and stack frame:

[wb-error] Error preparing mesh data for 0x00000000A9B20114
[wb-error]   ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
[wb-error]   at DatReaderWriter.DBObjs.Setup.Unpack(DatBinReader reader)
[wb-error]   at DatReaderWriter.DatDatabase.TryGet[T](UInt32 fileId, T& value)
[wb-error]   at WorldBuilder.Shared.Services.DefaultDatDatabase.TryGet[T](UInt32 fileId, T& value)
[wb-error]   at Chorizite.OpenGLSDLBackend.Lib.ObjectMeshManager.PrepareEnvCellMeshData(...) line 1223
[wb-error]   at Chorizite.OpenGLSDLBackend.Lib.ObjectMeshManager.PrepareMeshData(...) line 571

For Holtburg (0xA9B4) specifically: 123 requested → 97 completed + 26 silently failed. The 26 failures all match this exception signature. The first interior cell 0xA9B40100 is among them — exactly where the user reported a missing floor.

Why the other hypotheses were ruled out

Phase 1 ruled out H2-H6 via the captured probe data. Phase 2's diagnostic walk:

  1. ourCellDb.TryGet=True — acdream's DatCollection finds the cell.
  2. wbResolveId.Count=1 — WB's ResolveId also finds it.
  3. wbSelectedType=EnvCell — type classification is correct.
  4. wbDbTryGet<EnvCell>=True — the cell record IS loadable by WB.
  5. hadRenderData=False at request time — no pre-existing cache hit.

All preconditions for a successful upload were met. The failure was in a downstream emitter-collection step (line 1223) that's tangential to the cell's own geometry — but its exception silently kills the entire upload.

Fix

One-line WB fork patch. Pre-check the Setup-prefix bit before calling TryGet<Setup>:

// Before:
if (_dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) {

// After:
if ((stab.Id & 0xFF000000u) == 0x02000000u
    && _dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) {

For GfxObj-prefixed stabs (which have no DefaultScript and no emitters anyway), the branch is now skipped correctly. For Setup-prefixed stabs, behavior is unchanged.

This is in our WB fork at references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs:1230. The patch should be upstreamed — it's a real WB bug.

Verification approach

After applying the fix:

  1. Re-launch with ACDREAM_PROBE_INDOOR_UPLOAD=1.
  2. Walk Holtburg.
  3. Expect: zero [wb-error] lines, zero [indoor-upload] NULL_RESULT lines. Previously-failing cells now have [indoor-upload] completed lines.
  4. Visual: floor renders in Holtburg Inn; objects no longer appear "below ground."

Phase 1 → Phase 2 chain summary

The diagnostic-driven approach worked end-to-end:

  • Phase 1: Added 5 probes. Identified that 26 Holtburg cells silently fail. Confirmed H1 class of bug. Could not pinpoint without exception data.
  • Phase 2 Task 1: Wrapped PrepareMeshDataAsync in a continuation to capture Task.Exception. Found that the task was never faulted — tcs.TrySetResult(null) ran instead. Hypothesized exception was swallowed inside PrepareMeshData.
  • Phase 2 cause-narrowing diagnostics: Added ourCellDb.TryGet + wbResolveId.Count + wbSelectedType + wbDbIsPortal + wbDbTryGet<EnvCell> + hadRenderData checks. Each iteration narrowed the cause class.
  • Phase 2 final probe: Replaced WB's NullLogger with a Console-backed ConsoleErrorLogger. WB's existing _logger.LogError(ex, ...) call at the catch block immediately surfaced 55 ArgumentOutOfRangeException stack traces with file:line locations. Cause definitively identified in one capture.
  • Phase 2 fix: One-line guard at the throwing call site.

Total runtime: ~3 client launches to nail it.