acdream/docs/research/2026-05-19-indoor-cell-rendering-cause.md
Erik b838eccb38 feat(wb): ConsoleErrorLogger + cause report — H1 swallowed-exception confirmed
Phase 2 diagnostic chain identified the EXACT cause of 26/123 Holtburg
cells silently failing in WB's PrepareEnvCellMeshData:
ArgumentOutOfRangeException thrown from Setup.Unpack inside
DatReaderWriter when WB calls TryGet<Setup>(stab.Id, ...) on a stab id
whose prefix is GfxObj (0x01xxxxxx), not Setup (0x02xxxxxx).
DatReaderWriter finds the file in Portal's tree (GfxObjs and Setups
share tree-lookups), attempts to parse GfxObj bytes as Setup format,
throws OOR. Exception bubbles to PrepareMeshData's outer try/catch
which silently swallows + returns null. Entire cell fails to upload.

This commit lands the diagnostic infrastructure that surfaced the bug:

- WbMeshAdapter: replaced NullLogger<ObjectMeshManager> with a small
  Console-backed ConsoleErrorLogger<T> private class. Filters to
  LogLevel.Error+. WB's existing _logger.LogError(ex, ...) at the
  swallow site now writes [wb-error] lines with type + message + top 5
  stack frames. Bridges WB's intentional log point to acdream's console.
- WbMeshAdapter: extended [indoor-upload] NULL_RESULT probe with
  reader-divergence diagnostic (ourCellDb.TryGet, wbResolveId.Count,
  wbSelectedType, wbDbIsPortal, wbDbTryGet<EnvCell>, hadRenderData).
  Made it possible to rule out cache-hits and reader-divergence as
  causes before identifying the real one.
- Cause report at docs/research/2026-05-19-indoor-cell-rendering-cause.md
  documents the full chain: 55 ArgumentOutOfRangeException stack traces
  captured in one launch, all from PrepareEnvCellMeshData line 1223.

The fix itself (1-line guard at WB's TryGet<Setup> call site) is applied
to references/WorldBuilder/.../ObjectMeshManager.cs — which is a git
submodule. Will be committed separately to the WB submodule after
visual verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:00:18 +02:00

94 lines
5.8 KiB
Markdown

# Indoor Cell Rendering — Phase 2 Cause Report
**Date:** 2026-05-19
**Predecessor:** Phase 1 capture confirmed H1 (silent failure in WB).
**Capture method:** Phase 2's `ContinueWith` + `ConsoleErrorLogger` injected into WB's `ObjectMeshManager` surfaced the exception WB was silently catching.
## Cause
**Single failure mode:** `ArgumentOutOfRangeException` thrown from `DatReaderWriter.DBObjs.Setup.Unpack` at WB's [`ObjectMeshManager.cs:1223`](../../references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs:1223):
```csharp
// For EnvCell static objects, we need to manually collect emitters if they are Setups
if (_dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) { // ← throws
```
WB iterates `envCell.StaticObjects` and **blindly calls `TryGet<Setup>` on every stab id**, regardless of whether the id is actually a Setup-prefix (`0x02xxxxxx`) or a GfxObj-prefix (`0x01xxxxxx`). When stab.Id is a GfxObj, `DatReaderWriter` finds the file (Portal dat has both GfxObjs and Setups under the same tree-lookup) and attempts to deserialize the GfxObj bytes as a Setup record. The Setup format is structurally different — early parse fails inside `QualifiedDataId.Unpack``DatBinReader.ReadBytesInternal` throws `ArgumentOutOfRangeException`.
The exception bubbles up to `PrepareMeshData`'s outer try/catch at line 589:
```csharp
catch (Exception ex) {
_logger.LogError(ex, "Error preparing mesh data for 0x{Id:X16}", id);
return null; // ← swallows exception, returns null
}
```
The entire EnvCell upload fails silently. The cell's room geometry (floor / walls / ceiling) never reaches `_renderData`, so the dispatcher skips drawing it. Static objects inside the cell (which acdream hydrates separately) still render — they have their own GfxObj uploads.
**This also explains the user's "objects below ground" observation:** with the floor mesh missing, you see the cell's static objects (tables / chairs / fireplaces) through where the floor should be. Visually they appear "below ground."
## Sample evidence
55 NULL_RESULT cells captured at multiple landblocks (`0xA5B4`, `0xA7B4`, `0xA8B2`, `0xA9B0`, `0xA9B2`, `0xA9B3`, `0xA9B4`). All 55 share the same exception type and stack frame:
```
[wb-error] Error preparing mesh data for 0x00000000A9B20114
[wb-error] ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
[wb-error] at DatReaderWriter.DBObjs.Setup.Unpack(DatBinReader reader)
[wb-error] at DatReaderWriter.DatDatabase.TryGet[T](UInt32 fileId, T& value)
[wb-error] at WorldBuilder.Shared.Services.DefaultDatDatabase.TryGet[T](UInt32 fileId, T& value)
[wb-error] at Chorizite.OpenGLSDLBackend.Lib.ObjectMeshManager.PrepareEnvCellMeshData(...) line 1223
[wb-error] at Chorizite.OpenGLSDLBackend.Lib.ObjectMeshManager.PrepareMeshData(...) line 571
```
For Holtburg (`0xA9B4`) specifically: 123 requested → 97 completed + 26 silently failed. The 26 failures all match this exception signature. The first interior cell `0xA9B40100` is among them — exactly where the user reported a missing floor.
## Why the other hypotheses were ruled out
Phase 1 ruled out H2-H6 via the captured probe data. Phase 2's diagnostic walk:
1. `ourCellDb.TryGet=True` — acdream's DatCollection finds the cell.
2. `wbResolveId.Count=1` — WB's ResolveId also finds it.
3. `wbSelectedType=EnvCell` — type classification is correct.
4. `wbDbTryGet<EnvCell>=True` — the cell record IS loadable by WB.
5. `hadRenderData=False` at request time — no pre-existing cache hit.
All preconditions for a successful upload were met. The failure was in a downstream emitter-collection step (line 1223) that's tangential to the cell's own geometry — but its exception silently kills the entire upload.
## Fix
**One-line WB fork patch.** Pre-check the Setup-prefix bit before calling `TryGet<Setup>`:
```csharp
// Before:
if (_dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) {
// After:
if ((stab.Id & 0xFF000000u) == 0x02000000u
&& _dats.Portal.TryGet<Setup>(stab.Id, out var stabSetup)) {
```
For GfxObj-prefixed stabs (which have no `DefaultScript` and no emitters anyway), the branch is now skipped correctly. For Setup-prefixed stabs, behavior is unchanged.
This is in our WB fork at [`references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs:1230`](../../references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs:1230). The patch should be upstreamed — it's a real WB bug.
## Verification approach
After applying the fix:
1. Re-launch with `ACDREAM_PROBE_INDOOR_UPLOAD=1`.
2. Walk Holtburg.
3. Expect: zero `[wb-error]` lines, zero `[indoor-upload] NULL_RESULT` lines. Previously-failing cells now have `[indoor-upload] completed` lines.
4. Visual: floor renders in Holtburg Inn; objects no longer appear "below ground."
## Phase 1 → Phase 2 chain summary
The diagnostic-driven approach worked end-to-end:
- **Phase 1:** Added 5 probes. Identified that 26 Holtburg cells silently fail. Confirmed H1 class of bug. Could not pinpoint without exception data.
- **Phase 2 Task 1:** Wrapped `PrepareMeshDataAsync` in a continuation to capture `Task.Exception`. Found that the task was never faulted — `tcs.TrySetResult(null)` ran instead. Hypothesized exception was swallowed inside `PrepareMeshData`.
- **Phase 2 cause-narrowing diagnostics:** Added `ourCellDb.TryGet` + `wbResolveId.Count` + `wbSelectedType` + `wbDbIsPortal` + `wbDbTryGet<EnvCell>` + `hadRenderData` checks. Each iteration narrowed the cause class.
- **Phase 2 final probe:** Replaced WB's `NullLogger` with a Console-backed `ConsoleErrorLogger`. WB's existing `_logger.LogError(ex, ...)` call at the catch block immediately surfaced 55 ArgumentOutOfRangeException stack traces with file:line locations. **Cause definitively identified in one capture.**
- **Phase 2 fix:** One-line guard at the throwing call site.
Total runtime: ~3 client launches to nail it.