acdream/docs/superpowers/specs/2026-05-19-phase2-indoor-cell-rendering-fix-design.md
Erik 251763b2c4 docs(spec): Phase 2 indoor cell rendering fix design
Three components:
1. WbMeshAdapter wraps the PrepareMeshDataAsync task with a continuation
   that surfaces faulted-task exceptions + null-result cases for EnvCell
   IDs only (gated by ProbeIndoorUploadEnabled). Two new log shapes:
   [indoor-upload] FAILED  cellId=0x... exception=<TypeName>: <Message>
                            stack=[<top 3 frames>]
   [indoor-upload] NULL_RESULT cellId=0x...
2. Capture procedure: re-launch at Holtburg with the probe on, grep for
   FAILED/NULL_RESULT lines, get definitive per-cell cause for the 26
   missing-completion cells from Phase 1's capture.
3. Targeted fix: code change matching whichever exception type / null
   pattern dominates. Fix shape is data-driven — see the contingency
   table in the spec.

WB's catch at ObjectMeshManager.cs:589 already calls _logger.LogError,
but WbMeshAdapter constructs the manager with NullLogger.Instance, so
the log is dropped. Our continuation surfaces the same data scoped to
EnvCells only (avoids the thousands of GfxObj/Setup log lines a real
logger would emit during landblock streaming).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:18:04 +02:00

9.6 KiB
Raw Blame History

Indoor Cell Rendering Fix — Phase 2 Design

Status: Brainstormed 2026-05-19. Awaiting user review. Scope: Surface the silent failure in WB's PrepareEnvCellMeshData for 26/123 Holtburg cells, then implement the targeted fix. Predecessor: Phase 1 (docs/superpowers/specs/2026-05-19-indoor-cell-rendering-fix-design.md) shipped the five [indoor-*] probes that confirmed hypothesis H1. Capture evidence: docs/research/2026-05-19-indoor-cell-rendering-probe-capture.md.


1. What we know

Phase 1's ACDREAM_PROBE_INDOOR_ALL=1 capture at Holtburg 0xA9B4 proved:

  • 123 EnvCells requested via WbMeshAdapter.IncrementRefCount → only 97 complete.
  • 26 cells silently fail. They get [indoor-upload] requested but never [indoor-upload] completed.
  • The dispatcher then tries to draw them, TryGetRenderData returns null, draw is silently skipped → user sees missing floor.
  • The first interior cell 0xA9B40100 (likely the inn entry or another major building anchor) is among the 26.

The smoking gun is in WB's ObjectMeshManager.PrepareMeshData:

catch (Exception ex) {
    _logger.LogError(ex, "Error preparing mesh data for 0x{Id:X16}", id);
    return null;
}

WB logs the exception via its injected _logger. But WbMeshAdapter.cs:71 constructs ObjectMeshManager with NullLogger<ObjectMeshManager>.Instance — so the log goes to /dev/null. The exception type and message are lost.

2. Solution — three components

Component 1 — Exception-surfacing wrap

Capture the Task<ObjectMeshData?> returned by _meshManager.PrepareMeshDataAsync(id, isSetup: false) and attach a continuation that, for EnvCell IDs only, logs the failure cause.

Three logged outcomes:

  • Task faulted[indoor-upload] FAILED cellId=0x... exception=<TypeName>: <Message> stack=[<top 3 frames>]. Unwrap AggregateException.InnerException for cleaner output.
  • Task succeeded with null result[indoor-upload] NULL_RESULT cellId=0x.... WB's deliberate null-return path (e.g., ResolveId returned empty, type was Unknown).
  • Task succeeded with non-null result → no extra log. The existing Tick() drain already emits [indoor-upload] completed.

The continuation:

  • Runs on TaskScheduler.Default (ThreadPool) so it doesn't block the render thread.
  • Only attached for EnvCell IDs (gated by RenderingDiagnostics.IsEnvCellId(id)) when ProbeIndoorUploadEnabled is true — zero cost when off.
  • Captures cellId (a ulong value) only; no instance closure leakage.
  • Truncates stack trace to top 3 frames.

Concrete code shape:

if (_metadataPopulated.Add(id))
{
    PopulateMetadata(id);
    var prepTask = _meshManager.PrepareMeshDataAsync(id, isSetup: false);

    if (RenderingDiagnostics.IsEnvCellId(id) && RenderingDiagnostics.ProbeIndoorUploadEnabled)
    {
        _pendingEnvCellRequests.Add(id);
        Console.WriteLine($"[indoor-upload] requested cellId=0x{id:X8}");

        ulong cellId = id;
        _ = prepTask.ContinueWith(t =>
        {
            if (t.IsFaulted && t.Exception is not null)
            {
                var ex = t.Exception.InnerException ?? t.Exception;
                var stack = (ex.StackTrace ?? "").Split('\n')
                    .Take(3).Select(s => s.Trim()).Where(s => s.Length > 0);
                Console.WriteLine(
                    $"[indoor-upload] FAILED cellId=0x{cellId:X8} " +
                    $"exception={ex.GetType().Name}: {ex.Message} " +
                    $"stack=[{string.Join(" | ", stack)}]");
            }
            else if (t.IsCompletedSuccessfully && t.Result is null)
            {
                Console.WriteLine($"[indoor-upload] NULL_RESULT cellId=0x{cellId:X8}");
            }
        }, TaskScheduler.Default);
    }
}

using System.Linq; and using System.Threading.Tasks; may need adding (likely already present).

Component 2 — Capture procedure

Standard launch:

$env:ACDREAM_PROBE_INDOOR_UPLOAD = "1"
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch.log

Walk into Holtburg Inn, walk into nearby buildings whose cells were on the missing-26 list (0xA9B40100, 0xA9B40111, etc.). Close gracefully.

Analyze:

Get-Content launch.log |
    Where-Object { $_ -match '\[indoor-upload\] (FAILED|NULL_RESULT)' } |
    Select-Object -Unique

Expected output: a per-cell list of distinct exception types or null-return signals. Most cells likely share 13 root causes.

Component 3 — Targeted fix (shape unknown until Component 2 captures)

Once Component 2 reveals the exception type + message, the fix is one localized code change. Likely shapes:

Captured cause Fix shape
Texture decode Exception (e.g. KeyNotFoundException on surface ID) Guard at WbMeshAdapter.PopulateMetadata or pre-validate surfaces; possibly patch WB fork.
KeyNotFoundException for missing Environment / CellStruct Log + skip cell with a sentinel render-data; report which dat is stale.
NullReferenceException in PrepareCellStructMeshData Add null guard at the specific call site.
WB internal logic bug Fork patch to WB.
NULL_RESULT (ResolveId returned empty / type was Unknown) Investigate dat file integrity; possibly user needs a dat update.

The fix is one or two code edits, lands as a single commit, and is followed by a re-launch verifying:

  • [indoor-upload] FAILED / NULL_RESULT lines disappear for the previously-failing cells.
  • [indoor-upload] completed appears for those cells.
  • Visual verification: floor renders in Holtburg Inn.

3. Edge cases

Scenario Behavior
Probe toggled off mid-session Continuation still emits if attached at request time. Acceptable — capturing the cause once matters more than honoring runtime toggle.
Continuation fires after adapter disposed Harmless console write on dying process. No memory leak; closure captures only the ulong cellId.
Same cell requested twice _metadataPopulated.Add(id) guards; continuation attaches exactly once. Re-streaming after Remove+Add keeps the sticky set. First failure is what we want.
Cancellation t.IsCanceled is neither IsFaulted nor IsCompletedSuccessfully. Continuation silently skips. Acceptable — cancellation isn't a failure cause.
Task.Result on faulted task Re-throws AggregateException. Our gate else if (t.IsCompletedSuccessfully && t.Result is null) ensures we never read Result without a clean success state.
WB's _logger.LogError for the same exception WbMeshAdapter passes NullLogger — WB's log goes nowhere. Our continuation is what surfaces it. Discussed below.

Why not just inject a real logger into ObjectMeshManager? Could replace NullLogger<ObjectMeshManager>.Instance with a real logger that writes to Console.WriteLine. Tradeoff:

  • Real logger: simpler, leverages WB's existing _logger.LogError call → catches GfxObj + Setup + EnvCell failures.
  • Our continuation: scoped to EnvCell IDs only → less noise.

Going with the continuation approach because:

  1. The probe flag is already in place.
  2. Phase 2 is targeted at EnvCells.
  3. Real-logger would emit thousands of GfxObj/Setup log lines during landblock streaming, drowning the EnvCell signal.

We can revisit if a future debugging session calls for broader visibility.


4. Testing strategy

Unit tests

None for Component 1 — the continuation is straight wiring around an async API; the logic is "if faulted, log; if null result, log." Testing requires either mocking Task<ObjectMeshData?> (low value) or running a real WB instance (impractical in unit tests).

Visual verification (end-to-end)

Component 2's capture procedure is the verification mechanism:

  1. Build green.
  2. Launch with probe flag on, walk into Holtburg.
  3. Confirm [indoor-upload] FAILED or NULL_RESULT lines appear for ~26 cells.
  4. Apply Component 3's fix.
  5. Re-launch, re-walk Holtburg.
  6. Acceptance: previously-failing cells now produce [indoor-upload] completed lines AND the user can see the floor in Holtburg Inn.

5. What's NOT in this phase

  • Tightening IsEnvCellId false-positives (flagged in Phase 1 capture note). Deferred — doesn't block Phase 2 since the upload probe gates on the correct path.
  • Cell collision symptoms (no wall collision when exiting, weird open-air collisions). Separate investigation phase.
  • Stab-leak-through-walls (Phase 1 Task 3). Deferred.
  • Broader WB logger injection for GfxObj/Setup failures. Open if we ever want broader diagnostic visibility.

6. Acceptance criteria

  • WbMeshAdapter.IncrementRefCount captures the prep task and attaches a continuation for EnvCell IDs.
  • Continuation logs [indoor-upload] FAILED cellId=0x... exception=<TypeName>: <Message> stack=[...] for faulted tasks.
  • Continuation logs [indoor-upload] NULL_RESULT cellId=0x... for clean-null returns.
  • dotnet build clean. dotnet test clean (no new failures; pre-existing 8 physics/input failures unchanged).
  • Capture launched, FAILED/NULL_RESULT lines appear for the previously-missing cells, distinct causes identified.
  • Component 3 fix designed and implemented for each distinct cause.
  • Re-capture confirms [indoor-upload] completed appears for cells previously missing.
  • Visual verification: floor renders in Holtburg Inn.
  • Roadmap updated with Phase 2 shipped.
  • Commit messages cite the captured exception types + the fix rationale.