acdream/docs/superpowers/plans/2026-05-19-phase2-indoor-cell-rendering-fix.md
Erik e9cc9cb228 plan: Phase 2 indoor cell rendering fix
Six tasks:
1. Add exception-surfacing ContinueWith in WbMeshAdapter.IncrementRefCount
   for EnvCell ids when ProbeIndoorUploadEnabled is on. Logs
   [indoor-upload] FAILED + [indoor-upload] NULL_RESULT.
2. Capture procedure: user walks Holtburg with the probe on; analyze log.
3. Write cause report documenting the captured exception type(s).
4. Apply targeted fix (4a/4b/4c/4d sub-shapes for the 4 most-likely causes
   — choice driven by Task 2's data). Or 4d: re-design if cause is none
   of the above.
5. Verification: re-capture confirms completed lines, user visually
   confirms floor in Holtburg Inn.
6. Roadmap update.

Tasks 2 and 5 are user-driven (must walk the client). Tasks 1, 3, 4, 6
can be subagent-dispatched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:20:57 +02:00

23 KiB

Indoor Cell Rendering Fix — Phase 2 Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Surface WB's silent PrepareEnvCellMeshData failures via an exception-capturing continuation in WbMeshAdapter, identify the root cause for the 26 missing-completion cells, then implement the targeted fix that lands the indoor floor rendering.

Architecture: WbMeshAdapter.IncrementRefCount captures the Task<ObjectMeshData?> returned by WB's PrepareMeshDataAsync and attaches a ContinueWith that logs faulted-task exceptions + clean-null results for EnvCell IDs only. Gated by the existing ProbeIndoorUploadEnabled flag — zero cost when off. Component 3 (the actual fix) is data-driven: the captured exception type + message determines the surgical code change.

Tech Stack: C# .NET 10, Silk.NET OpenGL, WorldBuilder's Chorizite.OpenGLSDLBackend.Lib.ObjectMeshManager. xUnit for any unit tests.

Spec: docs/superpowers/specs/2026-05-19-phase2-indoor-cell-rendering-fix-design.md. Phase 1 capture: docs/research/2026-05-19-indoor-cell-rendering-probe-capture.md.


File Structure

File Status Responsibility
src/AcDream.App/Rendering/Wb/WbMeshAdapter.cs MODIFY (Task 1) Capture prepTask from PrepareMeshDataAsync. Attach a ContinueWith for EnvCell IDs that emits [indoor-upload] FAILED on faulted tasks and [indoor-upload] NULL_RESULT on clean-null returns.
launch.log (and the user's walk-through) NEW (Task 2) Captured probe output. Drives Component 3's fix shape. Not committed.
docs/research/2026-05-19-indoor-cell-rendering-cause.md NEW (Task 3) One-page report documenting the captured exception type(s) + the chosen fix shape. Becomes Phase 2's "design closure" doc.
TBD-by-data (Component 3) MODIFY (Task 4) Fix shape depends on captured cause. Likely candidates: WbMeshAdapter.PopulateMetadata, CellMesh.Build, a guard at the dat-access call site, or a small WB fork patch.
docs/research/2026-05-19-indoor-cell-rendering-verification.md NEW (Task 5) Post-fix verification record: previously-missing cells now emit [indoor-upload] completed, visual confirmation.
docs/plans/2026-04-11-roadmap.md MODIFY (Task 6) Roadmap update: Phase 2 shipped, link to spec + research notes.

Task 1: Add exception-surfacing continuation in WbMeshAdapter

Files:

  • Modify: src/AcDream.App/Rendering/Wb/WbMeshAdapter.cs

  • Step 1: Add using System.Linq; and using System.Threading.Tasks; if missing

Open src/AcDream.App/Rendering/Wb/WbMeshAdapter.cs. Verify both using System.Linq; and using System.Threading.Tasks; are present at the top. Add them if not.

  • Step 2: Replace the fire-and-forget call with a captured task + continuation

Find the IncrementRefCount method (around line 116). The current block looks like:

public void IncrementRefCount(ulong id)
{
    if (_isUninitialized || _meshManager is null) return;
    _meshManager.IncrementRefCount(id);

    if (_metadataPopulated.Add(id))
    {
        PopulateMetadata(id);

        // WB's IncrementRefCount alone only bumps a usage counter; it does
        // NOT trigger mesh loading. We must explicitly call PrepareMeshDataAsync
        // so the background workers actually decode the GfxObj. The result
        // auto-enqueues into _stagedMeshData (ObjectMeshManager line 510),
        // which Tick() drains onto the GPU. Until that completes,
        // TryGetRenderData(id) returns null and the dispatcher silently
        // skips the entity — standard streaming flicker.
        //
        // isSetup: false — acdream's MeshRefs already carry expanded
        // per-part GfxObj ids (0x01XXXXXX). WB's Setup-expansion path is
        // unused.
        _ = _meshManager.PrepareMeshDataAsync(id, isSetup: false);

        // [indoor-upload] requested probe — only for EnvCell ids.
        if (RenderingDiagnostics.IsEnvCellId(id) && RenderingDiagnostics.ProbeIndoorUploadEnabled)
        {
            _pendingEnvCellRequests.Add(id);
            Console.WriteLine($"[indoor-upload] requested cellId=0x{id:X8}");
        }
    }
}

Replace the _metadataPopulated.Add(id) block body with this exact content (note: the _ = _meshManager.PrepareMeshDataAsync(...) line becomes var prepTask = ... — capture the task instead of discarding it):

        PopulateMetadata(id);

        // WB's IncrementRefCount alone only bumps a usage counter; it does
        // NOT trigger mesh loading. We must explicitly call PrepareMeshDataAsync
        // so the background workers actually decode the GfxObj. The result
        // auto-enqueues into _stagedMeshData (ObjectMeshManager line 510),
        // which Tick() drains onto the GPU. Until that completes,
        // TryGetRenderData(id) returns null and the dispatcher silently
        // skips the entity — standard streaming flicker.
        //
        // isSetup: false — acdream's MeshRefs already carry expanded
        // per-part GfxObj ids (0x01XXXXXX). WB's Setup-expansion path is
        // unused.
        var prepTask = _meshManager.PrepareMeshDataAsync(id, isSetup: false);

        // [indoor-upload] requested probe — only for EnvCell ids.
        if (RenderingDiagnostics.IsEnvCellId(id) && RenderingDiagnostics.ProbeIndoorUploadEnabled)
        {
            _pendingEnvCellRequests.Add(id);
            Console.WriteLine($"[indoor-upload] requested cellId=0x{id:X8}");

            // Phase 2 — surface what WB's catch block silently swallows.
            // ObjectMeshManager.PrepareMeshData has try/catch at line 589
            // that calls _logger.LogError on exceptions and returns null.
            // We construct ObjectMeshManager with NullLogger so the log
            // goes nowhere. This continuation captures the same data
            // (scoped to EnvCell ids only). Runs on ThreadPool; non-
            // blocking. Zero cost when probe is off.
            ulong cellId = id;
            _ = prepTask.ContinueWith(t =>
            {
                if (t.IsFaulted && t.Exception is not null)
                {
                    var ex = t.Exception.InnerException ?? t.Exception;
                    var stack = (ex.StackTrace ?? "").Split('\n')
                        .Take(3).Select(s => s.Trim()).Where(s => s.Length > 0);
                    Console.WriteLine(
                        $"[indoor-upload] FAILED cellId=0x{cellId:X8} " +
                        $"exception={ex.GetType().Name}: {ex.Message} " +
                        $"stack=[{string.Join(" | ", stack)}]");
                }
                else if (t.IsCompletedSuccessfully && t.Result is null)
                {
                    Console.WriteLine($"[indoor-upload] NULL_RESULT cellId=0x{cellId:X8}");
                }
            }, TaskScheduler.Default);
        }
  • Step 3: Build

Run: dotnet build src/AcDream.App/AcDream.App.csproj -c Debug Expected: 0 errors, 0 warnings (any new warnings about discarded tasks are fixed by the _ = prepTask.ContinueWith(...) assignment).

  • Step 4: Run tests (sanity)

Run: dotnet test tests/AcDream.Core.Tests/AcDream.Core.Tests.csproj --filter "FullyQualifiedName~Rendering" -c Debug --nologo --no-build Expected: All 130 Rendering tests still pass (the change doesn't touch any tested code path — WbMeshAdapter.IncrementRefCount isn't covered by unit tests).

  • Step 5: Commit
git add src/AcDream.App/Rendering/Wb/WbMeshAdapter.cs
git commit -m "$(cat <<'EOF'
feat(wb): surface WB-swallowed exceptions for EnvCell upload failures

Phase 1 confirmed 26/123 Holtburg cells silently fail in WB's
PrepareEnvCellMeshData / PrepareMeshData. WB's catch block at
ObjectMeshManager.cs:589 calls _logger.LogError(ex, ...) — but we
construct ObjectMeshManager with NullLogger, so the log is dropped.

Capture the Task from PrepareMeshDataAsync (previously fire-and-forget)
and attach a ContinueWith that, for EnvCell ids only when the probe
is on, logs:

  [indoor-upload] FAILED cellId=0x... exception=<Type>: <Message>
                          stack=[<top 3 frames>]
  [indoor-upload] NULL_RESULT cellId=0x...

Runs on ThreadPool — non-blocking. Zero cost when ProbeIndoorUploadEnabled
is off. AggregateException is unwrapped to InnerException for readability.
Stack truncated to top 3 frames.

Next: capture procedure, identify cause, target the fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 2: Capture procedure — run client, identify cause

This task is operator-driven, not subagent-driven. The user (not a subagent) walks the client. Subagent role is limited to launching + analyzing the log.

Files:

  • New: launch.log (transient — not committed)

  • Step 1: Full solution build (sanity)

Run: dotnet build AcDream.slnx -c Debug --nologo 2>&1 | tail -10 Expected: Build succeeded. 0 Error(s).

  • Step 2: Gracefully close any prior AcDream.App instance
$proc = Get-Process -Name AcDream.App -ErrorAction SilentlyContinue
if ($proc) {
    $proc | ForEach-Object { $_.CloseMainWindow() | Out-Null }
    $proc | ForEach-Object { if (-not $_.WaitForExit(5000)) { Stop-Process -Id $_.Id -Force } }
    Start-Sleep -Seconds 3
}
  • Step 3: Launch with ACDREAM_PROBE_INDOOR_UPLOAD=1
$env:ACDREAM_DAT_DIR             = "$env:USERPROFILE\Documents\Asheron's Call"
$env:ACDREAM_LIVE                = "1"
$env:ACDREAM_TEST_HOST           = "127.0.0.1"
$env:ACDREAM_TEST_PORT           = "9000"
$env:ACDREAM_TEST_USER           = "testaccount"
$env:ACDREAM_TEST_PASS           = "testpassword"
$env:ACDREAM_DEVTOOLS            = "1"
$env:ACDREAM_PROBE_INDOOR_UPLOAD = "1"
$logPath = "launch.log"
Remove-Item $logPath -ErrorAction SilentlyContinue
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath $logPath

Run in background via run_in_background: true.

  • Step 4: User walks Holtburg

User waits for the client to reach in-world (~8-12 s), then:

  • Walks into Holtburg Inn (where the floor was missing in Phase 1).

  • Walks into 2-3 other nearby buildings to capture varied failure causes.

  • Closes the client window with the close button (graceful — NOT taskkill).

  • Step 5: Analyze the log

$lines = Get-Content launch.log | Where-Object { $_ -match '\[indoor-upload\] (FAILED|NULL_RESULT)' }
Write-Host "Total failure lines: $($lines.Count)"
Write-Host ""
Write-Host "=== Distinct exception types (FAILED) ==="
$lines | Where-Object { $_ -match '\[indoor-upload\] FAILED' } |
    ForEach-Object { if ($_ -match 'exception=(\w+):') { $matches[1] } } |
    Group-Object | Sort-Object Count -Descending | Format-Table -AutoSize

Write-Host "=== Distinct NULL_RESULT count ==="
($lines | Where-Object { $_ -match 'NULL_RESULT' }).Count

Write-Host ""
Write-Host "=== Sample FAILED lines ==="
$lines | Where-Object { $_ -match '\[indoor-upload\] FAILED' } | Select-Object -First 10
Write-Host ""
Write-Host "=== Sample NULL_RESULT lines ==="
$lines | Where-Object { $_ -match '\[indoor-upload\] NULL_RESULT' } | Select-Object -First 5

Verify the previously-failing cells (from Phase 1: 0xA9B40100, 0xA9B40111, 0xA9B40112, etc.) now appear in either FAILED or NULL_RESULT.

If they DON'T appear:

  • Confirm the probe flag is on (check $env:ACDREAM_PROBE_INDOOR_UPLOAD reads "1").
  • Confirm the user actually walked into the failing cells.
  • Possible BUG: the continuation isn't firing — check Task 1's edits for typos.

Task 3: Write the cause report

Files:

  • Create: docs/research/2026-05-19-indoor-cell-rendering-cause.md

  • Step 1: Write the report based on Task 2's output

Create the file with this structure (replace bracketed sections with captured data):

# Indoor Cell Rendering — Phase 2 Cause Report

**Date:** 2026-05-19
**Predecessor:** Phase 1 capture confirmed H1 (silent failure in WB).
**Capture method:** Task 1's `ContinueWith` surfaced WB's swallowed exceptions for EnvCell IDs.

## Cause(s)

[Replace this section with the captured findings. Example shape:]

Two distinct failure modes captured at Holtburg:

1. **`KeyNotFoundException` — N cells affected** — Exception thrown from `PrepareCellStructMeshData` line XXX when trying to look up surface `0x08001234`. Affected cells: `0xA9B40100`, `0xA9B40111`, ...

2. **`NULL_RESULT` — M cells affected** — WB's `ResolveId` returned empty for `EnvironmentId 0xD000XXXX`, causing `PrepareEnvCellMeshData` to skip the cellGeometry branch and produce an empty result. Affected cells: ...

[OR if only one cause is observed:]

Single failure mode: [exception type] thrown in [location] for all 26 cells. Root cause: [analysis].

## Sample log lines

[paste 5-10 actual captured FAILED / NULL_RESULT lines here]


## Proposed fix

[Concrete code change for each distinct cause. For example:]

- For `KeyNotFoundException` on surface lookup: add a null-guard in `WbMeshAdapter.PopulateMetadata` AND skip the failing surface in our acdream-side processing.
- For `NULL_RESULT` from missing `EnvironmentId`: log + skip with a sentinel render-data so the dispatcher gracefully draws nothing instead of failing silently.

Each fix is a single-file change. Task 4 of this plan implements them.

## Verification approach

After Task 4's fix:
- Re-launch with the same probe flag.
- Confirm previously-failing cells now emit `[indoor-upload] completed` lines.
- Visual: floor renders in Holtburg Inn.
  • Step 2: Commit
git add docs/research/2026-05-19-indoor-cell-rendering-cause.md
git commit -m "$(cat <<'EOF'
docs(research): Phase 2 cause report — <one-line summary of finding>

Captured at Holtburg with the ContinueWith-based exception surfacer
from Task 1. <Describe finding in 2-3 sentences: which exception types
fired, for how many cells, the root cause.>

Fix shape decided: <one sentence>. Implemented in next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 4: Apply the targeted fix

The fix shape is unknown until Task 2 captures. This task's code is data-driven. The plan below lists the four most likely fix shapes; the implementer picks the matching one(s) and implements them.

4a — If the cause is KeyNotFoundException / missing dat record

Most likely path: WB's PrepareCellStructMeshData calls _dats.Portal.TryGet<Surface>(surfaceId, out var surface), gets false, then crashes when later code assumes non-null.

Files:

  • Modify: TBD by exception stack — likely a WB fork patch OR a guard at our acdream call site.

  • Step 1: Open the throwing file based on the exception stack trace

The probe line will show:

stack=[at PrepareCellStructMeshData in ObjectMeshManager.cs:line | at PrepareEnvCellMeshData in ObjectMeshManager.cs:line | ...]

Open that file at that line. Confirm the missing-dat-record assumption.

  • Step 2: Patch shape (WB fork, if in WB)

In references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs, add a null-guard at the throwing line:

// Pre-Phase-2: WB assumed every surface in envCell.Surfaces was
// resolvable. Some Holtburg cells reference surfaces that aren't in the
// loaded portal dat, causing a NullRef in the throwing line below.
// Guard: skip the surface if it doesn't resolve.
if (!_dats.Portal.TryGet<Surface>(surfaceId, out var surface))
{
    continue;  // or: surface = _fallbackSurface; whichever fits
}

(Exact code depends on the stack. The implementer reads the actual throwing line and adapts.)

  • Step 3: Build, capture, verify
dotnet build src/AcDream.App/AcDream.App.csproj -c Debug

Then re-run Task 2's launch + capture. Confirm:

  • Previously-failing cells now have [indoor-upload] completed lines.

  • No new [indoor-upload] FAILED lines for those cells.

  • Step 4: Commit

git add references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs
# OR whatever file was patched
git commit -m "$(cat <<'EOF'
fix(wb): null-guard for missing surface in PrepareCellStructMeshData

Phase 2 capture found <N> Holtburg cells silently failing with
<ExceptionType> thrown at <file>:<line> when WB tried to look up
surface 0x... that isn't resolvable in the loaded portal dat.

Patch: <one-sentence description of the guard>.

Visual-verified: floor now renders in Holtburg Inn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

4b — If the cause is NULL_RESULT (clean null return from WB)

WB's PrepareMeshData returns null without throwing. Examined paths in the WB source:

  • Line 568: _dats.Portal.TryGet<Environment>(envId, ...) fails → returns null.
  • Line 583: type == DBObjType.Unknown (ResolveId didn't classify the record) → returns null.

Files:

  • Modify: probably WbMeshAdapter to detect and log, then either accept the cell as "no geometry" gracefully OR investigate the dat issue.

  • Step 1: Read which path triggered

Look at the NULL_RESULT cells' EnvironmentId values. If the EnvironmentId looks corrupt or out of range, the dat is the issue. If it looks valid, WB's ResolveId is broken for that record.

  • Step 2: Add a guard at our acdream call site OR patch WB

Depending on the finding:

  • If dat is genuinely missing data: skip the cell with a warning. Don't try to render its mesh. Log once via memory.

  • If WB's ResolveId mis-classifies: patch WB or work around by pre-checking with our own _dats.Get<EnvCell>(envCellId) before calling IncrementRefCount.

  • Step 3: Build, capture, verify, commit (same pattern as 4a Step 3-4).

4c — If the cause is a NullReferenceException in our code path

Less likely but possible — if PopulateMetadata or CellMesh.Build crashes when invoked from a worker thread.

Files:

  • Modify: the specific acdream file the stack trace points to.

  • Step 1: Read the throwing line

  • Step 2: Add the appropriate null-guard

  • Step 3: Build, capture, verify, commit.

4d — If the cause is something else entirely

If the captured exception type doesn't match 4a-4c, STOP and re-design. The fix shape needs the implementer's judgment + possibly a fresh brainstorm session. Don't paper over the cause with a generic try/catch.


Task 5: Verification + visualization

Files:

  • Create: docs/research/2026-05-19-indoor-cell-rendering-verification.md

  • Step 1: Re-launch with the probe and re-walk Holtburg

Same as Task 2 Steps 2-4, but expectation flipped: [indoor-upload] FAILED / NULL_RESULT lines for previously-failing cells should NOT appear; [indoor-upload] completed lines should appear instead.

  • Step 2: Visual verification by user

User walks into Holtburg Inn AND the other buildings whose cells were previously missing. Expected: floors visible, no missing geometry.

  • Step 3: Write the verification report

Create the file documenting:

# Indoor Cell Rendering — Phase 2 Verification

**Date:** 2026-05-19
**Outcome:** Floor renders in Holtburg Inn.

## Probe re-capture

After Task 4's fix:
- Previously-failing cells: <list  e.g. `0xA9B40100`, `0xA9B40111`, ...>
- Now emit `[indoor-upload] completed cellId=0x... isSetup=True hasEnvCellGeom=True cellGeomVerts=<N> uploadOk=True`
- No new `[indoor-upload] FAILED` or `NULL_RESULT` lines for these cells.

## Visual confirmation

User walked into:
- Holtburg Inn — floor visible. ✓
- <other buildings tested> — floor visible. ✓

## Regressions checked

- Outdoor terrain still renders correctly. ✓
- NPCs, mobs, scenery still render. ✓
- No new build warnings, no new test failures.

## Closes

This concludes Phase 2 of the indoor cell rendering fix.
  • Step 4: Commit
git add docs/research/2026-05-19-indoor-cell-rendering-verification.md
git commit -m "$(cat <<'EOF'
docs(research): Phase 2 verification — floor renders in Holtburg Inn

Post-fix re-capture confirms previously-failing cells now emit
[indoor-upload] completed. Visual verification by user confirms
floors visible in Holtburg Inn and <other tested buildings>.

Phase 2 complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 6: Roadmap update

Files:

  • Modify: docs/plans/2026-04-11-roadmap.md

  • Step 1: Read the roadmap's "shipped" section

Open docs/plans/2026-04-11-roadmap.md. Find the section listing recently-shipped phases (likely near the top, in a "shipped" table or chronological list).

  • Step 2: Add an entry for Phase 2 indoor cell rendering fix

Add an entry matching the existing pattern of shipped-row entries. Example shape:

| <next row number> | 2026-05-19 | Indoor cell rendering — Phase 1 (diagnostics) + Phase 2 (fix) | Surfaced + fixed WB's silent failure for 26/123 Holtburg cells. Spec at [phase 1](../superpowers/specs/2026-05-19-indoor-cell-rendering-fix-design.md) + [phase 2](../superpowers/specs/2026-05-19-phase2-indoor-cell-rendering-fix-design.md). Cause: <one-line>. Fix: <one-line>. Visual-verified at Holtburg Inn. |

(Read the actual existing row format and match it.)

  • Step 3: Commit
git add docs/plans/2026-04-11-roadmap.md
git commit -m "$(cat <<'EOF'
docs(roadmap): Phase 2 indoor cell rendering fix shipped

Phase 1 diagnostics + Phase 2 fix landed today. Indoor floor rendering
restored for Holtburg cells previously missing due to WB silent
failure. Spec, plan, and verification documents committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Acceptance Criteria

  • Task 1 commits: WbMeshAdapter.IncrementRefCount attaches the continuation. dotnet build clean.
  • Task 2 capture: [indoor-upload] FAILED or NULL_RESULT lines fire for previously-failing cells. Distinct cause(s) identified.
  • Task 3 cause report: documented in docs/research/2026-05-19-indoor-cell-rendering-cause.md.
  • Task 4 fix: applied + committed. Build clean. Tests clean (no new failures; pre-existing 8 physics/input failures unchanged).
  • Task 5 verification: post-fix probe re-capture confirms [indoor-upload] completed for previously-failing cells. User visually confirms floor renders in Holtburg Inn.
  • Task 6 roadmap update: shipped row added.

Subagent dispatch notes

  • Task 1 is mechanical (well-specified code edit) — dispatch to Sonnet.
  • Task 2 is operator-driven — the controller (parent) drives the launch + capture, not a subagent. The user MUST walk the client.
  • Task 3 is analytical (interpret captured data) — controller writes inline, or dispatch a Sonnet subagent with the captured log as context.
  • Task 4 is judgment-intensive (fix shape depends on data) — controller writes inline. If complex, a fresh brainstorm may be needed.
  • Task 5 is similar to Task 2 (user-driven walk + analysis).
  • Task 6 is mechanical — dispatch to Sonnet OR controller writes inline.