docs(spec): Phase 2 indoor cell rendering fix design
Three components:
1. WbMeshAdapter wraps the PrepareMeshDataAsync task with a continuation
that surfaces faulted-task exceptions + null-result cases for EnvCell
IDs only (gated by ProbeIndoorUploadEnabled). Two new log shapes:
[indoor-upload] FAILED cellId=0x... exception=<TypeName>: <Message>
stack=[<top 3 frames>]
[indoor-upload] NULL_RESULT cellId=0x...
2. Capture procedure: re-launch at Holtburg with the probe on, grep for
FAILED/NULL_RESULT lines, get definitive per-cell cause for the 26
missing-completion cells from Phase 1's capture.
3. Targeted fix: code change matching whichever exception type / null
pattern dominates. Fix shape is data-driven — see the contingency
table in the spec.
WB's catch at ObjectMeshManager.cs:589 already calls _logger.LogError,
but WbMeshAdapter constructs the manager with NullLogger.Instance, so
the log is dropped. Our continuation surfaces the same data scoped to
EnvCells only (avoids the thousands of GfxObj/Setup log lines a real
logger would emit during landblock streaming).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9f152d9754
commit
251763b2c4
1 changed files with 189 additions and 0 deletions
|
|
@ -0,0 +1,189 @@
|
|||
# Indoor Cell Rendering Fix — Phase 2 Design
|
||||
|
||||
**Status:** Brainstormed 2026-05-19. Awaiting user review.
|
||||
**Scope:** Surface the silent failure in WB's `PrepareEnvCellMeshData` for 26/123 Holtburg cells, then implement the targeted fix.
|
||||
**Predecessor:** Phase 1 (`docs/superpowers/specs/2026-05-19-indoor-cell-rendering-fix-design.md`) shipped the five `[indoor-*]` probes that confirmed hypothesis H1.
|
||||
**Capture evidence:** `docs/research/2026-05-19-indoor-cell-rendering-probe-capture.md`.
|
||||
|
||||
---
|
||||
|
||||
## 1. What we know
|
||||
|
||||
Phase 1's `ACDREAM_PROBE_INDOOR_ALL=1` capture at Holtburg `0xA9B4` proved:
|
||||
|
||||
- 123 EnvCells requested via `WbMeshAdapter.IncrementRefCount` → only **97 complete**.
|
||||
- **26 cells** silently fail. They get `[indoor-upload] requested` but never `[indoor-upload] completed`.
|
||||
- The dispatcher then tries to draw them, `TryGetRenderData` returns null, draw is silently skipped → user sees **missing floor**.
|
||||
- The first interior cell `0xA9B40100` (likely the inn entry or another major building anchor) is among the 26.
|
||||
|
||||
The smoking gun is in WB's [`ObjectMeshManager.PrepareMeshData`](../../../references/WorldBuilder/Chorizite.OpenGLSDLBackend/Lib/ObjectMeshManager.cs):
|
||||
|
||||
```csharp
|
||||
catch (Exception ex) {
|
||||
_logger.LogError(ex, "Error preparing mesh data for 0x{Id:X16}", id);
|
||||
return null;
|
||||
}
|
||||
```
|
||||
|
||||
WB logs the exception via its injected `_logger`. But [`WbMeshAdapter.cs:71`](../../../src/AcDream.App/Rendering/Wb/WbMeshAdapter.cs:71) constructs `ObjectMeshManager` with `NullLogger<ObjectMeshManager>.Instance` — so the log goes to `/dev/null`. The exception type and message are lost.
|
||||
|
||||
## 2. Solution — three components
|
||||
|
||||
### Component 1 — Exception-surfacing wrap
|
||||
|
||||
Capture the `Task<ObjectMeshData?>` returned by `_meshManager.PrepareMeshDataAsync(id, isSetup: false)` and attach a continuation that, for EnvCell IDs only, logs the failure cause.
|
||||
|
||||
Three logged outcomes:
|
||||
|
||||
- **Task faulted** → `[indoor-upload] FAILED cellId=0x... exception=<TypeName>: <Message> stack=[<top 3 frames>]`. Unwrap `AggregateException.InnerException` for cleaner output.
|
||||
- **Task succeeded with null result** → `[indoor-upload] NULL_RESULT cellId=0x...`. WB's deliberate null-return path (e.g., `ResolveId` returned empty, type was `Unknown`).
|
||||
- **Task succeeded with non-null result** → no extra log. The existing `Tick()` drain already emits `[indoor-upload] completed`.
|
||||
|
||||
The continuation:
|
||||
- Runs on `TaskScheduler.Default` (`ThreadPool`) so it doesn't block the render thread.
|
||||
- Only attached for EnvCell IDs (gated by `RenderingDiagnostics.IsEnvCellId(id)`) when `ProbeIndoorUploadEnabled` is true — zero cost when off.
|
||||
- Captures `cellId` (a `ulong` value) only; no instance closure leakage.
|
||||
- Truncates stack trace to top 3 frames.
|
||||
|
||||
Concrete code shape:
|
||||
|
||||
```csharp
|
||||
if (_metadataPopulated.Add(id))
|
||||
{
|
||||
PopulateMetadata(id);
|
||||
var prepTask = _meshManager.PrepareMeshDataAsync(id, isSetup: false);
|
||||
|
||||
if (RenderingDiagnostics.IsEnvCellId(id) && RenderingDiagnostics.ProbeIndoorUploadEnabled)
|
||||
{
|
||||
_pendingEnvCellRequests.Add(id);
|
||||
Console.WriteLine($"[indoor-upload] requested cellId=0x{id:X8}");
|
||||
|
||||
ulong cellId = id;
|
||||
_ = prepTask.ContinueWith(t =>
|
||||
{
|
||||
if (t.IsFaulted && t.Exception is not null)
|
||||
{
|
||||
var ex = t.Exception.InnerException ?? t.Exception;
|
||||
var stack = (ex.StackTrace ?? "").Split('\n')
|
||||
.Take(3).Select(s => s.Trim()).Where(s => s.Length > 0);
|
||||
Console.WriteLine(
|
||||
$"[indoor-upload] FAILED cellId=0x{cellId:X8} " +
|
||||
$"exception={ex.GetType().Name}: {ex.Message} " +
|
||||
$"stack=[{string.Join(" | ", stack)}]");
|
||||
}
|
||||
else if (t.IsCompletedSuccessfully && t.Result is null)
|
||||
{
|
||||
Console.WriteLine($"[indoor-upload] NULL_RESULT cellId=0x{cellId:X8}");
|
||||
}
|
||||
}, TaskScheduler.Default);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`using System.Linq;` and `using System.Threading.Tasks;` may need adding (likely already present).
|
||||
|
||||
### Component 2 — Capture procedure
|
||||
|
||||
Standard launch:
|
||||
|
||||
```powershell
|
||||
$env:ACDREAM_PROBE_INDOOR_UPLOAD = "1"
|
||||
dotnet run --project src\AcDream.App\AcDream.App.csproj --no-build -c Debug 2>&1 | Tee-Object -FilePath launch.log
|
||||
```
|
||||
|
||||
Walk into Holtburg Inn, walk into nearby buildings whose cells were on the missing-26 list (`0xA9B40100`, `0xA9B40111`, etc.). Close gracefully.
|
||||
|
||||
Analyze:
|
||||
|
||||
```powershell
|
||||
Get-Content launch.log |
|
||||
Where-Object { $_ -match '\[indoor-upload\] (FAILED|NULL_RESULT)' } |
|
||||
Select-Object -Unique
|
||||
```
|
||||
|
||||
Expected output: a per-cell list of distinct exception types or null-return signals. Most cells likely share 1–3 root causes.
|
||||
|
||||
### Component 3 — Targeted fix (shape unknown until Component 2 captures)
|
||||
|
||||
Once Component 2 reveals the exception type + message, the fix is one localized code change. Likely shapes:
|
||||
|
||||
| Captured cause | Fix shape |
|
||||
|---|---|
|
||||
| Texture decode `Exception` (e.g. `KeyNotFoundException` on surface ID) | Guard at `WbMeshAdapter.PopulateMetadata` or pre-validate surfaces; possibly patch WB fork. |
|
||||
| `KeyNotFoundException` for missing `Environment` / `CellStruct` | Log + skip cell with a sentinel render-data; report which dat is stale. |
|
||||
| `NullReferenceException` in `PrepareCellStructMeshData` | Add null guard at the specific call site. |
|
||||
| WB internal logic bug | Fork patch to WB. |
|
||||
| `NULL_RESULT` (ResolveId returned empty / type was Unknown) | Investigate dat file integrity; possibly user needs a dat update. |
|
||||
|
||||
The fix is one or two code edits, lands as a single commit, and is followed by a re-launch verifying:
|
||||
- `[indoor-upload] FAILED` / `NULL_RESULT` lines disappear for the previously-failing cells.
|
||||
- `[indoor-upload] completed` appears for those cells.
|
||||
- Visual verification: floor renders in Holtburg Inn.
|
||||
|
||||
---
|
||||
|
||||
## 3. Edge cases
|
||||
|
||||
| Scenario | Behavior |
|
||||
|---|---|
|
||||
| Probe toggled off mid-session | Continuation still emits if attached at request time. Acceptable — capturing the cause once matters more than honoring runtime toggle. |
|
||||
| Continuation fires after adapter disposed | Harmless console write on dying process. No memory leak; closure captures only the `ulong` cellId. |
|
||||
| Same cell requested twice | `_metadataPopulated.Add(id)` guards; continuation attaches exactly once. Re-streaming after Remove+Add keeps the sticky set. First failure is what we want. |
|
||||
| Cancellation | `t.IsCanceled` is neither `IsFaulted` nor `IsCompletedSuccessfully`. Continuation silently skips. Acceptable — cancellation isn't a failure cause. |
|
||||
| `Task.Result` on faulted task | Re-throws AggregateException. Our gate `else if (t.IsCompletedSuccessfully && t.Result is null)` ensures we never read Result without a clean success state. |
|
||||
| WB's `_logger.LogError` for the same exception | WbMeshAdapter passes `NullLogger` — WB's log goes nowhere. Our continuation is what surfaces it. Discussed below. |
|
||||
|
||||
**Why not just inject a real logger into `ObjectMeshManager`?** Could replace `NullLogger<ObjectMeshManager>.Instance` with a real logger that writes to `Console.WriteLine`. Tradeoff:
|
||||
|
||||
- Real logger: simpler, leverages WB's existing `_logger.LogError` call → catches GfxObj + Setup + EnvCell failures.
|
||||
- Our continuation: scoped to EnvCell IDs only → less noise.
|
||||
|
||||
Going with the continuation approach because:
|
||||
1. The probe flag is already in place.
|
||||
2. Phase 2 is targeted at EnvCells.
|
||||
3. Real-logger would emit thousands of GfxObj/Setup log lines during landblock streaming, drowning the EnvCell signal.
|
||||
|
||||
We can revisit if a future debugging session calls for broader visibility.
|
||||
|
||||
---
|
||||
|
||||
## 4. Testing strategy
|
||||
|
||||
### Unit tests
|
||||
|
||||
None for Component 1 — the continuation is straight wiring around an async API; the logic is "if faulted, log; if null result, log." Testing requires either mocking `Task<ObjectMeshData?>` (low value) or running a real WB instance (impractical in unit tests).
|
||||
|
||||
### Visual verification (end-to-end)
|
||||
|
||||
Component 2's capture procedure is the verification mechanism:
|
||||
|
||||
1. Build green.
|
||||
2. Launch with probe flag on, walk into Holtburg.
|
||||
3. Confirm `[indoor-upload] FAILED` or `NULL_RESULT` lines appear for ~26 cells.
|
||||
4. Apply Component 3's fix.
|
||||
5. Re-launch, re-walk Holtburg.
|
||||
6. **Acceptance:** previously-failing cells now produce `[indoor-upload] completed` lines AND the user can see the floor in Holtburg Inn.
|
||||
|
||||
---
|
||||
|
||||
## 5. What's NOT in this phase
|
||||
|
||||
- Tightening `IsEnvCellId` false-positives (flagged in Phase 1 capture note). Deferred — doesn't block Phase 2 since the upload probe gates on the correct path.
|
||||
- Cell collision symptoms (no wall collision when exiting, weird open-air collisions). Separate investigation phase.
|
||||
- Stab-leak-through-walls (Phase 1 Task 3). Deferred.
|
||||
- Broader WB logger injection for GfxObj/Setup failures. Open if we ever want broader diagnostic visibility.
|
||||
|
||||
---
|
||||
|
||||
## 6. Acceptance criteria
|
||||
|
||||
- [ ] `WbMeshAdapter.IncrementRefCount` captures the prep task and attaches a continuation for EnvCell IDs.
|
||||
- [ ] Continuation logs `[indoor-upload] FAILED cellId=0x... exception=<TypeName>: <Message> stack=[...]` for faulted tasks.
|
||||
- [ ] Continuation logs `[indoor-upload] NULL_RESULT cellId=0x...` for clean-null returns.
|
||||
- [ ] `dotnet build` clean. `dotnet test` clean (no new failures; pre-existing 8 physics/input failures unchanged).
|
||||
- [ ] Capture launched, FAILED/NULL_RESULT lines appear for the previously-missing cells, distinct causes identified.
|
||||
- [ ] Component 3 fix designed and implemented for each distinct cause.
|
||||
- [ ] Re-capture confirms `[indoor-upload] completed` appears for cells previously missing.
|
||||
- [ ] Visual verification: floor renders in Holtburg Inn.
|
||||
- [ ] Roadmap updated with Phase 2 shipped.
|
||||
- [ ] Commit messages cite the captured exception types + the fix rationale.
|
||||
Loading…
Add table
Add a link
Reference in a new issue