# Indoor render HANG — root cause: `PortalVisibilityBuilder.Build` non-termination — 2026-06-06 > Report-only investigation (user chose "investigate more first"). **No code changed.** > Worktree `thirsty-goldberg-51bb9b`. This blocks the verbatim-DrawCells port's Task 2 > visual gate: every indoor frame can freeze here. ## Symptom Three launches of the client all **froze** (`AppHangB1`, Windows Event Log) within seconds-to-minutes of the camera being indoors at the Holtburg cottage. Not a crash — no access violation, no managed exception. The captured managed stack of the frozen render thread (`hang-stack.txt`, via `dotnet-stack`) shows it **CPU-spinning**: ``` CPU_TIME CellView.Add(ViewPolygon) PortalVisibilityBuilder.AddRegion(CellView, List) PortalVisibilityBuilder.Build(...) RetailPViewRenderer.DrawInside(...) GameWindow.OnRender(...) ``` App.Tests 207/207 and Core 1331/4/1 are green; the bug is invisible to the suite (see §Evidence). ## Verdict **It is NOT Task 2 (the verbatim-DrawCells / grey fix).** `Build(...)` runs at the very top of `DrawInside` ([RetailPViewRenderer.cs:43](../../src/AcDream.App/Rendering/RetailPViewRenderer.cs)), **before** any line Task 2 touched, and the call is byte-identical pre/post-change. Task 2's draw logic was independently confirmed correct in the run-1 log: `[render-sig] draw=[…]` equalled `ids=[…]` with `miss=[]`, and `[shell]` showed every visible cell drawing textured (`zh=0`). The grey fix works. **Root cause:** `PortalVisibilityBuilder.Build`'s portal BFS does not terminate for real cottage geometry. It **re-enqueues a popped cell every time that cell's `CellView` grows**: `queued.Remove(cell.CellId)` on pop ([:122](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs)) + `if (grew && queued.Add(neighbourId))` on grow ([:289](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs)). Termination therefore depends entirely on growth stopping. Growth is gated only by `CellView.Add`'s **exact-match dedup** (`SamePolygon`, eps `1e-4`, [PortalView.cs:79](../../src/AcDream.App/Rendering/PortalView.cs)). The **near-side portal clip** (`ClipPortalAgainstView` → `PortalProjection.ProjectToClip` → `ClipToRegion`, [:474/:485](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs)) produces a polygon that is a hair different on each `A↔B` reciprocal round (float drift through the homogeneous project→clip round-trip with a non-identity cell transform). The dedup never matches the drifted near-duplicate → the region grows without bound → the cell re-enqueues forever → `CellView.Polygons` grows to N, and `CellView.Add`'s O(N) dedup scan makes the whole thing O(N²) → frozen. ## Evidence 1. **Captured stack** pins the spin to `CellView.Add ← AddRegion ← Build`, pure managed `CPU_TIME` (not a GL call, not blocked, not a fault). 2. **The code already documents this exact failure** at [:694-697](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs): the *reciprocal* clip deliberately stays on the float-stable `ProjectToNdc` path *because* "per-round float drift defeated the CellView SamePolygon dedup, inflating a tight A<->B reciprocal view to ~4x its area." The **near-side** clip ([:474](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs)) did not get the same treatment — it uses `ProjectToClip`. 3. **The only bound was removed this session.** [:74](../../src/AcDream.App/Rendering/PortalVisibilityBuilder.cs): "Fixpoint termination replacing the old `MaxReprocessPerCell` hard cap." The fixpoint never converges under drift; with the cap gone there is no other bound (no iteration cap, no max-polygon cap, no time bound). 4. **It's the dirty-tree rewire the handoff said to KEEP.** `git diff --stat`: `PortalVisibilityBuilder.cs +426/−45` and `PortalProjection.cs +111` are **uncommitted**. `ProjectToClip` is part of the new `PortalProjection` lines. The handoff (`2026-06-06-verbatim-drawcells-port-pickup-handoff.md`) lists this rewire as the faithful foundation to preserve and says "the clip math is already faithful — do not harden the w-clip." The clip is faithful in the *picture* it computes; it is the *non-termination* that is broken. 5. **Why the suite is green:** `PortalVisibilityBuilderTests` build cells with `WorldTransform = Matrix4x4.Identity` and axis-aligned quads in 2-cell **chains** (`cam → ground → exit`). No `A↔B` cycle, no transform-induced drift → the project→clip round-trip is exact → the dedup collapses duplicates → the BFS converges. The real cottage is a **cyclic** cell cluster (`0x016F–0x0175`, mutual portals) with **non-identity** transforms → drift + cycle → non-termination. The suite cannot reach the failing case. 6. **Why run 1 survived 113 frames then froze:** `Build` converges at most camera poses; only specific poses create the non-converging drift cycle. The freeze coincided with the metastable doorway flip (`[render-sig] stable` went 39→0, visible-cell count 5→4) one frame before the log ended. ## Hypotheses (ranked) 1. **(confirmed)** Non-terminating BFS: re-enqueue-on-grow + `ProjectToClip` drift defeats the `SamePolygon` dedup → unbounded `CellView` growth. Falsify: a re-process cap, a drift-tolerant dedup, or `ProjectToNdc` on the near-side clip all make `Build` terminate. 2. *(ruled out)* GPU/driver hang from a malformed draw — the stack is pure managed `CPU_TIME` in `CellView.Add`, never a GL call; no fault. 3. *(ruled out)* Probe-output stdout saturation — disproven: the probe-free run also hung. 4. *(ruled out)* Task 2 — `Build` is upstream of every Task 2 line and unchanged by it. ## Fix options (all additive — none reverts the dirty tree) | | Fix | Touches | Pro | Con | |---|---|---|---|---| | **A** *(rec.)* | **Drift-tolerant dedup**: round clipped polygon vertices to a small grid (≈`1e-3`) before `AddRegion`, or widen/snaps `SamePolygon`'s match, so near-duplicates collapse → growth converges. | `CellView`/`AddRegion` | Fixes the actual root cause ("drift defeats dedup"); keeps the faithful `ProjectToClip`; preserves growth-propagation. ~10 lines. | Tolerance is a tuning constant (pick conservatively; over-merge = minor over-tighten). | | **B** | **Restore a re-process bound** (`MaxReprocessPerCell`-style cap on the BFS). | `Build` loop | Smallest; guarantees termination; doesn't touch clip. | A guard, not a root fix; may under-include a late-growing view. The user's "no workarounds" rule applies — this is the band-aid. | | **C** | **Near-side clip on `ProjectToNdc`** (what the reciprocal clip already uses). | `ClipPortalAgainstView` | Removes the drift source directly; consistent with `:694`. | Steps on this session's homogeneous near-eye clip work; the handoff's "don't harden the w-clip" is closest to here. | **Recommended next step:** approve **A** (drift-tolerant dedup) — it closes the precise mechanism the code half-acknowledges at `:694`, terminates structurally, and leaves the faithful clip path intact. Implement in a follow-up (not report-only) session, then re-run the Task 2 visual gate (probe-free) at the cottage + cellar. ## What this is NOT - **NOT** Task 2 / the grey fix — that is verified working (`draw==ids`, `miss=[]`, textured shells). - **NOT** a wrong-pixels / unfaithful-projection bug — it's a **termination** bug. The handoff's "the clip math is faithful, don't harden the w-clip" is about projection *correctness*; this is BFS *convergence*. Don't chase the w-clip. - **NOT** a GPU/shader/driver hang and **NOT** the probe firehose (both ruled out by the stack and the probe-free repro). --- ## Reassessment — is the dirty-tree builder rewire sound? (post Option A) Option A (drift-tolerant `CellView.Add` dedup, `CellViewDedupTests` green) was implemented and the client relaunched. Result: the hang **moved out of `CellView.Add`** (A worked for its target) but **relocated to `ScreenPolygonClip.ClipByEdge`** via `ApplyReciprocalClip` (second captured stack, `hang-stack2.txt`). `ScreenPolygonClip.Intersect`/`ClipByEdge` are **both bounded `for` loops** — they cannot spin on one call — so the spin is the **outer `Build` BFS** still not terminating and calling them a runaway number of times. **Option A is necessary but not sufficient.** ### Git evidence (what the dirty rewire changed re: termination) - **HEAD (committed)** near-side portal clip = `PortalProjection.ProjectToNdc` (float-stable; `git show HEAD:` line 146). **The dirty rewire switched it to `ProjectToClip`** (`ClipPortalAgainstView`, dirty line 474) — the homogeneous near-eye clip, introduced to fix the near/grazing-doorway flap/void. - The `MaxReprocessPerCell` **hard cap was removed earlier** (committed Phase U.2a `d880775`), replaced by "fixpoint termination." **Neither HEAD nor the dirty tree has a hard iteration bound.** - The dirty rewire's own comment (`PortalVisibilityBuilder.cs:519-522`) documents that `ProjectToClip` "produced per-round float drift that defeated the CellView SamePolygon dedup" — and applied that lesson **only to the reciprocal clip** (kept on `ProjectToNdc`), leaving the **near-side** clip on the drift-prone `ProjectToClip`. ### Soundness verdict The builder's termination model is **unsound by construction.** It relies on the clipped regions reaching a geometric fixpoint — re-clipping a cell's view reproduces *exactly-equal* polygons that the dedup recognises — with **no hard iteration bound.** That only holds if the clip is float-stable. `ProjectToClip` (needed for faithful near-doorway projection) injects per-round drift, so re-clipping never reproduces an exactly-equal polygon, the dedup never catches it, and the re-enqueue-on-grow flood never converges → infinite loop. **You cannot have BOTH faithful near-doorway projection (`ProjectToClip`) AND convergence-via-exact-dedup-without-a-bound.** HEAD got away with it because `ProjectToNdc` was stable enough to converge (and it sealed — user-verified); the dirty switch tipped it into non-termination. The rewire fixed the *projection* and, apparently never having been launched, shipped a hang. A's drift-tolerant dedup *narrows* the gap but cannot *close* it: for some geometry the per-round drift exceeds any fixed snap grid, so growth still produces new keys forever. Only a **hard bound** guarantees termination. ### Paths (for the user to choose) | | Path | Termination | Projection fidelity | Risk | |---|---|---|---|---| | **1** *(rec.)* | Keep `ProjectToClip` + add **enqueue-once** bound (D) — the builder's own comment already calls enqueue-once "the hard termination guarantee"; the re-enqueue-on-grow is the bug. Keep A. | Guaranteed (≤N pops) | Full (faithful doorway clip kept) | Minor under-inclusion of late growth → visual-verify; widen to a cap if needed | | **2** | Keep `ProjectToClip` + add a **re-process cap** (B, restore `MaxReprocessPerCell`). Keep A. | Guaranteed (≤N×K) | Full | Less faithful than enqueue-once; a tuning constant | | **3** | **Revert** the near-side `ProjectToClip → ProjectToNdc` (back to HEAD). | Restored (HEAD converged) | **Loses** the rewire's near-doorway fix → reintroduces the flap/void (separate bug) | Throws away this session's projection work; contradicts the keep-the-dirty-tree directive | A bound (paths 1/2) is the sound fix: it makes termination independent of clip drift, so the faithful `ProjectToClip` projection AND guaranteed termination coexist. **Recommendation: path 1** (enqueue-once + keep A), visual-verify for under-inclusion. Reverting (path 3) only trades the hang back for the flap/void.