acdream/docs/superpowers/specs/2026-06-08-portal-flood-membership-stability-design.md
Erik d6aa526dd3 diag(render/physics): flap root-caused to physics rest µm-jitter; refute prior diagnoses
Apparatus + handoff for the indoor flap. Confirmed (primary evidence): the flap is the
portal-flood clip being µm-sensitive at the threshold, driven by a ~1-8µm jitter in the
player RenderPosition (physics resting position not bit-stable; Lerp surfaces it). REFUTES
the 2026-06-07 see-through/EnvCell/outdoor-node diagnosis (ModelId GfxObj 0x01000A2B IS the
solid exterior) AND an enqueue-once attempt (retail propagates late slices via AddToCell;
the existing PropagatesNewSlicesToExit test caught it; reverted). Adds: Build determinism
test, A8CellAudit gfxobj dump, [pv-input] 6dp probe + [render-sig] outRoot/bshell fields.
No functional fix shipped. Next: higher-precision physics rest trace -> port retail
kill_velocity/contact rest-stability. Canonical: docs/research/2026-06-08-flap-rootcause-physics-rest-handoff.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 09:16:12 +02:00

231 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Portal-Flood Membership Stability — the indoor "flap" root-cause fix
**Date:** 2026-06-08
**Branch:** `claude/thirsty-goldberg-51bb9b`
**Status:** ⚠️ **§4 (enqueue-once) REFUTED 2026-06-08** — retail propagates late slices via `AddToCell`
(decomp :433494); the existing `Build_ViewGrowthAfterDoneCell_PropagatesNewSlicesToExit` test encodes
that and enqueue-once broke it (reverted). The flap's confirmed root is the **physics resting position
µm-jitter** (§6 contingency, now the active direction). **CANONICAL PICKUP:**
`docs/research/2026-06-08-flap-rootcause-physics-rest-handoff.md`. Keep §1§3 (mechanism + retail
grounding) as accurate diagnosis; treat §4§5 as a refuted approach.
---
## 1. Summary
The indoor render **flap** (textures "battling" at the doorway threshold) is **portal-flood
set-membership instability**: from a *stable* viewer cell, the PView BFS includes or excludes a
deeper cell cluster frame-to-frame, redrawing a different set each frame. The fix is a **verbatim
port of retail's enqueue-once traversal** (`PView::ConstructView`/`AddViewToPortals`): a cell is
enqueued **only on first discovery**; later view-growth into an already-discovered cell is unioned
**in place** (retail `AddToCell`/`FixCellList`) and **never re-enqueues or re-clips** that cell's
portals. This removes acdream's `MaxReprocessPerCell` **re-enqueue fixpoint** — the documented
per-round `ProjectToClip` **drift** that lets µm viewpoint jitter re-discover/undiscover the deep
cluster. Localized to `PortalVisibilityBuilder`; no overlap-predicate, no added robustness, no
camera/movement/physics/clip-math change. (Contingency: if a residual flap survives — the deep
portal's *first* clip being knife-edge under µm jitter independent of drift — the next
retail-faithful step is bit-stabilizing the viewpoint at rest; see §6.)
---
## 2. Root cause — confirmed with primary evidence
### 2.1 What the flap actually is
Live `[render-sig]` + `[pv-input]` capture at the Holtburg cottage threshold (landblock `0xA9B4`),
standing at the doorway:
- The render root is **stable** (`root=0xA9B40170`, `outRoot=n`, i.e. an interior viewer cell — NOT
the outdoor node, NOT a root toggle).
- The flood cell set **oscillates frame-to-frame**: `ids=[0170,0171,0172,0173,0174,0175]` (6) ↔
`ids=[0170,0171]` (2). The deeper cluster `{0172,0173,0174,0175}` pops in/out.
- The oscillation occurs **at a byte-identical (to cm) eye AND player position** — e.g. three
consecutive frames at eye `(155.55,15.45,96.05)`, player `(155.40,13.20,94.00)` with flood
`6,2,6`.
### 2.2 Why it flips — the mechanism
1. `PortalVisibilityBuilder.Build` is a **pure** static function with all-fresh per-call state
(new `frame`/`todo`/`queued`/`popCounts` every call). Proven deterministic by
`PortalVisibilityBuilderTests.Build_IsDeterministic_IdenticalInputsGiveIdenticalVisibleSet`
(passes). **So for identical inputs the output cannot flip** → the flip requires a varying input.
2. The high-precision `[pv-input]` probe (6 dp) shows the camera eye and the **player
`RenderPosition` carry perpetual ~18 µm float jitter every frame** even "standing still"
(e.g. player `94.000000 ↔ 94.000008`, eye `96.248863 ↔ 96.248871`). At most poses this is
harmless; the flood is stable.
3. The per-portal clip is a faithful homogeneous port of retail's `polyClipFinish`
(`PortalProjection.ProjectToClip``ClipToRegion`, w-aware SutherlandHodgman). But the
**re-enqueue fixpoint** (`MaxReprocessPerCell`) re-clips a cell's view each round, and the
codebase documents that this **drifts per round** (`PortalVisibilityBuilder.cs:43,151,732`:
"ProjectToClip drift keeps a view growing forever").
4. At the threshold pose a deeper portal is **grazing** (oblique / near the eye) → it projects to a
thin sliver. The per-round drift + the µm viewpoint jitter flip `ClipToRegion`'s surviving-vertex
count across the `<3` boundary (PortalProjection.cs:118/121) → `clippedRegion.Count` flips
`0 ↔ N` → the cull at **`PortalVisibilityBuilder.cs:235`**
(`if (clippedRegion.Count == 0 && !EyeInsidePortalOpening) continue;`) drops the deeper cluster
on the empty-clip frames → flood `2 ↔ 6` → the flap.
### 2.3 Why prior fixes did not work
- **boom-snap** (camera stabilization, shipped): the jitter is sub-cm and **perpetual** (it is in the
player `RenderPosition`, propagating to the camera); snapping the boom distance did not make the
viewpoint bit-exact, so the knife-edge still flips.
- **w-space clip** (`ProjectToClip`/`ClipToRegion`, shipped): this made the *single* clip robust, but
the instability is in the **re-clip drift across rounds** + the membership gate's dependence on the
surviving-vertex count, not in a single clip.
- **viewer-cell dead-zone** (tried, reverted): the root does not toggle here (`root=0170` stable), so
a root-resolution dead-zone is irrelevant to this symptom.
### 2.4 What this REFUTES (the 2026-06-07 handoff diagnosis)
The predecessor handoff
(`docs/research/2026-06-07-cutover-flip-render-residuals-diagnosis-handoff.md`) is **wrong** on its
load-bearing claims; do not act on its F1/F2:
- "See-through walls from outside" — **not reproduced**: standing outside with the door closed is
**stable** (user visual gate, 2026-06-08).
- "The walls ARE the EnvCell shells; the ModelId is a partial frame" — **refuted**: the cottage
ModelId GfxObj `0x01000A2B` is a full closed exterior (76 render polys, bbox 20×18×10.4 m, 46
outward-facing walls + roof; cross-checked vs the physics BSP + retail `DrawBuilding`). The EnvCell
shells are interior-facing room surfaces. **F2 (build EnvCell back faces / double-side) targets the
wrong geometry.**
- "Oscillation = outdoor-node flood instability (1↔13)" — **corrected**: it is the *indoor* flood
(`outRoot=n`, stable root) swinging **2↔6**. F1 targeted the wrong root.
- "branch=RetailPViewInside every frame proves the flap is gone" — **tautological**: post-flip
`clipRoot = viewerRoot ?? _outdoorNode` is essentially never null, so the `branch` label can no
longer report `OutdoorRoot`. It proves nothing.
---
## 3. Retail grounding
Retail `PView::ConstructView` (decomp `acclient_2013_pseudo_c.txt:433750`): a cell becomes a draw-set
member the moment it is popped from the todo list (`:433783`). A neighbour is enqueued only if the
per-portal `ConstructView` (`:433827`) passes: the **side-test** (`:433832-433849`, `dot(viewpoint,
planeN)+d` vs a 0.2 mm epsilon → POSITIVE/IN_PLANE/NEGATIVE) **AND** `GetClip` (`:432344`) returns a
**non-empty** clip (`:433858` `if (arg3 != 0)`). `GetClip` projects via `xformStart` and clips via
`ACRender::polyClipFinish` (`:702749`).
So retail gates membership on a non-empty clip **too** — it never flaps because (a) it processes each
cell **once** (enqueue-once; no re-clip drift) and (b) its viewpoint is **bit-stable at rest** (the
authoritative local position does not move). acdream diverges on **both** (re-enqueue drift + µm
viewpoint jitter), and the two combine at the grazing portal.
The fix restores retail's traversal **verbatim** — enqueue-once on first discovery, union-in-place on
growth — so acdream stops diverging from `AddViewToPortals` and the per-round re-clip drift disappears.
No new predicate, no added robustness.
---
## 4. The fix (design)
**Principle:** membership is set by **first discovery** in distance-priority order (retail
`InsCellTodoList` in the `AddViewToPortals` `update_count == 0` branch, decomp `:433478`). A cell
already discovered is **never re-enqueued and never re-clipped**; later view-growth into it is unioned
**in place** and only refines that cell's own draw clip / draw-list position (retail `AddToCell` +
`FixCellList`, `:433492-433502`). The drift-prone re-clip loop is deleted, so µm viewpoint jitter can
no longer re-discover/undiscover a cell.
**Change A — enqueue-once (the core fix), `PortalVisibilityBuilder.cs` ~308-327.**
Today a neighbour is RE-enqueued whenever its view `grew`, capped by `MaxReprocessPerCell`:
bool grew = AddRegion(nview, clippedRegion); // union in place (= retail AddToCell)
if (grew && popCounts[neighbourId] < MaxReprocessPerCell // RE-ENQUEUE on growth the divergence
&& queued.Add(neighbourId))
todo.Insert(neighbour, dist);
New: enqueue a neighbour **only on first discovery** (no `CellViews` / `processedViewCounts` entry
yet). On growth into an already-discovered neighbour, union in place (keep `AddRegion`) and update its
draw-list position if already drawn (port `FixCellList`), but **do not** re-insert it into the todo
list. Remove `MaxReprocessPerCell`, `popCounts`, and the per-pop cap enqueue-once terminates by
construction (≤ N cells), matching retail's `cell_view_done` guarantee (`:433784`).
**Change B — exit-portal / `OutsideView` contribution stays first-process.** Retail contributes a
cell's exit-portal slice to `OutsideView` once, when the cell is processed; there is no re-enqueue
path in `AddViewToPortals` to re-contribute a grown view. acdream's `OutsideView` contribution
(line 256) already happens at process time, so removing the re-enqueue makes it match retail.
**Regression watch:** the re-enqueue was added 2026-06-07 "to propagate late-discovered slices to exit
portals" which retail does **not** do, so dropping it is faithful, but a look-in / outside-view
slice could shrink. The existing OutsideView tests (`Builder_Cellar_WindowClippedToStairwell`, the
look-in tests) must stay green; if one shrinks, the fix is retail's `AddToCell`/`FixCellList` ordering,
**not** reinstating the re-enqueue.
**`EyeInsidePortalOpening` (line 235-244) is unchanged by this fix.** It is a separate near-degenerate
single-clip guard (eye standing in a doorway), orthogonal to the re-enqueue, and stays as-is. **No
overlap predicate is introduced.**
**Why this is the flap fix, not a band-aid:** the re-enqueue re-clips a popped cell's portals from its
*grown* (drifted) view and can therefore **add or drop** the deep `0172-0175` cluster as the drift
walks across the clip boundary under µm jitter. Enqueue-once decides the cluster's membership **once**,
at first discovery, from the cell's clean first-accumulated view the same decision retail makes.
---
## 5. Verification (TDD)
The flap itself is float-drift-dependent (it manifests only under live µm jitter at a specific grazing
geometry), so the **visual gate is the acceptance**; the unit layer pins enqueue-once correctness and
guards regressions.
1. **Enqueue-once correctness + termination (new).** A multi-path fixture in
`PortalVisibilityBuilderTests`: a **diamond** (a cell reachable from two parents, so its view grows
after first discovery) and a **cycle** (portals looping back). Assert the flood (a) **terminates
with `MaxReprocessPerCell` removed**, (b) yields a **deduped** `OrderedVisibleCells`, and (c) each
reachable cell is present exactly once. This is the property the re-enqueue cap was protecting;
enqueue-once provides it by construction. If a per-cell pop counter is cheap to surface, also assert
**each cell is popped ≤ 1** (RED under the re-enqueue, GREEN after) the direct enqueue-once signal.
2. **No membership regression on known geometries.** `Build_EyeStandingInInteriorPortal_FloodsNeighbour`,
`Build_CollapsedInteriorPortalNearEyeBeyondHalfMeter_FloodsNeighbour`,
`Build_DegeneratePortalToTheSide_NotFlooded_NoOverInclusion` (#95 guard), `Build_IsDeterministic_*`,
and the cellar/window/look-in tests stay **green** (re-enqueue and enqueue-once agree on
non-drifting geometry; if one changes, that is the §4 Change-B regression to handle retail's way,
NOT by reinstating the re-enqueue).
3. **Visual gate (user) — the acceptance.** At the cottage doorway threshold, hold still: the 26
oscillation is gone; the deeper rooms render steadily through the door; walking in/out stays
seamless. Re-run the `[pv-input]`/`[render-sig]` probes to confirm `ids=`/flood is stable while
standing still.
`dotnet build` + `dotnet test` green before the visual gate.
---
## 6. Scope / non-goals
- **In scope:** `PortalVisibilityBuilder` enqueue logic enqueue-once on first discovery; remove the
`MaxReprocessPerCell` re-enqueue, `popCounts`, and the per-pop cap; union-in-place + draw-list
re-position on growth (port retail `AddToCell`/`FixCellList`); the new + existing tests.
- **Non-goals (explicitly deferred):**
- **No overlap predicate / no added robustness** this is a verbatim retail port, not a new
membership rule. `EyeInsidePortalOpening` (line 235) is untouched.
- **No clip-math rewrite** (`ProjectToClip`/`ClipToRegion` stay).
- **No camera / movement / interpolation / physics changes** in this step.
- **Contingency (next retail-faithful step, only if a residual flap survives the visual gate):**
bit-stabilize the viewpoint at rest. The live `[pv-input]` probe shows the player `RenderPosition`
carries ~18 µm float noise at rest (e.g. Z `94.000000 ↔ 94.000008`), which retail's authoritative
local position does not. If enqueue-once leaves a residual flicker (the deep portal's *first* clip is
knife-edge under that jitter), trace the jitter to its source (interpolation residual vs physics
contact-settling) and make the local-player viewpoint bit-stable at rest, matching retail. Scoped as
a separate step because it touches the movement/physics path; do it only if measured necessary.
---
## 7. Apparatus (diagnostic probes added this session)
- **Keep:** `PortalVisibilityBuilderTests.Build_IsDeterministic_*` (regression value);
`tools/A8CellAudit` `gfxobj` dump mode (reusable).
- **Strip after the fix is visually verified:** the `[pv-input]` probe + `RenderingDiagnostics.ProbePvInputEnabled`
(GameWindow.cs / RenderingDiagnostics.cs), the `outRoot=`/`bshell=` fields added to `[render-sig]`,
and `launch-bshell-probe.ps1` / `launch-pvinput.ps1`. All env-var-gated and inert when off; safe to
leave until the visual gate passes, then remove.
---
## 8. References
- Diagnosis evidence + refutation: this session's `[render-sig]`/`[pv-input]` captures (cottage
threshold), the `Build_IsDeterministic` test, the GfxObj `0x01000A2B` render-geometry dump.
- Retail decomp: `PView::ConstructView` `:433750`/`:433827`, `PView::GetClip` `:432344`,
`ACRender::polyClipFinish` `:702749` (`docs/research/named-retail/acclient_2013_pseudo_c.txt`).
- Superseded: `docs/research/2026-06-07-cutover-flip-render-residuals-diagnosis-handoff.md` (wrong on
see-through / EnvCell-walls / outdoor-node see §2.4).
- Memory to correct: `project_indoor_flap_rootcause`, `reference_render_pipeline_state`.