docs: Phase A8 — session 2 handoff (pool fix shipped + 4 partial fixes + residuals)
After 5 visual gates, the session shipped 5 commits closing real bugs (pool aliasing was the catastrophic root cause), but residual symptoms (transparent floor, texture warping, flickering, distortion) didn't yield to surgical fixes. Per systematic-debugging skill's >=3-failures rule, stop and capture state. Doc covers: - Pool aliasing root cause + fix (the big win — closes session-1's visual chaos). - Sky-when-building, LiveDynamic, Landblock→None — all real bug closures. - Apparatus state (GL state probe + per-cell audit + pool diagnostics). - Three theories for the residual issues (FrontFace=CW global match to WB / per-poly Stippling audit / WB side-by-side render). - Pickup prompt for next session with ranked options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d5deeb3314
commit
e415bb3863
1 changed files with 218 additions and 0 deletions
218
docs/research/2026-05-28-a8-session-2-shipped-and-handoff.md
Normal file
218
docs/research/2026-05-28-a8-session-2-shipped-and-handoff.md
Normal file
|
|
@ -0,0 +1,218 @@
|
|||
# Phase A8 — Session 2: pool fix shipped, 4 more fixes shipped, residual visuals remain (2026-05-28 PM)
|
||||
|
||||
## TL;DR for next session
|
||||
|
||||
The session-1 handoff said "BUILD APPARATUS, NOT MORE SPECULATIVE FIXES." I
|
||||
built apparatus (per-step GL state probe + per-cell mesh audit + pool
|
||||
diagnostics) AND, before the apparatus was used, line-by-line audited
|
||||
`EnvCellRenderer.cs` against WB source. The audit found **two
|
||||
high-confidence bugs** (pool aliasing) in 30 minutes — these were the
|
||||
root cause of the post-Wave-5 catastrophic visual chaos. Pool fix shipped
|
||||
(`9559726`) and the visual went from "thin black diagonal sliver, GPU
|
||||
100%, 10 FPS, can't see anything" to "walls + objects + sky render
|
||||
cleanly, FPS normal."
|
||||
|
||||
Five more targeted fixes shipped across visual gates #1-#5. The first
|
||||
four landed real bugs. The fifth (cull-restore revert) was based on a
|
||||
hypothesis the [draworder] probe data invalidated — gate-#5 showed cull
|
||||
state was already off at Step 3 before EnvCellRenderer.Render ran, so
|
||||
the propagation theory didn't apply.
|
||||
|
||||
**Per systematic-debugging skill's `≥3-failures → question architecture`
|
||||
rule, I stopped and wrote this handoff rather than ship a 6th speculative
|
||||
fix.** The remaining symptoms (transparent floor, texture warping,
|
||||
distortion) point to architectural-level issues that need a different
|
||||
investigation approach.
|
||||
|
||||
## Visual progress chronicle
|
||||
|
||||
| Gate | Symptoms reported | Cause if known |
|
||||
|------|-------------------|---|
|
||||
| Pre-session (from session-1 handoff) | "Thin black diagonal sliver, GPU 100%, 10 FPS, can't see anything" | Pool aliasing (cleared by session-2 commit `9559726`) |
|
||||
| Gate #1 (`375f9a7` + sky-fix not yet) | Walls + objects render, no flicker, FPS normal. No sky through windows. Char + doors missing. Floor missing. Purple tint on walls. | Pool fixed (huge win). LiveDynamic/sky/cull not yet addressed. |
|
||||
| Gate #2 (sky fix + audit probe) | Sky visible through windows ✓. Char + doors still missing. Floor still missing. Purple still. | Sky fix worked. Audit dumped per-cell render data. |
|
||||
| Gate #3 (LiveDynamic + cull-disable A/B) | Char + doors visible ✓. Floor sometimes visible. See-through-head (cull-off side effect). | LiveDynamic fix worked. Cull-disable proved cull was hiding floor. |
|
||||
| Gate #4 (Landblock→None + cull-restore) | "BROKEN textures, floor is now transparent" — sky visible through floor | Cull-restore at exit propagated cull-back to dispatcher's IndoorPass, culling cottage shell's floor poly. |
|
||||
| Gate #5 (revert cull-restore) | "No change at all, textures warped, missing textures, floors transparent and flickering" | Revert didn't help — [draworder] probe shows cull was already off at Step 3 entry, so removing my cull-restore at exit doesn't change inherited state. |
|
||||
|
||||
## What's shipped this session
|
||||
|
||||
| SHA | Description | Status |
|
||||
|-----|-------------|--------|
|
||||
| `9559726` | Pool aliasing root cause fix (Clear + PostPreparePoolIndex + nested-Setup detection) + 4 regression tests + audit findings doc | **KEPT — closes the post-Wave-5 chaos** |
|
||||
| `375f9a7` | Full GL state probe + pool diagnostics extension (option-1 apparatus) | **KEPT — apparatus** |
|
||||
| `772d69c` | Sky-when-cameraInsideBuilding fix + per-cell audit probe | **KEPT — sky through windows works** |
|
||||
| `b19f3c1` | LiveDynamic dispatcher call in indoor branch + ACDREAM_A8_DISABLE_CULL A/B gate | **KEPT — chars + doors visible inside** |
|
||||
| `0940d79` | Cell-mesh Landblock CullMode → None + cull-state restore at exit | **PARTIALLY KEPT — Landblock→None is good; cull-restore was wrong (reverted in d5deeb3)** |
|
||||
| `d5deeb3` | Revert cull-restore at EnvCellRenderer exit | **KEPT — leaves cull-off propagating** |
|
||||
|
||||
## What's still wrong (visual gate #5 state)
|
||||
|
||||
User-reported symptoms with kill-switch ON (`ACDREAM_A8_INDOOR_BRANCH=1`):
|
||||
|
||||
1. **Floor transparent** — sky color visible where floor should be. Cell
|
||||
mesh has Landblock→None override that should render cell polys
|
||||
double-sided, but the floor poly either (a) isn't in the upload, (b)
|
||||
has wrong winding/orientation, or (c) is being rendered but z-fails or
|
||||
alpha-discards.
|
||||
|
||||
2. **Texture warping** — vague but visible in screenshots. Some surfaces
|
||||
show wrong texture or texture appears stretched/distorted.
|
||||
|
||||
3. **Flickering** — surfaces alternate between visible/invisible across
|
||||
frames. Could be Z-fighting (cell mesh vs cottage shell at same depth),
|
||||
alpha-test threshold instability, or animated camera causing
|
||||
per-frame frustum-test results to differ.
|
||||
|
||||
4. **General distortion** — overall scene "looks broken." Possibly purple
|
||||
tint on lighting (mentioned in gates #1-#3, not explicitly in #5).
|
||||
|
||||
## Apparatus state
|
||||
|
||||
These probes are wired and operate when env vars are set:
|
||||
|
||||
- `ACDREAM_PROBE_VIS=1` — emits `[draworder]` (per-step GL state),
|
||||
`[stencil]` (per stencil mark/punch), `[buildings]` (camera-building
|
||||
list), `[envcells]` (cells + tris + pool stats).
|
||||
- `ACDREAM_A8_AUDIT=1` — one-shot per (cellId, gfxObjId) pair dump of
|
||||
render data: batches count, total IndexCount, CullModes encountered,
|
||||
IsTransparent + IsAdditive flags, BindlessTextureHandle == 0 count.
|
||||
|
||||
Sample audit data captured in gate-#2 (`a8-visual-gate-2.log`):
|
||||
```
|
||||
[a8-audit] cell=0xA9B4013F gfx=0x7F852B220B93AD instances=1 isSetup=False batches=4 totalIdx=144 cull=[Landblock] translucent=0 additive=0 zeroHandle=0
|
||||
```
|
||||
Every cell mesh batch has CullMode=Landblock (uniform). Render data
|
||||
loads correctly (no nulls, no zero handles).
|
||||
|
||||
Sample [draworder] data captured in gate-#5 (`a8-visual-gate-5.log`):
|
||||
```
|
||||
[draworder] frame=155 step=3 stencil=off depthFn=0x201 depthMask=True cull=off(back) blend=0x302/0x303 sFunc=0x207:1:0xFF sOp=0x1E00/0x1E00/0x1E01 sMask=0x1 cMask=(RGB-) vao=0 prog=6
|
||||
```
|
||||
Cull is OFF at Step 3 entry (Step 1's `gl.Disable(EnableCap.CullFace)`
|
||||
already disabled it; my cull-restore-at-exit revert had no effect on
|
||||
incoming state).
|
||||
|
||||
## Root-cause analysis — why the speculative fixes can't close it
|
||||
|
||||
### Theory A: AC's polygon winding requires `glFrontFace(CW)`
|
||||
|
||||
WB sets `glFrontFace(GLEnum.CW)` globally at
|
||||
[GameScene.cs:843](references/WorldBuilder/Chorizite.OpenGLSDLBackend/GameScene.cs:843).
|
||||
Our `WbDrawDispatcher.cs:1056` sets `glFrontFace(CCW)` in the transparent
|
||||
pass with a comment claiming "our fan triangulation emits pos-side polys
|
||||
as (0, i, i+1) — CCW." But the actual triangulation in
|
||||
`BuildCellStructPolygonIndices` ([ObjectMeshManager.cs:1518-1586](src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs:1518))
|
||||
emits `(i, i-1, 0)` — the REVERSE of (0, i, i+1). The comment is wrong
|
||||
about our actual winding.
|
||||
|
||||
If AC's polys are wound CCW from their PosSurface side (the "front" side
|
||||
in retail convention), our triangulation produces CW-from-PosSurface
|
||||
triangles. WB's `FrontFace=CW` makes CW = front, so cull-back removes
|
||||
the back side correctly. Our `FrontFace=CCW` makes CCW = front, so
|
||||
cull-back removes the WRONG side — hiding polys whose PosSurface is
|
||||
camera-facing.
|
||||
|
||||
**Verification approach**: change `FrontFace` to CW globally (matching
|
||||
WB at GameScene.cs:843) and audit every consumer (sky, particles, UI,
|
||||
translucent crystal mesh) for impact. The dispatcher's CCW set at
|
||||
line 1056 has a comment about a Phase 9.2 fix (lifestone crystal
|
||||
see-through-hollow-interior) — that fix might have papered over the
|
||||
underlying FrontFace mismatch instead of fixing it properly.
|
||||
|
||||
**Risk**: changing FrontFace globally might re-introduce the
|
||||
hollow-interior bug for closed-shell translucent meshes. Needs careful
|
||||
audit and possibly per-renderer FrontFace push/pop.
|
||||
|
||||
### Theory B: Cell polys' floor is filtered out at upload time
|
||||
|
||||
`PrepareCellStructMeshData` ([ObjectMeshManager.cs:1295-1306](src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs:1295)):
|
||||
|
||||
```csharp
|
||||
if (!poly.Stippling.HasFlag(StipplingType.NoPos))
|
||||
AddSurfaceToBatch(poly, poly.PosSurface, false);
|
||||
|
||||
bool hasNeg = poly.Stippling.HasFlag(StipplingType.Negative) ||
|
||||
poly.Stippling.HasFlag(StipplingType.Both) ||
|
||||
(!poly.Stippling.HasFlag(StipplingType.NoNeg) && poly.SidesType == CullMode.Clockwise);
|
||||
if (hasNeg)
|
||||
AddSurfaceToBatch(poly, poly.NegSurface, true);
|
||||
```
|
||||
|
||||
For a floor poly with `Stippling=NoPos + SidesType=Landblock + no
|
||||
Negative/Both flag`, NEITHER side is uploaded → no rendering at all.
|
||||
Plausible if AC encodes floor polys this way.
|
||||
|
||||
**Verification approach**: dump per-poly Stippling + SidesType + PosSurface
|
||||
+ NegSurface values for cells. Add to the audit probe.
|
||||
|
||||
### Theory C: cottage shell has no floor poly + cell mesh's floor is broken
|
||||
|
||||
In retail AC, the cottage's "shell" GfxObj (from `info.Buildings[i].ModelId`)
|
||||
contains walls + roof + door frame. The floor is provided entirely by the
|
||||
cell's CellStruct PosSurface polygons. If our cell mesh's floor poly is
|
||||
broken (winding, missing, wrong texture), nothing else fills in.
|
||||
|
||||
**Verification approach**: run WB's executable against the same dat,
|
||||
take a screenshot from the same camera position inside the same cottage,
|
||||
diff against our screenshot. Identifies whether the floor source is
|
||||
the cell mesh or somewhere else.
|
||||
|
||||
## Process retrospective — what worked this session
|
||||
|
||||
1. **Audit BEFORE apparatus**: line-by-line read of EnvCellRenderer vs
|
||||
WB source found the pool bug in 30 min. The handoff doc warned about
|
||||
subagent-written code never being audited; that was the right warning.
|
||||
|
||||
2. **Apparatus shipped alongside fix**: GL state probe + audit dumps
|
||||
captured concrete data that informed subsequent fixes. Gates #1-#5
|
||||
all relied on probe data, not pure visual.
|
||||
|
||||
3. **Stopping after 4 fixes**: per systematic-debugging skill. The
|
||||
alternative (a 6th speculative attempt) would have either burned more
|
||||
user testing cycles or shipped another band-aid.
|
||||
|
||||
## What this session did NOT do (in scope for next session)
|
||||
|
||||
- Match WB's `glFrontFace(CW)` globally + audit consumers.
|
||||
- Inspect per-poly Stippling/SidesType for cell floors.
|
||||
- WB renderer side-by-side comparison.
|
||||
- Investigate purple tint on walls (lighting / scene UBO).
|
||||
- Investigate texture warping (UV / sampler issues).
|
||||
- Investigate flickering (Z-fighting / alpha threshold).
|
||||
- Remove the ACDREAM_A8_INDOOR_BRANCH kill-switch (still needed; default
|
||||
OFF restores pre-A8 behavior).
|
||||
|
||||
## Pickup prompt for next session
|
||||
|
||||
> Phase A8 indoor branch is partially working as of `d5deeb3`. Pool
|
||||
> aliasing root cause is fixed. Sky-through-windows, LiveDynamic chars,
|
||||
> cell-mesh double-sided rendering all work. But the floor is transparent
|
||||
> (sky visible through it), textures warp, and the scene has residual
|
||||
> distortion + flickering.
|
||||
>
|
||||
> Read this doc end-to-end. Then pick ONE of the three theories above
|
||||
> and verify before any code change:
|
||||
>
|
||||
> 1. **Theory A (FrontFace=CW)**: highest-leverage. WB sets CW globally;
|
||||
> we set CCW. Audit translucent crystal + sky shaders' winding
|
||||
> assumption first. If safe, set FrontFace=CW globally and visual-gate.
|
||||
>
|
||||
> 2. **Theory B (cell-poly filtered)**: extend the existing
|
||||
> `ACDREAM_A8_AUDIT=1` probe to dump per-poly Stippling + SidesType
|
||||
> + PosSurface/NegSurface for a few cells. Live-capture data; check
|
||||
> if any floor poly is "no upload" per the conditional.
|
||||
>
|
||||
> 3. **Theory C (WB side-by-side)**: build WB's executable from
|
||||
> `references/WorldBuilder/`, point at same dat dir, screenshot same
|
||||
> cottage interior. Compare. Confirms or rules out our cell mesh
|
||||
> upload as the source of the bug.
|
||||
>
|
||||
> The kill-switch (`ACDREAM_A8_INDOOR_BRANCH=1`) remains the way to
|
||||
> reproduce the indoor branch. Pre-A8 behavior (kill-switch unset) is
|
||||
> still the default and unchanged.
|
||||
>
|
||||
> User authorization: "use superpowers but DONT stop me for questions,
|
||||
> be perfect, no bandaids." The "no bandaids" rule is why this session
|
||||
> stopped at fix #5 and wrote the handoff instead of attempting fix #6.
|
||||
> Carry that discipline forward.
|
||||
Loading…
Add table
Add a link
Reference in a new issue