docs: Phase A8 — session 2 handoff (pool fix shipped + 4 partial fixes + residuals)
After 5 visual gates, the session shipped 5 commits closing real bugs (pool aliasing was the catastrophic root cause), but residual symptoms (transparent floor, texture warping, flickering, distortion) didn't yield to surgical fixes. Per systematic-debugging skill's >=3-failures rule, stop and capture state. Doc covers: - Pool aliasing root cause + fix (the big win — closes session-1's visual chaos). - Sky-when-building, LiveDynamic, Landblock→None — all real bug closures. - Apparatus state (GL state probe + per-cell audit + pool diagnostics). - Three theories for the residual issues (FrontFace=CW global match to WB / per-poly Stippling audit / WB side-by-side render). - Pickup prompt for next session with ranked options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d5deeb3314
commit
e415bb3863
1 changed files with 218 additions and 0 deletions
218
docs/research/2026-05-28-a8-session-2-shipped-and-handoff.md
Normal file
218
docs/research/2026-05-28-a8-session-2-shipped-and-handoff.md
Normal file
|
|
@ -0,0 +1,218 @@
|
||||||
|
# Phase A8 — Session 2: pool fix shipped, 4 more fixes shipped, residual visuals remain (2026-05-28 PM)
|
||||||
|
|
||||||
|
## TL;DR for next session
|
||||||
|
|
||||||
|
The session-1 handoff said "BUILD APPARATUS, NOT MORE SPECULATIVE FIXES." I
|
||||||
|
built apparatus (per-step GL state probe + per-cell mesh audit + pool
|
||||||
|
diagnostics) AND, before the apparatus was used, line-by-line audited
|
||||||
|
`EnvCellRenderer.cs` against WB source. The audit found **two
|
||||||
|
high-confidence bugs** (pool aliasing) in 30 minutes — these were the
|
||||||
|
root cause of the post-Wave-5 catastrophic visual chaos. Pool fix shipped
|
||||||
|
(`9559726`) and the visual went from "thin black diagonal sliver, GPU
|
||||||
|
100%, 10 FPS, can't see anything" to "walls + objects + sky render
|
||||||
|
cleanly, FPS normal."
|
||||||
|
|
||||||
|
Five more targeted fixes shipped across visual gates #1-#5. The first
|
||||||
|
four landed real bugs. The fifth (cull-restore revert) was based on a
|
||||||
|
hypothesis the [draworder] probe data invalidated — gate-#5 showed cull
|
||||||
|
state was already off at Step 3 before EnvCellRenderer.Render ran, so
|
||||||
|
the propagation theory didn't apply.
|
||||||
|
|
||||||
|
**Per systematic-debugging skill's `≥3-failures → question architecture`
|
||||||
|
rule, I stopped and wrote this handoff rather than ship a 6th speculative
|
||||||
|
fix.** The remaining symptoms (transparent floor, texture warping,
|
||||||
|
distortion) point to architectural-level issues that need a different
|
||||||
|
investigation approach.
|
||||||
|
|
||||||
|
## Visual progress chronicle
|
||||||
|
|
||||||
|
| Gate | Symptoms reported | Cause if known |
|
||||||
|
|------|-------------------|---|
|
||||||
|
| Pre-session (from session-1 handoff) | "Thin black diagonal sliver, GPU 100%, 10 FPS, can't see anything" | Pool aliasing (cleared by session-2 commit `9559726`) |
|
||||||
|
| Gate #1 (`375f9a7` + sky-fix not yet) | Walls + objects render, no flicker, FPS normal. No sky through windows. Char + doors missing. Floor missing. Purple tint on walls. | Pool fixed (huge win). LiveDynamic/sky/cull not yet addressed. |
|
||||||
|
| Gate #2 (sky fix + audit probe) | Sky visible through windows ✓. Char + doors still missing. Floor still missing. Purple still. | Sky fix worked. Audit dumped per-cell render data. |
|
||||||
|
| Gate #3 (LiveDynamic + cull-disable A/B) | Char + doors visible ✓. Floor sometimes visible. See-through-head (cull-off side effect). | LiveDynamic fix worked. Cull-disable proved cull was hiding floor. |
|
||||||
|
| Gate #4 (Landblock→None + cull-restore) | "BROKEN textures, floor is now transparent" — sky visible through floor | Cull-restore at exit propagated cull-back to dispatcher's IndoorPass, culling cottage shell's floor poly. |
|
||||||
|
| Gate #5 (revert cull-restore) | "No change at all, textures warped, missing textures, floors transparent and flickering" | Revert didn't help — [draworder] probe shows cull was already off at Step 3 entry, so removing my cull-restore at exit doesn't change inherited state. |
|
||||||
|
|
||||||
|
## What's shipped this session
|
||||||
|
|
||||||
|
| SHA | Description | Status |
|
||||||
|
|-----|-------------|--------|
|
||||||
|
| `9559726` | Pool aliasing root cause fix (Clear + PostPreparePoolIndex + nested-Setup detection) + 4 regression tests + audit findings doc | **KEPT — closes the post-Wave-5 chaos** |
|
||||||
|
| `375f9a7` | Full GL state probe + pool diagnostics extension (option-1 apparatus) | **KEPT — apparatus** |
|
||||||
|
| `772d69c` | Sky-when-cameraInsideBuilding fix + per-cell audit probe | **KEPT — sky through windows works** |
|
||||||
|
| `b19f3c1` | LiveDynamic dispatcher call in indoor branch + ACDREAM_A8_DISABLE_CULL A/B gate | **KEPT — chars + doors visible inside** |
|
||||||
|
| `0940d79` | Cell-mesh Landblock CullMode → None + cull-state restore at exit | **PARTIALLY KEPT — Landblock→None is good; cull-restore was wrong (reverted in d5deeb3)** |
|
||||||
|
| `d5deeb3` | Revert cull-restore at EnvCellRenderer exit | **KEPT — leaves cull-off propagating** |
|
||||||
|
|
||||||
|
## What's still wrong (visual gate #5 state)
|
||||||
|
|
||||||
|
User-reported symptoms with kill-switch ON (`ACDREAM_A8_INDOOR_BRANCH=1`):
|
||||||
|
|
||||||
|
1. **Floor transparent** — sky color visible where floor should be. Cell
|
||||||
|
mesh has Landblock→None override that should render cell polys
|
||||||
|
double-sided, but the floor poly either (a) isn't in the upload, (b)
|
||||||
|
has wrong winding/orientation, or (c) is being rendered but z-fails or
|
||||||
|
alpha-discards.
|
||||||
|
|
||||||
|
2. **Texture warping** — vague but visible in screenshots. Some surfaces
|
||||||
|
show wrong texture or texture appears stretched/distorted.
|
||||||
|
|
||||||
|
3. **Flickering** — surfaces alternate between visible/invisible across
|
||||||
|
frames. Could be Z-fighting (cell mesh vs cottage shell at same depth),
|
||||||
|
alpha-test threshold instability, or animated camera causing
|
||||||
|
per-frame frustum-test results to differ.
|
||||||
|
|
||||||
|
4. **General distortion** — overall scene "looks broken." Possibly purple
|
||||||
|
tint on lighting (mentioned in gates #1-#3, not explicitly in #5).
|
||||||
|
|
||||||
|
## Apparatus state
|
||||||
|
|
||||||
|
These probes are wired and operate when env vars are set:
|
||||||
|
|
||||||
|
- `ACDREAM_PROBE_VIS=1` — emits `[draworder]` (per-step GL state),
|
||||||
|
`[stencil]` (per stencil mark/punch), `[buildings]` (camera-building
|
||||||
|
list), `[envcells]` (cells + tris + pool stats).
|
||||||
|
- `ACDREAM_A8_AUDIT=1` — one-shot per (cellId, gfxObjId) pair dump of
|
||||||
|
render data: batches count, total IndexCount, CullModes encountered,
|
||||||
|
IsTransparent + IsAdditive flags, BindlessTextureHandle == 0 count.
|
||||||
|
|
||||||
|
Sample audit data captured in gate-#2 (`a8-visual-gate-2.log`):
|
||||||
|
```
|
||||||
|
[a8-audit] cell=0xA9B4013F gfx=0x7F852B220B93AD instances=1 isSetup=False batches=4 totalIdx=144 cull=[Landblock] translucent=0 additive=0 zeroHandle=0
|
||||||
|
```
|
||||||
|
Every cell mesh batch has CullMode=Landblock (uniform). Render data
|
||||||
|
loads correctly (no nulls, no zero handles).
|
||||||
|
|
||||||
|
Sample [draworder] data captured in gate-#5 (`a8-visual-gate-5.log`):
|
||||||
|
```
|
||||||
|
[draworder] frame=155 step=3 stencil=off depthFn=0x201 depthMask=True cull=off(back) blend=0x302/0x303 sFunc=0x207:1:0xFF sOp=0x1E00/0x1E00/0x1E01 sMask=0x1 cMask=(RGB-) vao=0 prog=6
|
||||||
|
```
|
||||||
|
Cull is OFF at Step 3 entry (Step 1's `gl.Disable(EnableCap.CullFace)`
|
||||||
|
already disabled it; my cull-restore-at-exit revert had no effect on
|
||||||
|
incoming state).
|
||||||
|
|
||||||
|
## Root-cause analysis — why the speculative fixes can't close it
|
||||||
|
|
||||||
|
### Theory A: AC's polygon winding requires `glFrontFace(CW)`
|
||||||
|
|
||||||
|
WB sets `glFrontFace(GLEnum.CW)` globally at
|
||||||
|
[GameScene.cs:843](references/WorldBuilder/Chorizite.OpenGLSDLBackend/GameScene.cs:843).
|
||||||
|
Our `WbDrawDispatcher.cs:1056` sets `glFrontFace(CCW)` in the transparent
|
||||||
|
pass with a comment claiming "our fan triangulation emits pos-side polys
|
||||||
|
as (0, i, i+1) — CCW." But the actual triangulation in
|
||||||
|
`BuildCellStructPolygonIndices` ([ObjectMeshManager.cs:1518-1586](src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs:1518))
|
||||||
|
emits `(i, i-1, 0)` — the REVERSE of (0, i, i+1). The comment is wrong
|
||||||
|
about our actual winding.
|
||||||
|
|
||||||
|
If AC's polys are wound CCW from their PosSurface side (the "front" side
|
||||||
|
in retail convention), our triangulation produces CW-from-PosSurface
|
||||||
|
triangles. WB's `FrontFace=CW` makes CW = front, so cull-back removes
|
||||||
|
the back side correctly. Our `FrontFace=CCW` makes CCW = front, so
|
||||||
|
cull-back removes the WRONG side — hiding polys whose PosSurface is
|
||||||
|
camera-facing.
|
||||||
|
|
||||||
|
**Verification approach**: change `FrontFace` to CW globally (matching
|
||||||
|
WB at GameScene.cs:843) and audit every consumer (sky, particles, UI,
|
||||||
|
translucent crystal mesh) for impact. The dispatcher's CCW set at
|
||||||
|
line 1056 has a comment about a Phase 9.2 fix (lifestone crystal
|
||||||
|
see-through-hollow-interior) — that fix might have papered over the
|
||||||
|
underlying FrontFace mismatch instead of fixing it properly.
|
||||||
|
|
||||||
|
**Risk**: changing FrontFace globally might re-introduce the
|
||||||
|
hollow-interior bug for closed-shell translucent meshes. Needs careful
|
||||||
|
audit and possibly per-renderer FrontFace push/pop.
|
||||||
|
|
||||||
|
### Theory B: Cell polys' floor is filtered out at upload time
|
||||||
|
|
||||||
|
`PrepareCellStructMeshData` ([ObjectMeshManager.cs:1295-1306](src/AcDream.App/Rendering/Wb/ObjectMeshManager.cs:1295)):
|
||||||
|
|
||||||
|
```csharp
|
||||||
|
if (!poly.Stippling.HasFlag(StipplingType.NoPos))
|
||||||
|
AddSurfaceToBatch(poly, poly.PosSurface, false);
|
||||||
|
|
||||||
|
bool hasNeg = poly.Stippling.HasFlag(StipplingType.Negative) ||
|
||||||
|
poly.Stippling.HasFlag(StipplingType.Both) ||
|
||||||
|
(!poly.Stippling.HasFlag(StipplingType.NoNeg) && poly.SidesType == CullMode.Clockwise);
|
||||||
|
if (hasNeg)
|
||||||
|
AddSurfaceToBatch(poly, poly.NegSurface, true);
|
||||||
|
```
|
||||||
|
|
||||||
|
For a floor poly with `Stippling=NoPos + SidesType=Landblock + no
|
||||||
|
Negative/Both flag`, NEITHER side is uploaded → no rendering at all.
|
||||||
|
Plausible if AC encodes floor polys this way.
|
||||||
|
|
||||||
|
**Verification approach**: dump per-poly Stippling + SidesType + PosSurface
|
||||||
|
+ NegSurface values for cells. Add to the audit probe.
|
||||||
|
|
||||||
|
### Theory C: cottage shell has no floor poly + cell mesh's floor is broken
|
||||||
|
|
||||||
|
In retail AC, the cottage's "shell" GfxObj (from `info.Buildings[i].ModelId`)
|
||||||
|
contains walls + roof + door frame. The floor is provided entirely by the
|
||||||
|
cell's CellStruct PosSurface polygons. If our cell mesh's floor poly is
|
||||||
|
broken (winding, missing, wrong texture), nothing else fills in.
|
||||||
|
|
||||||
|
**Verification approach**: run WB's executable against the same dat,
|
||||||
|
take a screenshot from the same camera position inside the same cottage,
|
||||||
|
diff against our screenshot. Identifies whether the floor source is
|
||||||
|
the cell mesh or somewhere else.
|
||||||
|
|
||||||
|
## Process retrospective — what worked this session
|
||||||
|
|
||||||
|
1. **Audit BEFORE apparatus**: line-by-line read of EnvCellRenderer vs
|
||||||
|
WB source found the pool bug in 30 min. The handoff doc warned about
|
||||||
|
subagent-written code never being audited; that was the right warning.
|
||||||
|
|
||||||
|
2. **Apparatus shipped alongside fix**: GL state probe + audit dumps
|
||||||
|
captured concrete data that informed subsequent fixes. Gates #1-#5
|
||||||
|
all relied on probe data, not pure visual.
|
||||||
|
|
||||||
|
3. **Stopping after 4 fixes**: per systematic-debugging skill. The
|
||||||
|
alternative (a 6th speculative attempt) would have either burned more
|
||||||
|
user testing cycles or shipped another band-aid.
|
||||||
|
|
||||||
|
## What this session did NOT do (in scope for next session)
|
||||||
|
|
||||||
|
- Match WB's `glFrontFace(CW)` globally + audit consumers.
|
||||||
|
- Inspect per-poly Stippling/SidesType for cell floors.
|
||||||
|
- WB renderer side-by-side comparison.
|
||||||
|
- Investigate purple tint on walls (lighting / scene UBO).
|
||||||
|
- Investigate texture warping (UV / sampler issues).
|
||||||
|
- Investigate flickering (Z-fighting / alpha threshold).
|
||||||
|
- Remove the ACDREAM_A8_INDOOR_BRANCH kill-switch (still needed; default
|
||||||
|
OFF restores pre-A8 behavior).
|
||||||
|
|
||||||
|
## Pickup prompt for next session
|
||||||
|
|
||||||
|
> Phase A8 indoor branch is partially working as of `d5deeb3`. Pool
|
||||||
|
> aliasing root cause is fixed. Sky-through-windows, LiveDynamic chars,
|
||||||
|
> cell-mesh double-sided rendering all work. But the floor is transparent
|
||||||
|
> (sky visible through it), textures warp, and the scene has residual
|
||||||
|
> distortion + flickering.
|
||||||
|
>
|
||||||
|
> Read this doc end-to-end. Then pick ONE of the three theories above
|
||||||
|
> and verify before any code change:
|
||||||
|
>
|
||||||
|
> 1. **Theory A (FrontFace=CW)**: highest-leverage. WB sets CW globally;
|
||||||
|
> we set CCW. Audit translucent crystal + sky shaders' winding
|
||||||
|
> assumption first. If safe, set FrontFace=CW globally and visual-gate.
|
||||||
|
>
|
||||||
|
> 2. **Theory B (cell-poly filtered)**: extend the existing
|
||||||
|
> `ACDREAM_A8_AUDIT=1` probe to dump per-poly Stippling + SidesType
|
||||||
|
> + PosSurface/NegSurface for a few cells. Live-capture data; check
|
||||||
|
> if any floor poly is "no upload" per the conditional.
|
||||||
|
>
|
||||||
|
> 3. **Theory C (WB side-by-side)**: build WB's executable from
|
||||||
|
> `references/WorldBuilder/`, point at same dat dir, screenshot same
|
||||||
|
> cottage interior. Compare. Confirms or rules out our cell mesh
|
||||||
|
> upload as the source of the bug.
|
||||||
|
>
|
||||||
|
> The kill-switch (`ACDREAM_A8_INDOOR_BRANCH=1`) remains the way to
|
||||||
|
> reproduce the indoor branch. Pre-A8 behavior (kill-switch unset) is
|
||||||
|
> still the default and unchanged.
|
||||||
|
>
|
||||||
|
> User authorization: "use superpowers but DONT stop me for questions,
|
||||||
|
> be perfect, no bandaids." The "no bandaids" rule is why this session
|
||||||
|
> stopped at fix #5 and wrote the handoff instead of attempting fix #6.
|
||||||
|
> Carry that discipline forward.
|
||||||
Loading…
Add table
Add a link
Reference in a new issue