After 5 visual gates, the session shipped 5 commits closing real bugs (pool aliasing was the catastrophic root cause), but residual symptoms (transparent floor, texture warping, flickering, distortion) didn't yield to surgical fixes. Per systematic-debugging skill's >=3-failures rule, stop and capture state. Doc covers: - Pool aliasing root cause + fix (the big win — closes session-1's visual chaos). - Sky-when-building, LiveDynamic, Landblock→None — all real bug closures. - Apparatus state (GL state probe + per-cell audit + pool diagnostics). - Three theories for the residual issues (FrontFace=CW global match to WB / per-poly Stippling audit / WB side-by-side render). - Pickup prompt for next session with ranked options. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
Phase A8 — Session 2: pool fix shipped, 4 more fixes shipped, residual visuals remain (2026-05-28 PM)
TL;DR for next session
The session-1 handoff said "BUILD APPARATUS, NOT MORE SPECULATIVE FIXES." I
built apparatus (per-step GL state probe + per-cell mesh audit + pool
diagnostics) AND, before the apparatus was used, line-by-line audited
EnvCellRenderer.cs against WB source. The audit found two
high-confidence bugs (pool aliasing) in 30 minutes — these were the
root cause of the post-Wave-5 catastrophic visual chaos. Pool fix shipped
(9559726) and the visual went from "thin black diagonal sliver, GPU
100%, 10 FPS, can't see anything" to "walls + objects + sky render
cleanly, FPS normal."
Five more targeted fixes shipped across visual gates #1-#5. The first four landed real bugs. The fifth (cull-restore revert) was based on a hypothesis the [draworder] probe data invalidated — gate-#5 showed cull state was already off at Step 3 before EnvCellRenderer.Render ran, so the propagation theory didn't apply.
Per systematic-debugging skill's ≥3-failures → question architecture
rule, I stopped and wrote this handoff rather than ship a 6th speculative
fix. The remaining symptoms (transparent floor, texture warping,
distortion) point to architectural-level issues that need a different
investigation approach.
Visual progress chronicle
| Gate | Symptoms reported | Cause if known |
|---|---|---|
| Pre-session (from session-1 handoff) | "Thin black diagonal sliver, GPU 100%, 10 FPS, can't see anything" | Pool aliasing (cleared by session-2 commit 9559726) |
Gate #1 (375f9a7 + sky-fix not yet) |
Walls + objects render, no flicker, FPS normal. No sky through windows. Char + doors missing. Floor missing. Purple tint on walls. | Pool fixed (huge win). LiveDynamic/sky/cull not yet addressed. |
| Gate #2 (sky fix + audit probe) | Sky visible through windows ✓. Char + doors still missing. Floor still missing. Purple still. | Sky fix worked. Audit dumped per-cell render data. |
| Gate #3 (LiveDynamic + cull-disable A/B) | Char + doors visible ✓. Floor sometimes visible. See-through-head (cull-off side effect). | LiveDynamic fix worked. Cull-disable proved cull was hiding floor. |
| Gate #4 (Landblock→None + cull-restore) | "BROKEN textures, floor is now transparent" — sky visible through floor | Cull-restore at exit propagated cull-back to dispatcher's IndoorPass, culling cottage shell's floor poly. |
| Gate #5 (revert cull-restore) | "No change at all, textures warped, missing textures, floors transparent and flickering" | Revert didn't help — [draworder] probe shows cull was already off at Step 3 entry, so removing my cull-restore at exit doesn't change inherited state. |
What's shipped this session
| SHA | Description | Status |
|---|---|---|
9559726 |
Pool aliasing root cause fix (Clear + PostPreparePoolIndex + nested-Setup detection) + 4 regression tests + audit findings doc | KEPT — closes the post-Wave-5 chaos |
375f9a7 |
Full GL state probe + pool diagnostics extension (option-1 apparatus) | KEPT — apparatus |
772d69c |
Sky-when-cameraInsideBuilding fix + per-cell audit probe | KEPT — sky through windows works |
b19f3c1 |
LiveDynamic dispatcher call in indoor branch + ACDREAM_A8_DISABLE_CULL A/B gate | KEPT — chars + doors visible inside |
0940d79 |
Cell-mesh Landblock CullMode → None + cull-state restore at exit | PARTIALLY KEPT — Landblock→None is good; cull-restore was wrong (reverted in d5deeb3) |
d5deeb3 |
Revert cull-restore at EnvCellRenderer exit | KEPT — leaves cull-off propagating |
What's still wrong (visual gate #5 state)
User-reported symptoms with kill-switch ON (ACDREAM_A8_INDOOR_BRANCH=1):
-
Floor transparent — sky color visible where floor should be. Cell mesh has Landblock→None override that should render cell polys double-sided, but the floor poly either (a) isn't in the upload, (b) has wrong winding/orientation, or (c) is being rendered but z-fails or alpha-discards.
-
Texture warping — vague but visible in screenshots. Some surfaces show wrong texture or texture appears stretched/distorted.
-
Flickering — surfaces alternate between visible/invisible across frames. Could be Z-fighting (cell mesh vs cottage shell at same depth), alpha-test threshold instability, or animated camera causing per-frame frustum-test results to differ.
-
General distortion — overall scene "looks broken." Possibly purple tint on lighting (mentioned in gates #1-#3, not explicitly in #5).
Apparatus state
These probes are wired and operate when env vars are set:
ACDREAM_PROBE_VIS=1— emits[draworder](per-step GL state),[stencil](per stencil mark/punch),[buildings](camera-building list),[envcells](cells + tris + pool stats).ACDREAM_A8_AUDIT=1— one-shot per (cellId, gfxObjId) pair dump of render data: batches count, total IndexCount, CullModes encountered, IsTransparent + IsAdditive flags, BindlessTextureHandle == 0 count.
Sample audit data captured in gate-#2 (a8-visual-gate-2.log):
[a8-audit] cell=0xA9B4013F gfx=0x7F852B220B93AD instances=1 isSetup=False batches=4 totalIdx=144 cull=[Landblock] translucent=0 additive=0 zeroHandle=0
Every cell mesh batch has CullMode=Landblock (uniform). Render data loads correctly (no nulls, no zero handles).
Sample [draworder] data captured in gate-#5 (a8-visual-gate-5.log):
[draworder] frame=155 step=3 stencil=off depthFn=0x201 depthMask=True cull=off(back) blend=0x302/0x303 sFunc=0x207:1:0xFF sOp=0x1E00/0x1E00/0x1E01 sMask=0x1 cMask=(RGB-) vao=0 prog=6
Cull is OFF at Step 3 entry (Step 1's gl.Disable(EnableCap.CullFace)
already disabled it; my cull-restore-at-exit revert had no effect on
incoming state).
Root-cause analysis — why the speculative fixes can't close it
Theory A: AC's polygon winding requires glFrontFace(CW)
WB sets glFrontFace(GLEnum.CW) globally at
GameScene.cs:843.
Our WbDrawDispatcher.cs:1056 sets glFrontFace(CCW) in the transparent
pass with a comment claiming "our fan triangulation emits pos-side polys
as (0, i, i+1) — CCW." But the actual triangulation in
BuildCellStructPolygonIndices (ObjectMeshManager.cs:1518-1586)
emits (i, i-1, 0) — the REVERSE of (0, i, i+1). The comment is wrong
about our actual winding.
If AC's polys are wound CCW from their PosSurface side (the "front" side
in retail convention), our triangulation produces CW-from-PosSurface
triangles. WB's FrontFace=CW makes CW = front, so cull-back removes
the back side correctly. Our FrontFace=CCW makes CCW = front, so
cull-back removes the WRONG side — hiding polys whose PosSurface is
camera-facing.
Verification approach: change FrontFace to CW globally (matching
WB at GameScene.cs:843) and audit every consumer (sky, particles, UI,
translucent crystal mesh) for impact. The dispatcher's CCW set at
line 1056 has a comment about a Phase 9.2 fix (lifestone crystal
see-through-hollow-interior) — that fix might have papered over the
underlying FrontFace mismatch instead of fixing it properly.
Risk: changing FrontFace globally might re-introduce the hollow-interior bug for closed-shell translucent meshes. Needs careful audit and possibly per-renderer FrontFace push/pop.
Theory B: Cell polys' floor is filtered out at upload time
PrepareCellStructMeshData (ObjectMeshManager.cs:1295-1306):
if (!poly.Stippling.HasFlag(StipplingType.NoPos))
AddSurfaceToBatch(poly, poly.PosSurface, false);
bool hasNeg = poly.Stippling.HasFlag(StipplingType.Negative) ||
poly.Stippling.HasFlag(StipplingType.Both) ||
(!poly.Stippling.HasFlag(StipplingType.NoNeg) && poly.SidesType == CullMode.Clockwise);
if (hasNeg)
AddSurfaceToBatch(poly, poly.NegSurface, true);
For a floor poly with Stippling=NoPos + SidesType=Landblock + no Negative/Both flag, NEITHER side is uploaded → no rendering at all.
Plausible if AC encodes floor polys this way.
Verification approach: dump per-poly Stippling + SidesType + PosSurface
- NegSurface values for cells. Add to the audit probe.
Theory C: cottage shell has no floor poly + cell mesh's floor is broken
In retail AC, the cottage's "shell" GfxObj (from info.Buildings[i].ModelId)
contains walls + roof + door frame. The floor is provided entirely by the
cell's CellStruct PosSurface polygons. If our cell mesh's floor poly is
broken (winding, missing, wrong texture), nothing else fills in.
Verification approach: run WB's executable against the same dat, take a screenshot from the same camera position inside the same cottage, diff against our screenshot. Identifies whether the floor source is the cell mesh or somewhere else.
Process retrospective — what worked this session
-
Audit BEFORE apparatus: line-by-line read of EnvCellRenderer vs WB source found the pool bug in 30 min. The handoff doc warned about subagent-written code never being audited; that was the right warning.
-
Apparatus shipped alongside fix: GL state probe + audit dumps captured concrete data that informed subsequent fixes. Gates #1-#5 all relied on probe data, not pure visual.
-
Stopping after 4 fixes: per systematic-debugging skill. The alternative (a 6th speculative attempt) would have either burned more user testing cycles or shipped another band-aid.
What this session did NOT do (in scope for next session)
- Match WB's
glFrontFace(CW)globally + audit consumers. - Inspect per-poly Stippling/SidesType for cell floors.
- WB renderer side-by-side comparison.
- Investigate purple tint on walls (lighting / scene UBO).
- Investigate texture warping (UV / sampler issues).
- Investigate flickering (Z-fighting / alpha threshold).
- Remove the ACDREAM_A8_INDOOR_BRANCH kill-switch (still needed; default OFF restores pre-A8 behavior).
Pickup prompt for next session
Phase A8 indoor branch is partially working as of
d5deeb3. Pool aliasing root cause is fixed. Sky-through-windows, LiveDynamic chars, cell-mesh double-sided rendering all work. But the floor is transparent (sky visible through it), textures warp, and the scene has residual distortion + flickering.Read this doc end-to-end. Then pick ONE of the three theories above and verify before any code change:
Theory A (FrontFace=CW): highest-leverage. WB sets CW globally; we set CCW. Audit translucent crystal + sky shaders' winding assumption first. If safe, set FrontFace=CW globally and visual-gate.
Theory B (cell-poly filtered): extend the existing
ACDREAM_A8_AUDIT=1probe to dump per-poly Stippling + SidesType
- PosSurface/NegSurface for a few cells. Live-capture data; check if any floor poly is "no upload" per the conditional.
Theory C (WB side-by-side): build WB's executable from
references/WorldBuilder/, point at same dat dir, screenshot same cottage interior. Compare. Confirms or rules out our cell mesh upload as the source of the bug.The kill-switch (
ACDREAM_A8_INDOOR_BRANCH=1) remains the way to reproduce the indoor branch. Pre-A8 behavior (kill-switch unset) is still the default and unchanged.User authorization: "use superpowers but DONT stop me for questions, be perfect, no bandaids." The "no bandaids" rule is why this session stopped at fix #5 and wrote the handoff instead of attempting fix #6. Carry that discipline forward.