14 KiB
HANDOFF — #105 intermittent missing indoor textures × #110 near-plane correlation
✅ CLOSED 2026-06-10 (same day). Root cause: the per-frame staged-texture flush (WB
GameScene.cs:975→ObjectMeshManager.GenerateMipmaps()) was dropped in the N.4/O-T4 extraction; fixc787201, znear=0.1 re-landedd4b5c71. §4's "only credible link" (upload pressure) was exactly right. Read the close-out instead:2026-06-10-105-110-CLOSED-staged-texture-flush-drop.md. This document is historical.
Date: 2026-06-10 (late). Branch: claude/thirsty-goldberg-51bb9b, HEAD 8bd3492.
Status: #105 struck twice today with the dat-side tripwires SILENT (= GL-side); the
retail near-plane fix (137b4f2, 0.1 m) was bisect-implicated in those two runs and
REVERTED (8bd3492) pending this investigation. One investigation, three payoffs:
attribute + kill the chronic #105, settle whether the near plane is innocent (#110), and
re-land znear=0.1 which closes the §4 corner see-through-wall.
Read this top-to-bottom before touching code. The render digest
(claude-memory/project_render_pipeline_digest.md) carries the distilled state + the
DO-NOT-RETRY table — this doc is the deep-dive for THIS investigation.
0. TL;DR
- #105 (chronic since ~2026-06-08): indoor wall/cell textures intermittently render
wrong ("missing" — exact appearance white-vs-invisible NOT yet confirmed by the user;
question outstanding). Today it struck on 2 consecutive launches. The four dat-side
tripwires (
[dat-miss]/[tex-miss]/[tex-skip]/[cell-miss], commit7433b70) produced ZERO output on both bad runs → per the #105 protocol the failure is GL-side: staged mesh/texture upload, bindless handle creation/residency, or the per-batch handle plumbing — NOT a failed dat read. - #110: both bad runs were on the near-plane build (
znear=0.1,137b4f2); the very next run withznear=1.0(working-tree bisect) rendered clean. 2-bad-on-0.1 / 1-good-on-1.0 is suggestive, not conclusive — #105 is intermittent and could have coincided. No mechanism is known by which znear touches texturing (see §4 for the honest candidate list). The change is reverted; all four cameras carry a ⚠️ comment pointing here. - The leverage: if #105 is independent (most likely), fixing it exonerates the near
plane → re-land
0.1→ §4 corner fix complete. If the near plane genuinely raises #105's trigger probability (e.g. more close-up geometry → more upload pressure), the fix is still in the #105 path — the near plane just becomes the best repro lever.
1. Today's run matrix (the evidence)
| Run (log, worktree root, untracked) | Build (near) | Indoor textures | Notes |
|---|---|---|---|
flood-fix-gate.log |
dac8f6a (1.0) |
OK | user gated the flood fix: transitions clean |
flood-fix-gate2.log |
dac8f6a (1.0) |
OK (no complaint) | #107 spawn wedge run |
flood-fix-gate3.log |
dac8f6a (1.0) |
OK (no complaint) | #107 again |
nearplane-gate.log |
137b4f2 (0.1) |
UNKNOWN | user asked to relaunch without detail |
nearplane-gate2.log |
137b4f2 (0.1) |
MISSING | tripwires silent (checked: 0/0/0/0) |
nearplane-gate3.log |
137b4f2 (0.1) |
MISSING | fresh launch, same |
nearplane-bisect.log |
tree (1.0 on RetailChaseCamera) | OK | the bisect run |
Also relevant: the clean-launch #105 occurrence on 2026-06-09 (35-line log, zero errors,
PRE-near-plane) — proof #105 strikes on znear=1.0 builds too. The near plane cannot be
the sole cause; the open question is independence vs trigger-probability.
2. #105 history — what is already settled (DO NOT REDO)
From docs/research/2026-06-09-dat-reader-thread-safety-investigation.md + the digest:
- Concurrent dat READS are SAFE (Chorizite.DatReaderWriter 2.1.7): source audit + the
in-tree hammer
DatConcurrencyStressTests(~1.1 M concurrent reads, zero anomalies). The "thread-unsafe dat reader" lore is refuted for the read path. Do not re-litigate. - The teardown AccessViolations were dispose-during-read (decode pool + streamer not
quiesced before
DatCollection.Disposeunmapped views) — FIXED8fadf77. - The "heavy probes cause white walls" framing is PARTIAL at best — a clean 35-line
launch reproduced white walls; a heavily-probed run rendered fine. Probe load skews
timing (still avoid
ACDREAM_PROBE_FLAPfor visual gates) but is not the cause. - Every silent dat-miss exit is tripwired (
7433b70):[dat-miss](DatCollection returns null),[tex-miss]/[tex-skip](texture resolve/upload skips),[cell-miss](EnvCell load misses). Zero output when healthy. Both of today's bad runs: zero output → the dat → decode → staged-data side delivered; the loss is between staging and the draw.
3. The GL-side texture path — anatomy + where it can lose textures
The modern pipeline (N.4/N.5, mandatory — see reference_modern_rendering_pipeline.md):
- Decode/stage:
ObjectMeshManager.PrepareMeshDataAsync(id, isSetup)background- decodes mesh + texture data → auto-enqueues to_stagedMeshData. - Drain:
WbMeshAdapter.Tick()(render thread, per frame) drains the staged queue, creates GL resources, populatesAcSurfaceMetadataTable(per-batch translucency / luminosity / fog metadata). - Texture upload:
TextureCacheGetOrUpload*Bindless→ GL texture (parallel Texture2DArray uploads viaUploadRgba8AsLayer1Array) →glGetTextureHandleARB→glMakeTextureHandleResidentARB. Returns the 64-bit handle. - Per-batch plumbing: the handle lands in
ObjectRenderBatch.BindlessTextureHandle.- Entities:
WbDrawDispatcherPhase 5 uploads_batchSsbo(binding=1,(uvec2 handle, uint layer, uint flags)per group). - Cell shells:
EnvCellRenderer.RenderModernMDIInternalpacksModernBatchData { TextureHandle, TextureIndex }→_modernBatchBuffer(SSBO binding=1, bound at EnvCellRenderer.cs:1211).
- Entities:
- Sampling:
mesh_modern.fragconstructssampler2DArray(handle)from the uvec2. A zero handle samples garbage/black/white (undefined) — this is the classic "white walls" appearance.
Loss candidates between staging and draw (ranked):
- (a) Zero handle at draw time — the batch was prepared before its texture upload
completed, and nothing back-patches the handle. Known to exist transiently (textures
pop in); a bug would make it PERSIST. ⚠️ There is an EXISTING probe for exactly
this:
ACDREAM_PROBE_SHELL=1(RenderingDiagnostics.ProbeShellEnabled) prints, per visible cell, gfxObj/batch counts ANDzh=(zero-bindless-handle batch count) — see RenderingDiagnostics.cs:120-130. One bad-run launch with this probe splits the search space in half (zh>0 ⇒ upload/handle side; zh==0 ⇒ resident-but-wrong-content or sampling/state side). - (b) Residency loss / never-made-resident — handle non-zero but
MakeTextureHandleResidentARBskipped or undone → same visual, but zh probe reads 0. Needs a residency assert (glIsTextureHandleResidentARB sweep) or RenderDoc. - (c) Upload raced/dropped under pressure —
MaxCompletionsPerFrame(QualityPreset) caps streaming completions per frame; a drop/requeue bug under burst load would lose whole cells' textures. Would likely show as some cells white, others fine. - (d) Texture content wrong but handle valid — array-layer mixups (zh==0, content white). RenderDoc territory.
4. #110 — what znear can and cannot plausibly do (senior-dev honest list)
znear enters the system in exactly one object: the projection matrix
(CreatePerspectiveFieldOfView(FovY, aspect, znear, 5000f) in RetailChaseCamera /
ChaseCamera / FlyCamera / OrbitCamera — all currently 1.0 with ⚠️ comments).
Downstream consumers of viewProj:
| Consumer | Effect of 0.1 vs 1.0 | Texture relevance |
|---|---|---|
| Rasterization | geometry 0.1–1.0 m from the eye now draws; depth distribution shifts (D24 @ 5 m: ~1.5 µm → ~15 µm) | none direct; z-fighting would flicker, not "lose" textures |
| Frustum culls (terrain/entity/EnvCell prepare) | strictly MORE visible (near plane closer) | more batches prepared per frame → more uploads in flight → raises (a)/(c) trigger probability ← the only credible #105×#110 link found |
| PortalVisibilityBuilder flood | viewProj changes per-vertex w by near-plane row only in z-row; flood clip planes are (nx,ny,0,dw) — x,y,w-based; the flood is conformance-gated (CornerSweep_FloodIsCompleteAndMonotone) |
none |
| gl_ClipDistance regions / terrain UBO | x,y,w-based, near-independent | none |
| Doorway scissor | computed from NdcAabb, projection-independent at the box level | none |
Conclusion to verify, not assume: the most credible story is that znear=0.1 makes
close-up geometry (the wall right behind the camera, the doorframe you're brushing)
newly visible, inflating per-frame prepare/upload pressure indoors, which raises the
probability of the pre-existing #105 loss. If true: fixing #105 exonerates the near
plane entirely. The alternative (0.1 breaks texturing via a mechanism not in this table)
needs RenderDoc evidence before being believed.
5. Investigation plan (staged, evidence-first)
Phase A — attribute with the existing probe (cheap, decisive split):
- Launch with
ACDREAM_PROBE_SHELL=1(+ the always-on dat tripwires). Flip-launch until a bad run reproduces (today it was 2/3 on the 0.1 build — consider temporarily re-applying 0.1 to the working tree as the REPRO LEVER ONLY, clearly uncommitted). - On a bad run: read
[shell]lines for the affected cells.zh>0⇒ path (a)/(b): zero/never-patched handles — go to Phase C1.zh==0⇒ path (b)/(d) — go to Phase C2. - ALSO capture the user's answer: white surfaces vs invisible walls (outstanding question — invisible would point at visibility/depth instead and reshape this plan).
Phase B — settle the #110 correlation statistically (parallel, mechanical):
Alternate launches 0.1 / 1.0 (working-tree flip on RetailChaseCamera only), ≥4 runs per
arm, record texture state per run (the [shell] zh= counts make this detectable WITHOUT
user eyes if path (a)). Independence ⇒ bad runs appear in both arms. The 2026-06-09
clean-launch occurrence already proves 1.0 is not immune.
Phase C — root-cause:
- C1 (zero handles): instrument the staging→handle path: log every batch that reaches
the draw SSBO with handle==0 (entity + EnvCell sides), plus
WbMeshAdapter.Tickdrain counts andTextureCacheupload completions per frame. Find who created a batch before its texture and never back-patched. Fix = the back-patch / ordering, NOT a retry loop. - C2 (valid handles, wrong output): RenderDoc the bad frame (GPU truth): inspect binding=1 SSBO contents, handle residency, sampled texture content, and the draw state of an affected wall batch.
Phase D — close out: fix #105 root cause → flip-test again (both arms clean) →
re-land znear=0.1 (re-apply the 137b4f2 payload: 4 cameras + restore the retail
citation comments) → user re-gates the §4 corner press (wall must stay solid at the
camera) + a distance scan for any new z-shimmer (none expected; retail ships 0.1).
6. Repro notes + session-ops gotchas (cost real time today)
- Repro spot: Holtburg houses near the player's parked position (the user was trying on a house interior; exact house id unconfirmed — textures were missing across the interior). Frequency today: 2 of 3 launches on the 0.1 build.
- #107 interference: logging in while parked INDOORS wedges the player (stuck in air/wall, 3-for-3 today — filed). For THIS investigation prefer ending test sessions with the character OUTDOORS so logins are clean. If wedged: relaunch; it intermittently recovers.
- ACE session hold: graceful window close ⇒ ~3–5 s; hard kill ⇒ ~3 min of
session failed(exit 29). The launch protocol + wait loop used all day is in this session's transcript;auto-entered player modeis the in-world marker. - ⚠️ PowerShell 5.1
Get-Content/Set-ContentMANGLES UTF-8 source files (reads CP1252, writes mojibake — corrupted all four camera files today; recovered viagit checkout+ redoing edits with the Edit tool). Never bulk-edit source with PS5.1 string replace. - Tee-Object logs are UTF-16LE — Python analyzers must BOM-detect; PowerShell
Select-Stringhandles them natively. - Probes:
ACDREAM_PROBE_SHELLis heavy-ish (per-prepare dumps) — short runs. The dat tripwires are always-on and free. NEVER judge visuals underACDREAM_PROBE_FLAP.
7. What ELSE is open (do not drift into these)
Priority order (set 2026-06-10, digest carries it): this investigation (#105+#110) →
#107 indoor-login spawn wedge (physics; ACDREAM_CAPTURE_RESOLVE apparatus ready) →
#108 cellar-ascent grass sweep + #109 far-exit-door oscillation (one render
session, probe captures at their spots) → #99/A6.P4 per-cell shadow architecture
(planned phase). The §4 flood strobe is FIXED (dac8f6a, user-gated) — its conformance
gate (CornerSweep_FloodIsCompleteAndMonotone) and the corner-seal characterization
(CameraCornerSealReplayTests) must stay green through any change here.
8. Today's commit ledger (context for blame/diff archaeology)
| Commit | What |
|---|---|
682cba3 |
[clip-route] probe apparatus (outdoor flap) |
c4df241 |
outdoor full-world flap FIX (EnvCellRenderer DepthMask(false) leak → depth-clear no-op) |
df2ef7c |
flap close-out doc |
b21bb28 |
corner-seal replay — camera-penetration hypothesis REFUTED (openings, not walls) |
482b0de |
corner-seal handoff doc |
dac8f6a |
§4 flood strobe FIX (homogeneous reciprocal clip + collinear-aware dedup) — user-gated |
137b4f2 |
near plane 1.0→0.1 (retail znear) + issues #107–#109 filed |
8bd3492 |
near plane reverted to 1.0 pending #110; #110 filed |
Test baseline: App 223, Core 1377 green + 4 pre-existing #99-era failures
(DoorBugTrajectoryReplay ×2 / DoorCollisionApparatus / BSPStepUp) + 1 skip, UI 420,
Net 294. ACE on 127.0.0.1:9000, testaccount/testpassword, +Acdream.