docs: A6.P3 #98 — new root-cause hypothesis (stale ramp contact plane)

Today's evening session ran from "harness still doesn't reproduce the
cap" → "harness reproduces it" → "wait, the cap is only a symptom, the
real cause is upstream Z drift from the contact plane never refreshing."

The breakthrough question, from the user: "we know how retail OPENs it
from above, how hard can it be to know how to open it from below?" —
which reframed the investigation away from cap-event mechanics (where
six prior attempts looked) and toward "what about our STATE is wrong
when the player is in the cellar but not on the ramp?"

The math: player at cap is 10 m away from the cellar ramp in cell-local
X, but body.ContactPlane is still the ramp's slope plane. AdjustOffset
projects forward motion along that stale slope every tick, lifting Z
by +0.201 m per tick. After enough ticks of horizontal walking, the
head sphere reaches Z=94 and bumps the cottage floor. If the contact
plane refreshed to the flat cellar floor when the player walked off
the ramp, the drift would be zero, the cap would never be reachable.

Next session's task (per the pickup prompt at the bottom of the
findings doc): (1) verify the hypothesis chronologically against the
live capture, (2) find the walkable-refresh gap in
Transition.FindEnvCollisions / SpherePath.SetWalkable, (3) cross-ref
retail's CObjCell::find_env_collisions for the per-tick contact-plane
refresh logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Erik 2026-05-24 06:03:52 +02:00
parent 7729bdcf98
commit bf6d97625c
2 changed files with 240 additions and 46 deletions

View file

@ -872,13 +872,49 @@ Two commits (`cc3afbc` → `97fec19`):
by stashing the cottage helper and reproducing the same flaky range.
Out of scope for this session; tracked as follow-up.
**Next-session move:** investigate the residual +X edge-slide divergence
in `Transition.transitional_insert` / `AdjustOffset`'s handling of a
`cn=(0,0,-1)` head-bump. Live treats it as a Z-only constraint and
slides the remaining XY motion along the cottage floor; harness blocks
the entire move vector instead. The harness's
`LiveCompare_FirstCap_ResidualXMotionDivergence_DocumentsNextInvestigation`
test gives <1s feedback per fix attempt. ~2 hours estimate.
**Evening v3 finding (2026-05-23 PM, even later) — NEW root-cause
hypothesis identified:** the cottage-floor cap is a SYMPTOM. The actual
bug is **stale ramp contact plane causing per-tick Z drift** that makes
the cap reachable in the first place.
Evidence:
- Body's contact plane at cap = ramp's plane (n=(0, 0.7190, 0.6950),
d=-69.5035) from the live capture's `bodyBefore`
- Cellar ramp's actual world XY: X∈[129.7, 131.3], Y∈[10.19, 13.09]
(computed from the cellar cell fixture's vertex data + WorldTransform)
- Player position at cap: world (141.5, 7.22, 92.74) — **10 m away**
from the ramp in cell-local X
- `AdjustOffset` projects requested motion along the contact-plane
perpendicular. Math: dot((0.0266, -0.4022, 0), (0, 0.719, 0.695))
= -0.2892 → projected = (0.0266, -0.1943, +0.2010). **+0.201 m of
Z gain per tick**, applied because the engine believes the player
is on the slope.
- Head sphere top at cap = foot Z + 1.68 = 94.42. Cottage floor at
Z=94.00. **Head sphere exceeds cottage floor by 0.42 m** → cap fires
- If the contact plane refreshed to the flat cellar floor when the
player walked off the ramp, AdjustOffset would produce zero Z gain
(no Z component in requested motion + horizontal-plane perpendicular).
No drift, no cap.
How this question surfaced: user asked "we know how retail OPENs it
from above, how hard can it be to know how to open it from below?" —
that reframing made the question "what's different about our state
when walking up vs down?" The answer: **nothing, actually — the
cottage geometry is the same. But our contact plane is wrong.** The
six prior fix attempts were all investigating the cap-event mechanics
(step-up, slope projection at the cap, edge-slide, SidesType, +X
residual). None questioned why the contact plane was the ramp at all
when the player was 10 m from the ramp.
**Next-session move:** verify the stale-contact-plane hypothesis
chronologically against the live capture (walk the JSONL records, find
the last tick the player was on the actual ramp, quantify Z drift),
then locate the walkable-refresh code path in
`Transition.FindEnvCollisions` / `SpherePath.SetWalkable` that's
supposed to detect a new walkable polygon under the sphere and
overwrite the contact plane. Retail decomp anchor:
`CObjCell::find_env_collisions`. Full pickup prompt at the bottom of
[`docs/research/2026-05-23-a6-p3-issue98-comparison-harness-findings.md`](docs/research/2026-05-23-a6-p3-issue98-comparison-harness-findings.md).
Original demo scenario (Holtburg Sewer end-to-end) is unreachable: sewer
doesn't exist on this server, and **issue #95** (portal-graph visibility
blowup) blocks any substitute dungeon. Revised M1.5 demo split into

View file

@ -13,11 +13,36 @@ documents the FIRST evidence-driven step in the saga.
## TL;DR
**Updated 2026-05-23 evening v2: apparatus convergence shipped.** The
harness now reproduces the live cottage-floor cap event bit-perfect on
the collision normal. The residual divergence is a single +X-motion
edge-slide gap; everything else round-trips. The session below covers
both arcs (root-cause identification THEN convergence).
**Updated 2026-05-23 evening v3: NEW root-cause hypothesis identified —
STALE RAMP CONTACT PLANE causes per-tick Z drift, which is what makes
the cottage-floor cap reachable in the first place.**
- Player position at cap: world (141.5, 7.2, 92.7). The cellar ramp's
actual world XY is X=[129.7, 131.3] — the player is **10 meters away
from the ramp** in cell-local space.
- Body's contact plane: ramp's plane (n=(0, 0.719, 0.695), d=-69.5035).
Stale; should be the flat cellar floor (n=(0,0,1)).
- AdjustOffset projects forward motion along that stale ramp plane.
Mathematically: requested delta (+0.0266, -0.4022, 0) → projected
(+0.0266, -0.1943, +0.2010). **+0.2010 m of Z lift per tick.**
- After enough horizontal-walking ticks, the head sphere rises to
Z=94 and hits the cottage floor's downward-facing back-face polygon.
Cap fires.
- The cap is a SYMPTOM. The root cause is the contact plane not
refreshing when the player walks off the ramp onto the flat cellar
floor. Retail must re-find the walkable plane each tick; we're
keeping the stale ramp seed.
**This explains why six prior fix attempts missed.** Step-up,
AdjustOffset projection, SidesType, edge-slide, +X residual — all
were investigating the cap event mechanics, not the upstream Z drift
that made the cap reachable. The harness convergence (Section "What
shipped 2026-05-23 evening v2") is still valuable as the deterministic
reproduction infrastructure; the new hypothesis is the **next** thing
to verify against that infrastructure.
(Sections below preserve the evening-v2 arc for context: apparatus +
cap-event reproduction.)
- **Evidence-driven apparatus shipped.** `PhysicsResolveCapture` writes one
JSON Lines record per player ResolveWithTransition call when
@ -317,52 +342,181 @@ range. All 21 issue-#98-relevant tests (12 harness + 4
---
## The stale-contact-plane finding — full evidence (2026-05-23 evening v3)
### How the question led to the answer
User asked: "We know how retail OPENs it from above, how hard can it
be to know how to open it from below?" — the implicit question being
"if walking on the cottage floor from above works fine, why doesn't
walking up from below?"
That reframed the investigation. The cottage floor is the SAME
polygon set whether viewed from above (walking on it) or below
(head-bumping it from the cellar). Retail handles both. If our cap
fires from below, what's different about our state?
Tracing the harness's `LiveCompare_FirstCap_DiagnosticDump` output
revealed:
1. **The contact plane the engine started with**: ramp's plane
`n=(0, 0.7190, 0.6950), d=-69.5035`. From the live capture's
`bodyBefore.contactPlane`.
2. **Cellar ramp's actual world position**: vertices computed from
the cellar cell's fixture put the ramp at world
X∈[129.7, 131.3], Y∈[10.19, 13.09], Z∈[92.5, 95.5]. The ramp is
in the +Y corner of the cellar, ~1.6 m wide.
3. **Player position at cap**: world (141.5, 7.22, 92.74). 10+ m
away from the ramp in X.
4. **The +Z drift math**: `AdjustOffset` projects the requested
motion onto the plane perpendicular to the contact-plane normal:
- requested = (+0.0266, -0.4022, 0)
- dot(requested, ramp normal) = 0·0.0266 + 0.719·(-0.4022) +
0.695·0 = -0.2892
- projected = requested - (-0.2892)·rampNormal =
(+0.0266, -0.1943, +0.2010)
- **+0.2010 m of Z gain per tick**, applied because the contact
plane the engine believes the player is on is the slope.
5. **The cap math**: foot Z at cap = 92.74. Head sphere center at
foot Z + sphereHeight 1.2 = 93.94. Head sphere top at
foot Z + 1.68 = 94.42. **Cottage floor at world Z=94.00.** Head
sphere top exceeds cottage floor by 0.42 m → cap fires from
below.
If the contact plane were the flat cellar floor (n=(0,0,1) at
Z=90.95) instead of the ramp, AdjustOffset's projection would
produce zero Z gain (requested motion has no Z component, projection
onto flat-floor plane preserves XY). No drift, no cap.
### Why this fits the user-facing bug
- "Stuck climbing cellar" — the player walks forward, accumulates Z,
bumps cottage floor, can't progress. Matches what the user sees.
- "Pure jump in cellar caps at same Z" — jumping doesn't refresh the
contact plane either. Drift continues. Matches.
- "Six prior fix attempts failed" — all attempted to fix the CAP
mechanics (step-up, slope projection at the cap, edge-slide). None
questioned why the contact plane was the ramp at all.
### What still needs verification (next session's task)
1. **Chronological evidence**: walk the live capture from the start of
the cellar session. When did the player last stand on the actual
ramp? Does `bodyBefore.contactPlane` persist as the ramp's plane
across many ticks of horizontal walking? Quantify the cumulative
Z drift.
2. **The walkable-refresh gap**: where in
`Transition.FindEnvCollisions` / `SpherePath.SetWalkable` /
related is the contact plane supposed to be refreshed when the
sphere is over a different walkable polygon? Retail's
`CObjCell::find_env_collisions` is the decomp anchor — find the
path that detects a NEW walkable and overwrites the contact
plane, and find where our engine skips that.
3. **Retail cdb cross-check** (optional, definitive): attach cdb to a
running retail acclient, walk to a cottage cellar, log the
contact plane each tick. If retail's contact plane refreshes
to (0,0,1) when the player walks off the ramp, hypothesis
confirmed.
---
## Pickup prompt for next session
```
A6.P3 #98 — apparatus convergence landed, residual X-motion divergence
is next.
A6.P3 #98 — apparatus convergence landed, NEW root-cause hypothesis
(stale ramp contact plane) needs verification.
Read FIRST:
Read FIRST (in order, ~15 min):
1. docs/research/2026-05-23-a6-p3-issue98-comparison-harness-findings.md
(this doc — particularly the "What shipped 2026-05-23 evening v2"
and "The residual divergence" sections)
2. tests/AcDream.Core.Tests/Physics/CellarUpTrajectoryReplayTests.cs
(especially the two LiveCompare_FirstCap_* tests and the
RegisterCottageGfxObj helper)
3. CLAUDE.md "Current A6 phase" block
— start with TL;DR (evening v3 update at top), then the section
"The stale-contact-plane finding — full evidence" near the bottom.
Skip the middle sections (evening v1 + v2 arcs) unless context is
needed.
2. CLAUDE.md "Current A6 phase" block — look for the "Evening v3
finding" paragraph.
3. tests/AcDream.Core.Tests/Physics/CellarUpTrajectoryReplayTests.cs
— the RegisterCottageGfxObj helper + 2 LiveCompare_FirstCap_*
tests are what you'll iterate against.
State both altitudes:
State both altitudes (one sentence each):
Currently working toward: M1.5 — Indoor world feels right
Current phase: A6.P3 — apparatus convergence shipped. Harness now
reproduces the live cottage-floor cap event (cn=(0,0,-1) round-trips
bit-perfect). Residual: +0.0266 m of +X motion lost in the harness's
post-cap slide where live preserves it.
Current phase: A6.P3 — apparatus convergence shipped (cap event
reproduces bit-perfect). New root-cause hypothesis: stale ramp
contact plane causes per-tick Z drift that makes the cap reachable.
Needs verification.
Two concrete next moves:
What was shipped today (3 commits — DO NOT REDO):
- cc3afbc: GfxObj dump infrastructure (ACDREAM_DUMP_GFXOBJS)
- 97fec19: Harness reproduces cottage-floor cap (RegisterCottageGfxObj)
- 7729bdc + (this commit): findings doc + CLAUDE.md updates
(A) Investigate the +X edge-slide divergence in the harness. The
LiveCompare_FirstCap_ResidualXMotionDivergence_DocumentsNextInvestigation
test currently passes asserting the divergence; flipping it should
drive the investigation. Likely target: Transition.transitional_insert
/ AdjustOffset's handling of a cn=(0,0,-1) head-bump — live treats
it as Z-only constraint and edge-slides the remaining XY motion;
harness blocks all motion. Decomp anchor: acclient_2013_pseudo_c.txt
in the find_obj_collisions → adjust_sphere_to_plane chain. ~2 hours
estimate.
The hypothesis with full math:
- Body's contact plane = ramp's plane (n=(0,0.719,0.695), d=-69.5035)
- Player position at cap = world (141.5, 7.22, 92.74)
- Cellar ramp's actual world XY = X∈[129.7, 131.3] — 10m from player
- AdjustOffset projects requested move along contact-plane perpendicular
- Per-tick Z gain ≈ 0.201m from slope projection on STALE ramp plane
- Accumulates over ticks → head sphere reaches Z=94 → bumps cottage
floor → cap fires
- If contact plane refreshed to flat cellar floor (n=(0,0,1)) when
player walks off ramp, no Z drift, no cap
(B) Attach cdb to retail at the cottage ramp-top, trace the BSP queries,
compare polygon-by-polygon what retail finds vs what acdream finds.
Authoritative for the "how does retail differ?" question but
larger scope (~half day setup + capture).
Concrete next moves (in order):
(A) is recommended — the harness now isolates this divergence to a
specific known XY slide path; the test gives <1s feedback per fix
attempt. (B) becomes valuable if (A) hypothesis chase stalls.
(1) **Verify the hypothesis chronologically.** Walk
a6-issue98-resolve-capture-2.jsonl (or the cottage capture
fixture's full file) from the start. Find when the player last
stood on the actual ramp (within world X∈[129.7, 131.3], Y∈[10.19,
13.09]). Quantify: how many ticks does the body's contact plane
persist as the ramp's plane while the player walks horizontally
away? Compute the cumulative Z drift. Should match observed Z=92.74
at cap if the hypothesis holds. (Probably 30 min PowerShell jq.)
CLAUDE.md rules apply throughout. NO speculative fixes — the saga
already converted from speculation to evidence-driven; keep it that
way.
(2) **Locate the walkable-refresh code path.** In
src/AcDream.Core/Physics/TransitionTypes.cs, search for where
Transition.FindEnvCollisions or SpherePath.SetWalkable is supposed
to detect a new walkable polygon under the sphere and overwrite
the contact plane. The fix likely lives at the call site that
EITHER fails to fire OR fires but doesn't replace the existing
contact plane.
(3) **Cross-ref retail decomp.** acclient_2013_pseudo_c.txt's
CObjCell::find_env_collisions + the walkable-detection chain.
Find the path where retail unconditionally replaces
contact_plane when a new walkable is found. Quote the line
numbers in the fix commit.
(4) **Implement the fix + verify against harness.** The harness's
LiveCompare_FirstCap_HarnessReproducesCottageFloorCapNormal test
currently PASSES asserting the cap reproduces. After the fix,
if the contact plane refreshes correctly, the cap should NOT fire
(no Z drift to make it reachable). The test should start FAILING
— that's the signal the fix works.
(5) **Visual verification (user-side).** Launch acdream live, walk
into a Holtburg cottage, down to the cellar, then back up. The
user-facing bug should resolve if the hypothesis is correct.
Decomp grep targets:
- CObjCell::find_env_collisions
- CPhysicsObj::find_object_collisions
- CTransition::find_walkable
- CSpherePath::set_walkable / walkable_hits_sphere
- OBJECTINFO::object → contact_plane writes
CLAUDE.md rules apply throughout:
- NO speculative fixes — the saga's converted to evidence-driven.
Verify hypothesis with chronological capture BEFORE coding.
- Visual verification belongs to the user.
- If the chronological verification (step 1) shows the contact
plane is NOT actually stale across many ticks, the hypothesis is
wrong — pivot to retail cdb trace (definitive oracle).
Out-of-scope but observed: pre-existing test suite has 819 failures
across runs of the same code due to static-state leakage between test
@ -370,6 +524,10 @@ classes (PhysicsResolveCapture, PhysicsDiagnostics statics). Targeted
issue-#98 tests pass deterministically in isolation. Don't touch the
flakiness this session; it's a separate investigation.
Test baseline: harness's 12 CellarUpTrajectoryReplayTests + 4
GfxObjDumpRoundTripTests + 1 new PhysicsDiagnosticsTests + 4
CellDumpRoundTripTests all pass in isolation. Maintain.
Test baseline: 1178 + 8 pre-existing failures (serial run).
Maintain throughout. The previously-failing
LiveCompare_FirstCap_HarnessMissesCottageFloorBecauseCottageGfxObjNotRegistered