acdream/docs/research/2026-05-25-door-bug-cdb-retail-trace-findings.md

176 lines
7.8 KiB
Markdown

# Door bug — retail cdb trace + NegPolyHit dispatch findings
2026-05-25, continuation of door-collision investigation
## TL;DR
cdb attached to retail at a Holtburg cottage door while user walked the
inside-out off-center scenario. The smoking-gun trace identified the
real collision-recording function: **`SPHEREPATH::set_neg_poly_hit`**
fired hundreds of times during the walk; `SPHEREPATH::set_collide`,
`COLLISIONINFO::set_collision_normal`, `set_sliding_normal`,
`add_object` ALL fired zero times.
In our codebase, `NegPolyHitDispatch` exists but **is never called
from any production code path** — it's dead code. The `path.NegPolyHit`
flag is therefore never set. The downstream handler in
`Transition.TransitionalInsert` was a stub that just cleared the flag.
Two-part fix attempted this session:
1. **`BSPQuery.FindCollisions` Path 5** (Contact branch) restructured
to call `NegPolyHitDispatch` when sphere 0 had a near-miss polygon
set but didn't fully penetrate (mirrors retail's `var_5c != 0` case
at `acclient_2013_pseudo_c.txt:0053a6ce-0053a6fb`).
2. **`Transition.TransitionalInsert` NegPolyHit handler** rewritten
to dispatch to `step_up + step_up_slide` (NegStepUp=true) or
record collision normal + return `Collided` (NegStepUp=false).
**Result: fix doesn't fully close the bug.** User still squeezes
through. Diagnostic `[neg-poly-dispatch]` probe shows ZERO hits in
production — the BSP Path 5 changes don't surface NegPolyHit for this
case.
## Why the fix doesn't fire
Retail's `BSPTREE::find_collisions` calls
`vtable->sphere_intersects_poly(localspace_sphere, var_78_6, var_74_6, var_70_8)`
which:
- **Returns `eax_10`**: non-zero on full sphere-vs-poly hit
- **Writes `var_5c`**: closest polygon pointer, set EVEN ON
NEAR-MISS (BSP traversal sets it when entering a leaf containing
candidate polys, regardless of intersection)
So retail records "near miss" polygons during BSP traversal. The
caller dispatches `set_neg_poly_hit(1, var_5c + 0x20)` when sphere 0
returned `eax_10 == 0` but `var_5c != 0`.
Our `SphereIntersectsPolyInternal` only sets `hitPoly` on actual
hits. Near-miss polygons are NOT recorded. So the Path 5 branch
`if (hitPoly0 is not null)` is false → no `NegPolyHitDispatch` call
→ no NegPolyHit set → no dispatch in TransitionalInsert.
## The deeper fix needed
Implement retail's "BSP traversal records closest near-miss polygon"
behavior in `SphereIntersectsPolyInternal` (or a sibling). The
function should return TWO outputs:
- `bool hit` — true if sphere fully penetrates a polygon
- `ResolvedPolygon? closestPoly` — set during traversal to the
polygon that the sphere came closest to (in the BSP node walk),
regardless of whether the full intersection test passed
This requires modifying the BSP recursion to track the "closest
considered" polygon. Retail's sphere_intersects_poly likely tracks
this as a side effect of testing each candidate polygon during the
traversal.
Once that's in place, the existing Path 5 changes + TransitionalInsert
NegPolyHit dispatch should fire correctly and produce the block.
## Second symptom flagged by user (2026-05-25 evening)
User flagged: "we get run a bit into the door as well when it blocks.
That is not retail behavior."
Over-penetration before block = our BSP detects collision AFTER the
sphere has already moved into the surface (static overlap detection)
vs retail's swept-sphere collision (predicts the t-value of first
contact along the motion path and stops the sphere at the surface).
This is the SAME ROOT MECHANISM as the squeeze-through:
sphere_intersects_poly in retail does swept collision with the
motion vector (var_44 = sphere_center - prev_center). Our
`SphereIntersectsPolyInternal` takes a `movement` parameter but the
internal poly-test logic may not actually use it for swept detection.
Verifying: read SphereIntersectsPolyInternal and check whether it
uses the `movement` vector for swept-sphere-vs-poly intersection
testing (computes the t-value where sphere first contacts the poly
along motion), or just does static overlap (sphere center +/- radius
overlaps poly plane). Retail does swept (the `var_44` in
sphere_intersects_poly is the motion delta).
Single fix needed in next session: SphereIntersectsPolyInternal needs to:
1. Implement swept-sphere-vs-poly detection (use the motion vector)
2. Record the closest-considered polygon for near-miss handling
Both feed into the existing Path 5 + TransitionalInsert dispatch
(committed today). Once that single function does its job correctly,
both symptoms close at once.
## What the cdb trace proved
| Symbol | v1 hits | v2 hits | v3 hits |
|---|---|---|---|
| `CPhysicsObj::FindObjCollisions` | 161,081 | 196,608 | 196,608 |
| `CCylSphere::collides_with_sphere` | 35,527 | — | — |
| `SPHEREPATH::set_collide` | **0** | — | — |
| `COLLISIONINFO::set_collision_normal` | — | **0** | — |
| `COLLISIONINFO::set_sliding_normal` | — | **0** | — |
| `COLLISIONINFO::add_object` | — | **0** | — |
| `BSPTREE::slide_sphere` | — | — | **0** |
| `CTransition::cliff_slide` | — | — | **0** |
| **`SPHEREPATH::set_neg_poly_hit`** | — | — | **303+ (fires)** |
| `CTransition::insert_into_cell` | — | — | 3,652 |
Retail records collisions almost exclusively via
`SPHEREPATH::set_neg_poly_hit` during normal-grounded-motion. The
COLLISIONINFO normal/sliding setters fire essentially never for
walking-into-walls scenarios. Our investigation premise was wrong;
the cdb data forced the correction.
## Apparatus + scripts committed
- `tools/cdb/door-inside-out.cdb` — v1 (set_collide check)
- `tools/cdb/door-inside-out-v2.cdb` — v2 (COLLISIONINFO family)
- `tools/cdb/door-inside-out-v3.cdb` — v3 (wide net, found
set_neg_poly_hit)
- `tools/cdb/symbol-probe.cdb` — verifies symbol resolution
## Pickup prompt for next session
```
A6.P4 door inside-out: cdb trace + NegPolyHit dispatch landed
(BSPQuery.FindCollisions Path 5 + TransitionalInsert NegPolyHit
branch) but the fix doesn't fire because our SphereIntersectsPolyInternal
doesn't record near-miss polygons. Retail's sphere_intersects_poly
sets a "closest polygon" output even on non-hits via BSP traversal
side-effect; our equivalent only sets it on full hits.
Read docs/research/2026-05-25-door-bug-cdb-retail-trace-findings.md
State both altitudes:
Currently working toward: M1.5 — Indoor world feels right
Current phase: A6.P4 door bug — implement near-miss polygon
recording in SphereIntersectsPolyInternal.
TWO SYMPTOMS to fix simultaneously (same root cause):
(a) Off-center inside-out: sphere walks (or squeezes) past door
(b) When blocked: sphere visibly penetrates the door before stopping
Both = static overlap detection without near-miss recording.
Retail uses swept-sphere-vs-poly intersection (uses motion vector
to compute t-value of first contact, stops sphere at surface)
AND records the closest near-miss polygon during BSP traversal.
First move: read SphereIntersectsPolyInternal in
src/AcDream.Core/Physics/BSPQuery.cs. Check whether the `movement`
param is actually used for swept-sphere-vs-poly testing. If not
(just static overlap), that's symptom (b). Add swept detection
and a "closestPoly" output param set on ANY polygon considered
during traversal (not just hits). That closes symptom (a) too.
Then the Path 5 branch `if (hitPoly0 is not null)` will fire on
near-miss cases, NegPolyHitDispatch will set NegPolyHit, and the
TransitionalInsert dispatch (already landed) will block the sphere
at the surface (swept-detected t-value), not after penetration.
Retail oracle: BSPTREE::find_collisions + sphere_intersects_poly
vtable call at acclient_2013_pseudo_c.txt:0053a630-0053a6fb.
Visual verification: same scenario (Holtburg cottage door,
inside-out, ~50cm off-center). Should block fully, no squeeze-through.
Outside-in should still work. Issue #98 cellar cap must still pass.
```