# Iter 4 — CPhysicsObj sweep design (DRAFT, NOT YET IMPLEMENTED) ## Goal Periodically destroy abandoned CPhysicsObj instances to recover the residual leak documented in §6.1 of REPORT.md. **Highest-risk patch class** (physics-state mutation, same risk profile as v13 which killed Larsson at 98 min). Long soak per change is mandatory. ## What iter 3 told us After 13 minutes on Unkle Leo (PID 16044), a typical scan shows: ``` total=971 no_parent=546 no_cell=278 orphan_hash=697 both=234 triple=111 ``` So ~11% of all CPhysicsObj instances pass the strict triple predicate. On a fresh client triple count is ~100 (startup residual). Growth is +1-2 candidates per minute during normal play. Strict-candidate sample dumps confirm: - `parent`, `cell`, `hash_next` all NULL ✓ - `part_array` non-NULL (heap allocation that should be freed) - `shadow_objects.data` non-NULL (heap allocation that should be freed) - `state` has small bits set (e.g., 0x00000414 — normal active flags) This matches the v17 owner-vtable diagnostic's "abandoned but heap state still allocated" pattern. ## Candidate destruction call The engine already has correct teardown: ```c // EoR 0x005145D0 — CPhysicsObj::Destroy void __thiscall CPhysicsObj::Destroy(CPhysicsObj* this); ``` Per the v17 owner-diag, `CPhysicsObj::Destroy` correctly tears down all owned heap state (`CPartArray::DestroyParts`, etc.). The leak is that it's never **called** on these abandoned objects. After Destroy, the CPhysicsObj itself (~408 bytes) needs to be freed via `operator delete`. ## Predicate hardening (BEFORE we destroy anything) The triple predicate may not be conservative enough. Additional checks before destroy: 1. **`update_time` is stale** — field at +0xD4 is a long double (timestamp). If less than `now() - 60s`, the object hasn't been touched in a minute. Compare via TimeGetTime() or similar global. 2. **`state` is not "currently active"** — need to identify which bits indicate "being processed." For now, skip if state has any high bit set. 3. **`weenie_obj == NULL`** — at +0x?? (need to verify offset). If a weenie-object still owns this physobj, the engine considers it alive even if other tracking is gone. 4. **`movement_manager == NULL`** — at +0xC4 per acclient.h (LongHashData base 12 + ... + 0xB8 should be it). If there's an active mover, the object is in flight. 5. **`hooks == NULL`** — at +0xE? — animation hooks pending. The candidate must pass ALL these AND the iter-3 triple predicate. Stricter than iter 3. ## Safety protocol 1. **Throttle:** max 1 destruction per scan cycle (5 min). Even if 100 candidates qualify, destroy ONE per scan. Surface latent bugs slowly. 2. **Sample-first:** for the first 2 hours, LOG candidate addresses but do NOT destroy. Verify the candidates stay candidates over multiple scans (i.e., they're not transient). 3. **Per-scan budget:** if a destruction succeeds, log address + pre-destroy field dump. If process crashes after, we have the last destroyed object for forensics. 4. **Kill switch:** check `LEAKFIX_NO_SWEEP=1` env var at scan start. If set, skip destruction. Default ON (=destroy) once code lands. 5. **Initial test target:** Unkle Leo (current designated guinea pig per CLAUDE.md). One client only. 4-hour soak before declaring safe. 6. **Failure recovery:** if Unkle Leo crashes within 1 hour of destruction logic enabling, set the env var, restart with sweep disabled, mark iter-4 as failed in memory, do not retry without redesign. ## Implementation outline (when ready) ```cpp struct CPhysicsObj { void* vtable; // +0x00 void* hash_next; // +0x04 uint32_t id; // +0x08 void* netblob_list; // +0x0C void* part_array; // +0x10 // ... 12 bytes of player_vector/distance/CYpt void* sound_table; // +0x28 uint32_t pad_exam; // +0x2C void* script_manager; // +0x30 void* physics_script; // +0x34 uint32_t default_script; // +0x38 float script_intensity;// +0x3C void* parent; // +0x40 void* children; // +0x44 char position[72]; // +0x48 void* cell; // +0x90 uint32_t num_shadow; // +0x94 char shadow_arr[16]; // +0x98 — DArray uint32_t state; // +0xA8 uint32_t transient_state; // +0xAC // ... floats void* movement_manager;// +0xC4 void* position_manager;// +0xC8 int last_move_auto; // +0xCC int jumped_frame; // +0xD0 double update_time; // +0xD4 (8 bytes) // ... void* weenie_obj; // +0x?? TBD }; typedef void (__fastcall *destroy_fn_t)(CPhysicsObj* self, void* edx); constexpr destroy_fn_t CPHYSICSOBJ_DESTROY = (destroy_fn_t)0x005145D0; constexpr void* OP_DELETE = (void*)0x005DF15E; bool is_truly_abandoned(CPhysicsObj* p) { if (p->parent) return false; if (p->cell) return false; if (p->hash_next) return false; if (p->movement_manager) return false; // state mask: bits 0..15 are flags we tolerate; high bits suggest // active processing if ((p->state & 0xFFFF0000) != 0) return false; if (p->weenie_obj) return false; // need offset verified // update_time stale check double now = get_engine_time(); // need to find this — e.g., 0x???? if (now - p->update_time < 60.0) return false; return true; } void sweep_once() { if (env_skip_sweep()) return; // Walk all CPhysicsObj instances... CPhysicsObj* victim = nullptr; for (each CPhysicsObj p) { if (is_truly_abandoned(p)) { victim = p; break; } // ONLY ONE } if (!victim) return; logf("SWEEP destroying CPhysicsObj @ 0x%p (state=0x%08x)", victim, victim->state); dump_physobj((uintptr_t)victim); // pre-destroy forensics __try { CPHYSICSOBJ_DESTROY(victim, 0); ((void(__fastcall*)(void*, void*))OP_DELETE)(victim, 0); logf("SWEEP ok"); } __except (EXCEPTION_EXECUTE_HANDLER) { logf("SWEEP exception — abandoning sweep this scan"); } } ``` ## Known unknowns to resolve before coding 1. **Engine time global address** — for the stale-`update_time` check 2. **`weenie_obj` offset** — need to read acclient.h carefully or sample dumps 3. **State-bit meanings** — which bits indicate "in active processing" 4. **Does `operator delete` of a CPhysicsObj that already had Destroy() called work?** — Destroy probably tears down state but may not free `this`. 5. **What if the object is mid-iteration in some other code?** — destroying it would leave dangling iterators. Need to check the render loop / update loop doesn't have outstanding refs. These are NOT minor — getting any wrong = v13-class crash. ## Recommended path 1. **Iter 4a (logging-only):** add the harder predicates (`movement_manager`, `weenie_obj`, `update_time` stale, state mask). Log candidate count passing the harder set. Compare to iter-3 triple count. If much smaller, predicates are stricter and we have higher confidence. 2. **Iter 4b (sample-first):** dump 3 candidates that pass the hard set every scan. Verify they look genuinely abandoned across multiple scans. 3. **Iter 4c (destroy 1 per hour, not per scan):** initial mutation test at the slowest possible rate. Soak 8h+ before declaring safe. 4. **Iter 4d (destroy N per scan, where N = current candidate count):** only after 4c passes 24h soak. This is a 3-day minimum process if everything goes right. If a v13-class crash happens anywhere, restart from 4a with a redesigned predicate. ## Decision gate Per the soak data on Unkle Leo: - triple candidate growth: ~5/5min = 1/min - After 1 hour without sweep: ~60 abandoned physobjs added - After 24h: ~1440 abandoned - At ~1KB heap state per physobj: ~1.4 MB/day from this exact predicate Compare to the agent's CObjCell-family estimate of 7-8 MB/hr. The triple subset is much smaller than the agent's total. The harder predicates will be smaller still. **Question for the decision-maker (the human):** is recovering ~1-2 MB/day per active client worth a v13-class risk? Given the project's 5-day soak target is already met without iter 4, **the honest answer is probably NO** — iter 4 buys marginal improvement at meaningful risk. If the goal is 10-day uptime for heavy looters, iter 4 might help but the residual is dominated by other classes (CObjCell, gm*UI recycle pool, palette outside v3b's scope). ## Recommendation **Defer iter 4 indefinitely.** Iter 3 instrumentation gives us data to argue for or against. The DLL form's basic patches (v3b/v5/v11/v14) are what produces the soak win. Adding sweep is high-risk, low-marginal-reward. Keep this document for future reference if a future analyst decides the residual leak warrants the risk.