# Retail AC Memory Leak Hunt > **Status: COMPLETE 2026-05-22.** Five bugs found and patched in the > retail AC client. Controlled fleet soak showed the unpatched control > died at 26h with palette exhaustion; all 14 patched clients survived > past that point and ran for ≥5-day uptime. The residual ~15 MB/h > growth was traced to d3d9.dll's internal slab allocator and is > unfixable from outside d3d9. > > If you just want to install: drop `dll/leakfix/dist/leakfix.dll` > into your AC directory and run > `python tools/install_leakfix.py "C:\path\to\AC"`. The installer > patches `acclient.exe`'s import table to load `leakfix.dll` at > startup. Idempotent — safe to re-run. --- ## What ships | Patch | Bug | One-line fix | |---|---|---| | **v3b** | Palette refcount over-increment in `makeModifiedPalette` | NOP the `inc [eax+0x24]` at two sites | | **v5** | `RenderSurface::PurgeResource` is a no-op stub | Override vtable slot 2 to call `Destroy()` for real | | **v11** | Two dangling-pointer dereferences in `delete_contents` + `~GXTri3Mesh` | NULL-check guards | | **v14** | `CEnvCell::Destroy` leaks the `ClipPlaneList` (just zeros the count) | Replace the 18-byte buggy block with a JMP to a thunk that actually frees the list | | **v22** | Server-driven AV in the unpacker function at `0x00526A50` (5-client mass crash 2026-05-21 09:00) | Wrap the function in `__try / __except`, return 0 on AV (which the engine already handles as the size-check-failure code path) | All five plus a crash-handler ship in `leakfix.dll`. Patches are applied 30 seconds after process start (deferred so Decal/UB win their own init race first). Crash handler is installed immediately so any crashes during the 30s window are still captured. ## Patch pseudo-code ### v3b — palette refcount over-increment The engine's palette-cache hit path increments the cached entry's refcount **twice** (once in the cache lookup, once in the constructor that wraps it). Result: refcount grows monotonically; nothing ever hits zero; palettes accumulate until the 32-bit address space exhausts (~26h on heavy-loot clients). ```asm ; at 0x0053EFFE (and 0x0053F19C, the sibling overload) ; before patch: after patch: ; ff 40 24 inc dword ptr [eax+0x24] 90 90 90 nop nop nop ``` ```c // effect, expressed in C: // before: refcount++ twice per cache hit // after: refcount++ once per cache hit (the outer increment is removed) ``` ### v5 — RenderSurface PurgeResource override `RenderSurface`'s and `RenderTexture`'s `PurgeResource` virtual slot points at `0x004154A0`, which is `mov al, 1; ret` — a no-op stub. When the resource manager's purge sweep walks `s_Resources` and calls `PurgeResource()` on each entry, the call returns "1 = purged" but the resource's D3D handle + heap state is never touched. Result: purged-shell accumulation in `s_Resources`. ```c // before — at slot 2 of the RenderSurface vtable (0x0079A684): // PurgeResource = noop_stub; // 0x004154A0 // int noop_stub() { return 1; } // // after — slot 2 redirected to our thunk in leakfix.dll: int purge_rendersurface_thunk(RenderSurface* self) { RenderSurface::Destroy(self); // real cleanup return 1; // engine marks entry purged } // same fix mirrored to RenderTexture slot 2 (0x0079C1A0). ``` ### v11 — two dangling-pointer crash guards Two places where the engine dereferences a pointer that's been freed elsewhere. Both manifest as AVs that take the process down. **Site 1** — `delete_contents` hash walk (`0x00587126`): The loop falls through into a dereference of an already-freed bucket node when the bucket chain was rebuilt mid-walk. Fix: retarget the JMP so the freed-bucket branch jumps to the epilogue, skipping the deref. ```asm ; before: eb 07 jmp +0x07 ; into the deref ; after: eb 42 jmp +0x42 ; into the epilogue (skip deref) ``` **Site 2** — `~GXTri3Mesh` slot 0 deref (`0x005E565D`): Destructor of `GXTri3Mesh` reads its slot[0] *then* zeros it. If slot[0] is stale (some other path already freed it), the deref AVs. Fix: reorder so we zero first; never deref a slot we can't trust. ```asm ; before: after: ; 8B 08 mov ecx, [eax] 89 5E 08 mov [esi+8], ebx ; zero first ; 50 push eax 90 ... 90 nop x6 ; skip deref + call ; FF 51 08 call [ecx+8] ; 89 5E 08 mov [esi+8], ebx ``` ### v14 — CEnvCell::ClipPlaneList leak `CEnvCell::Destroy` contains an 18-byte cleanup block that **only zeros `cplane_num`** — never frees the underlying `ClipPlaneList` object hanging off `[this+0xDC]`. Every cell unload leaks one of these. Replace the broken block with a `JMP` to a thunk in leakfix.dll that does the real cleanup: ```c // thunk pseudo-code: void v14_clipplane_cleanup_thunk(CEnvCell* self) { ClipPlaneListWrapper* outer = self->cplane_wrapper; // [esi+0xDC] if (outer) { ClipPlaneList* inner = outer->inner; // [outer+0x0] if (inner) { inner->~ClipPlaneList(); operator delete(inner); } operator delete[](outer); self->cplane_wrapper = nullptr; } // jump back to V14_RESUME_VA (just past the original 18-byte block) } ``` ### v22 — unpacker stale-pointer SEH guard A small inline unpacker at `0x00526A50` pulls 4 DWORDs from `arg1->buffer`. On 2026-05-21 the server fed five clients simultaneously a buffer pointing into freed/kernel memory; all five AV'd on the 4th deref. The engine *already* has a code path for "buffer too small / unpack failed" (line 1 of the function checks a size field and returns 0). We just wrap the whole function body in SEH and route AVs to that same return-0 path. ```c // 1. Copy the original 73 bytes of the function to executable memory. // 2. Patch the original entry with JMP rel32 to our wrapper. int v22_unpacker_wrapper(this, arg1, count) { __try { return original_copy(this, arg1, count); // run the real unpacker } __except (EXCEPTION_EXECUTE_HANDLER) { // log + return 0 (engine treats this as size-check failure) return 0; } } ``` ## Install ```powershell # 1. Copy leakfix.dll into your AC directory Copy-Item .\dll\leakfix\build\leakfix.dll "C:\Turbine\Asheron's Call\" # 2. Patch acclient.exe to import leakfix.dll python tools\install_leakfix.py "C:\Turbine\Asheron's Call" # 3. Verify python tools\install_leakfix.py "C:\Turbine\Asheron's Call" verify ``` The installer adds a `.limport` PE section to acclient.exe containing the rebuilt import table. It backs up the original to `acclient.exe.bare_original` on first run, and is idempotent. ## Roll back ```powershell Copy-Item "C:\Turbine\Asheron's Call\acclient.exe.bare_original" ` "C:\Turbine\Asheron's Call\acclient.exe" -Force Remove-Item "C:\Turbine\Asheron's Call\leakfix.dll" -ErrorAction Ignore ``` ## Files - `dll/leakfix/src/` — DLL source (C++ with inline asm for the naked thunks) - `dll/leakfix/dist/leakfix.dll` — current production build (117 KB) - `dll/leakfix/build.bat` — build script (VS 2022 BuildTools required) - `tools/install_leakfix.py` — patches acclient.exe to import leakfix.dll - `tools/check_acclient_imports.py` — verify import table contains leakfix.dll - `references/` — symbol table, pseudo-C, header for the 2013 client (PDB-backed) The rest of this document is the original VM operator brief that drove the investigation. Preserved for context but no longer operationally relevant — the hunt is done. --- # Retail AC Memory Leak Hunt — VM Operator Brief **You are picking this up cold on a freshly-provisioned Windows VM.** This document is your full mission brief. Read it end-to-end before running anything, then drive the work autonomously, using `ScheduleWakeup` (Claude Code) to pace long-running operations between your active turns. --- ## 1. Mission Find and patch a memory leak in the retail Asheron's Call client. The production symptom is a hard crash after ~4–5 days of continuous play on the **End-of-Retail (EoR, ~Jan 2017) client**. We don't have symbols for that binary — but we have **full PDB symbols for the Sept 2013 v11.4186 client**, which almost certainly carries the same leak (AC was in pure maintenance mode 2013→2017, very little net new code). **The hunt happens on the 2013 client (symbolized).** **The patch ships against the EoR client (via BinDiff-forward).** ### What "done" looks like 1. A specific function in the 2013 client is identified as the leak source, with evidence: monotonic UMDH growth across multiple snapshot diffs attributed to that function's call stack. 2. The corresponding function in the EoR client is located via BinDiff (this step happens on the **host machine**, not the VM — the BNDB files live there). 3. A DLL-injection patch is built that hooks the EoR function and plugs the leak (typically: adds a missing `delete`/`Release`/decref on a known path). 4. A 5+ day soak on EoR with the patch installed completes without the OOM crash that reproduces unpatched in the same window. ### Hard scope boundary This is a self-contained side quest. **Do not** expand it into a general retail-instrumentation framework, a fork of the controller DLL into a fully-featured bot, a parallel acdream feature, or "while I'm here" refactors of the AC2D/Mosswart tooling. Find leak → patch → validate → ship → done. If you catch yourself reaching for adjacent work, stop and re-read this paragraph. --- ## 2. Why this works (assumptions you can rely on) - **Compiler & toolchain stability.** 2013 and EoR were both built with the same VC++ family on the same Turbine build farm. Binary structure is highly similar. - **Code stability.** AC went into maintenance after Throne of Destiny (2005) and stayed there. Most of the codebase did not change meaningfully between 2013 and EoR. A leak severe enough to crash in 4–5 days has almost certainly been present for many years. - **PDB → BinDiff path is mature.** `BinDiff` and `Diaphora` routinely achieve 80–95% function-match rates across related VC++ binaries. Once you identify the leaking function in 2013 (with name), porting the symbol forward to EoR is signature-scan-able. ### What you're betting on, and the fallback - **Primary bet:** the leak repros on the 2013 client. UMDH on the 2013 client + activity bot reveals it within hours-to-days. You identify a named function, hand the name to the host for BinDiff, receive the EoR signature back, build the patch DLL, validate. - **Fallback:** the leak does NOT repro on 2013 — i.e. it was introduced after Sept 2013. In that case, you fall back to hunting on the EoR client without symbols, using BinDiff-transferred names for whatever functions match the 2013 codebase. This is slower but still feasible. The primary-vs-fallback determination is **Phase 1 Decision Gate** below. --- ## 3. Package contents ``` leak-hunt-vm-2026-05-12/ ├── README.md ← you are here ├── MANIFEST.md ← list of out-of-repo files copied in ├── CLAUDE.md ← VM-side project rules (persistent) ├── templates/ │ ├── supervisor.ps1 ← skeleton — start ACE, start client, snapshot loop │ ├── snapshot.ps1 ← UMDH single-shot │ ├── activity-phases.json ← phase schedule template │ ├── login.ahk ← AutoHotkey login skeleton │ └── trace.cdb ← cdb scripting template ├── tools/ │ ├── check_exe_pdb.py ← verify binary ↔ PDB GUID match │ ├── dump_pdb_info.py ← PDB metadata │ └── pdb_extract.py ← regenerate symbols.json if needed ├── pdb/ │ └── acclient.pdb ← (29 MB, copied per MANIFEST) └── references/ ├── symbols.json ← 18,366 named functions + addresses (grep-friendly) ├── types.json ← 5,371 struct/class type definitions ├── acclient.h ← verbatim retail header structs └── acclient_2013_pseudo_c.txt ← 64 MB symbolized Binary Ninja pseudo-C ``` The Python tools are stdlib-only (no pip). Everything else is data. --- ## 4. What you need on the VM (one-time, before starting) If any of these is missing, **ask the user before guessing**. | Component | Where | Notes | |---|---|---| | Retail AC client (2013 v11.4186) | `C:\Turbine\Asheron's Call\` | Standard install path. Verify match with `check_exe_pdb.py` before any other work. The `_NT_SYMBOL_PATH` must include `pdb/`. | | Retail AC dat files | inside the install | `client_portal.dat`, `client_cell_1.dat`, `client_highres.dat`, `client_local_English.dat` | | ACE server | `127.0.0.1:9000` on VM | Use ACEmulator from github.com/ACEmulator/ACE. Same config as user's dev box. Confirm it accepts logins before continuing. | | Test character | on the VM's ACE | Suggested name: `+Leakhunt`. GM-marker `+` so debug commands are available. | | Windows Debugging Tools | Microsoft Store WinDbg or Win10/11 SDK | Need `cdb.exe`, `umdh.exe`, `gflags.exe`. 32-bit (`x86`) versions — `acclient.exe` is 32-bit. | | AutoHotkey v2 | autohotkey.com | For login automation. v2 only — templates assume v2 syntax. | | Sysinternals `procdump` | sysinternals.com | Crash-dump capture. | | MinHook (optional, for patch DLL) | github.com/TsudaKageyu/minhook | Only needed at Phase 8. Defer. | | Shared folder or mounted drive | `Z:\` or similar | For passing snapshots back to host. Configure at VM-setup time. | --- ## 5. Configuration questions to ask the user at session start **Ask these first, before running anything.** They materially affect the harness. 1. **Where is ACE running** — same VM (recommended; snapshot-clean) or on the host with VM networking through to it? Default assumption: same VM. 2. **What's the AC install path** if it's not the standard `C:\Turbine\Asheron's Call\`? 3. **Output flow** — shared folder path? Or push artifacts to a git branch (e.g. `leak-hunt-vm/2026-05-12`)? Default: shared folder to `Z:\leak-hunt\` on host. 4. **Test character name** on the VM ACE? Default: `+Leakhunt`. 5. **VM specs** — RAM and core count? (Affects whether to enable gflags+UST from the start, which costs ~20–30% perf.) 6. **EoR binary location on host** — confirm the user has it at `C:\Users\erikn\source\repos\acdream\refs\acclient-eor-2024-09-11.bndb` (Binary Ninja db). This isn't needed on the VM but is critical for Phase 7 BinDiff on the host. 7. **Wake-up cadence preference** — do they want you to use `ScheduleWakeup` for hours-long gaps, or stay continuously active? Default: ScheduleWakeup for any gap > 30 min. Save the user's answers as memory entries before proceeding past Phase 0 so a future session can pick up cold. --- ## 6. Phased plan Each phase has a **goal**, **commands**, **decision gate**, and **estimated time**. Don't skip ahead. Don't run multiple phases in parallel until Phase 4. ### Phase 0 — Verify the bench (target: 30 min) **Goal:** prove the environment can launch AC, log in, and observe memory. 1. `py tools/check_exe_pdb.py "C:\Turbine\Asheron's Call\acclient.exe"`. Expect: `=== MATCH: this exe pairs with our acclient.pdb ===`. If MISMATCH → stop, ask the user which build they installed. 2. `py tools/dump_pdb_info.py pdb/acclient.pdb`. Confirm GUID `9e847e2f-777c-4bd9-886c-22256bb87f32`, age 1. 3. Start ACE locally (`dotnet run` in the ACE checkout, or `ACE.exe` if pre-built). Confirm it listens on `127.0.0.1:9000`. 4. Manually launch AC, log in with the test character, walk one step, log out. **This proves the bench works before you add instrumentation.** 5. Take a clean Hyper-V / VMware snapshot named `bench-verified`. The supervisor will revert to this before each run. **Decision gate:** can you launch, log in, walk, log out, clean? If no, fix this before anything else. If yes, proceed. --- ### Phase 1 — Idle baseline + decide hunt platform (target: 4 hours) **Goal:** does the leak reproduce on the 2013 client when the player sits at the lifestone doing nothing? If yes, primary plan; if no, Phase 2 will find the right activity profile. 1. Enable heap allocation tagging: `gflags /i acclient.exe +ust`. This is registry-set; survives reboots. (Disable later with `gflags /i acclient.exe -ust`.) 2. Set `_NT_SYMBOL_PATH=`. 3. Launch AC via `templates/supervisor.ps1` (which sets env and spawns the process). Log in manually OR via `templates/login.ahk`. 4. Walk to a quiet spot (lifestone interior, away from spawn). Sit. 5. `templates/snapshot.ps1 -ProcessId -Out snap_001.txt` immediately. Wait 30 min. Take `snap_002`. Repeat for at least 4 hours (8 snapshots). 6. `umdh -d snap_001 snap_008 -f:diff_idle.txt`. Read top 20 growing stacks. Save the diff to `Z:\leak-hunt\phase1\`. **Decision gate:** - **Total committed memory grew >50 MB over 4h idle?** Leak repros at idle. Skip Phase 2, jump to Phase 4 (long-soak idle). - **Total committed grew 5–50 MB?** Leak may need amplification. Proceed to Phase 2. - **Total committed grew <5 MB?** Leak is activity-specific or doesn't exist on 2013. Proceed to Phase 2. - **Memory dropped or oscillates around 0?** No leak signal at idle. Phase 2 is where you'll find it (or won't). Record the baseline growth-rate number in memory: `leak_hunt_phase1_baseline_mb_per_hour`. --- ### Phase 2 — Activity-phase characterization (target: 1–2 days) **Goal:** find which player activity causes the leak. The bot is not yet built — you drive this manually with the activity-phase template running as an AHK macro, or by playing 30-min phases yourself if no bot is available. The five canonical phases (see `templates/activity-phases.json`): 1. **idle** — stand at lifestone, no input 2. **wander** — walk a fixed route around Holtburg 3. **chat** — spam say/tell/global chat 4. **target-cycle** — Tab through nearby NPCs/mobs, no combat 5. **ui-cycle** — open/close inventory, character pane, spells **Procedure per session:** 1. Start fresh from `bench-verified` snapshot. 2. Run a single phase for 1 hour with snapshots every 15 min. 3. `umdh -d` diff the snapshot pair for that phase. 4. Record growth-rate per phase to memory. 5. Repeat for each phase, single VM, single phase per run. (If user has authorized multiple parallel VMs, run different phases simultaneously instead of sequentially.) **Decision gate:** rank phases by growth rate. The top phase is your target for Phase 4 amplification. Save ranking to memory. --- ### Phase 3 — Controller DLL (target: 1–2 days) **Goal:** build a small DLL that drives the leaking phase deterministically and reproducibly, faster than a human can. **Build approach:** - C++ DLL, 32-bit, compiled against Visual Studio Build Tools. - MinHook for function hooking. - LoadLibrary'd into `acclient.exe` via a small launcher EXE (CreateProcess SUSPENDED → WriteProcessMemory of LoadLibrary trampoline → ResumeThread). Standard injection pattern. - Hook a frame-loop function from `symbols.json` — search `references/symbols.json` for `CGameLoop`, `WorldFilter`, `Tick`, `ProcessFrame` and pick the highest-frequency one with stable signature. - Call retail functions directly via PDB-resolved addresses. Examples: `CPhysicsObj::set_velocity`, `CChatManager::SendSay`, `CPlayerSystem::SelectTarget`. These take a `this` pointer in `ecx` (thiscall) — you'll need a small asm trampoline or use `__thiscall` calling-convention helpers. **The bot's job:** - Drive the top-ranked Phase 2 activity continuously. - Emit a heartbeat to a log file every 30s so the supervisor can detect wedging. - Auto-restart self-position-watchdog: if `CPhysicsObj::position` hasn't changed in 5 min during a movement phase, signal the supervisor to revert and retry. **Reuse opportunity:** the user maintains MosswartMassacre and MosswartOverlord — both are AC client DLL-injection projects. **Ask the user for read access** before designing from scratch; they may have a working injector + MinHook scaffolding you can port from in hours rather than days. Do not assume; ask. **Decision gate:** bot runs the leaking phase for 1 hour unattended, emits heartbeats, produces measurable UMDH growth. --- ### Phase 4 — Long-soak with amplification (target: 12–48 hours) **Goal:** generate a clean signal — one or two leaking call stacks visibly dominate the UMDH diff. 1. Revert VM to `bench-verified`. 2. Launch via `supervisor.ps1` with the controller DLL injected. 3. Snapshot every 15 min for 12+ hours. 4. `umdh -d` snap_001 vs snap_N every couple of hours during your active turns. Between active turns, use `ScheduleWakeup` with delay 1800–3600s and the reason `"long-soak snapshot check"`. **Decision gate:** UMDH diff shows one or more call stacks with monotonic growth across all adjacent-pair diffs, dominating the total by ≥10× over the next-highest. That's your leak candidate(s). --- ### Phase 5 — Identify the leaking function (target: 2–4 hours) **Goal:** convert the UMDH call stack into a named function we can study and patch. 1. The top growing stack will look like: ``` ntdll!RtlAllocateHeap+0x... acclient!operator new+0x... acclient!CFoo::AllocateBar+0x42 acclient!CFoo::DoTheThing+0x18 acclient!CGameLoop::Tick+0x... ``` 2. The named function is `CFoo::AllocateBar`. Grep `references/acclient_2013_pseudo_c.txt` for `CFoo::AllocateBar` to read its body. 3. Identify the paired free function (`CFoo::ReleaseBar`, `~CFoo`, etc.) and confirm by reading both. 4. Find every call site of `CFoo::AllocateBar` (grep the pseudo-C for the function name) and verify each has a matching paired release. The one that doesn't is the bug. **Decision gate:** you have (a) the leaking function name, (b) the specific call site that doesn't free, (c) a hypothesis for the patch (typically: add a `delete` or `Release()` on a specific code path). Save these to memory + a write-up file. --- ### Phase 6 — Cross-reference with retail debugger trace (optional, target: 2 hours) **Goal:** confirm the leak path is actually hit at runtime in a real play scenario, not just statically possible. This step is optional but recommended if the leak path is conditional (e.g. "only when the chat buffer wraps"). Use the cdb workflow documented in `templates/trace.cdb` and the retail-debugger section in `CLAUDE.md`. Attach to acclient.exe, breakpoint on the alloc function with a non-blocking action (`r $t0=@$t0+1; gc`), let it accumulate for 30 min, count hits, correlate with UMDH growth bytes. --- ### Phase 7 — BinDiff to EoR (**HOST MACHINE — not VM**) **Goal:** produce an EoR-binary signature and offset for the leaking function. This phase does not happen on the VM. The Binary Ninja databases live on the host (`refs/acclient-eor-2024-09-11.bndb`, `refs/acclient_2013-2024-09-11.bndb`). **You (VM Claude) do:** 1. Write a structured handoff file `Z:\leak-hunt\phase7-handoff.md` containing: the function name, its 2013 RVA, the paired release function name, the suspected missing-free call site, the call-graph context, a 32–48 byte AOB signature with wildcards over relocatable operands (cite the byte sequence from the pseudo-C/disassembly). 2. Notify the user that Phase 7 is ready. **The user (or a Claude session on the host) does:** 3. Load both BNDBs in Binary Ninja, run BinDiff (or use BN's native diff). Locate the matching function in EoR. 4. Verify the AOB signature still matches in EoR (small mods are OK — adjust wildcards as needed). 5. Write back to `Z:\leak-hunt\phase7-result.md`: EoR RVA, confirmed signature, any structural differences worth knowing. You (VM Claude) resume once that file appears. --- ### Phase 8 — Patch DLL (target: 1 day) **Goal:** ship a DLL that, when loaded into `acclient.exe` (any build that matches the signature), plugs the leak. - Same scaffold as the controller DLL — MinHook + a launcher EXE. - Hook the leaking function (or its caller, whichever is cleanest). - Wrap the existing logic so the missing free is performed on the bug path. - Provide a versioned filename and a small README so it's clear which client build it targets. - **Verify on the 2013 client first** — same UMDH soak, expect the top growing stack to vanish. --- ### Phase 9 — Multi-day soak validation (target: 5+ days) **Goal:** prove the patch fixes the production crash. 1. Install the patch DLL injector into the EoR client setup. 2. Launch under the supervisor with snapshots every hour (lower freq — we're not hunting now, just confirming). 3. Run a controlled activity profile (the bot's full rotation) for 5+ days continuous. 4. **Pass:** no OOM crash, committed memory stable or decreasing slope. **Fail:** crash before 5 days → back to Phase 5 to find the second leak. --- ### Phase 10 — Ship 1. Write up findings: - The leak's root cause (one paragraph) - The patch's mechanism (one paragraph) - The 2013-vs-EoR signature note - Validation evidence (UMDH diffs, soak duration, growth-rate plot if you have one) 2. Save the writeup to `Z:\leak-hunt\REPORT.md` and to memory. 3. Notify the user. Stop the loop. Done. --- ## 7. Wake-up protocol Use `ScheduleWakeup` to self-pace between phases. **Default cadence table:** | Situation | Delay | Why | |---|---|---| | Active analysis (reading UMDH diffs, writing code) | none — stay engaged | Full-context work | | Between snapshots inside a soak (Phase 1/4/9) | 1500–1800 s | Cache stays warm-ish, snapshots accumulate | | Overnight gap (≥6 h) | 3600 s and chain | One cache miss is cheap vs. burning per-hour | | Waiting for user (Phase 7 handoff) | 3600 s | Poll for the result file | Pass the same loop prompt each turn. The `reason` field should identify the phase and what you'll check (e.g. `"phase 4 snapshot check — read snap_012 and diff against snap_001"`). --- ## 8. When to stop and ask the user - **Phase 0 verification fails** (PDB mismatch, login fails, ACE not reachable). Don't guess at fixes. - **The bot wedges and auto-recovery fails twice in a row.** - **You're about to expand scope** (refactor the supervisor into a framework, build a UI for snapshot review, port code into acdream's tree). Stop and ask. Default answer is no. - **You hit a decision gate where the data is genuinely ambiguous** (e.g. growth rate is moderate but no single stack dominates). - **Phase 5 produces a function name that isn't in symbols.json.** Probably means an indirect call or vtable dispatch — ask before spending hours decoding it. - **The patch in Phase 9 doesn't validate.** Don't iterate indefinitely; surface findings and re-plan. --- ## 9. Memory protocol Save findings as memory entries so a session that wakes 8 hours later can resume cold. Specifically: - `project_leak_hunt.md` — top-level project context, current phase, open questions - `leak_hunt_phase_N.md` — per-phase findings, growth rates, decisions - `leak_hunt_candidate_.md` — once a function is suspected, everything you know about it - `feedback_leak_hunt_.md` — if the user gives operational feedback during this hunt, record it Update `MEMORY.md` index entry for each. Keep entries short and factual; long writeups go in `Z:\leak-hunt\` files referenced from memory. --- ## 10. Hard rules (do not violate) 1. **Don't run anything that touches the user's host machine.** The VM is isolated for a reason. All output goes through the shared folder. 2. **Don't disable gflags+UST mid-run.** If you need to disable, stop the supervisor, disable, take a fresh baseline. 3. **Don't modify `acclient.exe` on disk.** All patches are runtime DLL hooks. If you ever feel tempted to binary-patch the exe directly, ask the user first. 4. **Don't auto-update `MEMORY.md`** without first saving the underlying memory file. The index must point at real files. 5. **Don't claim a leak is found** without the evidence checklist: - ≥3 consecutive UMDH diffs showing the same top stack growing - The stack is attributed to a named function in `symbols.json` - The call site is identified in `acclient_2013_pseudo_c.txt` - The hypothesis for the missing free is stated 6. **Don't proceed to Phase 8 without Phase 7 handoff complete.** The patch must target the EoR signature, not the 2013 RVA. --- ## 11. First action when you start a fresh session ``` 1. Read README.md (this file) end-to-end. 2. Read CLAUDE.md (project rules — concise). 3. Run the configuration questions in §5 by the user. 4. Save their answers as memory. 5. Begin Phase 0. ``` Good hunting.