Five bugs identified and patched in retail Asheron's Call client: - v3b: palette refcount over-increment (3-byte NOP at two sites) - v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk) - v11: two dangling-pointer crash guards (NULL-check + reorder) - v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk) - v22: unpacker stale-pointer SEH guard (whole-function __try/__except) All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded by acclient.exe at process start via PE import table patching by tools/install_leakfix.py. Controlled 15-client fleet soak: unpatched control died at 26h with palette exhaustion; all 14 patched clients survived past that point and reached ≥5-day uptime. Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator (260KB surface backing buffers retained after Release). See REPORT.md §10 for the full investigation; conclusion is that it's unfixable from outside d3d9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| bin | ||
| dll | ||
| pdb | ||
| references | ||
| templates | ||
| tools | ||
| .gitignore | ||
| README.md | ||
Retail AC Memory Leak Hunt
Status: COMPLETE 2026-05-22. Five bugs found and patched in the retail AC client. Controlled fleet soak showed the unpatched control died at 26h with palette exhaustion; all 14 patched clients survived past that point and ran for ≥5-day uptime. The residual ~15 MB/h growth was traced to d3d9.dll's internal slab allocator and is unfixable from outside d3d9.
If you just want to install: drop
dll/leakfix/dist/leakfix.dllinto your AC directory and runpython tools/install_leakfix.py "C:\path\to\AC". The installer patchesacclient.exe's import table to loadleakfix.dllat startup. Idempotent — safe to re-run.
What ships
| Patch | Bug | One-line fix |
|---|---|---|
| v3b | Palette refcount over-increment in makeModifiedPalette |
NOP the inc [eax+0x24] at two sites |
| v5 | RenderSurface::PurgeResource is a no-op stub |
Override vtable slot 2 to call Destroy() for real |
| v11 | Two dangling-pointer dereferences in delete_contents + ~GXTri3Mesh |
NULL-check guards |
| v14 | CEnvCell::Destroy leaks the ClipPlaneList (just zeros the count) |
Replace the 18-byte buggy block with a JMP to a thunk that actually frees the list |
| v22 | Server-driven AV in the unpacker function at 0x00526A50 (5-client mass crash 2026-05-21 09:00) |
Wrap the function in __try / __except, return 0 on AV (which the engine already handles as the size-check-failure code path) |
All five plus a crash-handler ship in leakfix.dll. Patches are
applied 30 seconds after process start (deferred so Decal/UB win their
own init race first). Crash handler is installed immediately so any
crashes during the 30s window are still captured.
Patch pseudo-code
v3b — palette refcount over-increment
The engine's palette-cache hit path increments the cached entry's refcount twice (once in the cache lookup, once in the constructor that wraps it). Result: refcount grows monotonically; nothing ever hits zero; palettes accumulate until the 32-bit address space exhausts (~26h on heavy-loot clients).
; at 0x0053EFFE (and 0x0053F19C, the sibling overload)
; before patch: after patch:
; ff 40 24 inc dword ptr [eax+0x24] 90 90 90 nop nop nop
// effect, expressed in C:
// before: refcount++ twice per cache hit
// after: refcount++ once per cache hit (the outer increment is removed)
v5 — RenderSurface PurgeResource override
RenderSurface's and RenderTexture's PurgeResource virtual slot
points at 0x004154A0, which is mov al, 1; ret — a no-op stub.
When the resource manager's purge sweep walks s_Resources and calls
PurgeResource() on each entry, the call returns "1 = purged" but
the resource's D3D handle + heap state is never touched. Result:
purged-shell accumulation in s_Resources.
// before — at slot 2 of the RenderSurface vtable (0x0079A684):
// PurgeResource = noop_stub; // 0x004154A0
// int noop_stub() { return 1; }
//
// after — slot 2 redirected to our thunk in leakfix.dll:
int purge_rendersurface_thunk(RenderSurface* self) {
RenderSurface::Destroy(self); // real cleanup
return 1; // engine marks entry purged
}
// same fix mirrored to RenderTexture slot 2 (0x0079C1A0).
v11 — two dangling-pointer crash guards
Two places where the engine dereferences a pointer that's been freed elsewhere. Both manifest as AVs that take the process down.
Site 1 — delete_contents hash walk (0x00587126):
The loop falls through into a dereference of an already-freed bucket
node when the bucket chain was rebuilt mid-walk. Fix: retarget the
JMP so the freed-bucket branch jumps to the epilogue, skipping the
deref.
; before: eb 07 jmp +0x07 ; into the deref
; after: eb 42 jmp +0x42 ; into the epilogue (skip deref)
Site 2 — ~GXTri3Mesh slot 0 deref (0x005E565D):
Destructor of GXTri3Mesh reads its slot[0] then zeros it. If
slot[0] is stale (some other path already freed it), the deref AVs.
Fix: reorder so we zero first; never deref a slot we can't trust.
; before: after:
; 8B 08 mov ecx, [eax] 89 5E 08 mov [esi+8], ebx ; zero first
; 50 push eax 90 ... 90 nop x6 ; skip deref + call
; FF 51 08 call [ecx+8]
; 89 5E 08 mov [esi+8], ebx
v14 — CEnvCell::ClipPlaneList leak
CEnvCell::Destroy contains an 18-byte cleanup block that only
zeros cplane_num — never frees the underlying ClipPlaneList
object hanging off [this+0xDC]. Every cell unload leaks one of
these. Replace the broken block with a JMP to a thunk in
leakfix.dll that does the real cleanup:
// thunk pseudo-code:
void v14_clipplane_cleanup_thunk(CEnvCell* self) {
ClipPlaneListWrapper* outer = self->cplane_wrapper; // [esi+0xDC]
if (outer) {
ClipPlaneList* inner = outer->inner; // [outer+0x0]
if (inner) {
inner->~ClipPlaneList();
operator delete(inner);
}
operator delete[](outer);
self->cplane_wrapper = nullptr;
}
// jump back to V14_RESUME_VA (just past the original 18-byte block)
}
v22 — unpacker stale-pointer SEH guard
A small inline unpacker at 0x00526A50 pulls 4 DWORDs from
arg1->buffer. On 2026-05-21 the server fed five clients
simultaneously a buffer pointing into freed/kernel memory; all five
AV'd on the 4th deref. The engine already has a code path for
"buffer too small / unpack failed" (line 1 of the function checks a
size field and returns 0). We just wrap the whole function body in
SEH and route AVs to that same return-0 path.
// 1. Copy the original 73 bytes of the function to executable memory.
// 2. Patch the original entry with JMP rel32 to our wrapper.
int v22_unpacker_wrapper(this, arg1, count) {
__try {
return original_copy(this, arg1, count); // run the real unpacker
} __except (EXCEPTION_EXECUTE_HANDLER) {
// log + return 0 (engine treats this as size-check failure)
return 0;
}
}
Install
# 1. Copy leakfix.dll into your AC directory
Copy-Item .\dll\leakfix\build\leakfix.dll "C:\Turbine\Asheron's Call\"
# 2. Patch acclient.exe to import leakfix.dll
python tools\install_leakfix.py "C:\Turbine\Asheron's Call"
# 3. Verify
python tools\install_leakfix.py "C:\Turbine\Asheron's Call" verify
The installer adds a .limport PE section to acclient.exe containing
the rebuilt import table. It backs up the original to
acclient.exe.bare_original on first run, and is idempotent.
Roll back
Copy-Item "C:\Turbine\Asheron's Call\acclient.exe.bare_original" `
"C:\Turbine\Asheron's Call\acclient.exe" -Force
Remove-Item "C:\Turbine\Asheron's Call\leakfix.dll" -ErrorAction Ignore
Files
dll/leakfix/src/— DLL source (C++ with inline asm for the naked thunks)dll/leakfix/dist/leakfix.dll— current production build (117 KB)dll/leakfix/build.bat— build script (VS 2022 BuildTools required)tools/install_leakfix.py— patches acclient.exe to import leakfix.dlltools/check_acclient_imports.py— verify import table contains leakfix.dllreferences/— symbol table, pseudo-C, header for the 2013 client (PDB-backed)
The rest of this document is the original VM operator brief that drove the investigation. Preserved for context but no longer operationally relevant — the hunt is done.
Retail AC Memory Leak Hunt — VM Operator Brief
You are picking this up cold on a freshly-provisioned Windows VM.
This document is your full mission brief. Read it end-to-end before
running anything, then drive the work autonomously, using
ScheduleWakeup (Claude Code) to pace long-running operations between
your active turns.
1. Mission
Find and patch a memory leak in the retail Asheron's Call client. The production symptom is a hard crash after ~4–5 days of continuous play on the End-of-Retail (EoR, ~Jan 2017) client. We don't have symbols for that binary — but we have full PDB symbols for the Sept 2013 v11.4186 client, which almost certainly carries the same leak (AC was in pure maintenance mode 2013→2017, very little net new code).
The hunt happens on the 2013 client (symbolized). The patch ships against the EoR client (via BinDiff-forward).
What "done" looks like
- A specific function in the 2013 client is identified as the leak source, with evidence: monotonic UMDH growth across multiple snapshot diffs attributed to that function's call stack.
- The corresponding function in the EoR client is located via BinDiff (this step happens on the host machine, not the VM — the BNDB files live there).
- A DLL-injection patch is built that hooks the EoR function and
plugs the leak (typically: adds a missing
delete/Release/decref on a known path). - A 5+ day soak on EoR with the patch installed completes without the OOM crash that reproduces unpatched in the same window.
Hard scope boundary
This is a self-contained side quest. Do not expand it into a general retail-instrumentation framework, a fork of the controller DLL into a fully-featured bot, a parallel acdream feature, or "while I'm here" refactors of the AC2D/Mosswart tooling. Find leak → patch → validate → ship → done. If you catch yourself reaching for adjacent work, stop and re-read this paragraph.
2. Why this works (assumptions you can rely on)
- Compiler & toolchain stability. 2013 and EoR were both built with the same VC++ family on the same Turbine build farm. Binary structure is highly similar.
- Code stability. AC went into maintenance after Throne of Destiny (2005) and stayed there. Most of the codebase did not change meaningfully between 2013 and EoR. A leak severe enough to crash in 4–5 days has almost certainly been present for many years.
- PDB → BinDiff path is mature.
BinDiffandDiaphoraroutinely achieve 80–95% function-match rates across related VC++ binaries. Once you identify the leaking function in 2013 (with name), porting the symbol forward to EoR is signature-scan-able.
What you're betting on, and the fallback
- Primary bet: the leak repros on the 2013 client. UMDH on the 2013 client + activity bot reveals it within hours-to-days. You identify a named function, hand the name to the host for BinDiff, receive the EoR signature back, build the patch DLL, validate.
- Fallback: the leak does NOT repro on 2013 — i.e. it was introduced after Sept 2013. In that case, you fall back to hunting on the EoR client without symbols, using BinDiff-transferred names for whatever functions match the 2013 codebase. This is slower but still feasible. The primary-vs-fallback determination is Phase 1 Decision Gate below.
3. Package contents
leak-hunt-vm-2026-05-12/
├── README.md ← you are here
├── MANIFEST.md ← list of out-of-repo files copied in
├── CLAUDE.md ← VM-side project rules (persistent)
├── templates/
│ ├── supervisor.ps1 ← skeleton — start ACE, start client, snapshot loop
│ ├── snapshot.ps1 ← UMDH single-shot
│ ├── activity-phases.json ← phase schedule template
│ ├── login.ahk ← AutoHotkey login skeleton
│ └── trace.cdb ← cdb scripting template
├── tools/
│ ├── check_exe_pdb.py ← verify binary ↔ PDB GUID match
│ ├── dump_pdb_info.py ← PDB metadata
│ └── pdb_extract.py ← regenerate symbols.json if needed
├── pdb/
│ └── acclient.pdb ← (29 MB, copied per MANIFEST)
└── references/
├── symbols.json ← 18,366 named functions + addresses (grep-friendly)
├── types.json ← 5,371 struct/class type definitions
├── acclient.h ← verbatim retail header structs
└── acclient_2013_pseudo_c.txt ← 64 MB symbolized Binary Ninja pseudo-C
The Python tools are stdlib-only (no pip). Everything else is data.
4. What you need on the VM (one-time, before starting)
If any of these is missing, ask the user before guessing.
| Component | Where | Notes |
|---|---|---|
| Retail AC client (2013 v11.4186) | C:\Turbine\Asheron's Call\ |
Standard install path. Verify match with check_exe_pdb.py before any other work. The _NT_SYMBOL_PATH must include pdb/. |
| Retail AC dat files | inside the install | client_portal.dat, client_cell_1.dat, client_highres.dat, client_local_English.dat |
| ACE server | 127.0.0.1:9000 on VM |
Use ACEmulator from github.com/ACEmulator/ACE. Same config as user's dev box. Confirm it accepts logins before continuing. |
| Test character | on the VM's ACE | Suggested name: +Leakhunt. GM-marker + so debug commands are available. |
| Windows Debugging Tools | Microsoft Store WinDbg or Win10/11 SDK | Need cdb.exe, umdh.exe, gflags.exe. 32-bit (x86) versions — acclient.exe is 32-bit. |
| AutoHotkey v2 | autohotkey.com | For login automation. v2 only — templates assume v2 syntax. |
Sysinternals procdump |
sysinternals.com | Crash-dump capture. |
| MinHook (optional, for patch DLL) | github.com/TsudaKageyu/minhook | Only needed at Phase 8. Defer. |
| Shared folder or mounted drive | Z:\ or similar |
For passing snapshots back to host. Configure at VM-setup time. |
5. Configuration questions to ask the user at session start
Ask these first, before running anything. They materially affect the harness.
- Where is ACE running — same VM (recommended; snapshot-clean) or on the host with VM networking through to it? Default assumption: same VM.
- What's the AC install path if it's not the standard
C:\Turbine\Asheron's Call\? - Output flow — shared folder path? Or push artifacts to a git
branch (e.g.
leak-hunt-vm/2026-05-12)? Default: shared folder toZ:\leak-hunt\on host. - Test character name on the VM ACE? Default:
+Leakhunt. - VM specs — RAM and core count? (Affects whether to enable gflags+UST from the start, which costs ~20–30% perf.)
- EoR binary location on host — confirm the user has it at
C:\Users\erikn\source\repos\acdream\refs\acclient-eor-2024-09-11.bndb(Binary Ninja db). This isn't needed on the VM but is critical for Phase 7 BinDiff on the host. - Wake-up cadence preference — do they want you to use
ScheduleWakeupfor hours-long gaps, or stay continuously active? Default: ScheduleWakeup for any gap > 30 min.
Save the user's answers as memory entries before proceeding past Phase 0 so a future session can pick up cold.
6. Phased plan
Each phase has a goal, commands, decision gate, and estimated time. Don't skip ahead. Don't run multiple phases in parallel until Phase 4.
Phase 0 — Verify the bench (target: 30 min)
Goal: prove the environment can launch AC, log in, and observe memory.
py tools/check_exe_pdb.py "C:\Turbine\Asheron's Call\acclient.exe". Expect:=== MATCH: this exe pairs with our acclient.pdb ===. If MISMATCH → stop, ask the user which build they installed.py tools/dump_pdb_info.py pdb/acclient.pdb. Confirm GUID9e847e2f-777c-4bd9-886c-22256bb87f32, age 1.- Start ACE locally (
dotnet runin the ACE checkout, orACE.exeif pre-built). Confirm it listens on127.0.0.1:9000. - Manually launch AC, log in with the test character, walk one step, log out. This proves the bench works before you add instrumentation.
- Take a clean Hyper-V / VMware snapshot named
bench-verified. The supervisor will revert to this before each run.
Decision gate: can you launch, log in, walk, log out, clean? If no, fix this before anything else. If yes, proceed.
Phase 1 — Idle baseline + decide hunt platform (target: 4 hours)
Goal: does the leak reproduce on the 2013 client when the player sits at the lifestone doing nothing? If yes, primary plan; if no, Phase 2 will find the right activity profile.
- Enable heap allocation tagging:
gflags /i acclient.exe +ust. This is registry-set; survives reboots. (Disable later withgflags /i acclient.exe -ust.) - Set
_NT_SYMBOL_PATH=<vm-path-to-pdb-dir>. - Launch AC via
templates/supervisor.ps1(which sets env and spawns the process). Log in manually OR viatemplates/login.ahk. - Walk to a quiet spot (lifestone interior, away from spawn). Sit.
templates/snapshot.ps1 -ProcessId <pid> -Out snap_001.txtimmediately. Wait 30 min. Takesnap_002. Repeat for at least 4 hours (8 snapshots).umdh -d snap_001 snap_008 -f:diff_idle.txt. Read top 20 growing stacks. Save the diff toZ:\leak-hunt\phase1\.
Decision gate:
- Total committed memory grew >50 MB over 4h idle? Leak repros at idle. Skip Phase 2, jump to Phase 4 (long-soak idle).
- Total committed grew 5–50 MB? Leak may need amplification. Proceed to Phase 2.
- Total committed grew <5 MB? Leak is activity-specific or doesn't exist on 2013. Proceed to Phase 2.
- Memory dropped or oscillates around 0? No leak signal at idle. Phase 2 is where you'll find it (or won't).
Record the baseline growth-rate number in memory:
leak_hunt_phase1_baseline_mb_per_hour.
Phase 2 — Activity-phase characterization (target: 1–2 days)
Goal: find which player activity causes the leak. The bot is not yet built — you drive this manually with the activity-phase template running as an AHK macro, or by playing 30-min phases yourself if no bot is available.
The five canonical phases (see templates/activity-phases.json):
- idle — stand at lifestone, no input
- wander — walk a fixed route around Holtburg
- chat — spam say/tell/global chat
- target-cycle — Tab through nearby NPCs/mobs, no combat
- ui-cycle — open/close inventory, character pane, spells
Procedure per session:
- Start fresh from
bench-verifiedsnapshot. - Run a single phase for 1 hour with snapshots every 15 min.
umdh -ddiff the snapshot pair for that phase.- Record growth-rate per phase to memory.
- Repeat for each phase, single VM, single phase per run.
(If user has authorized multiple parallel VMs, run different phases simultaneously instead of sequentially.)
Decision gate: rank phases by growth rate. The top phase is your target for Phase 4 amplification. Save ranking to memory.
Phase 3 — Controller DLL (target: 1–2 days)
Goal: build a small DLL that drives the leaking phase deterministically and reproducibly, faster than a human can.
Build approach:
- C++ DLL, 32-bit, compiled against Visual Studio Build Tools.
- MinHook for function hooking.
- LoadLibrary'd into
acclient.exevia a small launcher EXE (CreateProcess SUSPENDED → WriteProcessMemory of LoadLibrary trampoline → ResumeThread). Standard injection pattern. - Hook a frame-loop function from
symbols.json— searchreferences/symbols.jsonforCGameLoop,WorldFilter,Tick,ProcessFrameand pick the highest-frequency one with stable signature. - Call retail functions directly via PDB-resolved addresses. Examples:
CPhysicsObj::set_velocity,CChatManager::SendSay,CPlayerSystem::SelectTarget. These take athispointer inecx(thiscall) — you'll need a small asm trampoline or use__thiscallcalling-convention helpers.
The bot's job:
- Drive the top-ranked Phase 2 activity continuously.
- Emit a heartbeat to a log file every 30s so the supervisor can detect wedging.
- Auto-restart self-position-watchdog: if
CPhysicsObj::positionhasn't changed in 5 min during a movement phase, signal the supervisor to revert and retry.
Reuse opportunity: the user maintains MosswartMassacre and MosswartOverlord — both are AC client DLL-injection projects. Ask the user for read access before designing from scratch; they may have a working injector + MinHook scaffolding you can port from in hours rather than days. Do not assume; ask.
Decision gate: bot runs the leaking phase for 1 hour unattended, emits heartbeats, produces measurable UMDH growth.
Phase 4 — Long-soak with amplification (target: 12–48 hours)
Goal: generate a clean signal — one or two leaking call stacks visibly dominate the UMDH diff.
- Revert VM to
bench-verified. - Launch via
supervisor.ps1with the controller DLL injected. - Snapshot every 15 min for 12+ hours.
umdh -dsnap_001 vs snap_N every couple of hours during your active turns. Between active turns, useScheduleWakeupwith delay 1800–3600s and the reason"long-soak snapshot check".
Decision gate: UMDH diff shows one or more call stacks with monotonic growth across all adjacent-pair diffs, dominating the total by ≥10× over the next-highest. That's your leak candidate(s).
Phase 5 — Identify the leaking function (target: 2–4 hours)
Goal: convert the UMDH call stack into a named function we can study and patch.
- The top growing stack will look like:
ntdll!RtlAllocateHeap+0x... acclient!operator new+0x... acclient!CFoo::AllocateBar+0x42 acclient!CFoo::DoTheThing+0x18 acclient!CGameLoop::Tick+0x... - The named function is
CFoo::AllocateBar. Grepreferences/acclient_2013_pseudo_c.txtforCFoo::AllocateBarto read its body. - Identify the paired free function (
CFoo::ReleaseBar,~CFoo, etc.) and confirm by reading both. - Find every call site of
CFoo::AllocateBar(grep the pseudo-C for the function name) and verify each has a matching paired release. The one that doesn't is the bug.
Decision gate: you have (a) the leaking function name, (b) the
specific call site that doesn't free, (c) a hypothesis for the
patch (typically: add a delete or Release() on a specific code
path). Save these to memory + a write-up file.
Phase 6 — Cross-reference with retail debugger trace (optional, target: 2 hours)
Goal: confirm the leak path is actually hit at runtime in a real play scenario, not just statically possible.
This step is optional but recommended if the leak path is conditional
(e.g. "only when the chat buffer wraps"). Use the cdb workflow
documented in templates/trace.cdb and the retail-debugger section
in CLAUDE.md. Attach to acclient.exe, breakpoint on the alloc
function with a non-blocking action (r $t0=@$t0+1; gc), let it
accumulate for 30 min, count hits, correlate with UMDH growth bytes.
Phase 7 — BinDiff to EoR (HOST MACHINE — not VM)
Goal: produce an EoR-binary signature and offset for the leaking function.
This phase does not happen on the VM. The Binary Ninja databases
live on the host (refs/acclient-eor-2024-09-11.bndb,
refs/acclient_2013-2024-09-11.bndb).
You (VM Claude) do:
- Write a structured handoff file
Z:\leak-hunt\phase7-handoff.mdcontaining: the function name, its 2013 RVA, the paired release function name, the suspected missing-free call site, the call-graph context, a 32–48 byte AOB signature with wildcards over relocatable operands (cite the byte sequence from the pseudo-C/disassembly). - Notify the user that Phase 7 is ready.
The user (or a Claude session on the host) does:
3. Load both BNDBs in Binary Ninja, run BinDiff (or use BN's native
diff). Locate the matching function in EoR.
4. Verify the AOB signature still matches in EoR (small mods are
OK — adjust wildcards as needed).
5. Write back to Z:\leak-hunt\phase7-result.md: EoR RVA,
confirmed signature, any structural differences worth knowing.
You (VM Claude) resume once that file appears.
Phase 8 — Patch DLL (target: 1 day)
Goal: ship a DLL that, when loaded into acclient.exe (any
build that matches the signature), plugs the leak.
- Same scaffold as the controller DLL — MinHook + a launcher EXE.
- Hook the leaking function (or its caller, whichever is cleanest).
- Wrap the existing logic so the missing free is performed on the bug path.
- Provide a versioned filename and a small README so it's clear which client build it targets.
- Verify on the 2013 client first — same UMDH soak, expect the top growing stack to vanish.
Phase 9 — Multi-day soak validation (target: 5+ days)
Goal: prove the patch fixes the production crash.
- Install the patch DLL injector into the EoR client setup.
- Launch under the supervisor with snapshots every hour (lower freq — we're not hunting now, just confirming).
- Run a controlled activity profile (the bot's full rotation) for 5+ days continuous.
- Pass: no OOM crash, committed memory stable or decreasing slope. Fail: crash before 5 days → back to Phase 5 to find the second leak.
Phase 10 — Ship
- Write up findings:
- The leak's root cause (one paragraph)
- The patch's mechanism (one paragraph)
- The 2013-vs-EoR signature note
- Validation evidence (UMDH diffs, soak duration, growth-rate plot if you have one)
- Save the writeup to
Z:\leak-hunt\REPORT.mdand to memory. - Notify the user. Stop the loop. Done.
7. Wake-up protocol
Use ScheduleWakeup to self-pace between phases. Default cadence
table:
| Situation | Delay | Why |
|---|---|---|
| Active analysis (reading UMDH diffs, writing code) | none — stay engaged | Full-context work |
| Between snapshots inside a soak (Phase 1/4/9) | 1500–1800 s | Cache stays warm-ish, snapshots accumulate |
| Overnight gap (≥6 h) | 3600 s and chain | One cache miss is cheap vs. burning per-hour |
| Waiting for user (Phase 7 handoff) | 3600 s | Poll for the result file |
Pass the same loop prompt each turn. The reason field should
identify the phase and what you'll check (e.g. "phase 4 snapshot check — read snap_012 and diff against snap_001").
8. When to stop and ask the user
- Phase 0 verification fails (PDB mismatch, login fails, ACE not reachable). Don't guess at fixes.
- The bot wedges and auto-recovery fails twice in a row.
- You're about to expand scope (refactor the supervisor into a framework, build a UI for snapshot review, port code into acdream's tree). Stop and ask. Default answer is no.
- You hit a decision gate where the data is genuinely ambiguous (e.g. growth rate is moderate but no single stack dominates).
- Phase 5 produces a function name that isn't in symbols.json. Probably means an indirect call or vtable dispatch — ask before spending hours decoding it.
- The patch in Phase 9 doesn't validate. Don't iterate indefinitely; surface findings and re-plan.
9. Memory protocol
Save findings as memory entries so a session that wakes 8 hours later can resume cold. Specifically:
project_leak_hunt.md— top-level project context, current phase, open questionsleak_hunt_phase_N.md— per-phase findings, growth rates, decisionsleak_hunt_candidate_<funcname>.md— once a function is suspected, everything you know about itfeedback_leak_hunt_<topic>.md— if the user gives operational feedback during this hunt, record it
Update MEMORY.md index entry for each. Keep entries short and
factual; long writeups go in Z:\leak-hunt\ files referenced from
memory.
10. Hard rules (do not violate)
- Don't run anything that touches the user's host machine. The VM is isolated for a reason. All output goes through the shared folder.
- Don't disable gflags+UST mid-run. If you need to disable, stop the supervisor, disable, take a fresh baseline.
- Don't modify
acclient.exeon disk. All patches are runtime DLL hooks. If you ever feel tempted to binary-patch the exe directly, ask the user first. - Don't auto-update
MEMORY.mdwithout first saving the underlying memory file. The index must point at real files. - Don't claim a leak is found without the evidence checklist:
- ≥3 consecutive UMDH diffs showing the same top stack growing
- The stack is attributed to a named function in
symbols.json - The call site is identified in
acclient_2013_pseudo_c.txt - The hypothesis for the missing free is stated
- Don't proceed to Phase 8 without Phase 7 handoff complete. The patch must target the EoR signature, not the 2013 RVA.
11. First action when you start a fresh session
1. Read README.md (this file) end-to-end.
2. Read CLAUDE.md (project rules — concise).
3. Run the configuration questions in §5 by the user.
4. Save their answers as memory.
5. Begin Phase 0.
Good hunting.