leakhunt/README.md

# Retail AC Memory Leak Hunt

> **Status: COMPLETE 2026-05-22.** Five bugs found and patched in the
> retail AC client. Controlled fleet soak showed the unpatched control
> died at 26h with palette exhaustion; all 14 patched clients survived
> past that point and ran for ≥5-day uptime. The residual ~15 MB/h
> growth was traced to d3d9.dll's internal slab allocator and is
> unfixable from outside d3d9.
>
> If you just want to install: drop `dll/leakfix/dist/leakfix.dll`
> into your AC directory and run
> `python tools/install_leakfix.py "C:\path\to\AC"`. The installer
> patches `acclient.exe`'s import table to load `leakfix.dll` at
> startup. Idempotent — safe to re-run.

---

## What ships

| Patch | Bug | One-line fix |
|---|---|---|
| **v3b** | Palette refcount over-increment in `makeModifiedPalette` | NOP the `inc [eax+0x24]` at two sites |
| **v5**  | `RenderSurface::PurgeResource` is a no-op stub | Override vtable slot 2 to call `Destroy()` for real |
| **v11** | Two dangling-pointer dereferences in `delete_contents` + `~GXTri3Mesh` | NULL-check guards |
| **v14** | `CEnvCell::Destroy` leaks the `ClipPlaneList` (just zeros the count) | Replace the 18-byte buggy block with a JMP to a thunk that actually frees the list |
| **v22** | Server-driven AV in the unpacker function at `0x00526A50` (5-client mass crash 2026-05-21 09:00) | Wrap the function in `__try / __except`, return 0 on AV (which the engine already handles as the size-check-failure code path) |

All five plus a crash-handler ship in `leakfix.dll`. Patches are
applied 30 seconds after process start (deferred so Decal/UB win their
own init race first). Crash handler is installed immediately so any
crashes during the 30s window are still captured.

## Patch pseudo-code

### v3b — palette refcount over-increment
The engine's palette-cache hit path increments the cached entry's
refcount **twice** (once in the cache lookup, once in the constructor
that wraps it). Result: refcount grows monotonically; nothing ever
hits zero; palettes accumulate until the 32-bit address space
exhausts (~26h on heavy-loot clients).

```asm
; at 0x0053EFFE (and 0x0053F19C, the sibling overload)
;   before patch:                            after patch:
;   ff 40 24    inc dword ptr [eax+0x24]     90 90 90  nop nop nop
```

```c
// effect, expressed in C:
// before: refcount++ twice per cache hit
// after:  refcount++ once per cache hit (the outer increment is removed)
```

### v5 — RenderSurface PurgeResource override
`RenderSurface`'s and `RenderTexture`'s `PurgeResource` virtual slot
points at `0x004154A0`, which is `mov al, 1; ret` — a no-op stub.
When the resource manager's purge sweep walks `s_Resources` and calls
`PurgeResource()` on each entry, the call returns "1 = purged" but
the resource's D3D handle + heap state is never touched. Result:
purged-shell accumulation in `s_Resources`.

```c
// before — at slot 2 of the RenderSurface vtable (0x0079A684):
//   PurgeResource = noop_stub;   // 0x004154A0
//   int noop_stub() { return 1; }
//
// after — slot 2 redirected to our thunk in leakfix.dll:
int purge_rendersurface_thunk(RenderSurface* self) {
    RenderSurface::Destroy(self);   // real cleanup
    return 1;                       // engine marks entry purged
}
// same fix mirrored to RenderTexture slot 2 (0x0079C1A0).
```

### v11 — two dangling-pointer crash guards
Two places where the engine dereferences a pointer that's been freed
elsewhere. Both manifest as AVs that take the process down.

**Site 1** — `delete_contents` hash walk (`0x00587126`):
The loop falls through into a dereference of an already-freed bucket
node when the bucket chain was rebuilt mid-walk. Fix: retarget the
JMP so the freed-bucket branch jumps to the epilogue, skipping the
deref.

```asm
;   before: eb 07    jmp +0x07   ; into the deref
;   after:  eb 42    jmp +0x42   ; into the epilogue (skip deref)
```

**Site 2** — `~GXTri3Mesh` slot 0 deref (`0x005E565D`):
Destructor of `GXTri3Mesh` reads its slot[0] *then* zeros it. If
slot[0] is stale (some other path already freed it), the deref AVs.
Fix: reorder so we zero first; never deref a slot we can't trust.

```asm
;   before:                          after:
;   8B 08        mov ecx, [eax]      89 5E 08    mov [esi+8], ebx  ; zero first
;   50           push eax            90 ... 90   nop x6            ; skip deref + call
;   FF 51 08     call [ecx+8]
;   89 5E 08     mov [esi+8], ebx
```

### v14 — CEnvCell::ClipPlaneList leak
`CEnvCell::Destroy` contains an 18-byte cleanup block that **only
zeros `cplane_num`** — never frees the underlying `ClipPlaneList`
object hanging off `[this+0xDC]`. Every cell unload leaks one of
these. Replace the broken block with a `JMP` to a thunk in
leakfix.dll that does the real cleanup:

```c
// thunk pseudo-code:
void v14_clipplane_cleanup_thunk(CEnvCell* self) {
    ClipPlaneListWrapper* outer = self->cplane_wrapper;   // [esi+0xDC]
    if (outer) {
        ClipPlaneList* inner = outer->inner;              // [outer+0x0]
        if (inner) {
            inner->~ClipPlaneList();
            operator delete(inner);
        }
        operator delete[](outer);
        self->cplane_wrapper = nullptr;
    }
    // jump back to V14_RESUME_VA (just past the original 18-byte block)
}
```

### v22 — unpacker stale-pointer SEH guard
A small inline unpacker at `0x00526A50` pulls 4 DWORDs from
`arg1->buffer`. On 2026-05-21 the server fed five clients
simultaneously a buffer pointing into freed/kernel memory; all five
AV'd on the 4th deref. The engine *already* has a code path for
"buffer too small / unpack failed" (line 1 of the function checks a
size field and returns 0). We just wrap the whole function body in
SEH and route AVs to that same return-0 path.

```c
// 1. Copy the original 73 bytes of the function to executable memory.
// 2. Patch the original entry with JMP rel32 to our wrapper.
int v22_unpacker_wrapper(this, arg1, count) {
    __try {
        return original_copy(this, arg1, count);  // run the real unpacker
    } __except (EXCEPTION_EXECUTE_HANDLER) {
        // log + return 0 (engine treats this as size-check failure)
        return 0;
    }
}
```

## Install

```powershell
# 1. Copy leakfix.dll into your AC directory
Copy-Item .\dll\leakfix\build\leakfix.dll "C:\Turbine\Asheron's Call\"

# 2. Patch acclient.exe to import leakfix.dll
python tools\install_leakfix.py "C:\Turbine\Asheron's Call"

# 3. Verify
python tools\install_leakfix.py "C:\Turbine\Asheron's Call" verify
```

The installer adds a `.limport` PE section to acclient.exe containing
the rebuilt import table. It backs up the original to
`acclient.exe.bare_original` on first run, and is idempotent.

## Roll back

```powershell
Copy-Item "C:\Turbine\Asheron's Call\acclient.exe.bare_original" `
          "C:\Turbine\Asheron's Call\acclient.exe" -Force
Remove-Item "C:\Turbine\Asheron's Call\leakfix.dll" -ErrorAction Ignore
```

## Files

- `dll/leakfix/src/` — DLL source (C++ with inline asm for the naked thunks)
- `dll/leakfix/dist/leakfix.dll` — current production build (117 KB)
- `dll/leakfix/build.bat` — build script (VS 2022 BuildTools required)
- `tools/install_leakfix.py` — patches acclient.exe to import leakfix.dll
- `tools/check_acclient_imports.py` — verify import table contains leakfix.dll
- `references/` — symbol table, pseudo-C, header for the 2013 client (PDB-backed)

The rest of this document is the original VM operator brief that
drove the investigation. Preserved for context but no longer
operationally relevant — the hunt is done.

---

# Retail AC Memory Leak Hunt — VM Operator Brief

**You are picking this up cold on a freshly-provisioned Windows VM.**
This document is your full mission brief. Read it end-to-end before
running anything, then drive the work autonomously, using
`ScheduleWakeup` (Claude Code) to pace long-running operations between
your active turns.

---

## 1. Mission

Find and patch a memory leak in the retail Asheron's Call client. The
production symptom is a hard crash after ~4–5 days of continuous play
on the **End-of-Retail (EoR, ~Jan 2017) client**. We don't have symbols
for that binary — but we have **full PDB symbols for the Sept 2013
v11.4186 client**, which almost certainly carries the same leak (AC was
in pure maintenance mode 2013→2017, very little net new code).

**The hunt happens on the 2013 client (symbolized).**
**The patch ships against the EoR client (via BinDiff-forward).**

### What "done" looks like

1. A specific function in the 2013 client is identified as the leak
   source, with evidence: monotonic UMDH growth across multiple
   snapshot diffs attributed to that function's call stack.
2. The corresponding function in the EoR client is located via
   BinDiff (this step happens on the **host machine**, not the VM —
   the BNDB files live there).
3. A DLL-injection patch is built that hooks the EoR function and
   plugs the leak (typically: adds a missing `delete`/`Release`/decref
   on a known path).
4. A 5+ day soak on EoR with the patch installed completes without
   the OOM crash that reproduces unpatched in the same window.

### Hard scope boundary

This is a self-contained side quest. **Do not** expand it into a
general retail-instrumentation framework, a fork of the controller
DLL into a fully-featured bot, a parallel acdream feature, or "while
I'm here" refactors of the AC2D/Mosswart tooling. Find leak → patch →
validate → ship → done. If you catch yourself reaching for adjacent
work, stop and re-read this paragraph.

---

## 2. Why this works (assumptions you can rely on)

- **Compiler & toolchain stability.** 2013 and EoR were both built with
  the same VC++ family on the same Turbine build farm. Binary structure
  is highly similar.
- **Code stability.** AC went into maintenance after Throne of
  Destiny (2005) and stayed there. Most of the codebase did not change
  meaningfully between 2013 and EoR. A leak severe enough to crash in
  4–5 days has almost certainly been present for many years.
- **PDB → BinDiff path is mature.** `BinDiff` and `Diaphora` routinely
  achieve 80–95% function-match rates across related VC++ binaries.
  Once you identify the leaking function in 2013 (with name), porting
  the symbol forward to EoR is signature-scan-able.

### What you're betting on, and the fallback

- **Primary bet:** the leak repros on the 2013 client. UMDH on the
  2013 client + activity bot reveals it within hours-to-days. You
  identify a named function, hand the name to the host for BinDiff,
  receive the EoR signature back, build the patch DLL, validate.
- **Fallback:** the leak does NOT repro on 2013 — i.e. it was
  introduced after Sept 2013. In that case, you fall back to hunting
  on the EoR client without symbols, using BinDiff-transferred names
  for whatever functions match the 2013 codebase. This is slower but
  still feasible. The primary-vs-fallback determination is **Phase 1
  Decision Gate** below.

---

## 3. Package contents

```
leak-hunt-vm-2026-05-12/
├── README.md                ← you are here
├── MANIFEST.md              ← list of out-of-repo files copied in
├── CLAUDE.md                ← VM-side project rules (persistent)
├── templates/
│   ├── supervisor.ps1       ← skeleton — start ACE, start client, snapshot loop
│   ├── snapshot.ps1         ← UMDH single-shot
│   ├── activity-phases.json ← phase schedule template
│   ├── login.ahk            ← AutoHotkey login skeleton
│   └── trace.cdb            ← cdb scripting template
├── tools/
│   ├── check_exe_pdb.py     ← verify binary ↔ PDB GUID match
│   ├── dump_pdb_info.py     ← PDB metadata
│   └── pdb_extract.py       ← regenerate symbols.json if needed
├── pdb/
│   └── acclient.pdb         ← (29 MB, copied per MANIFEST)
└── references/
    ├── symbols.json         ← 18,366 named functions + addresses (grep-friendly)
    ├── types.json           ← 5,371 struct/class type definitions
    ├── acclient.h           ← verbatim retail header structs
    └── acclient_2013_pseudo_c.txt  ← 64 MB symbolized Binary Ninja pseudo-C
```

The Python tools are stdlib-only (no pip). Everything else is data.

---

## 4. What you need on the VM (one-time, before starting)

If any of these is missing, **ask the user before guessing**.

| Component | Where | Notes |
|---|---|---|
| Retail AC client (2013 v11.4186) | `C:\Turbine\Asheron's Call\` | Standard install path. Verify match with `check_exe_pdb.py` before any other work. The `_NT_SYMBOL_PATH` must include `pdb/`. |
| Retail AC dat files | inside the install | `client_portal.dat`, `client_cell_1.dat`, `client_highres.dat`, `client_local_English.dat` |
| ACE server | `127.0.0.1:9000` on VM | Use ACEmulator from github.com/ACEmulator/ACE. Same config as user's dev box. Confirm it accepts logins before continuing. |
| Test character | on the VM's ACE | Suggested name: `+Leakhunt`. GM-marker `+` so debug commands are available. |
| Windows Debugging Tools | Microsoft Store WinDbg or Win10/11 SDK | Need `cdb.exe`, `umdh.exe`, `gflags.exe`. 32-bit (`x86`) versions — `acclient.exe` is 32-bit. |
| AutoHotkey v2 | autohotkey.com | For login automation. v2 only — templates assume v2 syntax. |
| Sysinternals `procdump` | sysinternals.com | Crash-dump capture. |
| MinHook (optional, for patch DLL) | github.com/TsudaKageyu/minhook | Only needed at Phase 8. Defer. |
| Shared folder or mounted drive | `Z:\` or similar | For passing snapshots back to host. Configure at VM-setup time. |

---

## 5. Configuration questions to ask the user at session start

**Ask these first, before running anything.** They materially affect
the harness.

1. **Where is ACE running** — same VM (recommended; snapshot-clean) or
   on the host with VM networking through to it? Default assumption:
   same VM.
2. **What's the AC install path** if it's not the standard
   `C:\Turbine\Asheron's Call\`?
3. **Output flow** — shared folder path? Or push artifacts to a git
   branch (e.g. `leak-hunt-vm/2026-05-12`)? Default: shared folder
   to `Z:\leak-hunt\` on host.
4. **Test character name** on the VM ACE? Default: `+Leakhunt`.
5. **VM specs** — RAM and core count? (Affects whether to enable
   gflags+UST from the start, which costs ~20–30% perf.)
6. **EoR binary location on host** — confirm the user has it at
   `C:\Users\erikn\source\repos\acdream\refs\acclient-eor-2024-09-11.bndb`
   (Binary Ninja db). This isn't needed on the VM but is critical for
   Phase 7 BinDiff on the host.
7. **Wake-up cadence preference** — do they want you to use
   `ScheduleWakeup` for hours-long gaps, or stay continuously
   active? Default: ScheduleWakeup for any gap > 30 min.

Save the user's answers as memory entries before proceeding past
Phase 0 so a future session can pick up cold.

---

## 6. Phased plan

Each phase has a **goal**, **commands**, **decision gate**, and
**estimated time**. Don't skip ahead. Don't run multiple phases in
parallel until Phase 4.

### Phase 0 — Verify the bench (target: 30 min)

**Goal:** prove the environment can launch AC, log in, and observe
memory.

1. `py tools/check_exe_pdb.py "C:\Turbine\Asheron's Call\acclient.exe"`.
   Expect: `=== MATCH: this exe pairs with our acclient.pdb ===`.
   If MISMATCH → stop, ask the user which build they installed.
2. `py tools/dump_pdb_info.py pdb/acclient.pdb`. Confirm GUID
   `9e847e2f-777c-4bd9-886c-22256bb87f32`, age 1.
3. Start ACE locally (`dotnet run` in the ACE checkout, or
   `ACE.exe` if pre-built). Confirm it listens on `127.0.0.1:9000`.
4. Manually launch AC, log in with the test character, walk one
   step, log out. **This proves the bench works before you add
   instrumentation.**
5. Take a clean Hyper-V / VMware snapshot named `bench-verified`.
   The supervisor will revert to this before each run.

**Decision gate:** can you launch, log in, walk, log out, clean?
If no, fix this before anything else. If yes, proceed.

---

### Phase 1 — Idle baseline + decide hunt platform (target: 4 hours)

**Goal:** does the leak reproduce on the 2013 client when the player
sits at the lifestone doing nothing? If yes, primary plan; if no,
Phase 2 will find the right activity profile.

1. Enable heap allocation tagging:
   `gflags /i acclient.exe +ust`. This is registry-set; survives
   reboots. (Disable later with `gflags /i acclient.exe -ust`.)
2. Set `_NT_SYMBOL_PATH=<vm-path-to-pdb-dir>`.
3. Launch AC via `templates/supervisor.ps1` (which sets env and
   spawns the process). Log in manually OR via `templates/login.ahk`.
4. Walk to a quiet spot (lifestone interior, away from spawn). Sit.
5. `templates/snapshot.ps1 -ProcessId <pid> -Out snap_001.txt`
   immediately. Wait 30 min. Take `snap_002`. Repeat for at least
   4 hours (8 snapshots).
6. `umdh -d snap_001 snap_008 -f:diff_idle.txt`. Read top 20
   growing stacks. Save the diff to `Z:\leak-hunt\phase1\`.

**Decision gate:**
- **Total committed memory grew >50 MB over 4h idle?** Leak repros at
  idle. Skip Phase 2, jump to Phase 4 (long-soak idle).
- **Total committed grew 5–50 MB?** Leak may need amplification.
  Proceed to Phase 2.
- **Total committed grew <5 MB?** Leak is activity-specific or
  doesn't exist on 2013. Proceed to Phase 2.
- **Memory dropped or oscillates around 0?** No leak signal at idle.
  Phase 2 is where you'll find it (or won't).

Record the baseline growth-rate number in memory:
`leak_hunt_phase1_baseline_mb_per_hour`.

---

### Phase 2 — Activity-phase characterization (target: 1–2 days)

**Goal:** find which player activity causes the leak. The bot is not
yet built — you drive this manually with the activity-phase template
running as an AHK macro, or by playing 30-min phases yourself if no
bot is available.

The five canonical phases (see `templates/activity-phases.json`):

1. **idle** — stand at lifestone, no input
2. **wander** — walk a fixed route around Holtburg
3. **chat** — spam say/tell/global chat
4. **target-cycle** — Tab through nearby NPCs/mobs, no combat
5. **ui-cycle** — open/close inventory, character pane, spells

**Procedure per session:**
1. Start fresh from `bench-verified` snapshot.
2. Run a single phase for 1 hour with snapshots every 15 min.
3. `umdh -d` diff the snapshot pair for that phase.
4. Record growth-rate per phase to memory.
5. Repeat for each phase, single VM, single phase per run.

(If user has authorized multiple parallel VMs, run different phases
simultaneously instead of sequentially.)

**Decision gate:** rank phases by growth rate. The top phase is your
target for Phase 4 amplification. Save ranking to memory.

---

### Phase 3 — Controller DLL (target: 1–2 days)

**Goal:** build a small DLL that drives the leaking phase
deterministically and reproducibly, faster than a human can.

**Build approach:**
- C++ DLL, 32-bit, compiled against Visual Studio Build Tools.
- MinHook for function hooking.
- LoadLibrary'd into `acclient.exe` via a small launcher EXE
  (CreateProcess SUSPENDED → WriteProcessMemory of LoadLibrary
  trampoline → ResumeThread). Standard injection pattern.
- Hook a frame-loop function from `symbols.json` — search
  `references/symbols.json` for `CGameLoop`, `WorldFilter`, `Tick`,
  `ProcessFrame` and pick the highest-frequency one with stable
  signature.
- Call retail functions directly via PDB-resolved addresses. Examples:
  `CPhysicsObj::set_velocity`, `CChatManager::SendSay`,
  `CPlayerSystem::SelectTarget`. These take a `this` pointer in `ecx`
  (thiscall) — you'll need a small asm trampoline or use
  `__thiscall` calling-convention helpers.

**The bot's job:**
- Drive the top-ranked Phase 2 activity continuously.
- Emit a heartbeat to a log file every 30s so the supervisor can
  detect wedging.
- Auto-restart self-position-watchdog: if `CPhysicsObj::position`
  hasn't changed in 5 min during a movement phase, signal the
  supervisor to revert and retry.

**Reuse opportunity:** the user maintains MosswartMassacre and
MosswartOverlord — both are AC client DLL-injection projects. **Ask
the user for read access** before designing from scratch; they may
have a working injector + MinHook scaffolding you can port from in
hours rather than days. Do not assume; ask.

**Decision gate:** bot runs the leaking phase for 1 hour
unattended, emits heartbeats, produces measurable UMDH growth.

---

### Phase 4 — Long-soak with amplification (target: 12–48 hours)

**Goal:** generate a clean signal — one or two leaking call stacks
visibly dominate the UMDH diff.

1. Revert VM to `bench-verified`.
2. Launch via `supervisor.ps1` with the controller DLL injected.
3. Snapshot every 15 min for 12+ hours.
4. `umdh -d` snap_001 vs snap_N every couple of hours during your
   active turns. Between active turns, use `ScheduleWakeup` with
   delay 1800–3600s and the reason `"long-soak snapshot check"`.

**Decision gate:** UMDH diff shows one or more call stacks with
monotonic growth across all adjacent-pair diffs, dominating the
total by ≥10× over the next-highest. That's your leak candidate(s).

---

### Phase 5 — Identify the leaking function (target: 2–4 hours)

**Goal:** convert the UMDH call stack into a named function we can
study and patch.

1. The top growing stack will look like:
   ```
   ntdll!RtlAllocateHeap+0x...
   acclient!operator new+0x...
   acclient!CFoo::AllocateBar+0x42
   acclient!CFoo::DoTheThing+0x18
   acclient!CGameLoop::Tick+0x...
   ```
2. The named function is `CFoo::AllocateBar`. Grep
   `references/acclient_2013_pseudo_c.txt` for `CFoo::AllocateBar`
   to read its body.
3. Identify the paired free function (`CFoo::ReleaseBar`,
   `~CFoo`, etc.) and confirm by reading both.
4. Find every call site of `CFoo::AllocateBar` (grep the pseudo-C
   for the function name) and verify each has a matching paired
   release. The one that doesn't is the bug.

**Decision gate:** you have (a) the leaking function name, (b) the
specific call site that doesn't free, (c) a hypothesis for the
patch (typically: add a `delete` or `Release()` on a specific code
path). Save these to memory + a write-up file.

---

### Phase 6 — Cross-reference with retail debugger trace (optional, target: 2 hours)

**Goal:** confirm the leak path is actually hit at runtime in a real
play scenario, not just statically possible.

This step is optional but recommended if the leak path is conditional
(e.g. "only when the chat buffer wraps"). Use the cdb workflow
documented in `templates/trace.cdb` and the retail-debugger section
in `CLAUDE.md`. Attach to acclient.exe, breakpoint on the alloc
function with a non-blocking action (`r $t0=@$t0+1; gc`), let it
accumulate for 30 min, count hits, correlate with UMDH growth bytes.

---

### Phase 7 — BinDiff to EoR (**HOST MACHINE — not VM**)

**Goal:** produce an EoR-binary signature and offset for the leaking
function.

This phase does not happen on the VM. The Binary Ninja databases
live on the host (`refs/acclient-eor-2024-09-11.bndb`,
`refs/acclient_2013-2024-09-11.bndb`).

**You (VM Claude) do:**
1. Write a structured handoff file
   `Z:\leak-hunt\phase7-handoff.md` containing: the function name,
   its 2013 RVA, the paired release function name, the suspected
   missing-free call site, the call-graph context, a 32–48 byte
   AOB signature with wildcards over relocatable operands (cite
   the byte sequence from the pseudo-C/disassembly).
2. Notify the user that Phase 7 is ready.

**The user (or a Claude session on the host) does:**
3. Load both BNDBs in Binary Ninja, run BinDiff (or use BN's native
   diff). Locate the matching function in EoR.
4. Verify the AOB signature still matches in EoR (small mods are
   OK — adjust wildcards as needed).
5. Write back to `Z:\leak-hunt\phase7-result.md`: EoR RVA,
   confirmed signature, any structural differences worth knowing.

You (VM Claude) resume once that file appears.

---

### Phase 8 — Patch DLL (target: 1 day)

**Goal:** ship a DLL that, when loaded into `acclient.exe` (any
build that matches the signature), plugs the leak.

- Same scaffold as the controller DLL — MinHook + a launcher EXE.
- Hook the leaking function (or its caller, whichever is cleanest).
- Wrap the existing logic so the missing free is performed on the
  bug path.
- Provide a versioned filename and a small README so it's clear
  which client build it targets.
- **Verify on the 2013 client first** — same UMDH soak, expect the
  top growing stack to vanish.

---

### Phase 9 — Multi-day soak validation (target: 5+ days)

**Goal:** prove the patch fixes the production crash.

1. Install the patch DLL injector into the EoR client setup.
2. Launch under the supervisor with snapshots every hour (lower
   freq — we're not hunting now, just confirming).
3. Run a controlled activity profile (the bot's full rotation) for
   5+ days continuous.
4. **Pass:** no OOM crash, committed memory stable or decreasing
   slope. **Fail:** crash before 5 days → back to Phase 5 to find
   the second leak.

---

### Phase 10 — Ship

1. Write up findings:
   - The leak's root cause (one paragraph)
   - The patch's mechanism (one paragraph)
   - The 2013-vs-EoR signature note
   - Validation evidence (UMDH diffs, soak duration, growth-rate
     plot if you have one)
2. Save the writeup to `Z:\leak-hunt\REPORT.md` and to memory.
3. Notify the user. Stop the loop. Done.

---

## 7. Wake-up protocol

Use `ScheduleWakeup` to self-pace between phases. **Default cadence
table:**

| Situation | Delay | Why |
|---|---|---|
| Active analysis (reading UMDH diffs, writing code) | none — stay engaged | Full-context work |
| Between snapshots inside a soak (Phase 1/4/9) | 1500–1800 s | Cache stays warm-ish, snapshots accumulate |
| Overnight gap (≥6 h) | 3600 s and chain | One cache miss is cheap vs. burning per-hour |
| Waiting for user (Phase 7 handoff) | 3600 s | Poll for the result file |

Pass the same loop prompt each turn. The `reason` field should
identify the phase and what you'll check (e.g. `"phase 4 snapshot
check — read snap_012 and diff against snap_001"`).

---

## 8. When to stop and ask the user

- **Phase 0 verification fails** (PDB mismatch, login fails, ACE
  not reachable). Don't guess at fixes.
- **The bot wedges and auto-recovery fails twice in a row.**
- **You're about to expand scope** (refactor the supervisor into a
  framework, build a UI for snapshot review, port code into
  acdream's tree). Stop and ask. Default answer is no.
- **You hit a decision gate where the data is genuinely ambiguous**
  (e.g. growth rate is moderate but no single stack dominates).
- **Phase 5 produces a function name that isn't in symbols.json.**
  Probably means an indirect call or vtable dispatch — ask before
  spending hours decoding it.
- **The patch in Phase 9 doesn't validate.** Don't iterate
  indefinitely; surface findings and re-plan.

---

## 9. Memory protocol

Save findings as memory entries so a session that wakes 8 hours later
can resume cold. Specifically:

- `project_leak_hunt.md` — top-level project context, current phase,
  open questions
- `leak_hunt_phase_N.md` — per-phase findings, growth rates, decisions
- `leak_hunt_candidate_<funcname>.md` — once a function is suspected,
  everything you know about it
- `feedback_leak_hunt_<topic>.md` — if the user gives operational
  feedback during this hunt, record it

Update `MEMORY.md` index entry for each. Keep entries short and
factual; long writeups go in `Z:\leak-hunt\` files referenced from
memory.

---

## 10. Hard rules (do not violate)

1. **Don't run anything that touches the user's host machine.** The
   VM is isolated for a reason. All output goes through the shared
   folder.
2. **Don't disable gflags+UST mid-run.** If you need to disable,
   stop the supervisor, disable, take a fresh baseline.
3. **Don't modify `acclient.exe` on disk.** All patches are
   runtime DLL hooks. If you ever feel tempted to binary-patch the
   exe directly, ask the user first.
4. **Don't auto-update `MEMORY.md`** without first saving the
   underlying memory file. The index must point at real files.
5. **Don't claim a leak is found** without the evidence checklist:
   - ≥3 consecutive UMDH diffs showing the same top stack growing
   - The stack is attributed to a named function in `symbols.json`
   - The call site is identified in `acclient_2013_pseudo_c.txt`
   - The hypothesis for the missing free is stated
6. **Don't proceed to Phase 8 without Phase 7 handoff complete.**
   The patch must target the EoR signature, not the 2013 RVA.

---

## 11. First action when you start a fresh session

```
1. Read README.md (this file) end-to-end.
2. Read CLAUDE.md (project rules — concise).
3. Run the configuration questions in §5 by the user.
4. Save their answers as memory.
5. Begin Phase 0.
```

Good hunting.