Initial commit — leak-hunt project complete

Five bugs identified and patched in retail Asheron's Call client:
- v3b: palette refcount over-increment (3-byte NOP at two sites)
- v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk)
- v11: two dangling-pointer crash guards (NULL-check + reorder)
- v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk)
- v22: unpacker stale-pointer SEH guard (whole-function __try/__except)

All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded
by acclient.exe at process start via PE import table patching by
tools/install_leakfix.py.

Controlled 15-client fleet soak: unpatched control died at 26h with
palette exhaustion; all 14 patched clients survived past that point
and reached ≥5-day uptime.

Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator
(260KB surface backing buffers retained after Release). See REPORT.md
§10 for the full investigation; conclusion is that it's unfixable from
outside d3d9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
acbot 2026-05-23 21:05:17 +02:00
commit 57b5e43d0e
199 changed files with 1648333 additions and 0 deletions

319
dll/DESIGN.md Normal file
View file

@ -0,0 +1,319 @@
# leakfix.dll — Standalone Native Patch DLL
## Goal
Consolidate all runtime patches (v3b, v5, v11, v12, v14) **plus** add a
periodic CObjCell/LongHash cleanup sweep that's impossible at the
byte-patching level. Ship as a single native 32-bit DLL + tiny launcher
EXE. No Decal dependency.
## Why now
- Per-client byte patching works but doesn't scale to the residual
~78 MB/hr CPhysicsObj-family leak (requires real cleanup loops, not
inline thunks).
- The Python patchers re-apply on every restart via the monitor —
brittle. A DLL loads with the process.
- Native code = clean crash dumps at real fault sites (no CLR wrapping
like UB's `System.AccessViolationException` issue).
## Tech stack
- **Language:** C++17, MSVC `cl.exe` (verified working: `MSVC 14.44.35207`).
- **Target:** 32-bit x86 (`/arch:IA32`, default for `vcvars32`).
- **Runtime:** static link (`/MT`) → no extra runtime DLL dependency.
- **Hooking:** MinHook (single-header MIT, ~700 LOC) for frame-tick detour.
- **AC struct mirrors:** subset of `references/acclient.h`.
## Project layout
```
dll/
├── DESIGN.md # this file
├── leakfix/
│ ├── build.bat # one-shot build via vcvars32
│ ├── src/
│ │ ├── dllmain.cpp # DllMain, patch application, hook install
│ │ ├── patches.cpp # v3b, v5, v11, v12, v14 application
│ │ ├── thunks.cpp # inline-asm thunks (v14 ClipPlaneList, v5 purge)
│ │ ├── sweep.cpp # periodic CObjCell/LongHash cleanup
│ │ ├── hook.cpp # MinHook wiring for frame-tick detour
│ │ ├── logging.cpp # rolling log file
│ │ ├── ac_addrs.h # EoR address constants
│ │ ├── ac_types.h # struct mirrors
│ │ └── minhook/ # vendored MinHook source
│ └── injector/
│ └── inject.cpp # CreateProcess(suspended) + LoadLibraryA inject
└── test/ # hello.dll already verified
```
## Patch porting plan
Each existing Python patcher becomes a few lines of C++ that runs in
`DllMain` on `DLL_PROCESS_ATTACH`.
### v3b — palette NOP (trivial port)
```cpp
WriteCode(0x0053EFFE, "\x90\x90\x90", 3);
WriteCode(0x0053F19C, "\x90\x90\x90", 3);
```
### v5 — RenderSurface PurgeResource vtable override
The current 10-byte thunk becomes a real function:
```cpp
typedef void (__thiscall *DestroyFn)(void* self);
constexpr auto RENDERSURFACE_DESTROY = (DestroyFn)0x00444540;
constexpr auto RENDERTEXTURE_DESTROY = (DestroyFn)0x0044C4F0;
int __thiscall purge_rendersurface(void* self) {
RENDERSURFACE_DESTROY(self);
return 1;
}
int __thiscall purge_rendertexture(void* self) {
RENDERTEXTURE_DESTROY(self);
return 1;
}
void apply_v5() {
WriteVtableSlot(0x0079A684, (void*)&purge_rendersurface);
WriteVtableSlot(0x0079C1A0, (void*)&purge_rendertexture);
}
```
Replaces VirtualAllocEx + 10-byte thunk with proper function pointers
inside our DLL's .text.
### v11 — NULL-check NOPs
Two byte-level rewrites identical to Python patcher.
### v12 — unpacker validator + dispatch redirect
- Patcher allocates a 29-byte validator thunk + rewrites a dispatch
table entry.
- C++ version: validator becomes a `__declspec(naked)` function;
dispatch table entry becomes a function pointer.
### v14 — CEnvCell ClipPlaneList fix
Replace 18 bytes at `0x0052E661` with a 5-byte JMP into a naked
function:
```cpp
__declspec(naked) void clipplane_cleanup_thunk() {
__asm {
pushad
mov edi, [esi + 0xDC]
test edi, edi
jz done
mov ecx, [edi]
test ecx, ecx
jz free_outer
push ecx
mov eax, 0x0053C760 ; ClipPlaneList::~ClipPlaneList
call eax
pop ecx
push ecx
mov eax, 0x005DF15E ; operator delete
call eax
add esp, 4
free_outer:
push edi
mov eax, 0x005DF164 ; operator delete[]
call eax
add esp, 4
mov [esi + 0xDC], ebx
done:
popad
push 0x0052E673 ; resume
ret
}
}
```
Then install a 5-byte `E9 rel32` from `0x0052E661` to `clipplane_cleanup_thunk`,
followed by 13 NOPs.
## NEW: CObjCell/LongHash cleanup sweep
This is the actual reason for going to a DLL. Byte patches can't
express the logic.
### What we know
- Top owner vtable holding leaked CPhysicsObjs: `0x0079BF64` (= `LongHash<CPhysicsObj>::Node`, 21,553 hits).
- Secondary: `0x007ED3B0` (CObjCell-family containers, `object_list` DArrays) and `0x007CA4DC` (another LongHash family).
- All `CPhysicsObj::Destroy` teardown code is correct when called — the bug is it's never called for these objects.
### Sweep design
```cpp
struct LongHashNode {
LongHashNode* next;
uint32_t key;
void* value; // CPhysicsObj*
};
struct LongHashTable {
void* vtable;
LongHashNode** buckets;
uint32_t bucket_count;
uint32_t entry_count;
// ... mirror layout from acclient.h
};
void sweep_physobj_table(LongHashTable* table, uint32_t cutoff_ts) {
for (uint32_t b = 0; b < table->bucket_count; ++b) {
LongHashNode** prev = &table->buckets[b];
LongHashNode* node = *prev;
while (node) {
LongHashNode* next = node->next;
CPhysicsObj* po = (CPhysicsObj*)node->value;
if (is_safe_to_destroy(po, cutoff_ts)) {
*prev = next;
CPhysicsObj_Destroy(po); // 0x005145D0
operator_delete(node);
--table->entry_count;
} else {
prev = &node->next;
}
node = next;
}
}
}
```
### Safety predicates (critical — these prevent v13-class crashes)
A CPhysicsObj is "safe to destroy" only if:
1. `po->parent == NULL` (not currently attached to anything live)
2. `po->object_state` indicates dead/destroyed (need to find flag)
3. `po->last_used_timestamp` is older than some threshold (e.g., 60s)
4. `po->cell == NULL` (not in any cell's object list)
5. `po` is NOT referenced from any other table we know about (best-effort scan)
If any predicate is uncertain, leave it. **Conservative wins.**
### Tick hook
Need to find a function AC calls every frame, hook it via MinHook,
and trigger sweep every N frames (e.g., every 300 frames ≈ 5s at 60fps).
Candidate hook targets to investigate:
- `Render::Render` or main game loop entry
- `Input::ProcessFrame`
- `cm_GameLoop::Tick` (if it exists)
This needs another small investigation. Once found, hook:
```cpp
typedef void (__cdecl *TickFn)();
TickFn original_tick;
void __cdecl hooked_tick() {
original_tick();
static int counter = 0;
if (++counter >= 300) {
counter = 0;
sweep_all_physobj_tables();
}
}
```
## Injection mechanism
### Phase 1 — launcher EXE (development & testing)
```cpp
int main(int argc, char** argv) {
STARTUPINFO si = { sizeof(si) };
PROCESS_INFORMATION pi;
CreateProcess("acclient.exe", build_cmdline(argc, argv),
NULL, NULL, FALSE, CREATE_SUSPENDED,
NULL, NULL, &si, &pi);
// Inject DLL
void* mem = VirtualAllocEx(pi.hProcess, NULL, MAX_PATH, MEM_COMMIT, PAGE_READWRITE);
WriteProcessMemory(pi.hProcess, mem, "C:\\path\\to\\leakfix.dll", MAX_PATH, NULL);
HANDLE thr = CreateRemoteThread(pi.hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA"),
mem, 0, NULL);
WaitForSingleObject(thr, INFINITE);
ResumeThread(pi.hThread);
return 0;
}
```
Usage: `leakfix_launch.exe -h server -p port -u user -...` → drops in
as substitute for direct `acclient.exe`.
### Phase 2 — PE import table modification (production)
Patch `acclient.exe`'s PE header to add `leakfix.dll` to its imports.
Then the OS loader pulls our DLL in automatically before AC's
`WinMain` runs. User just runs acclient as normal.
Tool: small Python or C++ utility that does:
- Open PE
- Find IMPORT_DIRECTORY
- Add new IMAGE_IMPORT_DESCRIPTOR pointing at `leakfix.dll`
- Stuff in a fake IAT with a single function (`leakfix_init` exported from our DLL)
- Resave executable
(There are existing tools like `LoadDll`, `PE Bear`, or
`peimporter` we can crib from.)
## Build setup
```batch
@echo off
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars32.bat"
cl /LD /nologo /O2 /MT /EHsc /std:c++17 /W3 ^
/D_CRT_SECURE_NO_WARNINGS /D_WIN32_WINNT=0x0601 ^
/Fe:leakfix.dll ^
src\dllmain.cpp src\patches.cpp src\thunks.cpp src\sweep.cpp ^
src\hook.cpp src\logging.cpp src\minhook\*.c ^
/link kernel32.lib user32.lib
```
`/MT` avoids needing `vcruntime*.dll` alongside.
## Implementation order
1. ✅ Verify toolchain builds 32-bit DLL (hello.dll)
2. Write `dllmain.cpp` + `patches.cpp` with v3b only — verify same bytes as Python patcher produces, manually inject into a test PID
3. Add v11 (similar simple byte writes)
4. Add v5 (real `__thiscall` purge functions in our DLL .text)
5. Add v12 (more complex but pattern same as v5)
6. Add v14 (inline-asm naked function)
7. Build injector EXE, test full apply-on-attach flow
8. Find frame-tick hook target via Ghidra (separate task)
9. Wire MinHook + sweep skeleton
10. Implement sweep predicates iteratively, very long soak windows per iteration
11. Optional: PE import table patcher for one-launcher-binary UX
## Risk management
- Each patch porting step is verified against the Python patcher's
byte output before merging. No new bytes = no new risk.
- Sweep is the only NEW logic and follows v13 lessons: long soaks,
conservative predicates, refuse-to-destroy-if-uncertain rule.
- Crash dumps land cleanly because we're not crossing managed/unmanaged
boundary.
## What it replaces
- `tools/patch_palette_v3b.py` — runtime-applied at DLL load
- `tools/patch_purge_v5_test.py` — runtime-applied at DLL load
- `tools/patch_v11_test.py` — runtime-applied at DLL load
- `tools/patch_v12_test.py` — runtime-applied at DLL load
- `tools/patch_v14_cenvcell_clipplane.py` — runtime-applied at DLL load
- `tools/fleet_monitor.sh` auto-patching cascade — no longer needed (DLL
applies all on every restart automatically)
Snapshot/HB monitoring stays in place — that's separate from patching.