leakhunt/dll/DESIGN.md
acbot 57b5e43d0e Initial commit — leak-hunt project complete
Five bugs identified and patched in retail Asheron's Call client:
- v3b: palette refcount over-increment (3-byte NOP at two sites)
- v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk)
- v11: two dangling-pointer crash guards (NULL-check + reorder)
- v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk)
- v22: unpacker stale-pointer SEH guard (whole-function __try/__except)

All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded
by acclient.exe at process start via PE import table patching by
tools/install_leakfix.py.

Controlled 15-client fleet soak: unpatched control died at 26h with
palette exhaustion; all 14 patched clients survived past that point
and reached ≥5-day uptime.

Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator
(260KB surface backing buffers retained after Release). See REPORT.md
§10 for the full investigation; conclusion is that it's unfixable from
outside d3d9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:07:58 +02:00

319 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# leakfix.dll — Standalone Native Patch DLL
## Goal
Consolidate all runtime patches (v3b, v5, v11, v12, v14) **plus** add a
periodic CObjCell/LongHash cleanup sweep that's impossible at the
byte-patching level. Ship as a single native 32-bit DLL + tiny launcher
EXE. No Decal dependency.
## Why now
- Per-client byte patching works but doesn't scale to the residual
~78 MB/hr CPhysicsObj-family leak (requires real cleanup loops, not
inline thunks).
- The Python patchers re-apply on every restart via the monitor —
brittle. A DLL loads with the process.
- Native code = clean crash dumps at real fault sites (no CLR wrapping
like UB's `System.AccessViolationException` issue).
## Tech stack
- **Language:** C++17, MSVC `cl.exe` (verified working: `MSVC 14.44.35207`).
- **Target:** 32-bit x86 (`/arch:IA32`, default for `vcvars32`).
- **Runtime:** static link (`/MT`) → no extra runtime DLL dependency.
- **Hooking:** MinHook (single-header MIT, ~700 LOC) for frame-tick detour.
- **AC struct mirrors:** subset of `references/acclient.h`.
## Project layout
```
dll/
├── DESIGN.md # this file
├── leakfix/
│ ├── build.bat # one-shot build via vcvars32
│ ├── src/
│ │ ├── dllmain.cpp # DllMain, patch application, hook install
│ │ ├── patches.cpp # v3b, v5, v11, v12, v14 application
│ │ ├── thunks.cpp # inline-asm thunks (v14 ClipPlaneList, v5 purge)
│ │ ├── sweep.cpp # periodic CObjCell/LongHash cleanup
│ │ ├── hook.cpp # MinHook wiring for frame-tick detour
│ │ ├── logging.cpp # rolling log file
│ │ ├── ac_addrs.h # EoR address constants
│ │ ├── ac_types.h # struct mirrors
│ │ └── minhook/ # vendored MinHook source
│ └── injector/
│ └── inject.cpp # CreateProcess(suspended) + LoadLibraryA inject
└── test/ # hello.dll already verified
```
## Patch porting plan
Each existing Python patcher becomes a few lines of C++ that runs in
`DllMain` on `DLL_PROCESS_ATTACH`.
### v3b — palette NOP (trivial port)
```cpp
WriteCode(0x0053EFFE, "\x90\x90\x90", 3);
WriteCode(0x0053F19C, "\x90\x90\x90", 3);
```
### v5 — RenderSurface PurgeResource vtable override
The current 10-byte thunk becomes a real function:
```cpp
typedef void (__thiscall *DestroyFn)(void* self);
constexpr auto RENDERSURFACE_DESTROY = (DestroyFn)0x00444540;
constexpr auto RENDERTEXTURE_DESTROY = (DestroyFn)0x0044C4F0;
int __thiscall purge_rendersurface(void* self) {
RENDERSURFACE_DESTROY(self);
return 1;
}
int __thiscall purge_rendertexture(void* self) {
RENDERTEXTURE_DESTROY(self);
return 1;
}
void apply_v5() {
WriteVtableSlot(0x0079A684, (void*)&purge_rendersurface);
WriteVtableSlot(0x0079C1A0, (void*)&purge_rendertexture);
}
```
Replaces VirtualAllocEx + 10-byte thunk with proper function pointers
inside our DLL's .text.
### v11 — NULL-check NOPs
Two byte-level rewrites identical to Python patcher.
### v12 — unpacker validator + dispatch redirect
- Patcher allocates a 29-byte validator thunk + rewrites a dispatch
table entry.
- C++ version: validator becomes a `__declspec(naked)` function;
dispatch table entry becomes a function pointer.
### v14 — CEnvCell ClipPlaneList fix
Replace 18 bytes at `0x0052E661` with a 5-byte JMP into a naked
function:
```cpp
__declspec(naked) void clipplane_cleanup_thunk() {
__asm {
pushad
mov edi, [esi + 0xDC]
test edi, edi
jz done
mov ecx, [edi]
test ecx, ecx
jz free_outer
push ecx
mov eax, 0x0053C760 ; ClipPlaneList::~ClipPlaneList
call eax
pop ecx
push ecx
mov eax, 0x005DF15E ; operator delete
call eax
add esp, 4
free_outer:
push edi
mov eax, 0x005DF164 ; operator delete[]
call eax
add esp, 4
mov [esi + 0xDC], ebx
done:
popad
push 0x0052E673 ; resume
ret
}
}
```
Then install a 5-byte `E9 rel32` from `0x0052E661` to `clipplane_cleanup_thunk`,
followed by 13 NOPs.
## NEW: CObjCell/LongHash cleanup sweep
This is the actual reason for going to a DLL. Byte patches can't
express the logic.
### What we know
- Top owner vtable holding leaked CPhysicsObjs: `0x0079BF64` (= `LongHash<CPhysicsObj>::Node`, 21,553 hits).
- Secondary: `0x007ED3B0` (CObjCell-family containers, `object_list` DArrays) and `0x007CA4DC` (another LongHash family).
- All `CPhysicsObj::Destroy` teardown code is correct when called — the bug is it's never called for these objects.
### Sweep design
```cpp
struct LongHashNode {
LongHashNode* next;
uint32_t key;
void* value; // CPhysicsObj*
};
struct LongHashTable {
void* vtable;
LongHashNode** buckets;
uint32_t bucket_count;
uint32_t entry_count;
// ... mirror layout from acclient.h
};
void sweep_physobj_table(LongHashTable* table, uint32_t cutoff_ts) {
for (uint32_t b = 0; b < table->bucket_count; ++b) {
LongHashNode** prev = &table->buckets[b];
LongHashNode* node = *prev;
while (node) {
LongHashNode* next = node->next;
CPhysicsObj* po = (CPhysicsObj*)node->value;
if (is_safe_to_destroy(po, cutoff_ts)) {
*prev = next;
CPhysicsObj_Destroy(po); // 0x005145D0
operator_delete(node);
--table->entry_count;
} else {
prev = &node->next;
}
node = next;
}
}
}
```
### Safety predicates (critical — these prevent v13-class crashes)
A CPhysicsObj is "safe to destroy" only if:
1. `po->parent == NULL` (not currently attached to anything live)
2. `po->object_state` indicates dead/destroyed (need to find flag)
3. `po->last_used_timestamp` is older than some threshold (e.g., 60s)
4. `po->cell == NULL` (not in any cell's object list)
5. `po` is NOT referenced from any other table we know about (best-effort scan)
If any predicate is uncertain, leave it. **Conservative wins.**
### Tick hook
Need to find a function AC calls every frame, hook it via MinHook,
and trigger sweep every N frames (e.g., every 300 frames ≈ 5s at 60fps).
Candidate hook targets to investigate:
- `Render::Render` or main game loop entry
- `Input::ProcessFrame`
- `cm_GameLoop::Tick` (if it exists)
This needs another small investigation. Once found, hook:
```cpp
typedef void (__cdecl *TickFn)();
TickFn original_tick;
void __cdecl hooked_tick() {
original_tick();
static int counter = 0;
if (++counter >= 300) {
counter = 0;
sweep_all_physobj_tables();
}
}
```
## Injection mechanism
### Phase 1 — launcher EXE (development & testing)
```cpp
int main(int argc, char** argv) {
STARTUPINFO si = { sizeof(si) };
PROCESS_INFORMATION pi;
CreateProcess("acclient.exe", build_cmdline(argc, argv),
NULL, NULL, FALSE, CREATE_SUSPENDED,
NULL, NULL, &si, &pi);
// Inject DLL
void* mem = VirtualAllocEx(pi.hProcess, NULL, MAX_PATH, MEM_COMMIT, PAGE_READWRITE);
WriteProcessMemory(pi.hProcess, mem, "C:\\path\\to\\leakfix.dll", MAX_PATH, NULL);
HANDLE thr = CreateRemoteThread(pi.hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA"),
mem, 0, NULL);
WaitForSingleObject(thr, INFINITE);
ResumeThread(pi.hThread);
return 0;
}
```
Usage: `leakfix_launch.exe -h server -p port -u user -...` → drops in
as substitute for direct `acclient.exe`.
### Phase 2 — PE import table modification (production)
Patch `acclient.exe`'s PE header to add `leakfix.dll` to its imports.
Then the OS loader pulls our DLL in automatically before AC's
`WinMain` runs. User just runs acclient as normal.
Tool: small Python or C++ utility that does:
- Open PE
- Find IMPORT_DIRECTORY
- Add new IMAGE_IMPORT_DESCRIPTOR pointing at `leakfix.dll`
- Stuff in a fake IAT with a single function (`leakfix_init` exported from our DLL)
- Resave executable
(There are existing tools like `LoadDll`, `PE Bear`, or
`peimporter` we can crib from.)
## Build setup
```batch
@echo off
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars32.bat"
cl /LD /nologo /O2 /MT /EHsc /std:c++17 /W3 ^
/D_CRT_SECURE_NO_WARNINGS /D_WIN32_WINNT=0x0601 ^
/Fe:leakfix.dll ^
src\dllmain.cpp src\patches.cpp src\thunks.cpp src\sweep.cpp ^
src\hook.cpp src\logging.cpp src\minhook\*.c ^
/link kernel32.lib user32.lib
```
`/MT` avoids needing `vcruntime*.dll` alongside.
## Implementation order
1. ✅ Verify toolchain builds 32-bit DLL (hello.dll)
2. Write `dllmain.cpp` + `patches.cpp` with v3b only — verify same bytes as Python patcher produces, manually inject into a test PID
3. Add v11 (similar simple byte writes)
4. Add v5 (real `__thiscall` purge functions in our DLL .text)
5. Add v12 (more complex but pattern same as v5)
6. Add v14 (inline-asm naked function)
7. Build injector EXE, test full apply-on-attach flow
8. Find frame-tick hook target via Ghidra (separate task)
9. Wire MinHook + sweep skeleton
10. Implement sweep predicates iteratively, very long soak windows per iteration
11. Optional: PE import table patcher for one-launcher-binary UX
## Risk management
- Each patch porting step is verified against the Python patcher's
byte output before merging. No new bytes = no new risk.
- Sweep is the only NEW logic and follows v13 lessons: long soaks,
conservative predicates, refuse-to-destroy-if-uncertain rule.
- Crash dumps land cleanly because we're not crossing managed/unmanaged
boundary.
## What it replaces
- `tools/patch_palette_v3b.py` — runtime-applied at DLL load
- `tools/patch_purge_v5_test.py` — runtime-applied at DLL load
- `tools/patch_v11_test.py` — runtime-applied at DLL load
- `tools/patch_v12_test.py` — runtime-applied at DLL load
- `tools/patch_v14_cenvcell_clipplane.py` — runtime-applied at DLL load
- `tools/fleet_monitor.sh` auto-patching cascade — no longer needed (DLL
applies all on every restart automatically)
Snapshot/HB monitoring stays in place — that's separate from patching.