Five bugs identified and patched in retail Asheron's Call client: - v3b: palette refcount over-increment (3-byte NOP at two sites) - v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk) - v11: two dangling-pointer crash guards (NULL-check + reorder) - v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk) - v22: unpacker stale-pointer SEH guard (whole-function __try/__except) All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded by acclient.exe at process start via PE import table patching by tools/install_leakfix.py. Controlled 15-client fleet soak: unpatched control died at 26h with palette exhaustion; all 14 patched clients survived past that point and reached ≥5-day uptime. Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator (260KB surface backing buffers retained after Release). See REPORT.md §10 for the full investigation; conclusion is that it's unfixable from outside d3d9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
319 lines
10 KiB
Markdown
319 lines
10 KiB
Markdown
# leakfix.dll — Standalone Native Patch DLL
|
||
|
||
## Goal
|
||
|
||
Consolidate all runtime patches (v3b, v5, v11, v12, v14) **plus** add a
|
||
periodic CObjCell/LongHash cleanup sweep that's impossible at the
|
||
byte-patching level. Ship as a single native 32-bit DLL + tiny launcher
|
||
EXE. No Decal dependency.
|
||
|
||
## Why now
|
||
|
||
- Per-client byte patching works but doesn't scale to the residual
|
||
~7–8 MB/hr CPhysicsObj-family leak (requires real cleanup loops, not
|
||
inline thunks).
|
||
- The Python patchers re-apply on every restart via the monitor —
|
||
brittle. A DLL loads with the process.
|
||
- Native code = clean crash dumps at real fault sites (no CLR wrapping
|
||
like UB's `System.AccessViolationException` issue).
|
||
|
||
## Tech stack
|
||
|
||
- **Language:** C++17, MSVC `cl.exe` (verified working: `MSVC 14.44.35207`).
|
||
- **Target:** 32-bit x86 (`/arch:IA32`, default for `vcvars32`).
|
||
- **Runtime:** static link (`/MT`) → no extra runtime DLL dependency.
|
||
- **Hooking:** MinHook (single-header MIT, ~700 LOC) for frame-tick detour.
|
||
- **AC struct mirrors:** subset of `references/acclient.h`.
|
||
|
||
## Project layout
|
||
|
||
```
|
||
dll/
|
||
├── DESIGN.md # this file
|
||
├── leakfix/
|
||
│ ├── build.bat # one-shot build via vcvars32
|
||
│ ├── src/
|
||
│ │ ├── dllmain.cpp # DllMain, patch application, hook install
|
||
│ │ ├── patches.cpp # v3b, v5, v11, v12, v14 application
|
||
│ │ ├── thunks.cpp # inline-asm thunks (v14 ClipPlaneList, v5 purge)
|
||
│ │ ├── sweep.cpp # periodic CObjCell/LongHash cleanup
|
||
│ │ ├── hook.cpp # MinHook wiring for frame-tick detour
|
||
│ │ ├── logging.cpp # rolling log file
|
||
│ │ ├── ac_addrs.h # EoR address constants
|
||
│ │ ├── ac_types.h # struct mirrors
|
||
│ │ └── minhook/ # vendored MinHook source
|
||
│ └── injector/
|
||
│ └── inject.cpp # CreateProcess(suspended) + LoadLibraryA inject
|
||
└── test/ # hello.dll already verified
|
||
```
|
||
|
||
## Patch porting plan
|
||
|
||
Each existing Python patcher becomes a few lines of C++ that runs in
|
||
`DllMain` on `DLL_PROCESS_ATTACH`.
|
||
|
||
### v3b — palette NOP (trivial port)
|
||
|
||
```cpp
|
||
WriteCode(0x0053EFFE, "\x90\x90\x90", 3);
|
||
WriteCode(0x0053F19C, "\x90\x90\x90", 3);
|
||
```
|
||
|
||
### v5 — RenderSurface PurgeResource vtable override
|
||
|
||
The current 10-byte thunk becomes a real function:
|
||
|
||
```cpp
|
||
typedef void (__thiscall *DestroyFn)(void* self);
|
||
constexpr auto RENDERSURFACE_DESTROY = (DestroyFn)0x00444540;
|
||
constexpr auto RENDERTEXTURE_DESTROY = (DestroyFn)0x0044C4F0;
|
||
|
||
int __thiscall purge_rendersurface(void* self) {
|
||
RENDERSURFACE_DESTROY(self);
|
||
return 1;
|
||
}
|
||
int __thiscall purge_rendertexture(void* self) {
|
||
RENDERTEXTURE_DESTROY(self);
|
||
return 1;
|
||
}
|
||
|
||
void apply_v5() {
|
||
WriteVtableSlot(0x0079A684, (void*)&purge_rendersurface);
|
||
WriteVtableSlot(0x0079C1A0, (void*)&purge_rendertexture);
|
||
}
|
||
```
|
||
|
||
Replaces VirtualAllocEx + 10-byte thunk with proper function pointers
|
||
inside our DLL's .text.
|
||
|
||
### v11 — NULL-check NOPs
|
||
|
||
Two byte-level rewrites identical to Python patcher.
|
||
|
||
### v12 — unpacker validator + dispatch redirect
|
||
|
||
- Patcher allocates a 29-byte validator thunk + rewrites a dispatch
|
||
table entry.
|
||
- C++ version: validator becomes a `__declspec(naked)` function;
|
||
dispatch table entry becomes a function pointer.
|
||
|
||
### v14 — CEnvCell ClipPlaneList fix
|
||
|
||
Replace 18 bytes at `0x0052E661` with a 5-byte JMP into a naked
|
||
function:
|
||
|
||
```cpp
|
||
__declspec(naked) void clipplane_cleanup_thunk() {
|
||
__asm {
|
||
pushad
|
||
mov edi, [esi + 0xDC]
|
||
test edi, edi
|
||
jz done
|
||
mov ecx, [edi]
|
||
test ecx, ecx
|
||
jz free_outer
|
||
push ecx
|
||
mov eax, 0x0053C760 ; ClipPlaneList::~ClipPlaneList
|
||
call eax
|
||
pop ecx
|
||
push ecx
|
||
mov eax, 0x005DF15E ; operator delete
|
||
call eax
|
||
add esp, 4
|
||
free_outer:
|
||
push edi
|
||
mov eax, 0x005DF164 ; operator delete[]
|
||
call eax
|
||
add esp, 4
|
||
mov [esi + 0xDC], ebx
|
||
done:
|
||
popad
|
||
push 0x0052E673 ; resume
|
||
ret
|
||
}
|
||
}
|
||
```
|
||
|
||
Then install a 5-byte `E9 rel32` from `0x0052E661` to `clipplane_cleanup_thunk`,
|
||
followed by 13 NOPs.
|
||
|
||
## NEW: CObjCell/LongHash cleanup sweep
|
||
|
||
This is the actual reason for going to a DLL. Byte patches can't
|
||
express the logic.
|
||
|
||
### What we know
|
||
|
||
- Top owner vtable holding leaked CPhysicsObjs: `0x0079BF64` (= `LongHash<CPhysicsObj>::Node`, 21,553 hits).
|
||
- Secondary: `0x007ED3B0` (CObjCell-family containers, `object_list` DArrays) and `0x007CA4DC` (another LongHash family).
|
||
- All `CPhysicsObj::Destroy` teardown code is correct when called — the bug is it's never called for these objects.
|
||
|
||
### Sweep design
|
||
|
||
```cpp
|
||
struct LongHashNode {
|
||
LongHashNode* next;
|
||
uint32_t key;
|
||
void* value; // CPhysicsObj*
|
||
};
|
||
|
||
struct LongHashTable {
|
||
void* vtable;
|
||
LongHashNode** buckets;
|
||
uint32_t bucket_count;
|
||
uint32_t entry_count;
|
||
// ... mirror layout from acclient.h
|
||
};
|
||
|
||
void sweep_physobj_table(LongHashTable* table, uint32_t cutoff_ts) {
|
||
for (uint32_t b = 0; b < table->bucket_count; ++b) {
|
||
LongHashNode** prev = &table->buckets[b];
|
||
LongHashNode* node = *prev;
|
||
while (node) {
|
||
LongHashNode* next = node->next;
|
||
CPhysicsObj* po = (CPhysicsObj*)node->value;
|
||
|
||
if (is_safe_to_destroy(po, cutoff_ts)) {
|
||
*prev = next;
|
||
CPhysicsObj_Destroy(po); // 0x005145D0
|
||
operator_delete(node);
|
||
--table->entry_count;
|
||
} else {
|
||
prev = &node->next;
|
||
}
|
||
node = next;
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Safety predicates (critical — these prevent v13-class crashes)
|
||
|
||
A CPhysicsObj is "safe to destroy" only if:
|
||
|
||
1. `po->parent == NULL` (not currently attached to anything live)
|
||
2. `po->object_state` indicates dead/destroyed (need to find flag)
|
||
3. `po->last_used_timestamp` is older than some threshold (e.g., 60s)
|
||
4. `po->cell == NULL` (not in any cell's object list)
|
||
5. `po` is NOT referenced from any other table we know about (best-effort scan)
|
||
|
||
If any predicate is uncertain, leave it. **Conservative wins.**
|
||
|
||
### Tick hook
|
||
|
||
Need to find a function AC calls every frame, hook it via MinHook,
|
||
and trigger sweep every N frames (e.g., every 300 frames ≈ 5s at 60fps).
|
||
|
||
Candidate hook targets to investigate:
|
||
- `Render::Render` or main game loop entry
|
||
- `Input::ProcessFrame`
|
||
- `cm_GameLoop::Tick` (if it exists)
|
||
|
||
This needs another small investigation. Once found, hook:
|
||
|
||
```cpp
|
||
typedef void (__cdecl *TickFn)();
|
||
TickFn original_tick;
|
||
|
||
void __cdecl hooked_tick() {
|
||
original_tick();
|
||
static int counter = 0;
|
||
if (++counter >= 300) {
|
||
counter = 0;
|
||
sweep_all_physobj_tables();
|
||
}
|
||
}
|
||
```
|
||
|
||
## Injection mechanism
|
||
|
||
### Phase 1 — launcher EXE (development & testing)
|
||
|
||
```cpp
|
||
int main(int argc, char** argv) {
|
||
STARTUPINFO si = { sizeof(si) };
|
||
PROCESS_INFORMATION pi;
|
||
CreateProcess("acclient.exe", build_cmdline(argc, argv),
|
||
NULL, NULL, FALSE, CREATE_SUSPENDED,
|
||
NULL, NULL, &si, &pi);
|
||
|
||
// Inject DLL
|
||
void* mem = VirtualAllocEx(pi.hProcess, NULL, MAX_PATH, MEM_COMMIT, PAGE_READWRITE);
|
||
WriteProcessMemory(pi.hProcess, mem, "C:\\path\\to\\leakfix.dll", MAX_PATH, NULL);
|
||
HANDLE thr = CreateRemoteThread(pi.hProcess, NULL, 0,
|
||
(LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA"),
|
||
mem, 0, NULL);
|
||
WaitForSingleObject(thr, INFINITE);
|
||
ResumeThread(pi.hThread);
|
||
return 0;
|
||
}
|
||
```
|
||
|
||
Usage: `leakfix_launch.exe -h server -p port -u user -...` → drops in
|
||
as substitute for direct `acclient.exe`.
|
||
|
||
### Phase 2 — PE import table modification (production)
|
||
|
||
Patch `acclient.exe`'s PE header to add `leakfix.dll` to its imports.
|
||
Then the OS loader pulls our DLL in automatically before AC's
|
||
`WinMain` runs. User just runs acclient as normal.
|
||
|
||
Tool: small Python or C++ utility that does:
|
||
- Open PE
|
||
- Find IMPORT_DIRECTORY
|
||
- Add new IMAGE_IMPORT_DESCRIPTOR pointing at `leakfix.dll`
|
||
- Stuff in a fake IAT with a single function (`leakfix_init` exported from our DLL)
|
||
- Resave executable
|
||
|
||
(There are existing tools like `LoadDll`, `PE Bear`, or
|
||
`peimporter` we can crib from.)
|
||
|
||
## Build setup
|
||
|
||
```batch
|
||
@echo off
|
||
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars32.bat"
|
||
cl /LD /nologo /O2 /MT /EHsc /std:c++17 /W3 ^
|
||
/D_CRT_SECURE_NO_WARNINGS /D_WIN32_WINNT=0x0601 ^
|
||
/Fe:leakfix.dll ^
|
||
src\dllmain.cpp src\patches.cpp src\thunks.cpp src\sweep.cpp ^
|
||
src\hook.cpp src\logging.cpp src\minhook\*.c ^
|
||
/link kernel32.lib user32.lib
|
||
```
|
||
|
||
`/MT` avoids needing `vcruntime*.dll` alongside.
|
||
|
||
## Implementation order
|
||
|
||
1. ✅ Verify toolchain builds 32-bit DLL (hello.dll)
|
||
2. Write `dllmain.cpp` + `patches.cpp` with v3b only — verify same bytes as Python patcher produces, manually inject into a test PID
|
||
3. Add v11 (similar simple byte writes)
|
||
4. Add v5 (real `__thiscall` purge functions in our DLL .text)
|
||
5. Add v12 (more complex but pattern same as v5)
|
||
6. Add v14 (inline-asm naked function)
|
||
7. Build injector EXE, test full apply-on-attach flow
|
||
8. Find frame-tick hook target via Ghidra (separate task)
|
||
9. Wire MinHook + sweep skeleton
|
||
10. Implement sweep predicates iteratively, very long soak windows per iteration
|
||
11. Optional: PE import table patcher for one-launcher-binary UX
|
||
|
||
## Risk management
|
||
|
||
- Each patch porting step is verified against the Python patcher's
|
||
byte output before merging. No new bytes = no new risk.
|
||
- Sweep is the only NEW logic and follows v13 lessons: long soaks,
|
||
conservative predicates, refuse-to-destroy-if-uncertain rule.
|
||
- Crash dumps land cleanly because we're not crossing managed/unmanaged
|
||
boundary.
|
||
|
||
## What it replaces
|
||
|
||
- `tools/patch_palette_v3b.py` — runtime-applied at DLL load
|
||
- `tools/patch_purge_v5_test.py` — runtime-applied at DLL load
|
||
- `tools/patch_v11_test.py` — runtime-applied at DLL load
|
||
- `tools/patch_v12_test.py` — runtime-applied at DLL load
|
||
- `tools/patch_v14_cenvcell_clipplane.py` — runtime-applied at DLL load
|
||
- `tools/fleet_monitor.sh` auto-patching cascade — no longer needed (DLL
|
||
applies all on every restart automatically)
|
||
|
||
Snapshot/HB monitoring stays in place — that's separate from patching.
|