Five bugs identified and patched in retail Asheron's Call client: - v3b: palette refcount over-increment (3-byte NOP at two sites) - v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk) - v11: two dangling-pointer crash guards (NULL-check + reorder) - v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk) - v22: unpacker stale-pointer SEH guard (whole-function __try/__except) All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded by acclient.exe at process start via PE import table patching by tools/install_leakfix.py. Controlled 15-client fleet soak: unpatched control died at 26h with palette exhaustion; all 14 patched clients survived past that point and reached ≥5-day uptime. Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator (260KB surface backing buffers retained after Release). See REPORT.md §10 for the full investigation; conclusion is that it's unfixable from outside d3d9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
10 KiB
leakfix.dll — Standalone Native Patch DLL
Goal
Consolidate all runtime patches (v3b, v5, v11, v12, v14) plus add a periodic CObjCell/LongHash cleanup sweep that's impossible at the byte-patching level. Ship as a single native 32-bit DLL + tiny launcher EXE. No Decal dependency.
Why now
- Per-client byte patching works but doesn't scale to the residual ~7–8 MB/hr CPhysicsObj-family leak (requires real cleanup loops, not inline thunks).
- The Python patchers re-apply on every restart via the monitor — brittle. A DLL loads with the process.
- Native code = clean crash dumps at real fault sites (no CLR wrapping
like UB's
System.AccessViolationExceptionissue).
Tech stack
- Language: C++17, MSVC
cl.exe(verified working:MSVC 14.44.35207). - Target: 32-bit x86 (
/arch:IA32, default forvcvars32). - Runtime: static link (
/MT) → no extra runtime DLL dependency. - Hooking: MinHook (single-header MIT, ~700 LOC) for frame-tick detour.
- AC struct mirrors: subset of
references/acclient.h.
Project layout
dll/
├── DESIGN.md # this file
├── leakfix/
│ ├── build.bat # one-shot build via vcvars32
│ ├── src/
│ │ ├── dllmain.cpp # DllMain, patch application, hook install
│ │ ├── patches.cpp # v3b, v5, v11, v12, v14 application
│ │ ├── thunks.cpp # inline-asm thunks (v14 ClipPlaneList, v5 purge)
│ │ ├── sweep.cpp # periodic CObjCell/LongHash cleanup
│ │ ├── hook.cpp # MinHook wiring for frame-tick detour
│ │ ├── logging.cpp # rolling log file
│ │ ├── ac_addrs.h # EoR address constants
│ │ ├── ac_types.h # struct mirrors
│ │ └── minhook/ # vendored MinHook source
│ └── injector/
│ └── inject.cpp # CreateProcess(suspended) + LoadLibraryA inject
└── test/ # hello.dll already verified
Patch porting plan
Each existing Python patcher becomes a few lines of C++ that runs in
DllMain on DLL_PROCESS_ATTACH.
v3b — palette NOP (trivial port)
WriteCode(0x0053EFFE, "\x90\x90\x90", 3);
WriteCode(0x0053F19C, "\x90\x90\x90", 3);
v5 — RenderSurface PurgeResource vtable override
The current 10-byte thunk becomes a real function:
typedef void (__thiscall *DestroyFn)(void* self);
constexpr auto RENDERSURFACE_DESTROY = (DestroyFn)0x00444540;
constexpr auto RENDERTEXTURE_DESTROY = (DestroyFn)0x0044C4F0;
int __thiscall purge_rendersurface(void* self) {
RENDERSURFACE_DESTROY(self);
return 1;
}
int __thiscall purge_rendertexture(void* self) {
RENDERTEXTURE_DESTROY(self);
return 1;
}
void apply_v5() {
WriteVtableSlot(0x0079A684, (void*)&purge_rendersurface);
WriteVtableSlot(0x0079C1A0, (void*)&purge_rendertexture);
}
Replaces VirtualAllocEx + 10-byte thunk with proper function pointers inside our DLL's .text.
v11 — NULL-check NOPs
Two byte-level rewrites identical to Python patcher.
v12 — unpacker validator + dispatch redirect
- Patcher allocates a 29-byte validator thunk + rewrites a dispatch table entry.
- C++ version: validator becomes a
__declspec(naked)function; dispatch table entry becomes a function pointer.
v14 — CEnvCell ClipPlaneList fix
Replace 18 bytes at 0x0052E661 with a 5-byte JMP into a naked
function:
__declspec(naked) void clipplane_cleanup_thunk() {
__asm {
pushad
mov edi, [esi + 0xDC]
test edi, edi
jz done
mov ecx, [edi]
test ecx, ecx
jz free_outer
push ecx
mov eax, 0x0053C760 ; ClipPlaneList::~ClipPlaneList
call eax
pop ecx
push ecx
mov eax, 0x005DF15E ; operator delete
call eax
add esp, 4
free_outer:
push edi
mov eax, 0x005DF164 ; operator delete[]
call eax
add esp, 4
mov [esi + 0xDC], ebx
done:
popad
push 0x0052E673 ; resume
ret
}
}
Then install a 5-byte E9 rel32 from 0x0052E661 to clipplane_cleanup_thunk,
followed by 13 NOPs.
NEW: CObjCell/LongHash cleanup sweep
This is the actual reason for going to a DLL. Byte patches can't express the logic.
What we know
- Top owner vtable holding leaked CPhysicsObjs:
0x0079BF64(=LongHash<CPhysicsObj>::Node, 21,553 hits). - Secondary:
0x007ED3B0(CObjCell-family containers,object_listDArrays) and0x007CA4DC(another LongHash family). - All
CPhysicsObj::Destroyteardown code is correct when called — the bug is it's never called for these objects.
Sweep design
struct LongHashNode {
LongHashNode* next;
uint32_t key;
void* value; // CPhysicsObj*
};
struct LongHashTable {
void* vtable;
LongHashNode** buckets;
uint32_t bucket_count;
uint32_t entry_count;
// ... mirror layout from acclient.h
};
void sweep_physobj_table(LongHashTable* table, uint32_t cutoff_ts) {
for (uint32_t b = 0; b < table->bucket_count; ++b) {
LongHashNode** prev = &table->buckets[b];
LongHashNode* node = *prev;
while (node) {
LongHashNode* next = node->next;
CPhysicsObj* po = (CPhysicsObj*)node->value;
if (is_safe_to_destroy(po, cutoff_ts)) {
*prev = next;
CPhysicsObj_Destroy(po); // 0x005145D0
operator_delete(node);
--table->entry_count;
} else {
prev = &node->next;
}
node = next;
}
}
}
Safety predicates (critical — these prevent v13-class crashes)
A CPhysicsObj is "safe to destroy" only if:
po->parent == NULL(not currently attached to anything live)po->object_stateindicates dead/destroyed (need to find flag)po->last_used_timestampis older than some threshold (e.g., 60s)po->cell == NULL(not in any cell's object list)pois NOT referenced from any other table we know about (best-effort scan)
If any predicate is uncertain, leave it. Conservative wins.
Tick hook
Need to find a function AC calls every frame, hook it via MinHook, and trigger sweep every N frames (e.g., every 300 frames ≈ 5s at 60fps).
Candidate hook targets to investigate:
Render::Renderor main game loop entryInput::ProcessFramecm_GameLoop::Tick(if it exists)
This needs another small investigation. Once found, hook:
typedef void (__cdecl *TickFn)();
TickFn original_tick;
void __cdecl hooked_tick() {
original_tick();
static int counter = 0;
if (++counter >= 300) {
counter = 0;
sweep_all_physobj_tables();
}
}
Injection mechanism
Phase 1 — launcher EXE (development & testing)
int main(int argc, char** argv) {
STARTUPINFO si = { sizeof(si) };
PROCESS_INFORMATION pi;
CreateProcess("acclient.exe", build_cmdline(argc, argv),
NULL, NULL, FALSE, CREATE_SUSPENDED,
NULL, NULL, &si, &pi);
// Inject DLL
void* mem = VirtualAllocEx(pi.hProcess, NULL, MAX_PATH, MEM_COMMIT, PAGE_READWRITE);
WriteProcessMemory(pi.hProcess, mem, "C:\\path\\to\\leakfix.dll", MAX_PATH, NULL);
HANDLE thr = CreateRemoteThread(pi.hProcess, NULL, 0,
(LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA"),
mem, 0, NULL);
WaitForSingleObject(thr, INFINITE);
ResumeThread(pi.hThread);
return 0;
}
Usage: leakfix_launch.exe -h server -p port -u user -... → drops in
as substitute for direct acclient.exe.
Phase 2 — PE import table modification (production)
Patch acclient.exe's PE header to add leakfix.dll to its imports.
Then the OS loader pulls our DLL in automatically before AC's
WinMain runs. User just runs acclient as normal.
Tool: small Python or C++ utility that does:
- Open PE
- Find IMPORT_DIRECTORY
- Add new IMAGE_IMPORT_DESCRIPTOR pointing at
leakfix.dll - Stuff in a fake IAT with a single function (
leakfix_initexported from our DLL) - Resave executable
(There are existing tools like LoadDll, PE Bear, or
peimporter we can crib from.)
Build setup
@echo off
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars32.bat"
cl /LD /nologo /O2 /MT /EHsc /std:c++17 /W3 ^
/D_CRT_SECURE_NO_WARNINGS /D_WIN32_WINNT=0x0601 ^
/Fe:leakfix.dll ^
src\dllmain.cpp src\patches.cpp src\thunks.cpp src\sweep.cpp ^
src\hook.cpp src\logging.cpp src\minhook\*.c ^
/link kernel32.lib user32.lib
/MT avoids needing vcruntime*.dll alongside.
Implementation order
- ✅ Verify toolchain builds 32-bit DLL (hello.dll)
- Write
dllmain.cpp+patches.cppwith v3b only — verify same bytes as Python patcher produces, manually inject into a test PID - Add v11 (similar simple byte writes)
- Add v5 (real
__thiscallpurge functions in our DLL .text) - Add v12 (more complex but pattern same as v5)
- Add v14 (inline-asm naked function)
- Build injector EXE, test full apply-on-attach flow
- Find frame-tick hook target via Ghidra (separate task)
- Wire MinHook + sweep skeleton
- Implement sweep predicates iteratively, very long soak windows per iteration
- Optional: PE import table patcher for one-launcher-binary UX
Risk management
- Each patch porting step is verified against the Python patcher's byte output before merging. No new bytes = no new risk.
- Sweep is the only NEW logic and follows v13 lessons: long soaks, conservative predicates, refuse-to-destroy-if-uncertain rule.
- Crash dumps land cleanly because we're not crossing managed/unmanaged boundary.
What it replaces
tools/patch_palette_v3b.py— runtime-applied at DLL loadtools/patch_purge_v5_test.py— runtime-applied at DLL loadtools/patch_v11_test.py— runtime-applied at DLL loadtools/patch_v12_test.py— runtime-applied at DLL loadtools/patch_v14_cenvcell_clipplane.py— runtime-applied at DLL loadtools/fleet_monitor.shauto-patching cascade — no longer needed (DLL applies all on every restart automatically)
Snapshot/HB monitoring stays in place — that's separate from patching.