leakhunt/dll/DESIGN.md
acbot 57b5e43d0e Initial commit — leak-hunt project complete
Five bugs identified and patched in retail Asheron's Call client:
- v3b: palette refcount over-increment (3-byte NOP at two sites)
- v5: RenderSurface PurgeResource no-op stub (vtable slot 2 thunk)
- v11: two dangling-pointer crash guards (NULL-check + reorder)
- v14: CEnvCell::Destroy ClipPlaneList leak (18-byte JMP to cleanup thunk)
- v22: unpacker stale-pointer SEH guard (whole-function __try/__except)

All five ship in leakfix.dll (117 KB, SHA d282f23c…) which is loaded
by acclient.exe at process start via PE import table patching by
tools/install_leakfix.py.

Controlled 15-client fleet soak: unpatched control died at 26h with
palette exhaustion; all 14 patched clients survived past that point
and reached ≥5-day uptime.

Residual ~15 MB/h growth traced to d3d9.dll's internal slab allocator
(260KB surface backing buffers retained after Release). See REPORT.md
§10 for the full investigation; conclusion is that it's unfixable from
outside d3d9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 21:07:58 +02:00

10 KiB
Raw Blame History

leakfix.dll — Standalone Native Patch DLL

Goal

Consolidate all runtime patches (v3b, v5, v11, v12, v14) plus add a periodic CObjCell/LongHash cleanup sweep that's impossible at the byte-patching level. Ship as a single native 32-bit DLL + tiny launcher EXE. No Decal dependency.

Why now

  • Per-client byte patching works but doesn't scale to the residual ~78 MB/hr CPhysicsObj-family leak (requires real cleanup loops, not inline thunks).
  • The Python patchers re-apply on every restart via the monitor — brittle. A DLL loads with the process.
  • Native code = clean crash dumps at real fault sites (no CLR wrapping like UB's System.AccessViolationException issue).

Tech stack

  • Language: C++17, MSVC cl.exe (verified working: MSVC 14.44.35207).
  • Target: 32-bit x86 (/arch:IA32, default for vcvars32).
  • Runtime: static link (/MT) → no extra runtime DLL dependency.
  • Hooking: MinHook (single-header MIT, ~700 LOC) for frame-tick detour.
  • AC struct mirrors: subset of references/acclient.h.

Project layout

dll/
├── DESIGN.md          # this file
├── leakfix/
│   ├── build.bat      # one-shot build via vcvars32
│   ├── src/
│   │   ├── dllmain.cpp        # DllMain, patch application, hook install
│   │   ├── patches.cpp        # v3b, v5, v11, v12, v14 application
│   │   ├── thunks.cpp         # inline-asm thunks (v14 ClipPlaneList, v5 purge)
│   │   ├── sweep.cpp          # periodic CObjCell/LongHash cleanup
│   │   ├── hook.cpp           # MinHook wiring for frame-tick detour
│   │   ├── logging.cpp        # rolling log file
│   │   ├── ac_addrs.h         # EoR address constants
│   │   ├── ac_types.h         # struct mirrors
│   │   └── minhook/           # vendored MinHook source
│   └── injector/
│       └── inject.cpp         # CreateProcess(suspended) + LoadLibraryA inject
└── test/                      # hello.dll already verified

Patch porting plan

Each existing Python patcher becomes a few lines of C++ that runs in DllMain on DLL_PROCESS_ATTACH.

v3b — palette NOP (trivial port)

WriteCode(0x0053EFFE, "\x90\x90\x90", 3);
WriteCode(0x0053F19C, "\x90\x90\x90", 3);

v5 — RenderSurface PurgeResource vtable override

The current 10-byte thunk becomes a real function:

typedef void (__thiscall *DestroyFn)(void* self);
constexpr auto RENDERSURFACE_DESTROY = (DestroyFn)0x00444540;
constexpr auto RENDERTEXTURE_DESTROY = (DestroyFn)0x0044C4F0;

int __thiscall purge_rendersurface(void* self) {
    RENDERSURFACE_DESTROY(self);
    return 1;
}
int __thiscall purge_rendertexture(void* self) {
    RENDERTEXTURE_DESTROY(self);
    return 1;
}

void apply_v5() {
    WriteVtableSlot(0x0079A684, (void*)&purge_rendersurface);
    WriteVtableSlot(0x0079C1A0, (void*)&purge_rendertexture);
}

Replaces VirtualAllocEx + 10-byte thunk with proper function pointers inside our DLL's .text.

v11 — NULL-check NOPs

Two byte-level rewrites identical to Python patcher.

v12 — unpacker validator + dispatch redirect

  • Patcher allocates a 29-byte validator thunk + rewrites a dispatch table entry.
  • C++ version: validator becomes a __declspec(naked) function; dispatch table entry becomes a function pointer.

v14 — CEnvCell ClipPlaneList fix

Replace 18 bytes at 0x0052E661 with a 5-byte JMP into a naked function:

__declspec(naked) void clipplane_cleanup_thunk() {
    __asm {
        pushad
        mov edi, [esi + 0xDC]
        test edi, edi
        jz done
        mov ecx, [edi]
        test ecx, ecx
        jz free_outer
        push ecx
        mov eax, 0x0053C760           ; ClipPlaneList::~ClipPlaneList
        call eax
        pop ecx
        push ecx
        mov eax, 0x005DF15E           ; operator delete
        call eax
        add esp, 4
    free_outer:
        push edi
        mov eax, 0x005DF164           ; operator delete[]
        call eax
        add esp, 4
        mov [esi + 0xDC], ebx
    done:
        popad
        push 0x0052E673                ; resume
        ret
    }
}

Then install a 5-byte E9 rel32 from 0x0052E661 to clipplane_cleanup_thunk, followed by 13 NOPs.

NEW: CObjCell/LongHash cleanup sweep

This is the actual reason for going to a DLL. Byte patches can't express the logic.

What we know

  • Top owner vtable holding leaked CPhysicsObjs: 0x0079BF64 (= LongHash<CPhysicsObj>::Node, 21,553 hits).
  • Secondary: 0x007ED3B0 (CObjCell-family containers, object_list DArrays) and 0x007CA4DC (another LongHash family).
  • All CPhysicsObj::Destroy teardown code is correct when called — the bug is it's never called for these objects.

Sweep design

struct LongHashNode {
    LongHashNode* next;
    uint32_t      key;
    void*         value;     // CPhysicsObj*
};

struct LongHashTable {
    void*         vtable;
    LongHashNode** buckets;
    uint32_t      bucket_count;
    uint32_t      entry_count;
    // ... mirror layout from acclient.h
};

void sweep_physobj_table(LongHashTable* table, uint32_t cutoff_ts) {
    for (uint32_t b = 0; b < table->bucket_count; ++b) {
        LongHashNode** prev = &table->buckets[b];
        LongHashNode*  node = *prev;
        while (node) {
            LongHashNode* next = node->next;
            CPhysicsObj* po = (CPhysicsObj*)node->value;

            if (is_safe_to_destroy(po, cutoff_ts)) {
                *prev = next;
                CPhysicsObj_Destroy(po);   // 0x005145D0
                operator_delete(node);
                --table->entry_count;
            } else {
                prev = &node->next;
            }
            node = next;
        }
    }
}

Safety predicates (critical — these prevent v13-class crashes)

A CPhysicsObj is "safe to destroy" only if:

  1. po->parent == NULL (not currently attached to anything live)
  2. po->object_state indicates dead/destroyed (need to find flag)
  3. po->last_used_timestamp is older than some threshold (e.g., 60s)
  4. po->cell == NULL (not in any cell's object list)
  5. po is NOT referenced from any other table we know about (best-effort scan)

If any predicate is uncertain, leave it. Conservative wins.

Tick hook

Need to find a function AC calls every frame, hook it via MinHook, and trigger sweep every N frames (e.g., every 300 frames ≈ 5s at 60fps).

Candidate hook targets to investigate:

  • Render::Render or main game loop entry
  • Input::ProcessFrame
  • cm_GameLoop::Tick (if it exists)

This needs another small investigation. Once found, hook:

typedef void (__cdecl *TickFn)();
TickFn original_tick;

void __cdecl hooked_tick() {
    original_tick();
    static int counter = 0;
    if (++counter >= 300) {
        counter = 0;
        sweep_all_physobj_tables();
    }
}

Injection mechanism

Phase 1 — launcher EXE (development & testing)

int main(int argc, char** argv) {
    STARTUPINFO si = { sizeof(si) };
    PROCESS_INFORMATION pi;
    CreateProcess("acclient.exe", build_cmdline(argc, argv),
                  NULL, NULL, FALSE, CREATE_SUSPENDED,
                  NULL, NULL, &si, &pi);

    // Inject DLL
    void* mem = VirtualAllocEx(pi.hProcess, NULL, MAX_PATH, MEM_COMMIT, PAGE_READWRITE);
    WriteProcessMemory(pi.hProcess, mem, "C:\\path\\to\\leakfix.dll", MAX_PATH, NULL);
    HANDLE thr = CreateRemoteThread(pi.hProcess, NULL, 0,
                                     (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandle("kernel32"), "LoadLibraryA"),
                                     mem, 0, NULL);
    WaitForSingleObject(thr, INFINITE);
    ResumeThread(pi.hThread);
    return 0;
}

Usage: leakfix_launch.exe -h server -p port -u user -... → drops in as substitute for direct acclient.exe.

Phase 2 — PE import table modification (production)

Patch acclient.exe's PE header to add leakfix.dll to its imports. Then the OS loader pulls our DLL in automatically before AC's WinMain runs. User just runs acclient as normal.

Tool: small Python or C++ utility that does:

  • Open PE
  • Find IMPORT_DIRECTORY
  • Add new IMAGE_IMPORT_DESCRIPTOR pointing at leakfix.dll
  • Stuff in a fake IAT with a single function (leakfix_init exported from our DLL)
  • Resave executable

(There are existing tools like LoadDll, PE Bear, or peimporter we can crib from.)

Build setup

@echo off
call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars32.bat"
cl /LD /nologo /O2 /MT /EHsc /std:c++17 /W3 ^
   /D_CRT_SECURE_NO_WARNINGS /D_WIN32_WINNT=0x0601 ^
   /Fe:leakfix.dll ^
   src\dllmain.cpp src\patches.cpp src\thunks.cpp src\sweep.cpp ^
   src\hook.cpp src\logging.cpp src\minhook\*.c ^
   /link kernel32.lib user32.lib

/MT avoids needing vcruntime*.dll alongside.

Implementation order

  1. Verify toolchain builds 32-bit DLL (hello.dll)
  2. Write dllmain.cpp + patches.cpp with v3b only — verify same bytes as Python patcher produces, manually inject into a test PID
  3. Add v11 (similar simple byte writes)
  4. Add v5 (real __thiscall purge functions in our DLL .text)
  5. Add v12 (more complex but pattern same as v5)
  6. Add v14 (inline-asm naked function)
  7. Build injector EXE, test full apply-on-attach flow
  8. Find frame-tick hook target via Ghidra (separate task)
  9. Wire MinHook + sweep skeleton
  10. Implement sweep predicates iteratively, very long soak windows per iteration
  11. Optional: PE import table patcher for one-launcher-binary UX

Risk management

  • Each patch porting step is verified against the Python patcher's byte output before merging. No new bytes = no new risk.
  • Sweep is the only NEW logic and follows v13 lessons: long soaks, conservative predicates, refuse-to-destroy-if-uncertain rule.
  • Crash dumps land cleanly because we're not crossing managed/unmanaged boundary.

What it replaces

  • tools/patch_palette_v3b.py — runtime-applied at DLL load
  • tools/patch_purge_v5_test.py — runtime-applied at DLL load
  • tools/patch_v11_test.py — runtime-applied at DLL load
  • tools/patch_v12_test.py — runtime-applied at DLL load
  • tools/patch_v14_cenvcell_clipplane.py — runtime-applied at DLL load
  • tools/fleet_monitor.sh auto-patching cascade — no longer needed (DLL applies all on every restart automatically)

Snapshot/HB monitoring stays in place — that's separate from patching.