Commit graph

2 commits

Author SHA1 Message Date
Erik
83b020499b docs(research): #9 sweep acclient_function_map.md against PDB symbols
Pure-docs sweep. Cross-checked 63 hand-curated entries in
acclient_function_map.md against docs/research/named-retail/symbols.json
(the PDB-derived authoritative name table) using the new helper at
tools/pdb-extract/check_function_map.py.

Findings:
  - Zero entries matched address-and-name exactly. Confirms the
    PDB build is from a different revision than the binary that
    produced our Ghidra chunks (~0x800-0xC10 byte delta varies by
    function cluster). Match by NAME, not by raw address.
  - 38 entries corrected by PDB name lookup. The "Was" column
    preserves the old address for traceability against existing
    code comments. Old entries pointed mid-body of the actual
    function; new column heads point to function starts.
  - 25 entries have no PDB match. Either inlined / non-public
    (no S_PUB32 record) or our hand-derived names were synthesized
    from call-site analysis and don't match the MSVC mangled form
    in the PDB. Several had wrong class assignments (e.g. 0x5387C0
    claimed as CTransition::find_collisions, actually
    CPolygon::polygon_hits_sphere). Flagged for re-derivation in
    acclient_2013_pseudo_c.txt.

Pattern: kept the table format with two address columns (PDB +
legacy) so existing code references using the old addresses can
still be looked up. Added a sweep-summary section at the bottom of
the file documenting the methodology + findings.

Helper script at tools/pdb-extract/check_function_map.py is reusable
for future re-runs (re-run after every PDB regeneration / function
map edit).

Closes #9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:44:07 +02:00
Erik
69d884a3d6 tools(pdb-extract): #8 PDB -> symbols.json + types.json sidecar
Pure-Python MSF 7.00 PDB extractor (no deps, stdlib only). Reads
refs/acclient.pdb directly:
  - DBI stream (3) -> symbol record stream index + section header
    stream index
  - Section headers stream (9) -> per-segment image VA bases
  - Symbol record stream (8) -> S_PUB32 records with image VAs
  - TPI stream (2) -> LF_CLASS / LF_STRUCTURE named records (not
    forward-declared), with size leaf + name

Includes a best-effort MSVC C++ demangler so symbols.json is
grep-friendly:
  ?EnchantAttribute@CEnchantmentRegistry@@QBEHKAAK@Z
  -> CEnchantmentRegistry::EnchantAttribute

Both demangled `name` + raw `mangled` emitted per entry so callers
can choose. Operator overloads, vtables, and other special forms
where a partial demangle would be misleading are kept mangled.

Outputs committed to docs/research/named-retail/:
  - symbols.json (2.9 MB) — 18,366 named public function symbols
  - types.json (506 KB) — 5,371 unique named class/struct records

Spot check (matches discovery agent's earlier finding):
  CEnchantmentRegistry::EnchantAttribute -> 0x00594570 ✓

Updated docs/research/acclient_function_map.md header preamble to
direct readers at the new symbols.json as the authoritative name
source; the hand-curated table stays as the cross-port (ACE/ACME)
index. Several addresses there are wrong vs the PDB and will be
swept in the issue #9 close (Phase E).

Closes #8 (filed in Phase D's commit). Foundation for the address
sweep + name-driven workflows from here on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 17:31:52 +02:00