From b8b9845f50da7b73eff6b5c0651b1eb7a2ca4d15 Mon Sep 17 00:00:00 2001 From: Erik Date: Sun, 10 May 2026 17:52:26 +0200 Subject: [PATCH] docs(post-A.5): capture holtburger network-stack study + Phase M.0 quick-wins MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Holtburger reference fast-forwarded from 88b19bd to 629695a (+237 commits). Four parallel research agents produced a parity-first-pass between holtburger's network stack and acdream's src/AcDream.Core.Net/. Why captured now: study surfaced six small, high-confidence "Tier 1" fixes that can ship before the bigger M.1-M.8 layer extraction. Most likely fix for the longstanding "remote retail observer sees us not perfect" bug (MoveToState wire-format mismatches). Two transport gaps (no EchoResponse reply, eager port-switch) match recent holtburger fixes (403bc98, 99974cc). One latent bug worth a 5-min check (ISAAC search-mode for out-of-order ENCRYPTED_CHECKSUM packets). Captured as Phase M.0 in the roadmap so the work survives the session and can be picked up later. Existing M.1-M.8 lift unchanged; M.1 marked as partially started since the research note is the parity-map deliverable in draft form. Files: - docs/research/2026-05-10-holtburger-network-stack-study.md (new) — full study with ranked port candidates, recent commits worth knowing, and acdream-vs-holtburger file map. - docs/plans/2026-04-11-roadmap.md — Phase M Plan-of-record updated with 2026-05-10 pointer; M.0 sub-lane added before M.1; M.1 status note. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/plans/2026-04-11-roadmap.md | 40 +++- ...26-05-10-holtburger-network-stack-study.md | 177 ++++++++++++++++++ 2 files changed, 216 insertions(+), 1 deletion(-) create mode 100644 docs/research/2026-05-10-holtburger-network-stack-study.md diff --git a/docs/plans/2026-04-11-roadmap.md b/docs/plans/2026-04-11-roadmap.md index 9b1b89e..e83b5a6 100644 --- a/docs/plans/2026-04-11-roadmap.md +++ b/docs/plans/2026-04-11-roadmap.md @@ -438,13 +438,51 @@ implementation starts. Treat holtburger as the client-behavior oracle for this phase; cross-check wire details against named retail, ACE, Chorizite, and AC2D before porting. +**2026-05-10 update:** holtburger pulled to `629695a` (+237 commits since +last audit). First parity-pass written to +[`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md) +— that doc is the M.1 deliverable in draft form. Study identified six +high-ROI "Tier 1" fixes that are individually small and can ship as a +focused pre-pass before the bigger M.1-M.8 lift; tracked as **M.0** below. +Most relevant recent holtburger commits to consult: `99974cc` (session +crate split + retransmit core), `403bc98` (port-switch race), `336cbad` +(turning + locomotion fix), `797aece` (disconnect carries client_id). + **Sub-lanes:** +- **M.0 — Tier 1 quick-win polish pre-pass.** Six small, high-confidence + fixes that don't require the full M.1-M.8 layer extraction and can ship + as one focused PR (~1 day). Sourced from + [`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md) + §1 Tier 1. May ship independently of M.1-M.8. + 1. **MoveToState wire-format audit** (study §1.1.a-e). Side-by-side + compare `Messages/MoveToState.cs` against holtburger + `client/movement/common.rs:122-186`. Pin: `current_hold_key` always + set, empty `commands[]` on held WASD, `turn_speed` always with + TURN_COMMAND, gait-aware dedup, no `turning` when locomotion ≠ 0. + Likely candidate for the longstanding "remote retail observer sees + us not perfect" bug. + 2. **LoginComplete on every PlayerTeleport** (study §1.2). Currently + only sent on first PlayerCreate. + 3. **EchoRequest → EchoResponse reply** (study §1.3). We parse and + ignore; ACE pings periodically — likely contributor to long-session + timeouts. + 4. **Port-switch race fix** (study §1.4, holtburger commit `403bc98`). + Track pending vs confirmed `_connectEndpoint`. + 5. **Disconnect packet carries client_id** (study §4, holtburger commit + `797aece`). Currently `id = 0`. + 6. **Verify `IsaacRandom` has search-and-stash mode for out-of-order + ENCRYPTED_CHECKSUM packets** (study §1.7, holtburger + `crypto.rs:73-93`). 5-minute check; ~20 LOC port if missing — + latent bug under any UDP reorder event. - **M.1 — Audit & parity map.** Produce a source-by-source comparison of acdream `AcDream.Core.Net` and holtburger `holtburger-session`, `holtburger-protocol`, and `holtburger-core` networking code. Inventory each packet flag, optional header, session transition, control packet, fragment path, game message, and game action. Mark each as `parity`, `partial`, - `missing`, or `intentional divergence`. + `missing`, or `intentional divergence`. **Status (2026-05-10): first pass + done at [`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md); + the formal parity table can extend that doc rather than start from + scratch.** - **M.2 — Layer extraction.** Split the low-level stack under `WorldSession` into testable components: `INetTransport`, `PacketCodec`, `ReliablePacketSession`, `FragmentSession`, `GameMessageSession`, and the diff --git a/docs/research/2026-05-10-holtburger-network-stack-study.md b/docs/research/2026-05-10-holtburger-network-stack-study.md new file mode 100644 index 0000000..ee05549 --- /dev/null +++ b/docs/research/2026-05-10-holtburger-network-stack-study.md @@ -0,0 +1,177 @@ +# Holtburger network stack — study & port candidates for acdream + +**Date:** 2026-05-10 +**Holtburger reference:** github.com/merklejerk/holtburger, vendored at `references/holtburger/`, fast-forwarded from `88b19bd` → `629695a` (237 commits, ~3 months of work). +**Method:** Four parallel research agents — three over holtburger's transport, handshake, and movement; one inventorying acdream's current `src/AcDream.Core.Net/`. Findings cross-referenced and ranked by ROI. + +## TL;DR + +Holtburger has shipped real, citeable fixes since our last pin that we should adopt. The biggest tactical wins are: + +1. **A handful of one-line MoveToState fixes** that are likely candidates for the "remote retail observer sees acdream's player not perfect" issue (#L.X). +2. **Three small handshake/transport corrections** — LoginComplete-on-teleport, EchoResponse reply, port-switch race — each <1 hour and each measurable. +3. **A real retransmit subsystem we're missing entirely.** Our `WorldSession` parses retransmit requests, doesn't honor them, has no resend buffer, and never asks for a resend. Lost packets just vanish. Holtburger's `session/reliability.rs` is the reference-quality pattern. + +Separately, the audit surfaced one painful finding about acdream itself: **roughly half of our outbound `Messages/` library is dead code** — InteractRequests, InventoryActions, SocialActions, AllegianceRequests, CastSpellRequest, AppraiseRequest, and most of CharacterActions are built and unit-tested but have no `WorldSession.Send*` wrapper and no live caller. Phase B.4 (Use/UseWithTarget) per memory shipped, but the audit found no in-app caller. Either we left wiring on the table or there's an integration drift to investigate. + +The remainder of this doc is organized as: ranked port candidates → confirmations of what we got right → traps (where holtburger is wrong or stubbed) → recent commits worth knowing → recommended sequencing → cross-reference file map. + +--- + +## 1. Ranked port candidates (highest ROI first) + +### 1.1 Outbound MoveToState audit — concrete suspects for the "observer not perfect" bug + +Five specific items where holtburger's wire format is likely tighter than ours. Each is a small change in our `Messages/MoveToState.cs` builder; together they're the most likely cause of remote retail observers reporting our player "lagging forward" or "walking when running." + +| # | Suspect | Holtburger reference | +|---|---------|----------------------| +| a | **`current_hold_key` always set on non-stop MoveToState.** Holtburger's drive emit seeds `flags = CURRENT_HOLD_KEY` and writes `current_hold_key = HoldKey::Run`(2) for run, `HoldKey::None`(1) for walk. ACE's relay code may treat its absence as "unknown" and broadcast Walk to observers. | `crates/holtburger-core/src/client/movement/common.rs:151-153` | +| b | **`commands[]` array MUST be empty on held WASD.** Holtburger never puts a `MotionItem` in `commands[]` for held movement — only for transient slash commands like `/dance`. If acdream is putting one in for held W (or letting movement_sequence bump per-frame), every observer's `apply_self_update_motion` re-applies the same sequence as a fresh interpolation start — exactly the symptom. | `system.rs:743-766` (`execute_transient_motion_at`) | +| c | **`turn_speed` always emitted alongside `TURN_COMMAND`.** Holtburger writes 1.5 rad/s for Run, 1.0 rad/s for Walk; the `TURN_SPEED` flag is *always* set whenever `TURN_COMMAND` is. Omitting it lets ACE default to 0 → "smoothly but slowly" turn observed. | `common.rs:184-186, 226-231` | +| d | **Dedup gate must include gait.** Holtburger's `should_send_motion_state_pulse` compares the full `(MotionState, MotionStyle)`. If acdream's dedup is keyed on only `(forward_command, hold_key)` it would suppress the Run→Walk transition (since `forward_command = WalkForward = 0x45000005` for both), explaining the Run↔Walk observer bug specifically. | `system.rs:916-926` | +| e | **Don't emit `turning` field when locomotion is non-zero.** Recent fix in commit `336cbad`: `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0 (avoids server-side double-correction where it interpolates turn AND locomotes). | `crates/holtburger-core/src/client/movement/common.rs` | + +**Recommended action:** a side-by-side audit of [WorldSession.cs:6067-6089](src/AcDream.Core.Net/WorldSession.cs:6067) (MoveToState builder) and [Messages/MoveToState.cs](src/AcDream.Core.Net/Messages/MoveToState.cs) against holtburger `common.rs:122-186` and `system.rs:710-1000`. File whichever items don't already match as `#L.X.a-e` issues. + +### 1.2 LoginComplete on every PlayerTeleport, not just first PlayerCreate + +Holtburger sends `GameAction::LoginComplete` (0x00A1) **both** on first `PlayerCreate` (0xF746) AND on every `PlayerTeleport` (0xF74A) — no de-dup, server tolerates multiples. acdream sends it only on first PlayerCreate. Likely explains some portal-transition glitches. + +References: holtburger `messages.rs:433-467` (PlayerCreate), `messages.rs:480-487` (PlayerTeleport). acdream sends only at [WorldSession.cs:648](src/AcDream.Core.Net/WorldSession.cs:648). + +**Cost:** ~5 lines. + +### 1.3 EchoRequest → EchoResponse reply + +We parse `EchoRequest` from the optional header but never reply. ACE pings periodically; the missing response is a likely contributor to Network Timeout drops in long sessions. Holtburger handles it inline in the recv-message dispatcher. + +Reference: holtburger `crates/holtburger-session/src/session/receive.rs::finalize_ordered_server_packet` and the optional-header iterator at `crates/holtburger-session/src/optional_header.rs:59-141`. + +**Cost:** ~30 lines (parse the EchoRequest payload, build EchoResponse with mirrored time, send as control packet). + +### 1.4 Port-switch race fix (commit `403bc98`) + +On `ConnectRequest`, our `WorldSession` eagerly sets `_connectEndpoint = port+1`. Holtburger's recent fix introduces `pending_server_source_addr`: the new port is staged but `server_source_addr` is only updated when an actual packet arrives from the new port. ACE deployments occasionally send one more packet from `port` after the activation, and our code drops them. + +References: holtburger `session/auth.rs:42-47` (stage), `session/receive.rs:17-51` (confirm on first packet from new port). + +**Cost:** ~20 lines, one new field on `WorldSession`. + +### 1.5 Non-blocking 200 ms handshake delay + +We use `Thread.Sleep(200)` between receiving ConnectRequest and sending ConnectResponse on `port+1`. Holtburger queues ConnectResponse with `ready_at = Instant::now() + 200ms` and lets the recv loop keep draining during the gap (handles any inbound TimeSync that arrives in the window). + +Reference: holtburger `session/auth.rs:42-66`, queued via `pending_control_packets` flushed by the recv loop. (Their old form, deleted in `99974cc`, used `tokio::time::sleep` and matched our blocking pattern.) + +**Cost:** ~40 lines (small "deferred control packet" queue + flush check). + +### 1.6 AutonomousPosition cadence audit + +We have **three policies** in play, and at least two are wrong: + +- **acdream:** fixed 200 ms heartbeat (per `memory/project_retail_motion_outbound`) +- **holtburger:** fixed 1 s heartbeat, unconditional regardless of motion (`common.rs:22`, `system.rs:858-893`) +- **cdb retail trace (memory):** AutoPos appears gated on actual motion + +Most likely retail wins (cdb is observing real client behavior). If retail truly suppresses AutoPos when stationary, our 5× over-emission triggers ACE-side over-validation and may contribute to the observer-side jitter. **Recommended:** another cdb idle trace to confirm retail's exact behavior, then converge to it. + +### 1.7 Retransmit machinery (entire subsystem) + +Largest delta from holtburger. We are missing: + +- **A retransmit cache.** Holtburger's `MAX_CACHED_PACKETS=512`, LRU-style, drops oldest when full (`reliability.rs:32-37`). +- **Server-requested retransmits.** When the server asks for resends, holtburger re-encrypts with current ISAAC + RETRANSMISSION flag and replays from cache (`reliability.rs:135-186`). +- **Client-issued retransmit requests.** When inbound seq has gaps, holtburger sends `RequestRetransmit` for up to 115 seqs in a 256-seq window, rate-limited to once per second (`MAX_RETRANSMIT_SEQUENCE_IDS=115`, `MAX_RETRANSMIT_SEQUENCE_WINDOW=256`, `REQUEST_RETRANSMIT_INTERVAL=1s`). +- **`Iteration` field handling.** Our `PacketHeader.Iteration` is always 0; holtburger increments on retransmit. +- **`ISAAC::search` for out-of-order ENCRYPTED_CHECKSUM packets.** Out-of-order packets have ISAAC keys that have already advanced. Holtburger scans forward up to 256 keys, stashing each skipped key in `xors: HashSet` for later out-of-order packets to consume via `consume_key_value` (`crypto.rs:73-93`). **A naive port either drops the out-of-order packet or corrupts the ISAAC stream.** If our IsaacRandom doesn't have a search-and-stash mode, this is a latent bug waiting for any UDP loss event. + +Our `WorldSession` class doc explicitly defers this work (`WorldSession.cs:29` "ACK pump, retransmit handling … deferred"). Symptoms when it's missing: any packet loss → silent state divergence, eventual desync, "purple haze" / Network Timeout drops. + +**Cost:** 1-2 days. The whole pattern is in holtburger's `reliability.rs` (196 lines) plus the ISAAC search-mode in `crypto.rs:73-93`. + +### 1.8 Fragment assembler TTL + outbound multi-fragment split + +Two smaller correctness gaps: + +- **Inbound:** Our `FragmentAssembler` has no TTL. If a multi-fragment server message loses its middle fragment, the partials sit forever. Memory leak in any long session that sees UDP loss. Holtburger's reassembler tracks completion per `(sequence, id)` and lives inside `process_fragment` in `send.rs`. +- **Outbound:** Our `GameMessageFragment.BuildSingleFragment` throws on body > 448 bytes. Anything that needs splitting (long /tells, big inventory queries, large appraisals) silently can't be sent. Note: **holtburger doesn't do outbound fragmentation either** (`send_message` always emits `count: 1`, `send.rs:298`) — they're betting on UDP-level fragmentation. So this isn't a holtburger crib; it's a hole in both. AC2D + Chorizite are the better references when we get there. + +--- + +## 2. Confirmations — we're doing it right + +Three places where the audit confirmed our existing approach matches the reference: + +- **Run/walk encoding via WalkForward + HoldKey.Run/None.** Holtburger sends `forward_command = 0x45000005 (WalkForward)` for **both** walk and run; the distinction is in `forward_hold_key` (Run=2 vs None=1) and `forward_speed`. ACE upgrades server-side. Test pinning this contract: `holtburger system/tests.rs:404-428`. +- **Two-step EnterWorld** (`0xF7C8 CharacterEnterWorldRequest` → wait for `0xF7DF ServerReady` → `0xF657 CharacterEnterWorld`). +- **ACK on every received packet with seq > 0.** Holtburger's `recv_packet_with_addr` queues an ack for every received packet with `sequence > 0 && flags != ACK_SEQUENCE`. Outbound `send_message` auto-piggybacks the latest server seq onto the next data packet; standalone ACKs flush only when nothing naturally goes out. (Worth double-checking that our `SendAck` is called automatically on `ProcessDatagram`, not as a separate periodic pump.) + +One thing **worth re-verifying** because it's easy to invert: ISAAC seeding direction. Holtburger uses `isaac_c2s = Isaac::new(crd.client_seed)` and `isaac_s2c = Isaac::new(crd.server_seed)` — i.e. the wire field labelled `client_seed` seeds the C2S keystream, and vice versa. Worth a 30-second check that our `WorldSession` does the same. + +--- + +## 3. Don't crib these (holtburger gaps / wrong) + +- **Outbound fragmentation:** holtburger doesn't do it. Hole in both projects. Use AC2D + Chorizite when needed. +- **Jump (0xF61B):** holtburger never sends Jump. The TUI client can't jump. `JumpActionData` is decoder-only. Use cdb retail trace + Chorizite.ACProtocol for jump format reference. +- **Initial run_rate_scalar fallback:** holtburger uses 4.5 (max-cap formula, run_skill ≥ 800); acdream uses 2.4-2.94 default. Retail formula: `(load_mod * (run_skill / (run_skill + 200) * 11) + 4) / 4`. The right pre-PlayerDescription default depends on what retail does — cdb trace will settle it. +- **AutoPos cadence:** holtburger's 1-second unconditional heartbeat is probably wrong (cdb retail trace says gated on motion). Don't copy this verbatim; investigate first. + +--- + +## 4. Recent commits worth knowing (last 237) + +| Commit | Date | Intent | Relevance | +|--------|------|--------|-----------| +| `99974cc` | 2026-04-06 | "Fix/session issues" — splits 673-line `lib.rs` into `session/{api,auth,receive,send,reliability,types}`. **Adds the missing C↔S retransmit logic.** Replaces `tokio::sleep(200ms)` with deferred control-packet queue. | Read this diff if you read only one. | +| `403bc98` | 2026-04-21 | "do not switch ports prematurely" (#158). Pending vs confirmed source-port. | Apply same pattern to `WorldSession`. | +| `336cbad` | 2026-04-?? | "fix: more movement fixes". `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0. | Likely also a bug class in our outbound MoveToState. | +| `797aece` | 2026-04-06 | DISCONNECT now carries `id = client_id` instead of 0. | One-line fix on our `Dispose` path. | +| `854c1bb` | (older) | "Feat/simulation system" (#105) — added the entire 2222-LOC `client/movement/{common,system}.rs`. | Foundation everything else builds on. | + +Nothing in 237 commits changes LoginRequest payload, ConnectRequest parse, ISAAC seeding, or EnterWorld message ordering. The wire format is unchanged from what acdream targets — the deltas are internal architecture and bug fixes. + +--- + +## 5. Recommended sequencing + +**Tier 1 — Quick wins (under an hour each, high signal-to-noise):** +1. MoveToState audit fixes (1.1.a-e) — file as `#L.X.a-e`, batch into one PR +2. LoginComplete on PlayerTeleport (1.2) +3. EchoRequest → EchoResponse reply (1.3) +4. Port-switch race fix (1.4) +5. Non-blocking handshake delay (1.5) +6. Disconnect carries client_id (`797aece` finding) + +**Tier 2 — Investigation, then fix:** +7. AutoPos cadence — cdb idle trace, then converge (1.6) +8. Audit "dead outbound builders" (Phase B.4 wiring drift) — separate from holtburger but surfaced by this study + +**Tier 3 — Bigger investment:** +9. Retransmit subsystem (1.7) — port `reliability.rs` wholesale, including ISAAC search-mode (1-2 days) +10. Fragment assembler TTL (1.8 inbound) + +The Tier 1 group is a cohesive "post-A.5 network polish" pass — cheap, high-confidence, and several of them are likely candidates for the longstanding observer-not-perfect issue. + +--- + +## 6. File map for cross-reference + +| acdream | holtburger | Role | +|---------|-----------|------| +| `src/AcDream.Core.Net/WorldSession.cs:411-521` | `crates/holtburger-session/src/session/{api,auth}.rs` | Handshake driver | +| `src/AcDream.Core.Net/WorldSession.cs:556-924` | `crates/holtburger-core/src/client/runtime.rs:91-200` + `messages.rs` | Recv loop + dispatch | +| `src/AcDream.Core.Net/WorldSession.cs:1096-1156` | `crates/holtburger-session/src/session/send.rs` | Outbound transport (encode + ack piggyback) | +| `src/AcDream.Core.Net/Cryptography/IsaacRandom.cs` | `crates/holtburger-protocol/src/crypto.rs` | ISAAC (we likely lack `search`-mode) | +| `src/AcDream.Core.Net/Packets/PacketCodec.cs` | `session/{send,receive}.rs` + `optional_header.rs` | Encode/decode + optional header iteration | +| `src/AcDream.Core.Net/Packets/FragmentAssembler.cs` | `session/send.rs::process_fragment` | Inbound reassembly | +| `src/AcDream.Core.Net/Messages/MoveToState.cs` | `crates/holtburger-protocol/src/messages/movement/actions.rs:53-69` + `client/movement/common.rs:122-186` | MoveToState builder | +| `src/AcDream.Core.Net/Messages/AutonomousPosition.cs` | `messages/movement/actions.rs:175-189` + `system.rs:858-893` | AutoPos builder + cadence | +| **(missing)** | `crates/holtburger-session/src/session/reliability.rs` | **Retransmit machinery — entirely absent in acdream** | + +--- + +## Method note + +This study used four parallel general-purpose agents on the day-of pull (2026-05-10, holtburger HEAD `629695a`). All citations are file paths + line numbers in that exact tree. If holtburger moves forward, line numbers will drift; commit hashes (especially `99974cc`, `403bc98`, `336cbad`, `797aece`) are stable anchors.