acdream/docs/research/2026-05-10-holtburger-network-stack-study.md
Erik b8b9845f50 docs(post-A.5): capture holtburger network-stack study + Phase M.0 quick-wins
Holtburger reference fast-forwarded from 88b19bd to 629695a (+237 commits).
Four parallel research agents produced a parity-first-pass between
holtburger's network stack and acdream's src/AcDream.Core.Net/.

Why captured now: study surfaced six small, high-confidence "Tier 1" fixes
that can ship before the bigger M.1-M.8 layer extraction. Most likely fix
for the longstanding "remote retail observer sees us not perfect" bug
(MoveToState wire-format mismatches). Two transport gaps (no EchoResponse
reply, eager port-switch) match recent holtburger fixes (403bc98, 99974cc).
One latent bug worth a 5-min check (ISAAC search-mode for out-of-order
ENCRYPTED_CHECKSUM packets).

Captured as Phase M.0 in the roadmap so the work survives the session and
can be picked up later. Existing M.1-M.8 lift unchanged; M.1 marked as
partially started since the research note is the parity-map deliverable
in draft form.

Files:
- docs/research/2026-05-10-holtburger-network-stack-study.md (new) — full
  study with ranked port candidates, recent commits worth knowing, and
  acdream-vs-holtburger file map.
- docs/plans/2026-04-11-roadmap.md — Phase M Plan-of-record updated with
  2026-05-10 pointer; M.0 sub-lane added before M.1; M.1 status note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 17:52:26 +02:00

177 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Holtburger network stack — study & port candidates for acdream
**Date:** 2026-05-10
**Holtburger reference:** github.com/merklejerk/holtburger, vendored at `references/holtburger/`, fast-forwarded from `88b19bd``629695a` (237 commits, ~3 months of work).
**Method:** Four parallel research agents — three over holtburger's transport, handshake, and movement; one inventorying acdream's current `src/AcDream.Core.Net/`. Findings cross-referenced and ranked by ROI.
## TL;DR
Holtburger has shipped real, citeable fixes since our last pin that we should adopt. The biggest tactical wins are:
1. **A handful of one-line MoveToState fixes** that are likely candidates for the "remote retail observer sees acdream's player not perfect" issue (#L.X).
2. **Three small handshake/transport corrections** — LoginComplete-on-teleport, EchoResponse reply, port-switch race — each <1 hour and each measurable.
3. **A real retransmit subsystem we're missing entirely.** Our `WorldSession` parses retransmit requests, doesn't honor them, has no resend buffer, and never asks for a resend. Lost packets just vanish. Holtburger's `session/reliability.rs` is the reference-quality pattern.
Separately, the audit surfaced one painful finding about acdream itself: **roughly half of our outbound `Messages/` library is dead code** InteractRequests, InventoryActions, SocialActions, AllegianceRequests, CastSpellRequest, AppraiseRequest, and most of CharacterActions are built and unit-tested but have no `WorldSession.Send*` wrapper and no live caller. Phase B.4 (Use/UseWithTarget) per memory shipped, but the audit found no in-app caller. Either we left wiring on the table or there's an integration drift to investigate.
The remainder of this doc is organized as: ranked port candidates confirmations of what we got right traps (where holtburger is wrong or stubbed) recent commits worth knowing recommended sequencing cross-reference file map.
---
## 1. Ranked port candidates (highest ROI first)
### 1.1 Outbound MoveToState audit — concrete suspects for the "observer not perfect" bug
Five specific items where holtburger's wire format is likely tighter than ours. Each is a small change in our `Messages/MoveToState.cs` builder; together they're the most likely cause of remote retail observers reporting our player "lagging forward" or "walking when running."
| # | Suspect | Holtburger reference |
|---|---------|----------------------|
| a | **`current_hold_key` always set on non-stop MoveToState.** Holtburger's drive emit seeds `flags = CURRENT_HOLD_KEY` and writes `current_hold_key = HoldKey::Run`(2) for run, `HoldKey::None`(1) for walk. ACE's relay code may treat its absence as "unknown" and broadcast Walk to observers. | `crates/holtburger-core/src/client/movement/common.rs:151-153` |
| b | **`commands[]` array MUST be empty on held WASD.** Holtburger never puts a `MotionItem` in `commands[]` for held movement only for transient slash commands like `/dance`. If acdream is putting one in for held W (or letting movement_sequence bump per-frame), every observer's `apply_self_update_motion` re-applies the same sequence as a fresh interpolation start exactly the symptom. | `system.rs:743-766` (`execute_transient_motion_at`) |
| c | **`turn_speed` always emitted alongside `TURN_COMMAND`.** Holtburger writes 1.5 rad/s for Run, 1.0 rad/s for Walk; the `TURN_SPEED` flag is *always* set whenever `TURN_COMMAND` is. Omitting it lets ACE default to 0 "smoothly but slowly" turn observed. | `common.rs:184-186, 226-231` |
| d | **Dedup gate must include gait.** Holtburger's `should_send_motion_state_pulse` compares the full `(MotionState, MotionStyle)`. If acdream's dedup is keyed on only `(forward_command, hold_key)` it would suppress the RunWalk transition (since `forward_command = WalkForward = 0x45000005` for both), explaining the RunWalk observer bug specifically. | `system.rs:916-926` |
| e | **Don't emit `turning` field when locomotion is non-zero.** Recent fix in commit `336cbad`: `autonomous_wire_motion_state` no longer emits `turning` when locomotion 0 (avoids server-side double-correction where it interpolates turn AND locomotes). | `crates/holtburger-core/src/client/movement/common.rs` |
**Recommended action:** a side-by-side audit of [WorldSession.cs:6067-6089](src/AcDream.Core.Net/WorldSession.cs:6067) (MoveToState builder) and [Messages/MoveToState.cs](src/AcDream.Core.Net/Messages/MoveToState.cs) against holtburger `common.rs:122-186` and `system.rs:710-1000`. File whichever items don't already match as `#L.X.a-e` issues.
### 1.2 LoginComplete on every PlayerTeleport, not just first PlayerCreate
Holtburger sends `GameAction::LoginComplete` (0x00A1) **both** on first `PlayerCreate` (0xF746) AND on every `PlayerTeleport` (0xF74A) no de-dup, server tolerates multiples. acdream sends it only on first PlayerCreate. Likely explains some portal-transition glitches.
References: holtburger `messages.rs:433-467` (PlayerCreate), `messages.rs:480-487` (PlayerTeleport). acdream sends only at [WorldSession.cs:648](src/AcDream.Core.Net/WorldSession.cs:648).
**Cost:** ~5 lines.
### 1.3 EchoRequest → EchoResponse reply
We parse `EchoRequest` from the optional header but never reply. ACE pings periodically; the missing response is a likely contributor to Network Timeout drops in long sessions. Holtburger handles it inline in the recv-message dispatcher.
Reference: holtburger `crates/holtburger-session/src/session/receive.rs::finalize_ordered_server_packet` and the optional-header iterator at `crates/holtburger-session/src/optional_header.rs:59-141`.
**Cost:** ~30 lines (parse the EchoRequest payload, build EchoResponse with mirrored time, send as control packet).
### 1.4 Port-switch race fix (commit `403bc98`)
On `ConnectRequest`, our `WorldSession` eagerly sets `_connectEndpoint = port+1`. Holtburger's recent fix introduces `pending_server_source_addr`: the new port is staged but `server_source_addr` is only updated when an actual packet arrives from the new port. ACE deployments occasionally send one more packet from `port` after the activation, and our code drops them.
References: holtburger `session/auth.rs:42-47` (stage), `session/receive.rs:17-51` (confirm on first packet from new port).
**Cost:** ~20 lines, one new field on `WorldSession`.
### 1.5 Non-blocking 200 ms handshake delay
We use `Thread.Sleep(200)` between receiving ConnectRequest and sending ConnectResponse on `port+1`. Holtburger queues ConnectResponse with `ready_at = Instant::now() + 200ms` and lets the recv loop keep draining during the gap (handles any inbound TimeSync that arrives in the window).
Reference: holtburger `session/auth.rs:42-66`, queued via `pending_control_packets` flushed by the recv loop. (Their old form, deleted in `99974cc`, used `tokio::time::sleep` and matched our blocking pattern.)
**Cost:** ~40 lines (small "deferred control packet" queue + flush check).
### 1.6 AutonomousPosition cadence audit
We have **three policies** in play, and at least two are wrong:
- **acdream:** fixed 200 ms heartbeat (per `memory/project_retail_motion_outbound`)
- **holtburger:** fixed 1 s heartbeat, unconditional regardless of motion (`common.rs:22`, `system.rs:858-893`)
- **cdb retail trace (memory):** AutoPos appears gated on actual motion
Most likely retail wins (cdb is observing real client behavior). If retail truly suppresses AutoPos when stationary, our 5× over-emission triggers ACE-side over-validation and may contribute to the observer-side jitter. **Recommended:** another cdb idle trace to confirm retail's exact behavior, then converge to it.
### 1.7 Retransmit machinery (entire subsystem)
Largest delta from holtburger. We are missing:
- **A retransmit cache.** Holtburger's `MAX_CACHED_PACKETS=512`, LRU-style, drops oldest when full (`reliability.rs:32-37`).
- **Server-requested retransmits.** When the server asks for resends, holtburger re-encrypts with current ISAAC + RETRANSMISSION flag and replays from cache (`reliability.rs:135-186`).
- **Client-issued retransmit requests.** When inbound seq has gaps, holtburger sends `RequestRetransmit` for up to 115 seqs in a 256-seq window, rate-limited to once per second (`MAX_RETRANSMIT_SEQUENCE_IDS=115`, `MAX_RETRANSMIT_SEQUENCE_WINDOW=256`, `REQUEST_RETRANSMIT_INTERVAL=1s`).
- **`Iteration` field handling.** Our `PacketHeader.Iteration` is always 0; holtburger increments on retransmit.
- **`ISAAC::search` for out-of-order ENCRYPTED_CHECKSUM packets.** Out-of-order packets have ISAAC keys that have already advanced. Holtburger scans forward up to 256 keys, stashing each skipped key in `xors: HashSet<u32>` for later out-of-order packets to consume via `consume_key_value` (`crypto.rs:73-93`). **A naive port either drops the out-of-order packet or corrupts the ISAAC stream.** If our IsaacRandom doesn't have a search-and-stash mode, this is a latent bug waiting for any UDP loss event.
Our `WorldSession` class doc explicitly defers this work (`WorldSession.cs:29` "ACK pump, retransmit handling deferred"). Symptoms when it's missing: any packet loss silent state divergence, eventual desync, "purple haze" / Network Timeout drops.
**Cost:** 1-2 days. The whole pattern is in holtburger's `reliability.rs` (196 lines) plus the ISAAC search-mode in `crypto.rs:73-93`.
### 1.8 Fragment assembler TTL + outbound multi-fragment split
Two smaller correctness gaps:
- **Inbound:** Our `FragmentAssembler` has no TTL. If a multi-fragment server message loses its middle fragment, the partials sit forever. Memory leak in any long session that sees UDP loss. Holtburger's reassembler tracks completion per `(sequence, id)` and lives inside `process_fragment` in `send.rs`.
- **Outbound:** Our `GameMessageFragment.BuildSingleFragment` throws on body > 448 bytes. Anything that needs splitting (long /tells, big inventory queries, large appraisals) silently can't be sent. Note: **holtburger doesn't do outbound fragmentation either** (`send_message` always emits `count: 1`, `send.rs:298`) — they're betting on UDP-level fragmentation. So this isn't a holtburger crib; it's a hole in both. AC2D + Chorizite are the better references when we get there.
---
## 2. Confirmations — we're doing it right
Three places where the audit confirmed our existing approach matches the reference:
- **Run/walk encoding via WalkForward + HoldKey.Run/None.** Holtburger sends `forward_command = 0x45000005 (WalkForward)` for **both** walk and run; the distinction is in `forward_hold_key` (Run=2 vs None=1) and `forward_speed`. ACE upgrades server-side. Test pinning this contract: `holtburger system/tests.rs:404-428`.
- **Two-step EnterWorld** (`0xF7C8 CharacterEnterWorldRequest` → wait for `0xF7DF ServerReady``0xF657 CharacterEnterWorld`).
- **ACK on every received packet with seq > 0.** Holtburger's `recv_packet_with_addr` queues an ack for every received packet with `sequence > 0 && flags != ACK_SEQUENCE`. Outbound `send_message` auto-piggybacks the latest server seq onto the next data packet; standalone ACKs flush only when nothing naturally goes out. (Worth double-checking that our `SendAck` is called automatically on `ProcessDatagram`, not as a separate periodic pump.)
One thing **worth re-verifying** because it's easy to invert: ISAAC seeding direction. Holtburger uses `isaac_c2s = Isaac::new(crd.client_seed)` and `isaac_s2c = Isaac::new(crd.server_seed)` — i.e. the wire field labelled `client_seed` seeds the C2S keystream, and vice versa. Worth a 30-second check that our `WorldSession` does the same.
---
## 3. Don't crib these (holtburger gaps / wrong)
- **Outbound fragmentation:** holtburger doesn't do it. Hole in both projects. Use AC2D + Chorizite when needed.
- **Jump (0xF61B):** holtburger never sends Jump. The TUI client can't jump. `JumpActionData` is decoder-only. Use cdb retail trace + Chorizite.ACProtocol for jump format reference.
- **Initial run_rate_scalar fallback:** holtburger uses 4.5 (max-cap formula, run_skill ≥ 800); acdream uses 2.4-2.94 default. Retail formula: `(load_mod * (run_skill / (run_skill + 200) * 11) + 4) / 4`. The right pre-PlayerDescription default depends on what retail does — cdb trace will settle it.
- **AutoPos cadence:** holtburger's 1-second unconditional heartbeat is probably wrong (cdb retail trace says gated on motion). Don't copy this verbatim; investigate first.
---
## 4. Recent commits worth knowing (last 237)
| Commit | Date | Intent | Relevance |
|--------|------|--------|-----------|
| `99974cc` | 2026-04-06 | "Fix/session issues" — splits 673-line `lib.rs` into `session/{api,auth,receive,send,reliability,types}`. **Adds the missing C↔S retransmit logic.** Replaces `tokio::sleep(200ms)` with deferred control-packet queue. | Read this diff if you read only one. |
| `403bc98` | 2026-04-21 | "do not switch ports prematurely" (#158). Pending vs confirmed source-port. | Apply same pattern to `WorldSession`. |
| `336cbad` | 2026-04-?? | "fix: more movement fixes". `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0. | Likely also a bug class in our outbound MoveToState. |
| `797aece` | 2026-04-06 | DISCONNECT now carries `id = client_id` instead of 0. | One-line fix on our `Dispose` path. |
| `854c1bb` | (older) | "Feat/simulation system" (#105) — added the entire 2222-LOC `client/movement/{common,system}.rs`. | Foundation everything else builds on. |
Nothing in 237 commits changes LoginRequest payload, ConnectRequest parse, ISAAC seeding, or EnterWorld message ordering. The wire format is unchanged from what acdream targets — the deltas are internal architecture and bug fixes.
---
## 5. Recommended sequencing
**Tier 1 — Quick wins (under an hour each, high signal-to-noise):**
1. MoveToState audit fixes (1.1.a-e) — file as `#L.X.a-e`, batch into one PR
2. LoginComplete on PlayerTeleport (1.2)
3. EchoRequest → EchoResponse reply (1.3)
4. Port-switch race fix (1.4)
5. Non-blocking handshake delay (1.5)
6. Disconnect carries client_id (`797aece` finding)
**Tier 2 — Investigation, then fix:**
7. AutoPos cadence — cdb idle trace, then converge (1.6)
8. Audit "dead outbound builders" (Phase B.4 wiring drift) — separate from holtburger but surfaced by this study
**Tier 3 — Bigger investment:**
9. Retransmit subsystem (1.7) — port `reliability.rs` wholesale, including ISAAC search-mode (1-2 days)
10. Fragment assembler TTL (1.8 inbound)
The Tier 1 group is a cohesive "post-A.5 network polish" pass — cheap, high-confidence, and several of them are likely candidates for the longstanding observer-not-perfect issue.
---
## 6. File map for cross-reference
| acdream | holtburger | Role |
|---------|-----------|------|
| `src/AcDream.Core.Net/WorldSession.cs:411-521` | `crates/holtburger-session/src/session/{api,auth}.rs` | Handshake driver |
| `src/AcDream.Core.Net/WorldSession.cs:556-924` | `crates/holtburger-core/src/client/runtime.rs:91-200` + `messages.rs` | Recv loop + dispatch |
| `src/AcDream.Core.Net/WorldSession.cs:1096-1156` | `crates/holtburger-session/src/session/send.rs` | Outbound transport (encode + ack piggyback) |
| `src/AcDream.Core.Net/Cryptography/IsaacRandom.cs` | `crates/holtburger-protocol/src/crypto.rs` | ISAAC (we likely lack `search`-mode) |
| `src/AcDream.Core.Net/Packets/PacketCodec.cs` | `session/{send,receive}.rs` + `optional_header.rs` | Encode/decode + optional header iteration |
| `src/AcDream.Core.Net/Packets/FragmentAssembler.cs` | `session/send.rs::process_fragment` | Inbound reassembly |
| `src/AcDream.Core.Net/Messages/MoveToState.cs` | `crates/holtburger-protocol/src/messages/movement/actions.rs:53-69` + `client/movement/common.rs:122-186` | MoveToState builder |
| `src/AcDream.Core.Net/Messages/AutonomousPosition.cs` | `messages/movement/actions.rs:175-189` + `system.rs:858-893` | AutoPos builder + cadence |
| **(missing)** | `crates/holtburger-session/src/session/reliability.rs` | **Retransmit machinery — entirely absent in acdream** |
---
## Method note
This study used four parallel general-purpose agents on the day-of pull (2026-05-10, holtburger HEAD `629695a`). All citations are file paths + line numbers in that exact tree. If holtburger moves forward, line numbers will drift; commit hashes (especially `99974cc`, `403bc98`, `336cbad`, `797aece`) are stable anchors.