docs(post-A.5): capture holtburger network-stack study + Phase M.0 quick-wins
Holtburger reference fast-forwarded from 88b19bd to 629695a (+237 commits). Four parallel research agents produced a parity-first-pass between holtburger's network stack and acdream's src/AcDream.Core.Net/. Why captured now: study surfaced six small, high-confidence "Tier 1" fixes that can ship before the bigger M.1-M.8 layer extraction. Most likely fix for the longstanding "remote retail observer sees us not perfect" bug (MoveToState wire-format mismatches). Two transport gaps (no EchoResponse reply, eager port-switch) match recent holtburger fixes (403bc98, 99974cc). One latent bug worth a 5-min check (ISAAC search-mode for out-of-order ENCRYPTED_CHECKSUM packets). Captured as Phase M.0 in the roadmap so the work survives the session and can be picked up later. Existing M.1-M.8 lift unchanged; M.1 marked as partially started since the research note is the parity-map deliverable in draft form. Files: - docs/research/2026-05-10-holtburger-network-stack-study.md (new) — full study with ranked port candidates, recent commits worth knowing, and acdream-vs-holtburger file map. - docs/plans/2026-04-11-roadmap.md — Phase M Plan-of-record updated with 2026-05-10 pointer; M.0 sub-lane added before M.1; M.1 status note. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f66522cd6b
commit
b8b9845f50
2 changed files with 216 additions and 1 deletions
177
docs/research/2026-05-10-holtburger-network-stack-study.md
Normal file
177
docs/research/2026-05-10-holtburger-network-stack-study.md
Normal file
|
|
@ -0,0 +1,177 @@
|
|||
# Holtburger network stack — study & port candidates for acdream
|
||||
|
||||
**Date:** 2026-05-10
|
||||
**Holtburger reference:** github.com/merklejerk/holtburger, vendored at `references/holtburger/`, fast-forwarded from `88b19bd` → `629695a` (237 commits, ~3 months of work).
|
||||
**Method:** Four parallel research agents — three over holtburger's transport, handshake, and movement; one inventorying acdream's current `src/AcDream.Core.Net/`. Findings cross-referenced and ranked by ROI.
|
||||
|
||||
## TL;DR
|
||||
|
||||
Holtburger has shipped real, citeable fixes since our last pin that we should adopt. The biggest tactical wins are:
|
||||
|
||||
1. **A handful of one-line MoveToState fixes** that are likely candidates for the "remote retail observer sees acdream's player not perfect" issue (#L.X).
|
||||
2. **Three small handshake/transport corrections** — LoginComplete-on-teleport, EchoResponse reply, port-switch race — each <1 hour and each measurable.
|
||||
3. **A real retransmit subsystem we're missing entirely.** Our `WorldSession` parses retransmit requests, doesn't honor them, has no resend buffer, and never asks for a resend. Lost packets just vanish. Holtburger's `session/reliability.rs` is the reference-quality pattern.
|
||||
|
||||
Separately, the audit surfaced one painful finding about acdream itself: **roughly half of our outbound `Messages/` library is dead code** — InteractRequests, InventoryActions, SocialActions, AllegianceRequests, CastSpellRequest, AppraiseRequest, and most of CharacterActions are built and unit-tested but have no `WorldSession.Send*` wrapper and no live caller. Phase B.4 (Use/UseWithTarget) per memory shipped, but the audit found no in-app caller. Either we left wiring on the table or there's an integration drift to investigate.
|
||||
|
||||
The remainder of this doc is organized as: ranked port candidates → confirmations of what we got right → traps (where holtburger is wrong or stubbed) → recent commits worth knowing → recommended sequencing → cross-reference file map.
|
||||
|
||||
---
|
||||
|
||||
## 1. Ranked port candidates (highest ROI first)
|
||||
|
||||
### 1.1 Outbound MoveToState audit — concrete suspects for the "observer not perfect" bug
|
||||
|
||||
Five specific items where holtburger's wire format is likely tighter than ours. Each is a small change in our `Messages/MoveToState.cs` builder; together they're the most likely cause of remote retail observers reporting our player "lagging forward" or "walking when running."
|
||||
|
||||
| # | Suspect | Holtburger reference |
|
||||
|---|---------|----------------------|
|
||||
| a | **`current_hold_key` always set on non-stop MoveToState.** Holtburger's drive emit seeds `flags = CURRENT_HOLD_KEY` and writes `current_hold_key = HoldKey::Run`(2) for run, `HoldKey::None`(1) for walk. ACE's relay code may treat its absence as "unknown" and broadcast Walk to observers. | `crates/holtburger-core/src/client/movement/common.rs:151-153` |
|
||||
| b | **`commands[]` array MUST be empty on held WASD.** Holtburger never puts a `MotionItem` in `commands[]` for held movement — only for transient slash commands like `/dance`. If acdream is putting one in for held W (or letting movement_sequence bump per-frame), every observer's `apply_self_update_motion` re-applies the same sequence as a fresh interpolation start — exactly the symptom. | `system.rs:743-766` (`execute_transient_motion_at`) |
|
||||
| c | **`turn_speed` always emitted alongside `TURN_COMMAND`.** Holtburger writes 1.5 rad/s for Run, 1.0 rad/s for Walk; the `TURN_SPEED` flag is *always* set whenever `TURN_COMMAND` is. Omitting it lets ACE default to 0 → "smoothly but slowly" turn observed. | `common.rs:184-186, 226-231` |
|
||||
| d | **Dedup gate must include gait.** Holtburger's `should_send_motion_state_pulse` compares the full `(MotionState, MotionStyle)`. If acdream's dedup is keyed on only `(forward_command, hold_key)` it would suppress the Run→Walk transition (since `forward_command = WalkForward = 0x45000005` for both), explaining the Run↔Walk observer bug specifically. | `system.rs:916-926` |
|
||||
| e | **Don't emit `turning` field when locomotion is non-zero.** Recent fix in commit `336cbad`: `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0 (avoids server-side double-correction where it interpolates turn AND locomotes). | `crates/holtburger-core/src/client/movement/common.rs` |
|
||||
|
||||
**Recommended action:** a side-by-side audit of [WorldSession.cs:6067-6089](src/AcDream.Core.Net/WorldSession.cs:6067) (MoveToState builder) and [Messages/MoveToState.cs](src/AcDream.Core.Net/Messages/MoveToState.cs) against holtburger `common.rs:122-186` and `system.rs:710-1000`. File whichever items don't already match as `#L.X.a-e` issues.
|
||||
|
||||
### 1.2 LoginComplete on every PlayerTeleport, not just first PlayerCreate
|
||||
|
||||
Holtburger sends `GameAction::LoginComplete` (0x00A1) **both** on first `PlayerCreate` (0xF746) AND on every `PlayerTeleport` (0xF74A) — no de-dup, server tolerates multiples. acdream sends it only on first PlayerCreate. Likely explains some portal-transition glitches.
|
||||
|
||||
References: holtburger `messages.rs:433-467` (PlayerCreate), `messages.rs:480-487` (PlayerTeleport). acdream sends only at [WorldSession.cs:648](src/AcDream.Core.Net/WorldSession.cs:648).
|
||||
|
||||
**Cost:** ~5 lines.
|
||||
|
||||
### 1.3 EchoRequest → EchoResponse reply
|
||||
|
||||
We parse `EchoRequest` from the optional header but never reply. ACE pings periodically; the missing response is a likely contributor to Network Timeout drops in long sessions. Holtburger handles it inline in the recv-message dispatcher.
|
||||
|
||||
Reference: holtburger `crates/holtburger-session/src/session/receive.rs::finalize_ordered_server_packet` and the optional-header iterator at `crates/holtburger-session/src/optional_header.rs:59-141`.
|
||||
|
||||
**Cost:** ~30 lines (parse the EchoRequest payload, build EchoResponse with mirrored time, send as control packet).
|
||||
|
||||
### 1.4 Port-switch race fix (commit `403bc98`)
|
||||
|
||||
On `ConnectRequest`, our `WorldSession` eagerly sets `_connectEndpoint = port+1`. Holtburger's recent fix introduces `pending_server_source_addr`: the new port is staged but `server_source_addr` is only updated when an actual packet arrives from the new port. ACE deployments occasionally send one more packet from `port` after the activation, and our code drops them.
|
||||
|
||||
References: holtburger `session/auth.rs:42-47` (stage), `session/receive.rs:17-51` (confirm on first packet from new port).
|
||||
|
||||
**Cost:** ~20 lines, one new field on `WorldSession`.
|
||||
|
||||
### 1.5 Non-blocking 200 ms handshake delay
|
||||
|
||||
We use `Thread.Sleep(200)` between receiving ConnectRequest and sending ConnectResponse on `port+1`. Holtburger queues ConnectResponse with `ready_at = Instant::now() + 200ms` and lets the recv loop keep draining during the gap (handles any inbound TimeSync that arrives in the window).
|
||||
|
||||
Reference: holtburger `session/auth.rs:42-66`, queued via `pending_control_packets` flushed by the recv loop. (Their old form, deleted in `99974cc`, used `tokio::time::sleep` and matched our blocking pattern.)
|
||||
|
||||
**Cost:** ~40 lines (small "deferred control packet" queue + flush check).
|
||||
|
||||
### 1.6 AutonomousPosition cadence audit
|
||||
|
||||
We have **three policies** in play, and at least two are wrong:
|
||||
|
||||
- **acdream:** fixed 200 ms heartbeat (per `memory/project_retail_motion_outbound`)
|
||||
- **holtburger:** fixed 1 s heartbeat, unconditional regardless of motion (`common.rs:22`, `system.rs:858-893`)
|
||||
- **cdb retail trace (memory):** AutoPos appears gated on actual motion
|
||||
|
||||
Most likely retail wins (cdb is observing real client behavior). If retail truly suppresses AutoPos when stationary, our 5× over-emission triggers ACE-side over-validation and may contribute to the observer-side jitter. **Recommended:** another cdb idle trace to confirm retail's exact behavior, then converge to it.
|
||||
|
||||
### 1.7 Retransmit machinery (entire subsystem)
|
||||
|
||||
Largest delta from holtburger. We are missing:
|
||||
|
||||
- **A retransmit cache.** Holtburger's `MAX_CACHED_PACKETS=512`, LRU-style, drops oldest when full (`reliability.rs:32-37`).
|
||||
- **Server-requested retransmits.** When the server asks for resends, holtburger re-encrypts with current ISAAC + RETRANSMISSION flag and replays from cache (`reliability.rs:135-186`).
|
||||
- **Client-issued retransmit requests.** When inbound seq has gaps, holtburger sends `RequestRetransmit` for up to 115 seqs in a 256-seq window, rate-limited to once per second (`MAX_RETRANSMIT_SEQUENCE_IDS=115`, `MAX_RETRANSMIT_SEQUENCE_WINDOW=256`, `REQUEST_RETRANSMIT_INTERVAL=1s`).
|
||||
- **`Iteration` field handling.** Our `PacketHeader.Iteration` is always 0; holtburger increments on retransmit.
|
||||
- **`ISAAC::search` for out-of-order ENCRYPTED_CHECKSUM packets.** Out-of-order packets have ISAAC keys that have already advanced. Holtburger scans forward up to 256 keys, stashing each skipped key in `xors: HashSet<u32>` for later out-of-order packets to consume via `consume_key_value` (`crypto.rs:73-93`). **A naive port either drops the out-of-order packet or corrupts the ISAAC stream.** If our IsaacRandom doesn't have a search-and-stash mode, this is a latent bug waiting for any UDP loss event.
|
||||
|
||||
Our `WorldSession` class doc explicitly defers this work (`WorldSession.cs:29` "ACK pump, retransmit handling … deferred"). Symptoms when it's missing: any packet loss → silent state divergence, eventual desync, "purple haze" / Network Timeout drops.
|
||||
|
||||
**Cost:** 1-2 days. The whole pattern is in holtburger's `reliability.rs` (196 lines) plus the ISAAC search-mode in `crypto.rs:73-93`.
|
||||
|
||||
### 1.8 Fragment assembler TTL + outbound multi-fragment split
|
||||
|
||||
Two smaller correctness gaps:
|
||||
|
||||
- **Inbound:** Our `FragmentAssembler` has no TTL. If a multi-fragment server message loses its middle fragment, the partials sit forever. Memory leak in any long session that sees UDP loss. Holtburger's reassembler tracks completion per `(sequence, id)` and lives inside `process_fragment` in `send.rs`.
|
||||
- **Outbound:** Our `GameMessageFragment.BuildSingleFragment` throws on body > 448 bytes. Anything that needs splitting (long /tells, big inventory queries, large appraisals) silently can't be sent. Note: **holtburger doesn't do outbound fragmentation either** (`send_message` always emits `count: 1`, `send.rs:298`) — they're betting on UDP-level fragmentation. So this isn't a holtburger crib; it's a hole in both. AC2D + Chorizite are the better references when we get there.
|
||||
|
||||
---
|
||||
|
||||
## 2. Confirmations — we're doing it right
|
||||
|
||||
Three places where the audit confirmed our existing approach matches the reference:
|
||||
|
||||
- **Run/walk encoding via WalkForward + HoldKey.Run/None.** Holtburger sends `forward_command = 0x45000005 (WalkForward)` for **both** walk and run; the distinction is in `forward_hold_key` (Run=2 vs None=1) and `forward_speed`. ACE upgrades server-side. Test pinning this contract: `holtburger system/tests.rs:404-428`.
|
||||
- **Two-step EnterWorld** (`0xF7C8 CharacterEnterWorldRequest` → wait for `0xF7DF ServerReady` → `0xF657 CharacterEnterWorld`).
|
||||
- **ACK on every received packet with seq > 0.** Holtburger's `recv_packet_with_addr` queues an ack for every received packet with `sequence > 0 && flags != ACK_SEQUENCE`. Outbound `send_message` auto-piggybacks the latest server seq onto the next data packet; standalone ACKs flush only when nothing naturally goes out. (Worth double-checking that our `SendAck` is called automatically on `ProcessDatagram`, not as a separate periodic pump.)
|
||||
|
||||
One thing **worth re-verifying** because it's easy to invert: ISAAC seeding direction. Holtburger uses `isaac_c2s = Isaac::new(crd.client_seed)` and `isaac_s2c = Isaac::new(crd.server_seed)` — i.e. the wire field labelled `client_seed` seeds the C2S keystream, and vice versa. Worth a 30-second check that our `WorldSession` does the same.
|
||||
|
||||
---
|
||||
|
||||
## 3. Don't crib these (holtburger gaps / wrong)
|
||||
|
||||
- **Outbound fragmentation:** holtburger doesn't do it. Hole in both projects. Use AC2D + Chorizite when needed.
|
||||
- **Jump (0xF61B):** holtburger never sends Jump. The TUI client can't jump. `JumpActionData` is decoder-only. Use cdb retail trace + Chorizite.ACProtocol for jump format reference.
|
||||
- **Initial run_rate_scalar fallback:** holtburger uses 4.5 (max-cap formula, run_skill ≥ 800); acdream uses 2.4-2.94 default. Retail formula: `(load_mod * (run_skill / (run_skill + 200) * 11) + 4) / 4`. The right pre-PlayerDescription default depends on what retail does — cdb trace will settle it.
|
||||
- **AutoPos cadence:** holtburger's 1-second unconditional heartbeat is probably wrong (cdb retail trace says gated on motion). Don't copy this verbatim; investigate first.
|
||||
|
||||
---
|
||||
|
||||
## 4. Recent commits worth knowing (last 237)
|
||||
|
||||
| Commit | Date | Intent | Relevance |
|
||||
|--------|------|--------|-----------|
|
||||
| `99974cc` | 2026-04-06 | "Fix/session issues" — splits 673-line `lib.rs` into `session/{api,auth,receive,send,reliability,types}`. **Adds the missing C↔S retransmit logic.** Replaces `tokio::sleep(200ms)` with deferred control-packet queue. | Read this diff if you read only one. |
|
||||
| `403bc98` | 2026-04-21 | "do not switch ports prematurely" (#158). Pending vs confirmed source-port. | Apply same pattern to `WorldSession`. |
|
||||
| `336cbad` | 2026-04-?? | "fix: more movement fixes". `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0. | Likely also a bug class in our outbound MoveToState. |
|
||||
| `797aece` | 2026-04-06 | DISCONNECT now carries `id = client_id` instead of 0. | One-line fix on our `Dispose` path. |
|
||||
| `854c1bb` | (older) | "Feat/simulation system" (#105) — added the entire 2222-LOC `client/movement/{common,system}.rs`. | Foundation everything else builds on. |
|
||||
|
||||
Nothing in 237 commits changes LoginRequest payload, ConnectRequest parse, ISAAC seeding, or EnterWorld message ordering. The wire format is unchanged from what acdream targets — the deltas are internal architecture and bug fixes.
|
||||
|
||||
---
|
||||
|
||||
## 5. Recommended sequencing
|
||||
|
||||
**Tier 1 — Quick wins (under an hour each, high signal-to-noise):**
|
||||
1. MoveToState audit fixes (1.1.a-e) — file as `#L.X.a-e`, batch into one PR
|
||||
2. LoginComplete on PlayerTeleport (1.2)
|
||||
3. EchoRequest → EchoResponse reply (1.3)
|
||||
4. Port-switch race fix (1.4)
|
||||
5. Non-blocking handshake delay (1.5)
|
||||
6. Disconnect carries client_id (`797aece` finding)
|
||||
|
||||
**Tier 2 — Investigation, then fix:**
|
||||
7. AutoPos cadence — cdb idle trace, then converge (1.6)
|
||||
8. Audit "dead outbound builders" (Phase B.4 wiring drift) — separate from holtburger but surfaced by this study
|
||||
|
||||
**Tier 3 — Bigger investment:**
|
||||
9. Retransmit subsystem (1.7) — port `reliability.rs` wholesale, including ISAAC search-mode (1-2 days)
|
||||
10. Fragment assembler TTL (1.8 inbound)
|
||||
|
||||
The Tier 1 group is a cohesive "post-A.5 network polish" pass — cheap, high-confidence, and several of them are likely candidates for the longstanding observer-not-perfect issue.
|
||||
|
||||
---
|
||||
|
||||
## 6. File map for cross-reference
|
||||
|
||||
| acdream | holtburger | Role |
|
||||
|---------|-----------|------|
|
||||
| `src/AcDream.Core.Net/WorldSession.cs:411-521` | `crates/holtburger-session/src/session/{api,auth}.rs` | Handshake driver |
|
||||
| `src/AcDream.Core.Net/WorldSession.cs:556-924` | `crates/holtburger-core/src/client/runtime.rs:91-200` + `messages.rs` | Recv loop + dispatch |
|
||||
| `src/AcDream.Core.Net/WorldSession.cs:1096-1156` | `crates/holtburger-session/src/session/send.rs` | Outbound transport (encode + ack piggyback) |
|
||||
| `src/AcDream.Core.Net/Cryptography/IsaacRandom.cs` | `crates/holtburger-protocol/src/crypto.rs` | ISAAC (we likely lack `search`-mode) |
|
||||
| `src/AcDream.Core.Net/Packets/PacketCodec.cs` | `session/{send,receive}.rs` + `optional_header.rs` | Encode/decode + optional header iteration |
|
||||
| `src/AcDream.Core.Net/Packets/FragmentAssembler.cs` | `session/send.rs::process_fragment` | Inbound reassembly |
|
||||
| `src/AcDream.Core.Net/Messages/MoveToState.cs` | `crates/holtburger-protocol/src/messages/movement/actions.rs:53-69` + `client/movement/common.rs:122-186` | MoveToState builder |
|
||||
| `src/AcDream.Core.Net/Messages/AutonomousPosition.cs` | `messages/movement/actions.rs:175-189` + `system.rs:858-893` | AutoPos builder + cadence |
|
||||
| **(missing)** | `crates/holtburger-session/src/session/reliability.rs` | **Retransmit machinery — entirely absent in acdream** |
|
||||
|
||||
---
|
||||
|
||||
## Method note
|
||||
|
||||
This study used four parallel general-purpose agents on the day-of pull (2026-05-10, holtburger HEAD `629695a`). All citations are file paths + line numbers in that exact tree. If holtburger moves forward, line numbers will drift; commit hashes (especially `99974cc`, `403bc98`, `336cbad`, `797aece`) are stable anchors.
|
||||
Loading…
Add table
Add a link
Reference in a new issue