acdream/docs/research/2026-05-10-holtburger-network-stack-study.md
Erik b8b9845f50 docs(post-A.5): capture holtburger network-stack study + Phase M.0 quick-wins
Holtburger reference fast-forwarded from 88b19bd to 629695a (+237 commits).
Four parallel research agents produced a parity-first-pass between
holtburger's network stack and acdream's src/AcDream.Core.Net/.

Why captured now: study surfaced six small, high-confidence "Tier 1" fixes
that can ship before the bigger M.1-M.8 layer extraction. Most likely fix
for the longstanding "remote retail observer sees us not perfect" bug
(MoveToState wire-format mismatches). Two transport gaps (no EchoResponse
reply, eager port-switch) match recent holtburger fixes (403bc98, 99974cc).
One latent bug worth a 5-min check (ISAAC search-mode for out-of-order
ENCRYPTED_CHECKSUM packets).

Captured as Phase M.0 in the roadmap so the work survives the session and
can be picked up later. Existing M.1-M.8 lift unchanged; M.1 marked as
partially started since the research note is the parity-map deliverable
in draft form.

Files:
- docs/research/2026-05-10-holtburger-network-stack-study.md (new) — full
  study with ranked port candidates, recent commits worth knowing, and
  acdream-vs-holtburger file map.
- docs/plans/2026-04-11-roadmap.md — Phase M Plan-of-record updated with
  2026-05-10 pointer; M.0 sub-lane added before M.1; M.1 status note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 17:52:26 +02:00

16 KiB
Raw Blame History

Holtburger network stack — study & port candidates for acdream

Date: 2026-05-10 Holtburger reference: github.com/merklejerk/holtburger, vendored at references/holtburger/, fast-forwarded from 88b19bd629695a (237 commits, ~3 months of work). Method: Four parallel research agents — three over holtburger's transport, handshake, and movement; one inventorying acdream's current src/AcDream.Core.Net/. Findings cross-referenced and ranked by ROI.

TL;DR

Holtburger has shipped real, citeable fixes since our last pin that we should adopt. The biggest tactical wins are:

  1. A handful of one-line MoveToState fixes that are likely candidates for the "remote retail observer sees acdream's player not perfect" issue (#L.X).
  2. Three small handshake/transport corrections — LoginComplete-on-teleport, EchoResponse reply, port-switch race — each <1 hour and each measurable.
  3. A real retransmit subsystem we're missing entirely. Our WorldSession parses retransmit requests, doesn't honor them, has no resend buffer, and never asks for a resend. Lost packets just vanish. Holtburger's session/reliability.rs is the reference-quality pattern.

Separately, the audit surfaced one painful finding about acdream itself: roughly half of our outbound Messages/ library is dead code — InteractRequests, InventoryActions, SocialActions, AllegianceRequests, CastSpellRequest, AppraiseRequest, and most of CharacterActions are built and unit-tested but have no WorldSession.Send* wrapper and no live caller. Phase B.4 (Use/UseWithTarget) per memory shipped, but the audit found no in-app caller. Either we left wiring on the table or there's an integration drift to investigate.

The remainder of this doc is organized as: ranked port candidates → confirmations of what we got right → traps (where holtburger is wrong or stubbed) → recent commits worth knowing → recommended sequencing → cross-reference file map.


1. Ranked port candidates (highest ROI first)

1.1 Outbound MoveToState audit — concrete suspects for the "observer not perfect" bug

Five specific items where holtburger's wire format is likely tighter than ours. Each is a small change in our Messages/MoveToState.cs builder; together they're the most likely cause of remote retail observers reporting our player "lagging forward" or "walking when running."

# Suspect Holtburger reference
a current_hold_key always set on non-stop MoveToState. Holtburger's drive emit seeds flags = CURRENT_HOLD_KEY and writes current_hold_key = HoldKey::Run(2) for run, HoldKey::None(1) for walk. ACE's relay code may treat its absence as "unknown" and broadcast Walk to observers. crates/holtburger-core/src/client/movement/common.rs:151-153
b commands[] array MUST be empty on held WASD. Holtburger never puts a MotionItem in commands[] for held movement — only for transient slash commands like /dance. If acdream is putting one in for held W (or letting movement_sequence bump per-frame), every observer's apply_self_update_motion re-applies the same sequence as a fresh interpolation start — exactly the symptom. system.rs:743-766 (execute_transient_motion_at)
c turn_speed always emitted alongside TURN_COMMAND. Holtburger writes 1.5 rad/s for Run, 1.0 rad/s for Walk; the TURN_SPEED flag is always set whenever TURN_COMMAND is. Omitting it lets ACE default to 0 → "smoothly but slowly" turn observed. common.rs:184-186, 226-231
d Dedup gate must include gait. Holtburger's should_send_motion_state_pulse compares the full (MotionState, MotionStyle). If acdream's dedup is keyed on only (forward_command, hold_key) it would suppress the Run→Walk transition (since forward_command = WalkForward = 0x45000005 for both), explaining the Run↔Walk observer bug specifically. system.rs:916-926
e Don't emit turning field when locomotion is non-zero. Recent fix in commit 336cbad: autonomous_wire_motion_state no longer emits turning when locomotion ≠ 0 (avoids server-side double-correction where it interpolates turn AND locomotes). crates/holtburger-core/src/client/movement/common.rs

Recommended action: a side-by-side audit of WorldSession.cs:6067-6089 (MoveToState builder) and Messages/MoveToState.cs against holtburger common.rs:122-186 and system.rs:710-1000. File whichever items don't already match as #L.X.a-e issues.

1.2 LoginComplete on every PlayerTeleport, not just first PlayerCreate

Holtburger sends GameAction::LoginComplete (0x00A1) both on first PlayerCreate (0xF746) AND on every PlayerTeleport (0xF74A) — no de-dup, server tolerates multiples. acdream sends it only on first PlayerCreate. Likely explains some portal-transition glitches.

References: holtburger messages.rs:433-467 (PlayerCreate), messages.rs:480-487 (PlayerTeleport). acdream sends only at WorldSession.cs:648.

Cost: ~5 lines.

1.3 EchoRequest → EchoResponse reply

We parse EchoRequest from the optional header but never reply. ACE pings periodically; the missing response is a likely contributor to Network Timeout drops in long sessions. Holtburger handles it inline in the recv-message dispatcher.

Reference: holtburger crates/holtburger-session/src/session/receive.rs::finalize_ordered_server_packet and the optional-header iterator at crates/holtburger-session/src/optional_header.rs:59-141.

Cost: ~30 lines (parse the EchoRequest payload, build EchoResponse with mirrored time, send as control packet).

1.4 Port-switch race fix (commit 403bc98)

On ConnectRequest, our WorldSession eagerly sets _connectEndpoint = port+1. Holtburger's recent fix introduces pending_server_source_addr: the new port is staged but server_source_addr is only updated when an actual packet arrives from the new port. ACE deployments occasionally send one more packet from port after the activation, and our code drops them.

References: holtburger session/auth.rs:42-47 (stage), session/receive.rs:17-51 (confirm on first packet from new port).

Cost: ~20 lines, one new field on WorldSession.

1.5 Non-blocking 200 ms handshake delay

We use Thread.Sleep(200) between receiving ConnectRequest and sending ConnectResponse on port+1. Holtburger queues ConnectResponse with ready_at = Instant::now() + 200ms and lets the recv loop keep draining during the gap (handles any inbound TimeSync that arrives in the window).

Reference: holtburger session/auth.rs:42-66, queued via pending_control_packets flushed by the recv loop. (Their old form, deleted in 99974cc, used tokio::time::sleep and matched our blocking pattern.)

Cost: ~40 lines (small "deferred control packet" queue + flush check).

1.6 AutonomousPosition cadence audit

We have three policies in play, and at least two are wrong:

  • acdream: fixed 200 ms heartbeat (per memory/project_retail_motion_outbound)
  • holtburger: fixed 1 s heartbeat, unconditional regardless of motion (common.rs:22, system.rs:858-893)
  • cdb retail trace (memory): AutoPos appears gated on actual motion

Most likely retail wins (cdb is observing real client behavior). If retail truly suppresses AutoPos when stationary, our 5× over-emission triggers ACE-side over-validation and may contribute to the observer-side jitter. Recommended: another cdb idle trace to confirm retail's exact behavior, then converge to it.

1.7 Retransmit machinery (entire subsystem)

Largest delta from holtburger. We are missing:

  • A retransmit cache. Holtburger's MAX_CACHED_PACKETS=512, LRU-style, drops oldest when full (reliability.rs:32-37).
  • Server-requested retransmits. When the server asks for resends, holtburger re-encrypts with current ISAAC + RETRANSMISSION flag and replays from cache (reliability.rs:135-186).
  • Client-issued retransmit requests. When inbound seq has gaps, holtburger sends RequestRetransmit for up to 115 seqs in a 256-seq window, rate-limited to once per second (MAX_RETRANSMIT_SEQUENCE_IDS=115, MAX_RETRANSMIT_SEQUENCE_WINDOW=256, REQUEST_RETRANSMIT_INTERVAL=1s).
  • Iteration field handling. Our PacketHeader.Iteration is always 0; holtburger increments on retransmit.
  • ISAAC::search for out-of-order ENCRYPTED_CHECKSUM packets. Out-of-order packets have ISAAC keys that have already advanced. Holtburger scans forward up to 256 keys, stashing each skipped key in xors: HashSet<u32> for later out-of-order packets to consume via consume_key_value (crypto.rs:73-93). A naive port either drops the out-of-order packet or corrupts the ISAAC stream. If our IsaacRandom doesn't have a search-and-stash mode, this is a latent bug waiting for any UDP loss event.

Our WorldSession class doc explicitly defers this work (WorldSession.cs:29 "ACK pump, retransmit handling … deferred"). Symptoms when it's missing: any packet loss → silent state divergence, eventual desync, "purple haze" / Network Timeout drops.

Cost: 1-2 days. The whole pattern is in holtburger's reliability.rs (196 lines) plus the ISAAC search-mode in crypto.rs:73-93.

1.8 Fragment assembler TTL + outbound multi-fragment split

Two smaller correctness gaps:

  • Inbound: Our FragmentAssembler has no TTL. If a multi-fragment server message loses its middle fragment, the partials sit forever. Memory leak in any long session that sees UDP loss. Holtburger's reassembler tracks completion per (sequence, id) and lives inside process_fragment in send.rs.
  • Outbound: Our GameMessageFragment.BuildSingleFragment throws on body > 448 bytes. Anything that needs splitting (long /tells, big inventory queries, large appraisals) silently can't be sent. Note: holtburger doesn't do outbound fragmentation either (send_message always emits count: 1, send.rs:298) — they're betting on UDP-level fragmentation. So this isn't a holtburger crib; it's a hole in both. AC2D + Chorizite are the better references when we get there.

2. Confirmations — we're doing it right

Three places where the audit confirmed our existing approach matches the reference:

  • Run/walk encoding via WalkForward + HoldKey.Run/None. Holtburger sends forward_command = 0x45000005 (WalkForward) for both walk and run; the distinction is in forward_hold_key (Run=2 vs None=1) and forward_speed. ACE upgrades server-side. Test pinning this contract: holtburger system/tests.rs:404-428.
  • Two-step EnterWorld (0xF7C8 CharacterEnterWorldRequest → wait for 0xF7DF ServerReady0xF657 CharacterEnterWorld).
  • ACK on every received packet with seq > 0. Holtburger's recv_packet_with_addr queues an ack for every received packet with sequence > 0 && flags != ACK_SEQUENCE. Outbound send_message auto-piggybacks the latest server seq onto the next data packet; standalone ACKs flush only when nothing naturally goes out. (Worth double-checking that our SendAck is called automatically on ProcessDatagram, not as a separate periodic pump.)

One thing worth re-verifying because it's easy to invert: ISAAC seeding direction. Holtburger uses isaac_c2s = Isaac::new(crd.client_seed) and isaac_s2c = Isaac::new(crd.server_seed) — i.e. the wire field labelled client_seed seeds the C2S keystream, and vice versa. Worth a 30-second check that our WorldSession does the same.


3. Don't crib these (holtburger gaps / wrong)

  • Outbound fragmentation: holtburger doesn't do it. Hole in both projects. Use AC2D + Chorizite when needed.
  • Jump (0xF61B): holtburger never sends Jump. The TUI client can't jump. JumpActionData is decoder-only. Use cdb retail trace + Chorizite.ACProtocol for jump format reference.
  • Initial run_rate_scalar fallback: holtburger uses 4.5 (max-cap formula, run_skill ≥ 800); acdream uses 2.4-2.94 default. Retail formula: (load_mod * (run_skill / (run_skill + 200) * 11) + 4) / 4. The right pre-PlayerDescription default depends on what retail does — cdb trace will settle it.
  • AutoPos cadence: holtburger's 1-second unconditional heartbeat is probably wrong (cdb retail trace says gated on motion). Don't copy this verbatim; investigate first.

4. Recent commits worth knowing (last 237)

Commit Date Intent Relevance
99974cc 2026-04-06 "Fix/session issues" — splits 673-line lib.rs into session/{api,auth,receive,send,reliability,types}. Adds the missing C↔S retransmit logic. Replaces tokio::sleep(200ms) with deferred control-packet queue. Read this diff if you read only one.
403bc98 2026-04-21 "do not switch ports prematurely" (#158). Pending vs confirmed source-port. Apply same pattern to WorldSession.
336cbad 2026-04-?? "fix: more movement fixes". autonomous_wire_motion_state no longer emits turning when locomotion ≠ 0. Likely also a bug class in our outbound MoveToState.
797aece 2026-04-06 DISCONNECT now carries id = client_id instead of 0. One-line fix on our Dispose path.
854c1bb (older) "Feat/simulation system" (#105) — added the entire 2222-LOC client/movement/{common,system}.rs. Foundation everything else builds on.

Nothing in 237 commits changes LoginRequest payload, ConnectRequest parse, ISAAC seeding, or EnterWorld message ordering. The wire format is unchanged from what acdream targets — the deltas are internal architecture and bug fixes.


Tier 1 — Quick wins (under an hour each, high signal-to-noise):

  1. MoveToState audit fixes (1.1.a-e) — file as #L.X.a-e, batch into one PR
  2. LoginComplete on PlayerTeleport (1.2)
  3. EchoRequest → EchoResponse reply (1.3)
  4. Port-switch race fix (1.4)
  5. Non-blocking handshake delay (1.5)
  6. Disconnect carries client_id (797aece finding)

Tier 2 — Investigation, then fix: 7. AutoPos cadence — cdb idle trace, then converge (1.6) 8. Audit "dead outbound builders" (Phase B.4 wiring drift) — separate from holtburger but surfaced by this study

Tier 3 — Bigger investment: 9. Retransmit subsystem (1.7) — port reliability.rs wholesale, including ISAAC search-mode (1-2 days) 10. Fragment assembler TTL (1.8 inbound)

The Tier 1 group is a cohesive "post-A.5 network polish" pass — cheap, high-confidence, and several of them are likely candidates for the longstanding observer-not-perfect issue.


6. File map for cross-reference

acdream holtburger Role
src/AcDream.Core.Net/WorldSession.cs:411-521 crates/holtburger-session/src/session/{api,auth}.rs Handshake driver
src/AcDream.Core.Net/WorldSession.cs:556-924 crates/holtburger-core/src/client/runtime.rs:91-200 + messages.rs Recv loop + dispatch
src/AcDream.Core.Net/WorldSession.cs:1096-1156 crates/holtburger-session/src/session/send.rs Outbound transport (encode + ack piggyback)
src/AcDream.Core.Net/Cryptography/IsaacRandom.cs crates/holtburger-protocol/src/crypto.rs ISAAC (we likely lack search-mode)
src/AcDream.Core.Net/Packets/PacketCodec.cs session/{send,receive}.rs + optional_header.rs Encode/decode + optional header iteration
src/AcDream.Core.Net/Packets/FragmentAssembler.cs session/send.rs::process_fragment Inbound reassembly
src/AcDream.Core.Net/Messages/MoveToState.cs crates/holtburger-protocol/src/messages/movement/actions.rs:53-69 + client/movement/common.rs:122-186 MoveToState builder
src/AcDream.Core.Net/Messages/AutonomousPosition.cs messages/movement/actions.rs:175-189 + system.rs:858-893 AutoPos builder + cadence
(missing) crates/holtburger-session/src/session/reliability.rs Retransmit machinery — entirely absent in acdream

Method note

This study used four parallel general-purpose agents on the day-of pull (2026-05-10, holtburger HEAD 629695a). All citations are file paths + line numbers in that exact tree. If holtburger moves forward, line numbers will drift; commit hashes (especially 99974cc, 403bc98, 336cbad, 797aece) are stable anchors.