Merge branch 'claude/sharp-sanderson-6b47ae' — capture holtburger network study + Phase M.0

This commit is contained in:
Erik 2026-05-10 17:56:30 +02:00
commit 41fe6d0172
2 changed files with 216 additions and 1 deletions

View file

@ -438,13 +438,51 @@ implementation starts. Treat holtburger as the client-behavior oracle for this
phase; cross-check wire details against named retail, ACE, Chorizite, and AC2D
before porting.
**2026-05-10 update:** holtburger pulled to `629695a` (+237 commits since
last audit). First parity-pass written to
[`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md)
— that doc is the M.1 deliverable in draft form. Study identified six
high-ROI "Tier 1" fixes that are individually small and can ship as a
focused pre-pass before the bigger M.1-M.8 lift; tracked as **M.0** below.
Most relevant recent holtburger commits to consult: `99974cc` (session
crate split + retransmit core), `403bc98` (port-switch race), `336cbad`
(turning + locomotion fix), `797aece` (disconnect carries client_id).
**Sub-lanes:**
- **M.0 — Tier 1 quick-win polish pre-pass.** Six small, high-confidence
fixes that don't require the full M.1-M.8 layer extraction and can ship
as one focused PR (~1 day). Sourced from
[`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md)
§1 Tier 1. May ship independently of M.1-M.8.
1. **MoveToState wire-format audit** (study §1.1.a-e). Side-by-side
compare `Messages/MoveToState.cs` against holtburger
`client/movement/common.rs:122-186`. Pin: `current_hold_key` always
set, empty `commands[]` on held WASD, `turn_speed` always with
TURN_COMMAND, gait-aware dedup, no `turning` when locomotion ≠ 0.
Likely candidate for the longstanding "remote retail observer sees
us not perfect" bug.
2. **LoginComplete on every PlayerTeleport** (study §1.2). Currently
only sent on first PlayerCreate.
3. **EchoRequest → EchoResponse reply** (study §1.3). We parse and
ignore; ACE pings periodically — likely contributor to long-session
timeouts.
4. **Port-switch race fix** (study §1.4, holtburger commit `403bc98`).
Track pending vs confirmed `_connectEndpoint`.
5. **Disconnect packet carries client_id** (study §4, holtburger commit
`797aece`). Currently `id = 0`.
6. **Verify `IsaacRandom` has search-and-stash mode for out-of-order
ENCRYPTED_CHECKSUM packets** (study §1.7, holtburger
`crypto.rs:73-93`). 5-minute check; ~20 LOC port if missing —
latent bug under any UDP reorder event.
- **M.1 — Audit & parity map.** Produce a source-by-source comparison of
acdream `AcDream.Core.Net` and holtburger `holtburger-session`,
`holtburger-protocol`, and `holtburger-core` networking code. Inventory each
packet flag, optional header, session transition, control packet, fragment
path, game message, and game action. Mark each as `parity`, `partial`,
`missing`, or `intentional divergence`.
`missing`, or `intentional divergence`. **Status (2026-05-10): first pass
done at [`docs/research/2026-05-10-holtburger-network-stack-study.md`](../research/2026-05-10-holtburger-network-stack-study.md);
the formal parity table can extend that doc rather than start from
scratch.**
- **M.2 — Layer extraction.** Split the low-level stack under `WorldSession`
into testable components: `INetTransport`, `PacketCodec`,
`ReliablePacketSession`, `FragmentSession`, `GameMessageSession`, and the

View file

@ -0,0 +1,177 @@
# Holtburger network stack — study & port candidates for acdream
**Date:** 2026-05-10
**Holtburger reference:** github.com/merklejerk/holtburger, vendored at `references/holtburger/`, fast-forwarded from `88b19bd``629695a` (237 commits, ~3 months of work).
**Method:** Four parallel research agents — three over holtburger's transport, handshake, and movement; one inventorying acdream's current `src/AcDream.Core.Net/`. Findings cross-referenced and ranked by ROI.
## TL;DR
Holtburger has shipped real, citeable fixes since our last pin that we should adopt. The biggest tactical wins are:
1. **A handful of one-line MoveToState fixes** that are likely candidates for the "remote retail observer sees acdream's player not perfect" issue (#L.X).
2. **Three small handshake/transport corrections** — LoginComplete-on-teleport, EchoResponse reply, port-switch race — each <1 hour and each measurable.
3. **A real retransmit subsystem we're missing entirely.** Our `WorldSession` parses retransmit requests, doesn't honor them, has no resend buffer, and never asks for a resend. Lost packets just vanish. Holtburger's `session/reliability.rs` is the reference-quality pattern.
Separately, the audit surfaced one painful finding about acdream itself: **roughly half of our outbound `Messages/` library is dead code** — InteractRequests, InventoryActions, SocialActions, AllegianceRequests, CastSpellRequest, AppraiseRequest, and most of CharacterActions are built and unit-tested but have no `WorldSession.Send*` wrapper and no live caller. Phase B.4 (Use/UseWithTarget) per memory shipped, but the audit found no in-app caller. Either we left wiring on the table or there's an integration drift to investigate.
The remainder of this doc is organized as: ranked port candidates → confirmations of what we got right → traps (where holtburger is wrong or stubbed) → recent commits worth knowing → recommended sequencing → cross-reference file map.
---
## 1. Ranked port candidates (highest ROI first)
### 1.1 Outbound MoveToState audit — concrete suspects for the "observer not perfect" bug
Five specific items where holtburger's wire format is likely tighter than ours. Each is a small change in our `Messages/MoveToState.cs` builder; together they're the most likely cause of remote retail observers reporting our player "lagging forward" or "walking when running."
| # | Suspect | Holtburger reference |
|---|---------|----------------------|
| a | **`current_hold_key` always set on non-stop MoveToState.** Holtburger's drive emit seeds `flags = CURRENT_HOLD_KEY` and writes `current_hold_key = HoldKey::Run`(2) for run, `HoldKey::None`(1) for walk. ACE's relay code may treat its absence as "unknown" and broadcast Walk to observers. | `crates/holtburger-core/src/client/movement/common.rs:151-153` |
| b | **`commands[]` array MUST be empty on held WASD.** Holtburger never puts a `MotionItem` in `commands[]` for held movement — only for transient slash commands like `/dance`. If acdream is putting one in for held W (or letting movement_sequence bump per-frame), every observer's `apply_self_update_motion` re-applies the same sequence as a fresh interpolation start — exactly the symptom. | `system.rs:743-766` (`execute_transient_motion_at`) |
| c | **`turn_speed` always emitted alongside `TURN_COMMAND`.** Holtburger writes 1.5 rad/s for Run, 1.0 rad/s for Walk; the `TURN_SPEED` flag is *always* set whenever `TURN_COMMAND` is. Omitting it lets ACE default to 0 → "smoothly but slowly" turn observed. | `common.rs:184-186, 226-231` |
| d | **Dedup gate must include gait.** Holtburger's `should_send_motion_state_pulse` compares the full `(MotionState, MotionStyle)`. If acdream's dedup is keyed on only `(forward_command, hold_key)` it would suppress the Run→Walk transition (since `forward_command = WalkForward = 0x45000005` for both), explaining the Run↔Walk observer bug specifically. | `system.rs:916-926` |
| e | **Don't emit `turning` field when locomotion is non-zero.** Recent fix in commit `336cbad`: `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0 (avoids server-side double-correction where it interpolates turn AND locomotes). | `crates/holtburger-core/src/client/movement/common.rs` |
**Recommended action:** a side-by-side audit of [WorldSession.cs:6067-6089](src/AcDream.Core.Net/WorldSession.cs:6067) (MoveToState builder) and [Messages/MoveToState.cs](src/AcDream.Core.Net/Messages/MoveToState.cs) against holtburger `common.rs:122-186` and `system.rs:710-1000`. File whichever items don't already match as `#L.X.a-e` issues.
### 1.2 LoginComplete on every PlayerTeleport, not just first PlayerCreate
Holtburger sends `GameAction::LoginComplete` (0x00A1) **both** on first `PlayerCreate` (0xF746) AND on every `PlayerTeleport` (0xF74A) — no de-dup, server tolerates multiples. acdream sends it only on first PlayerCreate. Likely explains some portal-transition glitches.
References: holtburger `messages.rs:433-467` (PlayerCreate), `messages.rs:480-487` (PlayerTeleport). acdream sends only at [WorldSession.cs:648](src/AcDream.Core.Net/WorldSession.cs:648).
**Cost:** ~5 lines.
### 1.3 EchoRequest → EchoResponse reply
We parse `EchoRequest` from the optional header but never reply. ACE pings periodically; the missing response is a likely contributor to Network Timeout drops in long sessions. Holtburger handles it inline in the recv-message dispatcher.
Reference: holtburger `crates/holtburger-session/src/session/receive.rs::finalize_ordered_server_packet` and the optional-header iterator at `crates/holtburger-session/src/optional_header.rs:59-141`.
**Cost:** ~30 lines (parse the EchoRequest payload, build EchoResponse with mirrored time, send as control packet).
### 1.4 Port-switch race fix (commit `403bc98`)
On `ConnectRequest`, our `WorldSession` eagerly sets `_connectEndpoint = port+1`. Holtburger's recent fix introduces `pending_server_source_addr`: the new port is staged but `server_source_addr` is only updated when an actual packet arrives from the new port. ACE deployments occasionally send one more packet from `port` after the activation, and our code drops them.
References: holtburger `session/auth.rs:42-47` (stage), `session/receive.rs:17-51` (confirm on first packet from new port).
**Cost:** ~20 lines, one new field on `WorldSession`.
### 1.5 Non-blocking 200 ms handshake delay
We use `Thread.Sleep(200)` between receiving ConnectRequest and sending ConnectResponse on `port+1`. Holtburger queues ConnectResponse with `ready_at = Instant::now() + 200ms` and lets the recv loop keep draining during the gap (handles any inbound TimeSync that arrives in the window).
Reference: holtburger `session/auth.rs:42-66`, queued via `pending_control_packets` flushed by the recv loop. (Their old form, deleted in `99974cc`, used `tokio::time::sleep` and matched our blocking pattern.)
**Cost:** ~40 lines (small "deferred control packet" queue + flush check).
### 1.6 AutonomousPosition cadence audit
We have **three policies** in play, and at least two are wrong:
- **acdream:** fixed 200 ms heartbeat (per `memory/project_retail_motion_outbound`)
- **holtburger:** fixed 1 s heartbeat, unconditional regardless of motion (`common.rs:22`, `system.rs:858-893`)
- **cdb retail trace (memory):** AutoPos appears gated on actual motion
Most likely retail wins (cdb is observing real client behavior). If retail truly suppresses AutoPos when stationary, our 5× over-emission triggers ACE-side over-validation and may contribute to the observer-side jitter. **Recommended:** another cdb idle trace to confirm retail's exact behavior, then converge to it.
### 1.7 Retransmit machinery (entire subsystem)
Largest delta from holtburger. We are missing:
- **A retransmit cache.** Holtburger's `MAX_CACHED_PACKETS=512`, LRU-style, drops oldest when full (`reliability.rs:32-37`).
- **Server-requested retransmits.** When the server asks for resends, holtburger re-encrypts with current ISAAC + RETRANSMISSION flag and replays from cache (`reliability.rs:135-186`).
- **Client-issued retransmit requests.** When inbound seq has gaps, holtburger sends `RequestRetransmit` for up to 115 seqs in a 256-seq window, rate-limited to once per second (`MAX_RETRANSMIT_SEQUENCE_IDS=115`, `MAX_RETRANSMIT_SEQUENCE_WINDOW=256`, `REQUEST_RETRANSMIT_INTERVAL=1s`).
- **`Iteration` field handling.** Our `PacketHeader.Iteration` is always 0; holtburger increments on retransmit.
- **`ISAAC::search` for out-of-order ENCRYPTED_CHECKSUM packets.** Out-of-order packets have ISAAC keys that have already advanced. Holtburger scans forward up to 256 keys, stashing each skipped key in `xors: HashSet<u32>` for later out-of-order packets to consume via `consume_key_value` (`crypto.rs:73-93`). **A naive port either drops the out-of-order packet or corrupts the ISAAC stream.** If our IsaacRandom doesn't have a search-and-stash mode, this is a latent bug waiting for any UDP loss event.
Our `WorldSession` class doc explicitly defers this work (`WorldSession.cs:29` "ACK pump, retransmit handling … deferred"). Symptoms when it's missing: any packet loss → silent state divergence, eventual desync, "purple haze" / Network Timeout drops.
**Cost:** 1-2 days. The whole pattern is in holtburger's `reliability.rs` (196 lines) plus the ISAAC search-mode in `crypto.rs:73-93`.
### 1.8 Fragment assembler TTL + outbound multi-fragment split
Two smaller correctness gaps:
- **Inbound:** Our `FragmentAssembler` has no TTL. If a multi-fragment server message loses its middle fragment, the partials sit forever. Memory leak in any long session that sees UDP loss. Holtburger's reassembler tracks completion per `(sequence, id)` and lives inside `process_fragment` in `send.rs`.
- **Outbound:** Our `GameMessageFragment.BuildSingleFragment` throws on body > 448 bytes. Anything that needs splitting (long /tells, big inventory queries, large appraisals) silently can't be sent. Note: **holtburger doesn't do outbound fragmentation either** (`send_message` always emits `count: 1`, `send.rs:298`) — they're betting on UDP-level fragmentation. So this isn't a holtburger crib; it's a hole in both. AC2D + Chorizite are the better references when we get there.
---
## 2. Confirmations — we're doing it right
Three places where the audit confirmed our existing approach matches the reference:
- **Run/walk encoding via WalkForward + HoldKey.Run/None.** Holtburger sends `forward_command = 0x45000005 (WalkForward)` for **both** walk and run; the distinction is in `forward_hold_key` (Run=2 vs None=1) and `forward_speed`. ACE upgrades server-side. Test pinning this contract: `holtburger system/tests.rs:404-428`.
- **Two-step EnterWorld** (`0xF7C8 CharacterEnterWorldRequest` → wait for `0xF7DF ServerReady``0xF657 CharacterEnterWorld`).
- **ACK on every received packet with seq > 0.** Holtburger's `recv_packet_with_addr` queues an ack for every received packet with `sequence > 0 && flags != ACK_SEQUENCE`. Outbound `send_message` auto-piggybacks the latest server seq onto the next data packet; standalone ACKs flush only when nothing naturally goes out. (Worth double-checking that our `SendAck` is called automatically on `ProcessDatagram`, not as a separate periodic pump.)
One thing **worth re-verifying** because it's easy to invert: ISAAC seeding direction. Holtburger uses `isaac_c2s = Isaac::new(crd.client_seed)` and `isaac_s2c = Isaac::new(crd.server_seed)` — i.e. the wire field labelled `client_seed` seeds the C2S keystream, and vice versa. Worth a 30-second check that our `WorldSession` does the same.
---
## 3. Don't crib these (holtburger gaps / wrong)
- **Outbound fragmentation:** holtburger doesn't do it. Hole in both projects. Use AC2D + Chorizite when needed.
- **Jump (0xF61B):** holtburger never sends Jump. The TUI client can't jump. `JumpActionData` is decoder-only. Use cdb retail trace + Chorizite.ACProtocol for jump format reference.
- **Initial run_rate_scalar fallback:** holtburger uses 4.5 (max-cap formula, run_skill ≥ 800); acdream uses 2.4-2.94 default. Retail formula: `(load_mod * (run_skill / (run_skill + 200) * 11) + 4) / 4`. The right pre-PlayerDescription default depends on what retail does — cdb trace will settle it.
- **AutoPos cadence:** holtburger's 1-second unconditional heartbeat is probably wrong (cdb retail trace says gated on motion). Don't copy this verbatim; investigate first.
---
## 4. Recent commits worth knowing (last 237)
| Commit | Date | Intent | Relevance |
|--------|------|--------|-----------|
| `99974cc` | 2026-04-06 | "Fix/session issues" — splits 673-line `lib.rs` into `session/{api,auth,receive,send,reliability,types}`. **Adds the missing C↔S retransmit logic.** Replaces `tokio::sleep(200ms)` with deferred control-packet queue. | Read this diff if you read only one. |
| `403bc98` | 2026-04-21 | "do not switch ports prematurely" (#158). Pending vs confirmed source-port. | Apply same pattern to `WorldSession`. |
| `336cbad` | 2026-04-?? | "fix: more movement fixes". `autonomous_wire_motion_state` no longer emits `turning` when locomotion ≠ 0. | Likely also a bug class in our outbound MoveToState. |
| `797aece` | 2026-04-06 | DISCONNECT now carries `id = client_id` instead of 0. | One-line fix on our `Dispose` path. |
| `854c1bb` | (older) | "Feat/simulation system" (#105) — added the entire 2222-LOC `client/movement/{common,system}.rs`. | Foundation everything else builds on. |
Nothing in 237 commits changes LoginRequest payload, ConnectRequest parse, ISAAC seeding, or EnterWorld message ordering. The wire format is unchanged from what acdream targets — the deltas are internal architecture and bug fixes.
---
## 5. Recommended sequencing
**Tier 1 — Quick wins (under an hour each, high signal-to-noise):**
1. MoveToState audit fixes (1.1.a-e) — file as `#L.X.a-e`, batch into one PR
2. LoginComplete on PlayerTeleport (1.2)
3. EchoRequest → EchoResponse reply (1.3)
4. Port-switch race fix (1.4)
5. Non-blocking handshake delay (1.5)
6. Disconnect carries client_id (`797aece` finding)
**Tier 2 — Investigation, then fix:**
7. AutoPos cadence — cdb idle trace, then converge (1.6)
8. Audit "dead outbound builders" (Phase B.4 wiring drift) — separate from holtburger but surfaced by this study
**Tier 3 — Bigger investment:**
9. Retransmit subsystem (1.7) — port `reliability.rs` wholesale, including ISAAC search-mode (1-2 days)
10. Fragment assembler TTL (1.8 inbound)
The Tier 1 group is a cohesive "post-A.5 network polish" pass — cheap, high-confidence, and several of them are likely candidates for the longstanding observer-not-perfect issue.
---
## 6. File map for cross-reference
| acdream | holtburger | Role |
|---------|-----------|------|
| `src/AcDream.Core.Net/WorldSession.cs:411-521` | `crates/holtburger-session/src/session/{api,auth}.rs` | Handshake driver |
| `src/AcDream.Core.Net/WorldSession.cs:556-924` | `crates/holtburger-core/src/client/runtime.rs:91-200` + `messages.rs` | Recv loop + dispatch |
| `src/AcDream.Core.Net/WorldSession.cs:1096-1156` | `crates/holtburger-session/src/session/send.rs` | Outbound transport (encode + ack piggyback) |
| `src/AcDream.Core.Net/Cryptography/IsaacRandom.cs` | `crates/holtburger-protocol/src/crypto.rs` | ISAAC (we likely lack `search`-mode) |
| `src/AcDream.Core.Net/Packets/PacketCodec.cs` | `session/{send,receive}.rs` + `optional_header.rs` | Encode/decode + optional header iteration |
| `src/AcDream.Core.Net/Packets/FragmentAssembler.cs` | `session/send.rs::process_fragment` | Inbound reassembly |
| `src/AcDream.Core.Net/Messages/MoveToState.cs` | `crates/holtburger-protocol/src/messages/movement/actions.rs:53-69` + `client/movement/common.rs:122-186` | MoveToState builder |
| `src/AcDream.Core.Net/Messages/AutonomousPosition.cs` | `messages/movement/actions.rs:175-189` + `system.rs:858-893` | AutoPos builder + cadence |
| **(missing)** | `crates/holtburger-session/src/session/reliability.rs` | **Retransmit machinery — entirely absent in acdream** |
---
## Method note
This study used four parallel general-purpose agents on the day-of pull (2026-05-10, holtburger HEAD `629695a`). All citations are file paths + line numbers in that exact tree. If holtburger moves forward, line numbers will drift; commit hashes (especially `99974cc`, `403bc98`, `336cbad`, `797aece`) are stable anchors.