MosswartOverlord

SawatoMosswartsEnjoyersClub/MosswartOverlord

Author	SHA1	Message	Date
Erik	47607d75fb	docs: add fresh-session prompt for the parallel Go backend rewrite Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-24 08:31:47 +02:00
Erik	6a0bb9fe80	feat(sidebar): restore the rickroll title-click easter egg Holiday's over — revert the Sma Grodorna frog-hop title gag back to the original /rick.mp4 fullscreen rickroll + shake/spin it replaced. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-24 08:15:21 +02:00
Erik	cc686da532	feat(midsummer): retire the theme out of season (holiday over) Flip the seasonal master switch (SEASON_ACTIVE=false in useMidsummer) so the Sma Grodorna theme is fully dormant — no rain/frogs/maypole/banner/palette, regardless of any stored toggle preference — and remove the 🐸 toggle from the sidebar. All theme code is kept; to bring it back next Midsummer, flip SEASON_ACTIVE to true and re-add <FrogToggle /> in SidebarWindowButtons. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-24 08:12:00 +02:00
Erik	0565a54ae5	fix(live): window 'online' on server receive-time, not client clock The player-count flapping was client clock skew: telemetry is stamped with the game machine's DateTime.UtcNow (WebSocket.cs), and machines' clocks drift up to ~90s apart (proven: per-char offsets span -31s..+59s with steady 6s cadence; a wrong server clock would shift all equally, so the SPREAD proves clients differ from each other; a +59s future timestamp rules out lag). /live windowed on that client timestamp, so characters whose clock sat near the 30s boundary blinked in and out. Fix: stamp each telemetry row with the server's receive-time (received_at) and window the /live 'online' query on COALESCE(received_at, timestamp) instead of the client timestamp. A coarse timestamp bound (10 min) is kept only for TimescaleDB chunk pruning. Column added idempotently in init_db_async; COALESCE falls back to the client timestamp for pre-migration rows. Verified on the live DB: query valid, 8ms, equivalent pre-population. ~free CPU (one datetime.now() per ~14 inserts/sec). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-23 23:34:35 +02:00
Erik	645feef9aa	perf(inventory): cap concurrent forwards so flush bursts can't starve telemetry Root cause of the player-count flapping: the plugin's debounced inventory flush, combined with a fleet-wide relog wave (auto-update) phase-aligning the 60s flush timers, produced a synchronized burst of inventory forwards every cycle. The burst flooded the single event loop + httpx pool (errors in _do_handle_inventory_delta even though inventory-service was idle), periodically starving telemetry ingest (cliff 116→5 rows/10s) so characters aged out of the 30s window and the count flapped. - Global asyncio.Semaphore(8) around inventory forwarding: a burst can never monopolize the loop; telemetry always gets through. - Tighten the shared httpx client (max_connections=10, keepalive=5, 5s timeout) so a stale/slow connection can't hold a slot. Pairs with the plugin-side flush-timer jitter (2–5 min, re-rolled per tick) that de-synchronizes the fleet at the source. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-23 22:18:34 +02:00
Erik	349c15d944	perf(broadcast): serialize once with stdlib json, drop jsonable_encoder from hot path Every browser broadcast ran jsonable_encoder (slow recursive encode) and then re-serialized per client via send_json — so a payload to N browsers was encoded N+1 times, on the same single event-loop core that the telemetry/ inventory firehose already saturates. Now serialize ONCE with json.dumps + a datetime-aware default (_json_default mirrors jsonable_encoder for the types that actually appear: datetime, Enum, Decimal, set, bytes) and send the prebuilt string to every client via send_text. Verified the wire output parses identically to the old path. Pure backend change — no plugin, no frontend, no schema change; stdlib only so it deploys via restart with no image rebuild / dependency churn. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-23 21:40:52 +02:00
Erik	d86bc48862	feat(midsummer): rain of flowers/frogs/Swedish flags, dots become frogs, drop jingle Per request: remove the WebAudio jingle (+ its 🔊 toggle and sound state); replace the one-shot confetti with a continuous rain of 🌼🌸🐸🇸🇪🌿 over the screen (MidsummerRain, gated by the theme, reduced-motion aware, leak-free); and replace player-dot markers with frogs themselves (override the inline dot color/border) instead of a flower-crown on top. Still toggled by the 🐸 Midsommar switch. Includes rebuilt static bundle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 09:47:39 +02:00
Erik	7141a38c5c	build(midsummer): deploy Sma grodorna theme to static bundle Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 09:34:26 +02:00
Erik	1f86e7cc86	polish(midsummer): guard frog-hop against rapid re-click stacking Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 09:33:51 +02:00
Erik	3cd2165c15	feat(midsummer): WebAudio Sma grodorna jingle, plays once on first gesture	2026-06-19 09:30:00 +02:00
Erik	e896ef1f21	feat(midsummer): frog-hop easter egg replaces the rickroll	2026-06-19 09:29:07 +02:00
Erik	c4dd1b7ae7	feat(midsummer): glad midsommar banner + one-shot confetti	2026-06-19 09:28:34 +02:00
Erik	e7b0f11bb1	feat(midsummer): flower-crown dots, frog on selected	2026-06-19 09:27:40 +02:00
Erik	da0cc79def	feat(midsummer): dancing maypole pinned to map centre	2026-06-19 09:27:26 +02:00
Erik	2fb6fd2f3e	feat(midsummer): sidebar frog toggle + jingle toggle (sound stubbed)	2026-06-19 09:26:45 +02:00
Erik	580fd6fbc5	feat(midsummer): pond-green palette overlay for sidebar and map	2026-06-19 09:26:12 +02:00
Erik	568992d0f9	feat(midsummer): theme state provider + data-midsummer attribute	2026-06-19 09:25:54 +02:00
Erik	e803c35af9	docs(plan): Sma Grodorna midsummer theme implementation plan (+ spec: WebAudio jingle) 9-task plan with complete code for the frog/maypole theme: scoped CSS overlay, useMidsummer provider, dancing maypole, crown/frog dots, banner + confetti, frog-hop easter egg, WebAudio jingle. Spec updated to synthesize the jingle (no mp3 asset / licensing). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 09:22:54 +02:00
Erik	b3753d1ab0	docs(spec): Sma Grodorna midsummer theme design Full-takeover frog/maypole midsummer theme for the React frontend: scoped [data-midsummer] CSS overlay, useMidsummer hook (localStorage, default on), dancing maypole inside the map pan/zoom group, frog + flower-crown dots, Glad midsommar banner + confetti, frog-hop easter egg replacing the rickroll, play-once unmuted jingle. Manual 🐸 toggle. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 09:15:31 +02:00
Erik	52bf9342df	feat: SHARED_SECRET_LEGACY migration escape hatch for plugin secret rollout Accepts one legacy secret alongside the real one so existing clients keep registering while game machines migrate to websocket_secret.txt. Remove SHARED_SECRET_LEGACY from .env after the rollout. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 20:20:19 +02:00
Erik	15ae870117	docs: CLAUDE.md reflects env-based SHARED_SECRET and XFF internal-trust rule Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:08:39 +02:00
Erik	a28b61511c	security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups - SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses ALL plugin connections (constant-time compare). The old hardcoded 'your_shared_secret' in this public repo was no auth at all. Dockerfile default removed; generate_data.py reads the env var. - SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of falling back to a publicly-known signing key; agent systemd unit now requires /etc/overlord/agent.env (no '-' prefix). - AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every nginx-proxied internet request satisfied via docker-proxy — full session bypass and unauthenticated in-game command injection) with private-source AND no X-Forwarded-For, i.e. only genuinely internal callers (overlord-agent on the host, compose-network services). Invariant documented in nginx/overlord.conf: every tracker-bound location must set X-Forwarded-For. - /character-stats/test endpoints gated behind admin (they upsert real rows). - docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet- reachable; active brute-force observed in dereth-db logs). - discord-rare-monitor: drop dead SHARED_SECRET constant. - scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs (telemetry/spawn hypertable data excluded), 10MB canary, umask 077, TimescaleDB restore procedure. - Remove stray mangled-path css file from repo root. Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy- sequencing blockers addressed (secret staged before enforcement, exec bit set, cron uses bash). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:02:47 +02:00
Erik	c6a1af0c39	docs: rewrite CLAUDE.md from audit — drop stale 2025 fix journal The old file was ~half September-2025 changelog with claims now wrong: portal race condition (fixed via ON CONFLICT upsert), hypertable warning (telemetry_events IS a hypertable on live), pool 5-20 (actually 5-100), --no-cache rebuild for code changes (bind mounts + restart suffice), /ws/live unauthenticated (cookie-authenticated since), static-HTML frontend description (React since). Rewritten around current reality: component map, WS endpoints + auth caveats, two-database schema-as-code situation (alembic empty, manual ALTERs), route conventions, React deploy flow, operational notes. Overlord Assistant Mode section preserved verbatim (consumed at runtime by the agent service). AGENTS.md: remove nonexistent GET /history, fix /ws/live auth claim. Also delete stray mangled-path file from repo root. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 16:36:01 +02:00
Erik	a5b80fd9cd	security(nginx): remove dead Grafana service-account token from committed config The glsa_ token on the /grafana/ location was committed to a public repo. Verified dead: Grafana's service-account and api_key tables are empty (the data dir is ephemeral container storage, so the SA was wiped on a past recreate) and an arbitrary invalid bearer gets identical 200 responses — panel embeds are actually served by anonymous Viewer auth (GF_AUTH_ANONYMOUS_ENABLED=true). The header was a no-op; removing it changes no behavior and removes the credential from the config. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 16:25:20 +02:00
Erik	87e4f2ff62	fix(dashboard): table width:auto so Character column sizes to content only With width:100%, the table stretched to fill the container — and the Character column (the only one with stretchable text) absorbed all the extra space, looking much wider than the longest name needed. width:auto lets each column size to its content. Table now fits its data; container still scrolls horizontally if content ever exceeds the viewport. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-23 19:37:49 +02:00
Erik	5f43ddce93	feat(dashboard): click-to-highlight rows + character column auto-sizes Two small UX improvements to the Player Dashboard table (works in both the new-tab fullscreen page and the deprecated in-app window since they share PlayerDashboardContent): 1. Row highlight: click anywhere on a row to highlight it (blue tint + thin outline). Click again to unhighlight. Single selection — useful for tracking one character down a long sorted list. 2. Character column no longer truncates: removed maxWidth/overflow/ textOverflow on the name cell. Column now sizes to the longest character name (still no wrapping; container scrolls horizontally if names are extreme). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-23 19:35:09 +02:00
Erik	5bda2b64f4	feat(dashboard): open Player Dashboard in a new tab The 👥 Dashboard button used to open the player table as a draggable in-app window, which competed for screen space with the map. It now opens in a separate browser tab as a fullscreen page so users can put the dashboard on a second monitor. How: - App.tsx branches on ?view=dashboard → renders PlayerDashboardFullPage (new file in components/) instead of the default MapLayout. - SidebarWindowButtons.tsx: 👥 Dashboard onClick now does window.open('/?view=dashboard', '_blank', 'noopener'). Label shows '↗' so users know it's an external open. - PlayerDashboardWindow.tsx refactored: extracted the sortable table body into a reusable PlayerDashboardContent component. The old window shell stays registered in WindowRenderer for backward compat — just no longer reachable from the default sidebar. - map-layout.css: new .ml-dashboard-page rules for fullscreen layout. Each tab gets its own useLiveData + WebSocket connection (server already handles multiple browser clients). The new tab inherits the session cookie from the original tab — no re-login. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-23 19:31:26 +02:00
Erik	3cf6437617	fix(auth): populate request.state.user inside loopback-bypass branch The Docker-bridge / loopback bypass in AuthMiddleware was short-circuiting the whole auth flow without ever decoding the session cookie. Result: /me and other endpoints reading request.state.user got 401 for real logged-in browsers (because nginx → docker-proxy makes them look like 172.x). Symptom: dashboard admin UI invisible even for admin users — useCurrentUser saw 401 from /me and treated everyone as anonymous. Fix: in the bypass branch, still try to decode any session cookie present and populate request.state.user. The bypass still permits anonymous internal calls (overlord-agent's MCP tools), but real authenticated browsers now get their user correctly resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 20:17:22 +02:00
Erik	1c1c43d28b	feat(dashboard): logout button + admin user-management window Logout: new sidebar link 'Log out (username)' that POSTs /api/logout (clears session cookie) and navigates to /login. Visible to everyone. Replaces 'no logout functionality' state where users could only get out by deleting cookies manually. Admin window: new 'Admin · Users' window (only shown when current user.is_admin) lists all users in a table with: - Add user (username + password + admin checkbox) - Reset password inline per row - Toggle admin per row - Delete user per row (blocked for self) Wraps the existing /api-admin/users CRUD endpoints in main.py. Plumbing: useCurrentUser hook fetches /me on mount; apiPatch+apiDelete helpers added to api/client.ts; new endpoint wrappers exported from api/endpoints.ts; AdminUsersWindow.tsx registered in WindowRenderer under id prefix 'adminusers'; CSS for admin table/form/buttons and the muted-red logout link. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 20:10:10 +02:00
Erik	88e9e88f46	docs(agent): brief Claude on AC rare tier classification The user kept asking 'show me great rares' and Claude kept showing Crystals/Pearls/Jewels because the rare_events table doesn't store the tier — and Claude didn't know the distinction. Now CLAUDE.md spells out the ~71-item common allowlist (matching discord-rare-monitor's regex) plus example great-rare names. Includes a sample SQL query Claude can adapt for tier filtering.	2026-04-25 23:07:57 +02:00
Erik	1196746dbe	fix(agent): SQL parser robust against sqlglot version drift The query_telemetry_db tool was crashing with AttributeError because exp.AlterTable doesn't exist in this sqlglot version (renamed to Alter). Made the deny-class list build defensively via getattr and dropped any classes that the installed sqlglot doesn't expose. Also broadened the deny list (Alter, AlterColumn, AlterDatabase, Truncate, Grant, Revoke, Copy) and made the toplevel allowlist tolerant of missing classes too. The walk() return shape is also normalized in case sqlglot versions yield (node, parent, key) tuples vs. bare nodes. Belt-and-suspenders is fine — the GRANT-SELECT-only PG role is the real write barrier; the parser is just a faster/friendlier reject path.	2026-04-25 23:07:00 +02:00
Erik	6de4bfe03e	docs: README updated for Overlord Agent — host-side service, security stack, deploy flow Adds an AI Assistant section covering: - Architecture (host-side vs Docker, dedicated overlord-agent user) - The 12 MCP tools available to the model - 10-layer security stack (cookie auth, rate limit, audit log, allowed/disallowed tools, settings.json deny, system prompt, SQL parser, RO PG role, systemd hardening, dedicated UID) - Full file inventory under agent/ - Routing via nginx /api/agent/* - Cost / quota notes (subscription auth, reactive only) Plus features-section blurb, agent env vars table (/etc/overlord/agent.env), and deploy-flow recipes (code-only restart, requirements update, unit file change, first-time install).	2026-04-25 22:50:54 +02:00
Erik	0633865598	fix(agent): block Agent + Gmail/Drive/Calendar tools, brief model not to probe Two complementary changes after observing the model probe boundaries (it tried mcp__claude_ai_Gmail__search_threads, then tried to delegate to a subagent via the Agent tool, then suggested the user edit settings.local.json to add Gmail tools): 1. claude_wrapper.py adds to --disallowed-tools: - Agent (subagent spawning — should never delegate) - WebFetch (already; settings.json re-allows acpedia.org only) - Every Gmail/Calendar/Drive connector tool name we know about 2. CLAUDE.md adds a 'Non-negotiable scope rules' section: - Be a read-only game-state QA service, nothing else - Don't attempt tools outside your role - Don't explain how to bypass restrictions - Don't suggest settings.json edits - Don't enumerate hidden tools when asked Soft (system-prompt) + hard (CLI flag) defenses combined.	2026-04-25 22:45:39 +02:00
Erik	e780f249d1	fix(agent): keep strict permissions server-side, not in repo The previous commit put .claude/settings.json IN THE REPO, which would have applied its strict deny rules to ANY Claude Code invocation from this cwd — including the human user's interactive dev sessions on their own machine. That's wrong; the production agent's lockdown should not constrain the developer. Remove the committed file and gitignore .claude/ entirely. The repo is permission-neutral now. Strict permissions for the production agent come from two server-only sources: 1. CLI flags in agent/claude_wrapper.py (--allowed-tools + --disallowed-tools, passed by the systemd-spawned subprocess only) 2. /var/lib/overlord-agent/.claude/settings.json (the agent's own HOME — separate from any user's .claude/) Also bumps claude_wrapper.py with the explicit --disallowed-tools list of meta-tools (ToolSearch, Monitor, TodoWrite, TaskOutput, Skill, cron tools, etc.) that the --allowed-tools whitelist does not block on its own. Verified empirically: with only --allowed-tools, ToolSearch was still callable; --disallowed-tools is required.	2026-04-25 22:26:02 +02:00
Erik	f894399165	feat(agent): isolate from erik — dedicated overlord-agent user The agent service was running as User=erik, which meant: - Sessions polluted erik's ~/.claude/projects/ - erik's .claude/settings.local.json (months of accumulated dev permissions for docker/git/dotnet/etc.) was loaded by the production agent, defeating the --allowed-tools whitelist - Subscription rate quota mingled between human-erik's interactive Claude Code use and the production assistant - Theoretical access to /home/erik/.ssh, .bash_history, .gitconfig Now: - User=overlord-agent (system account, no shell, /var/lib/overlord-agent home) - HOME=/var/lib/overlord-agent — claude state fully isolated from erik - /home/erik/.claude permissions tightened to 0700 (was 0755) - group=overlord-agent on the repo + /etc/overlord/agent.env (read-only) Project settings: - New strict committed .claude/settings.json: deny Bash/Read/Write/Edit/ Glob/Grep/NotebookEdit/WebSearch; allow only WebFetch(domain:acpedia.org) - .claude/settings.local.json now gitignored (was leaking dev permissions to the server through the deploy)	2026-04-25 21:50:57 +02:00
Erik	49ae4369e0	fix(agent): relax SystemCallFilter — Node needs @cpu-emulation etc. The extra ~@cpu-emulation ~@obsolete ~@swap ~@raw-io negations on top of @system-service killed Claude Code (Node) with SIGSYS during startup. Keep just the truly dangerous groups blocked: ~@privileged ~@reboot ~@mount. The base @system-service preset already excludes others (no @debug, no @resources, etc. are included by default in that preset).	2026-04-25 21:31:14 +02:00
Erik	5cf052cedf	fix(agent): drop MemoryDenyWriteExecute — breaks Node.js V8 JIT Claude Code is a Node app. V8 JIT requires W^X transitions via mprotect with PROT_EXEC on JIT'd code pages. MemoryDenyWriteExecute kills the process with SIGTRAP/abort during startup (~10ms in). Without JIT we'd have to use --jitless mode, which destroys performance. The other systemd hardening (ProtectSystem, ProtectHome, InaccessiblePaths, NoNewPrivileges, capability drop, syscall filter, PrivateTmp, etc.) still gives strong filesystem and privilege isolation. The remaining shellcode-injection risk is theoretical — there is no Bash/Write/Edit tool exposed for an attacker to chain into. Also: MemoryLimit -> MemoryMax (deprecated unit form).	2026-04-25 21:29:16 +02:00
Erik	9d4c724b7f	feat(agent): security hardening — systemd lockdown, rate limit, audit log systemd unit now applies defense-in-depth: - ProtectSystem=strict + ProtectHome=read-only (rest of FS sealed) - ReadWritePaths only for ~/.claude (session JSONLs) and venv + audit log - InaccessiblePaths blocks /etc/shadow, /etc/ssh, /root, ~/.ssh, shell history - NoNewPrivileges + dropped capabilities (no setuid escalation, no caps) - PrivateTmp, PrivateDevices, ProtectKernel*, MemoryDenyWriteExecute - SystemCallFilter @system-service ~@privileged ~@debug ~@mount etc. - RestrictAddressFamilies blocks raw/packet sockets Application layer: - Per-user rate limit 60/hour (configurable via AGENT_RATE_MAX) - Per-user concurrency cap of 1 in-flight (no parallel claude burns) - JSONL audit log of every /agent/ask to /var/log/overlord-agent/audit.jsonl Logs username, message preview, result preview, timing, errors. Plus secrets migration: EnvironmentFile now prefers /etc/overlord/agent.env (root:erik 0640) over /home/erik/MosswartOverlord/.env, so even the read-only /home doesn't expose them. Falls back to old path during transition.	2026-04-25 21:25:40 +02:00
Erik	4ae18536be	feat(agent): cross-char search_items tool + bump timeouts Adds an MCP tool wrapping the inventory-service /search/items endpoint with include_all_characters=true, so questions like 'find me a bracelet with Legendary Acid Ward on any unequipped char' resolve in ONE tool call instead of looping get_inventory over 60+ chars (which timed out at 120s). - agent/tools.py: search_items_global wrapper - agent/mcp_overlord.py: register new tool with detailed schema doc - agent/claude_wrapper.py: include in --allowed-tools whitelist; bump timeout 120s -> 240s - nginx/overlord.conf: bump /api/agent/ proxy timeout 180s -> 300s - CLAUDE.md: brief Claude to USE search_items for cross-char searches	2026-04-25 21:13:26 +02:00
Erik	d3943e894c	fix(agent): SECURITY — replace bypassPermissions with dontAsk bypassPermissions ignores --allowed-tools entirely (per permission-modes.md docs). With it, the model could call Bash, Write, Edit, Read, etc. — confirmed by writing /tmp/owned.sh in a test. dontAsk is the correct production headless mode: auto-DENIES anything outside the --allowed-tools whitelist instead of prompting. Without this, our entire MCP whitelist was effectively useless.	2026-04-25 21:05:53 +02:00
Erik	6d5819d297	fix(agent): use --resume on existing sessions, --session-id only for new Claude Code rejects --session-id on a session that already exists on disk ('Session ID ... is already in use'). The first message of a conversation must use --session-id to create; every message after must use --resume. Detect by checking ~/.claude/projects/<encoded-cwd>/<uuid>.jsonl. Plus a belt-and-suspenders retry: if --session-id surprisingly fails with the 'already in use' string, automatically retry with --resume. This was the bug that caused chat windows to fail on the second message.	2026-04-25 20:51:46 +02:00
Erik	0745aefdb9	fix(auth): trust internal Docker/loopback connections in AuthMiddleware Same pattern we already use for /ws/live (host-side Discord bot bypass). Lets the new overlord-agent service call any tracker HTTP endpoint without forging a session cookie. Safe because port 8765 is bound to 127.0.0.1 in docker-compose.yml — only the host or other compose-network containers can reach it.	2026-04-25 20:47:47 +02:00
Erik	a3353e572d	fix(agent): whitelist MCP tools + bypass permissions for unattended service	2026-04-25 20:46:42 +02:00
Erik	64523c4e97	fix(agent): point .mcp.json at venv python so MCP deps resolve	2026-04-25 20:45:52 +02:00
Erik	79cf88d3f7	feat(agent): Phase 1 — chat-window AI assistant via Claude Code subprocess Adds an in-dashboard AI assistant that answers questions about live game state. Designed reactively (no background loops) — every user message in the chat window or via /api/agent/ask runs one `claude -p` invocation. Architecture: - New host-side FastAPI service (agent/) on 127.0.0.1:8767, OUTSIDE the dereth-tracker Docker container because `claude` and ~/.claude credentials live on the host. - nginx routes /api/agent/* to the host service. - The same browser session cookie the tracker issues authenticates agent requests (shared SECRET_KEY). - The agent shells out to `claude -p --session-id <uuid>` with cwd=/home/erik/MosswartOverlord. Sessions persist as JSONL on disk via Claude Code's built-in machinery. - An MCP stdio server (agent/mcp_overlord.py) exposes tools to Claude: get_live_players, get_recent_rares, query_telemetry_db (read-only, parsed by sqlglot to reject DML/DDL), get_player_state, get_inventory, get_inventory_search, get_combat_stats, get_equipment_cantrips, get_quest_status, get_server_health, suitbuilder_search. - Read-only PG role (overlord_agent_ro) is the second line of defense on the SQL tool — even a parser bypass can't mutate. Frontend: - AgentWindow.tsx — draggable chat window with localStorage-pinned session UUID, "New Chat" button, on-mount rehydration from /agent/sessions/{id}/history (parses Claude Code's JSONL). - Wired into WindowRenderer + Sidebar (🤖 Assistant button). Operational: - systemd unit (overlord-agent.service) + install.sh. - agent/README.md documents env vars, deploy flow, smoke tests. - nginx/overlord.conf gets a new /api/agent/ location with 180s timeout. - CLAUDE.md gets an "Overlord Assistant Mode" section briefing the agent on which tools to use and how to behave. NOT YET DEPLOYED — server still needs: 1. Apply agent/sql/0001_overlord_agent_ro.sql + ALTER ROLE password 2. Add AGENT_DB_DSN to /home/erik/MosswartOverlord/.env 3. bash agent/install.sh (creates venv, installs unit, starts service) 4. sudo cp /home/erik/MosswartOverlord/nginx/overlord.conf to nginx + reload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 20:43:59 +02:00
Erik	aeddaf9925	fix(ws): per-character lock for inventory_delta to prevent FK race The previous commit moved inventory_delta handling to fire-and-forget asyncio tasks. That removed the WS-loop blockage but introduced a race: when the same character generated multiple deltas in quick succession (mana burn, ID refresh, loot bursts), the tasks ran concurrently and inventory-service's DELETE-then-INSERT path raced on the items table: asyncpg.exceptions.ForeignKeyViolationError: update or delete on table 'items' violates foreign key constraint 'item_combat_stats_item_id_fkey' The 500 errors caused inventory_delta updates to be dropped silently (likely the source of the 'items in wrong container' bug the user reported earlier — every delta returning 500 means the DB never updates). Fix: per-character asyncio.Lock — deltas for the same character serialize, deltas for different characters still run in parallel. Restores correctness without losing the non-blocking-WS-loop benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 00:47:59 +02:00
Erik	e512c1c296	fix(ws): non-blocking inventory_delta + better disconnect handling Two issues causing plugin WS disconnects on heavy-loot characters: 1. inventory_delta processing was awaiting an httpx POST to inventory- service inline within the WS receive loop. Each delta also created a fresh httpx.AsyncClient (no connection pool reuse). When inventory- service was slow under load, the receive loop blocked, keepalives stopped flowing, and the connection eventually dropped (especially for characters spamming deltas: Elliot was reconnecting ~every 4 min). Fix: process each delta as an asyncio.create_task() — the WS receive loop returns immediately to read the next message. Use a shared httpx.AsyncClient with connection pooling. 2. websocket.receive_text() raises RuntimeError ("Need to call accept first") instead of WebSocketDisconnect in some race conditions when the connection closes mid-await. The receive loop only caught WebSocketDisconnect, so RuntimeError propagated up as an exception traceback in logs. Fix: catch RuntimeError and log as a clean disconnect. Also: log close code/reason on WebSocketDisconnect so we can tell apart clean closes (1000/1001) from network drops (1006) etc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 00:36:01 +02:00
Erik	f111e5063b	ops: add nginx site config to repo as source-of-truth The live host nginx config (/etc/nginx/sites-enabled/overlord) was not tracked in git, leading to drift. This commit checks in a source-of-truth copy under nginx/overlord.conf with a deploy procedure documented at the top of the file. Includes the proxy_read_timeout/proxy_send_timeout 1d settings for both WebSocket location blocks (/websocket/ and /). Without these, nginx's default 60s timeout drops idle plugin connections in a reconnect loop — the symptom users saw was "WebSocket error … State: Aborted" every ~60s on idle characters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 00:32:33 +02:00
Erik	3c634adbdc	docs: rewrite README to reflect current architecture Full rewrite covering: - React v2 frontend at /, classic v1 preserved at /classic - WebSocket message-type subscription mechanism (bot filter fix) - Death + idle alerts via Discord webhook with 5-min grace period - spawn_events now a TimescaleDB hypertable with 7-day retention - server_health_checks removed (write-only bloat) - PostgreSQL memory tuning (shared_buffers 8GB on 32GB host) - Uvicorn runs without --reload in production - deploy-frontend.sh requirement for React builds - Combat stats (Mag-Tools style), vital sharing, all WS event types - Cross-machine vital sharing via WebSocket relay - Deploy flows (quick / frontend / full rebuild) - BUILD_VERSION CalVer stamp format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 14:55:30 +02:00
Erik	f7f04d6a84	Revert floating badge, remove debug logs The floating version badge scrolled awkwardly and wasn't necessary now that the bind-mount/deploy issue is fixed. The existing ml-version inside the Sidebar is sufficient. Also removed the temporary [INV_DEBUG] console logs from useLiveData and InventoryWindow — the inventory live-update bug is confirmed fixed. Kept the per-character inventoryVersions fix and the cache-buster on the refetch URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 19:20:24 +02:00

1 2 3 4 5 ...

286 commits