MosswartOverlord

SawatoMosswartsEnjoyersClub/MosswartOverlord

Author	SHA1	Message	Date
Erik	a28b61511c	security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups - SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses ALL plugin connections (constant-time compare). The old hardcoded 'your_shared_secret' in this public repo was no auth at all. Dockerfile default removed; generate_data.py reads the env var. - SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of falling back to a publicly-known signing key; agent systemd unit now requires /etc/overlord/agent.env (no '-' prefix). - AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every nginx-proxied internet request satisfied via docker-proxy — full session bypass and unauthenticated in-game command injection) with private-source AND no X-Forwarded-For, i.e. only genuinely internal callers (overlord-agent on the host, compose-network services). Invariant documented in nginx/overlord.conf: every tracker-bound location must set X-Forwarded-For. - /character-stats/test endpoints gated behind admin (they upsert real rows). - docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet- reachable; active brute-force observed in dereth-db logs). - discord-rare-monitor: drop dead SHARED_SECRET constant. - scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs (telemetry/spawn hypertable data excluded), 10MB canary, umask 077, TimescaleDB restore procedure. - Remove stray mangled-path css file from repo root. Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy- sequencing blockers addressed (secret staged before enforcement, exec bit set, cron uses bash). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:02:47 +02:00
Erik	1196746dbe	fix(agent): SQL parser robust against sqlglot version drift The query_telemetry_db tool was crashing with AttributeError because exp.AlterTable doesn't exist in this sqlglot version (renamed to Alter). Made the deny-class list build defensively via getattr and dropped any classes that the installed sqlglot doesn't expose. Also broadened the deny list (Alter, AlterColumn, AlterDatabase, Truncate, Grant, Revoke, Copy) and made the toplevel allowlist tolerant of missing classes too. The walk() return shape is also normalized in case sqlglot versions yield (node, parent, key) tuples vs. bare nodes. Belt-and-suspenders is fine — the GRANT-SELECT-only PG role is the real write barrier; the parser is just a faster/friendlier reject path.	2026-04-25 23:07:00 +02:00
Erik	0633865598	fix(agent): block Agent + Gmail/Drive/Calendar tools, brief model not to probe Two complementary changes after observing the model probe boundaries (it tried mcp__claude_ai_Gmail__search_threads, then tried to delegate to a subagent via the Agent tool, then suggested the user edit settings.local.json to add Gmail tools): 1. claude_wrapper.py adds to --disallowed-tools: - Agent (subagent spawning — should never delegate) - WebFetch (already; settings.json re-allows acpedia.org only) - Every Gmail/Calendar/Drive connector tool name we know about 2. CLAUDE.md adds a 'Non-negotiable scope rules' section: - Be a read-only game-state QA service, nothing else - Don't attempt tools outside your role - Don't explain how to bypass restrictions - Don't suggest settings.json edits - Don't enumerate hidden tools when asked Soft (system-prompt) + hard (CLI flag) defenses combined.	2026-04-25 22:45:39 +02:00
Erik	e780f249d1	fix(agent): keep strict permissions server-side, not in repo The previous commit put .claude/settings.json IN THE REPO, which would have applied its strict deny rules to ANY Claude Code invocation from this cwd — including the human user's interactive dev sessions on their own machine. That's wrong; the production agent's lockdown should not constrain the developer. Remove the committed file and gitignore .claude/ entirely. The repo is permission-neutral now. Strict permissions for the production agent come from two server-only sources: 1. CLI flags in agent/claude_wrapper.py (--allowed-tools + --disallowed-tools, passed by the systemd-spawned subprocess only) 2. /var/lib/overlord-agent/.claude/settings.json (the agent's own HOME — separate from any user's .claude/) Also bumps claude_wrapper.py with the explicit --disallowed-tools list of meta-tools (ToolSearch, Monitor, TodoWrite, TaskOutput, Skill, cron tools, etc.) that the --allowed-tools whitelist does not block on its own. Verified empirically: with only --allowed-tools, ToolSearch was still callable; --disallowed-tools is required.	2026-04-25 22:26:02 +02:00
Erik	f894399165	feat(agent): isolate from erik — dedicated overlord-agent user The agent service was running as User=erik, which meant: - Sessions polluted erik's ~/.claude/projects/ - erik's .claude/settings.local.json (months of accumulated dev permissions for docker/git/dotnet/etc.) was loaded by the production agent, defeating the --allowed-tools whitelist - Subscription rate quota mingled between human-erik's interactive Claude Code use and the production assistant - Theoretical access to /home/erik/.ssh, .bash_history, .gitconfig Now: - User=overlord-agent (system account, no shell, /var/lib/overlord-agent home) - HOME=/var/lib/overlord-agent — claude state fully isolated from erik - /home/erik/.claude permissions tightened to 0700 (was 0755) - group=overlord-agent on the repo + /etc/overlord/agent.env (read-only) Project settings: - New strict committed .claude/settings.json: deny Bash/Read/Write/Edit/ Glob/Grep/NotebookEdit/WebSearch; allow only WebFetch(domain:acpedia.org) - .claude/settings.local.json now gitignored (was leaking dev permissions to the server through the deploy)	2026-04-25 21:50:57 +02:00
Erik	49ae4369e0	fix(agent): relax SystemCallFilter — Node needs @cpu-emulation etc. The extra ~@cpu-emulation ~@obsolete ~@swap ~@raw-io negations on top of @system-service killed Claude Code (Node) with SIGSYS during startup. Keep just the truly dangerous groups blocked: ~@privileged ~@reboot ~@mount. The base @system-service preset already excludes others (no @debug, no @resources, etc. are included by default in that preset).	2026-04-25 21:31:14 +02:00
Erik	5cf052cedf	fix(agent): drop MemoryDenyWriteExecute — breaks Node.js V8 JIT Claude Code is a Node app. V8 JIT requires W^X transitions via mprotect with PROT_EXEC on JIT'd code pages. MemoryDenyWriteExecute kills the process with SIGTRAP/abort during startup (~10ms in). Without JIT we'd have to use --jitless mode, which destroys performance. The other systemd hardening (ProtectSystem, ProtectHome, InaccessiblePaths, NoNewPrivileges, capability drop, syscall filter, PrivateTmp, etc.) still gives strong filesystem and privilege isolation. The remaining shellcode-injection risk is theoretical — there is no Bash/Write/Edit tool exposed for an attacker to chain into. Also: MemoryLimit -> MemoryMax (deprecated unit form).	2026-04-25 21:29:16 +02:00
Erik	9d4c724b7f	feat(agent): security hardening — systemd lockdown, rate limit, audit log systemd unit now applies defense-in-depth: - ProtectSystem=strict + ProtectHome=read-only (rest of FS sealed) - ReadWritePaths only for ~/.claude (session JSONLs) and venv + audit log - InaccessiblePaths blocks /etc/shadow, /etc/ssh, /root, ~/.ssh, shell history - NoNewPrivileges + dropped capabilities (no setuid escalation, no caps) - PrivateTmp, PrivateDevices, ProtectKernel*, MemoryDenyWriteExecute - SystemCallFilter @system-service ~@privileged ~@debug ~@mount etc. - RestrictAddressFamilies blocks raw/packet sockets Application layer: - Per-user rate limit 60/hour (configurable via AGENT_RATE_MAX) - Per-user concurrency cap of 1 in-flight (no parallel claude burns) - JSONL audit log of every /agent/ask to /var/log/overlord-agent/audit.jsonl Logs username, message preview, result preview, timing, errors. Plus secrets migration: EnvironmentFile now prefers /etc/overlord/agent.env (root:erik 0640) over /home/erik/MosswartOverlord/.env, so even the read-only /home doesn't expose them. Falls back to old path during transition.	2026-04-25 21:25:40 +02:00
Erik	4ae18536be	feat(agent): cross-char search_items tool + bump timeouts Adds an MCP tool wrapping the inventory-service /search/items endpoint with include_all_characters=true, so questions like 'find me a bracelet with Legendary Acid Ward on any unequipped char' resolve in ONE tool call instead of looping get_inventory over 60+ chars (which timed out at 120s). - agent/tools.py: search_items_global wrapper - agent/mcp_overlord.py: register new tool with detailed schema doc - agent/claude_wrapper.py: include in --allowed-tools whitelist; bump timeout 120s -> 240s - nginx/overlord.conf: bump /api/agent/ proxy timeout 180s -> 300s - CLAUDE.md: brief Claude to USE search_items for cross-char searches	2026-04-25 21:13:26 +02:00
Erik	d3943e894c	fix(agent): SECURITY — replace bypassPermissions with dontAsk bypassPermissions ignores --allowed-tools entirely (per permission-modes.md docs). With it, the model could call Bash, Write, Edit, Read, etc. — confirmed by writing /tmp/owned.sh in a test. dontAsk is the correct production headless mode: auto-DENIES anything outside the --allowed-tools whitelist instead of prompting. Without this, our entire MCP whitelist was effectively useless.	2026-04-25 21:05:53 +02:00
Erik	6d5819d297	fix(agent): use --resume on existing sessions, --session-id only for new Claude Code rejects --session-id on a session that already exists on disk ('Session ID ... is already in use'). The first message of a conversation must use --session-id to create; every message after must use --resume. Detect by checking ~/.claude/projects/<encoded-cwd>/<uuid>.jsonl. Plus a belt-and-suspenders retry: if --session-id surprisingly fails with the 'already in use' string, automatically retry with --resume. This was the bug that caused chat windows to fail on the second message.	2026-04-25 20:51:46 +02:00
Erik	a3353e572d	fix(agent): whitelist MCP tools + bypass permissions for unattended service	2026-04-25 20:46:42 +02:00
Erik	79cf88d3f7	feat(agent): Phase 1 — chat-window AI assistant via Claude Code subprocess Adds an in-dashboard AI assistant that answers questions about live game state. Designed reactively (no background loops) — every user message in the chat window or via /api/agent/ask runs one `claude -p` invocation. Architecture: - New host-side FastAPI service (agent/) on 127.0.0.1:8767, OUTSIDE the dereth-tracker Docker container because `claude` and ~/.claude credentials live on the host. - nginx routes /api/agent/* to the host service. - The same browser session cookie the tracker issues authenticates agent requests (shared SECRET_KEY). - The agent shells out to `claude -p --session-id <uuid>` with cwd=/home/erik/MosswartOverlord. Sessions persist as JSONL on disk via Claude Code's built-in machinery. - An MCP stdio server (agent/mcp_overlord.py) exposes tools to Claude: get_live_players, get_recent_rares, query_telemetry_db (read-only, parsed by sqlglot to reject DML/DDL), get_player_state, get_inventory, get_inventory_search, get_combat_stats, get_equipment_cantrips, get_quest_status, get_server_health, suitbuilder_search. - Read-only PG role (overlord_agent_ro) is the second line of defense on the SQL tool — even a parser bypass can't mutate. Frontend: - AgentWindow.tsx — draggable chat window with localStorage-pinned session UUID, "New Chat" button, on-mount rehydration from /agent/sessions/{id}/history (parses Claude Code's JSONL). - Wired into WindowRenderer + Sidebar (🤖 Assistant button). Operational: - systemd unit (overlord-agent.service) + install.sh. - agent/README.md documents env vars, deploy flow, smoke tests. - nginx/overlord.conf gets a new /api/agent/ location with 180s timeout. - CLAUDE.md gets an "Overlord Assistant Mode" section briefing the agent on which tools to use and how to behave. NOT YET DEPLOYED — server still needs: 1. Apply agent/sql/0001_overlord_agent_ro.sql + ALTER ROLE password 2. Add AGENT_DB_DSN to /home/erik/MosswartOverlord/.env 3. bash agent/install.sh (creates venv, installs unit, starts service) 4. sudo cp /home/erik/MosswartOverlord/nginx/overlord.conf to nginx + reload Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-25 20:43:59 +02:00

13 commits