- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
ALL plugin connections (constant-time compare). The old hardcoded
'your_shared_secret' in this public repo was no auth at all. Dockerfile
default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
falling back to a publicly-known signing key; agent systemd unit now
requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
nginx-proxied internet request satisfied via docker-proxy — full session
bypass and unauthenticated in-game command injection) with
private-source AND no X-Forwarded-For, i.e. only genuinely internal
callers (overlord-agent on the host, compose-network services). Invariant
documented in nginx/overlord.conf: every tracker-bound location must set
X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
(telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.
Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The glsa_ token on the /grafana/ location was committed to a public repo.
Verified dead: Grafana's service-account and api_key tables are empty (the
data dir is ephemeral container storage, so the SA was wiped on a past
recreate) and an arbitrary invalid bearer gets identical 200 responses —
panel embeds are actually served by anonymous Viewer auth
(GF_AUTH_ANONYMOUS_ENABLED=true). The header was a no-op; removing it
changes no behavior and removes the credential from the config.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds an MCP tool wrapping the inventory-service /search/items endpoint
with include_all_characters=true, so questions like 'find me a bracelet
with Legendary Acid Ward on any unequipped char' resolve in ONE tool call
instead of looping get_inventory over 60+ chars (which timed out at 120s).
- agent/tools.py: search_items_global wrapper
- agent/mcp_overlord.py: register new tool with detailed schema doc
- agent/claude_wrapper.py: include in --allowed-tools whitelist;
bump timeout 120s -> 240s
- nginx/overlord.conf: bump /api/agent/ proxy timeout 180s -> 300s
- CLAUDE.md: brief Claude to USE search_items for cross-char searches
Adds an in-dashboard AI assistant that answers questions about live game
state. Designed reactively (no background loops) — every user message in
the chat window or via /api/agent/ask runs one `claude -p` invocation.
Architecture:
- New host-side FastAPI service (agent/) on 127.0.0.1:8767, OUTSIDE the
dereth-tracker Docker container because `claude` and ~/.claude
credentials live on the host.
- nginx routes /api/agent/* to the host service.
- The same browser session cookie the tracker issues authenticates
agent requests (shared SECRET_KEY).
- The agent shells out to `claude -p --session-id <uuid>` with
cwd=/home/erik/MosswartOverlord. Sessions persist as JSONL on disk
via Claude Code's built-in machinery.
- An MCP stdio server (agent/mcp_overlord.py) exposes tools to Claude:
get_live_players, get_recent_rares, query_telemetry_db (read-only,
parsed by sqlglot to reject DML/DDL), get_player_state, get_inventory,
get_inventory_search, get_combat_stats, get_equipment_cantrips,
get_quest_status, get_server_health, suitbuilder_search.
- Read-only PG role (overlord_agent_ro) is the second line of defense
on the SQL tool — even a parser bypass can't mutate.
Frontend:
- AgentWindow.tsx — draggable chat window with localStorage-pinned
session UUID, "New Chat" button, on-mount rehydration from
/agent/sessions/{id}/history (parses Claude Code's JSONL).
- Wired into WindowRenderer + Sidebar (🤖 Assistant button).
Operational:
- systemd unit (overlord-agent.service) + install.sh.
- agent/README.md documents env vars, deploy flow, smoke tests.
- nginx/overlord.conf gets a new /api/agent/ location with 180s timeout.
- CLAUDE.md gets an "Overlord Assistant Mode" section briefing the
agent on which tools to use and how to behave.
NOT YET DEPLOYED — server still needs:
1. Apply agent/sql/0001_overlord_agent_ro.sql + ALTER ROLE password
2. Add AGENT_DB_DSN to /home/erik/MosswartOverlord/.env
3. bash agent/install.sh (creates venv, installs unit, starts service)
4. sudo cp /home/erik/MosswartOverlord/nginx/overlord.conf to nginx + reload
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The live host nginx config (/etc/nginx/sites-enabled/overlord) was not
tracked in git, leading to drift. This commit checks in a source-of-truth
copy under nginx/overlord.conf with a deploy procedure documented at the
top of the file.
Includes the proxy_read_timeout/proxy_send_timeout 1d settings for both
WebSocket location blocks (/websocket/ and /). Without these, nginx's
default 60s timeout drops idle plugin connections in a reconnect loop —
the symptom users saw was "WebSocket error … State: Aborted" every
~60s on idle characters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>