Consumes the Python tracker's /ws/live firehose (subscribed to rare+chat),
classifies rares common/great, posts embeds + relays allegiance chat.
- classify.go: the 74-name common-rares set, extracted verbatim from the Python
COMMON_RARES_PATTERN (not hand-transcribed). go test runs at build time; a
server-side dump-rares vs the Python regex confirms the sets are IDENTICAL.
- poster.go: a `poster` interface with a real discordgo impl (REST sends by
channel id; gold/blue embeds, location/time fields, icon attachment) and a
dry-run log impl.
- ws.go: coder/websocket client to /ws/live with subscribe, ping keepalive,
exponential-backoff reconnect; rare/chat dispatch incl. vortex-warning + the
MONITOR_CHARACTER filter.
- SAFE BY DEFAULT: dry-run unless a token AND DRY_RUN=0 are set, so it can never
double-post to production. Deployed via the compose override
(discord-rare-monitor-go), running dry-run against the same live firehose.
Validated on the server: connects, subscribes, relays a real chat in dry-run;
classifier parity 74/74 vs the Python regex.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The agent cannot sudo (password required), so nginx deploy is a user step.
go-services/nginx/go-location.conf holds the `location /go/` block + the
`upstream tracker_go` line with apply instructions. Not required for the
parallel run (the Go service is parity-verified on loopback); this is for
browser-reachable /go/ access. Live overlord.conf has drifted from the repo
copy — reconcile by hand, don't cp-overwrite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replicates main.py's AuthMiddleware so /go/ can be exposed safely:
- internal-trust: private source IP AND no X-Forwarded-For => skip auth
(loopback/compose callers; nginx adds XFF to all internet traffic).
- session cookie: byte-compatible itsdangerous URLSafeTimedSerializer verify
(HMAC-SHA1, django-concat key derivation sha1("itsdangerous"+"signer"+key),
Unix-epoch timestamp, urlsafe-b64 no pad, optional zlib payload), keyed on the
same SECRET_KEY. 30-day max-age. Public allowlist (/login,/logout,login assets,
/icons/,/health); 302->/login for html, 401 JSON otherwise.
Validated on the server: internal-trust loopback 200; external no-cookie 401;
html 302; valid cookie 200; tampered 401; /health public 200; and the SAME
Python-issued cookie authenticates BOTH services (cross-compat proof).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds the rest of the read-side endpoints to the Go tracker, all parity-checked
against the live Python service:
- DB reads: /stats/{c}, /portals, /spawns/heatmap, /server-health,
/character-stats/{c} (stats_data JSONB merged to top level),
/combat-stats[/{c}], /inventories, /inventory/{c}/search.
- 5-minute totals cache + /total-rares, /total-kills.
- Ingest-only state returned as Python's empty/default shapes (/quest-status,
/vital-sharing/peers, /equipment-cantrip-state/{c}); /issues (flat file),
/me (401 until cookie verification lands).
- Streaming reverse proxy to inventory-service (/inventory/{c},
/inventory-characters, /search/*, /sets/list, /inv/{path...} incl. the SSE
suitbuilder stream).
- compare/compare_endpoints.py: structural parity for all read endpoints +
exact-match check for /character-stats and /combat-stats on OFFLINE chars
(online chars legitimately differ — Python serves a richer live overlay that
Phase-1 Go lacks until ingest).
Verified live: 14/14 endpoints structural-match, 8/8 rich offline chars
exact-match on /character-stats.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Parallel Go reimplementation of the dereth-tracker read side, deployed
loopback-only (:8770) and reading the dereth TimescaleDB read-only. The live
Python stack is untouched (added via a compose override, not by editing the
tracked docker-compose.yml).
- Phase 0 scaffold: stdlib net/http server (Go 1.22+ method+path routing),
/health + /api-version, multi-stage distroless Docker build, and
go-services/docker-compose.go.yml override (loopback :8770).
- Phase 1: pgx v5 pool forced into read-only transactions, a 5s /live + /trails
cache loop using the exact main.py:837 SQL, and Python-isoformat timestamps
so output matches FastAPI's jsonable_encoder.
- compare/compare_live.py: parity harness vs the live Python service. Uses the
server-stamped received_at to prove same-row full-field equality and to make
the online-set diff boundary-aware.
Verified on live traffic (73 players): identical online set + 23-key schema,
identity/type parity for all, every same-row pair matches on every field, and
diff-row pairs differ only by the ~6s two-cache refresh skew.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Holiday's over — revert the Sma Grodorna frog-hop title gag back to the
original /rick.mp4 fullscreen rickroll + shake/spin it replaced.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Flip the seasonal master switch (SEASON_ACTIVE=false in useMidsummer) so the
Sma Grodorna theme is fully dormant — no rain/frogs/maypole/banner/palette,
regardless of any stored toggle preference — and remove the 🐸 toggle from the
sidebar. All theme code is kept; to bring it back next Midsummer, flip
SEASON_ACTIVE to true and re-add <FrogToggle /> in SidebarWindowButtons.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The player-count flapping was client clock skew: telemetry is stamped with the
game machine's DateTime.UtcNow (WebSocket.cs), and machines' clocks drift up to
~90s apart (proven: per-char offsets span -31s..+59s with steady 6s cadence; a
wrong server clock would shift all equally, so the SPREAD proves clients differ
from each other; a +59s future timestamp rules out lag). /live windowed on that
client timestamp, so characters whose clock sat near the 30s boundary blinked
in and out.
Fix: stamp each telemetry row with the server's receive-time (received_at) and
window the /live 'online' query on COALESCE(received_at, timestamp) instead of
the client timestamp. A coarse timestamp bound (10 min) is kept only for
TimescaleDB chunk pruning. Column added idempotently in init_db_async; COALESCE
falls back to the client timestamp for pre-migration rows. Verified on the live
DB: query valid, 8ms, equivalent pre-population. ~free CPU (one datetime.now()
per ~14 inserts/sec).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Root cause of the player-count flapping: the plugin's debounced inventory
flush, combined with a fleet-wide relog wave (auto-update) phase-aligning the
60s flush timers, produced a synchronized burst of inventory forwards every
cycle. The burst flooded the single event loop + httpx pool (errors in
_do_handle_inventory_delta even though inventory-service was idle), periodically
starving telemetry ingest (cliff 116→5 rows/10s) so characters aged out of the
30s window and the count flapped.
- Global asyncio.Semaphore(8) around inventory forwarding: a burst can never
monopolize the loop; telemetry always gets through.
- Tighten the shared httpx client (max_connections=10, keepalive=5, 5s timeout)
so a stale/slow connection can't hold a slot.
Pairs with the plugin-side flush-timer jitter (2–5 min, re-rolled per tick) that
de-synchronizes the fleet at the source.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every browser broadcast ran jsonable_encoder (slow recursive encode) and then
re-serialized per client via send_json — so a payload to N browsers was
encoded N+1 times, on the same single event-loop core that the telemetry/
inventory firehose already saturates.
Now serialize ONCE with json.dumps + a datetime-aware default (_json_default
mirrors jsonable_encoder for the types that actually appear: datetime, Enum,
Decimal, set, bytes) and send the prebuilt string to every client via
send_text. Verified the wire output parses identically to the old path.
Pure backend change — no plugin, no frontend, no schema change; stdlib only
so it deploys via restart with no image rebuild / dependency churn.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Per request: remove the WebAudio jingle (+ its 🔊 toggle and sound state);
replace the one-shot confetti with a continuous rain of 🌼🌸🐸🇸🇪🌿 over the
screen (MidsummerRain, gated by the theme, reduced-motion aware, leak-free);
and replace player-dot markers with frogs themselves (override the inline
dot color/border) instead of a flower-crown on top. Still toggled by the
🐸 Midsommar switch. Includes rebuilt static bundle.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Accepts one legacy secret alongside the real one so existing clients keep
registering while game machines migrate to websocket_secret.txt. Remove
SHARED_SECRET_LEGACY from .env after the rollout.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
ALL plugin connections (constant-time compare). The old hardcoded
'your_shared_secret' in this public repo was no auth at all. Dockerfile
default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
falling back to a publicly-known signing key; agent systemd unit now
requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
nginx-proxied internet request satisfied via docker-proxy — full session
bypass and unauthenticated in-game command injection) with
private-source AND no X-Forwarded-For, i.e. only genuinely internal
callers (overlord-agent on the host, compose-network services). Invariant
documented in nginx/overlord.conf: every tracker-bound location must set
X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
(telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.
Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The glsa_ token on the /grafana/ location was committed to a public repo.
Verified dead: Grafana's service-account and api_key tables are empty (the
data dir is ephemeral container storage, so the SA was wiped on a past
recreate) and an arbitrary invalid bearer gets identical 200 responses —
panel embeds are actually served by anonymous Viewer auth
(GF_AUTH_ANONYMOUS_ENABLED=true). The header was a no-op; removing it
changes no behavior and removes the credential from the config.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
With width:100%, the table stretched to fill the container — and the
Character column (the only one with stretchable text) absorbed all the
extra space, looking much wider than the longest name needed.
width:auto lets each column size to its content. Table now fits its
data; container still scrolls horizontally if content ever exceeds the
viewport.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two small UX improvements to the Player Dashboard table (works in both
the new-tab fullscreen page and the deprecated in-app window since they
share PlayerDashboardContent):
1. Row highlight: click anywhere on a row to highlight it (blue tint +
thin outline). Click again to unhighlight. Single selection — useful
for tracking one character down a long sorted list.
2. Character column no longer truncates: removed maxWidth/overflow/
textOverflow on the name cell. Column now sizes to the longest
character name (still no wrapping; container scrolls horizontally
if names are extreme).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 👥 Dashboard button used to open the player table as a draggable
in-app window, which competed for screen space with the map. It now
opens in a separate browser tab as a fullscreen page so users can put
the dashboard on a second monitor.
How:
- App.tsx branches on ?view=dashboard → renders PlayerDashboardFullPage
(new file in components/) instead of the default MapLayout.
- SidebarWindowButtons.tsx: 👥 Dashboard onClick now does
window.open('/?view=dashboard', '_blank', 'noopener'). Label shows
'↗' so users know it's an external open.
- PlayerDashboardWindow.tsx refactored: extracted the sortable table
body into a reusable PlayerDashboardContent component. The old window
shell stays registered in WindowRenderer for backward compat — just
no longer reachable from the default sidebar.
- map-layout.css: new .ml-dashboard-page rules for fullscreen layout.
Each tab gets its own useLiveData + WebSocket connection (server
already handles multiple browser clients). The new tab inherits the
session cookie from the original tab — no re-login.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Docker-bridge / loopback bypass in AuthMiddleware was short-circuiting
the whole auth flow without ever decoding the session cookie. Result: /me
and other endpoints reading request.state.user got 401 for real logged-in
browsers (because nginx → docker-proxy makes them look like 172.x).
Symptom: dashboard admin UI invisible even for admin users — useCurrentUser
saw 401 from /me and treated everyone as anonymous.
Fix: in the bypass branch, still try to decode any session cookie present
and populate request.state.user. The bypass still permits anonymous
internal calls (overlord-agent's MCP tools), but real authenticated browsers
now get their user correctly resolved.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logout: new sidebar link 'Log out (username)' that POSTs /api/logout
(clears session cookie) and navigates to /login. Visible to everyone.
Replaces 'no logout functionality' state where users could only get
out by deleting cookies manually.
Admin window: new 'Admin · Users' window (only shown when current
user.is_admin) lists all users in a table with:
- Add user (username + password + admin checkbox)
- Reset password inline per row
- Toggle admin per row
- Delete user per row (blocked for self)
Wraps the existing /api-admin/users CRUD endpoints in main.py.
Plumbing: useCurrentUser hook fetches /me on mount; apiPatch+apiDelete
helpers added to api/client.ts; new endpoint wrappers exported from
api/endpoints.ts; AdminUsersWindow.tsx registered in WindowRenderer
under id prefix 'adminusers'; CSS for admin table/form/buttons and
the muted-red logout link.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The user kept asking 'show me great rares' and Claude kept showing
Crystals/Pearls/Jewels because the rare_events table doesn't store the
tier — and Claude didn't know the distinction. Now CLAUDE.md spells out
the ~71-item common allowlist (matching discord-rare-monitor's regex)
plus example great-rare names. Includes a sample SQL query Claude can
adapt for tier filtering.
The query_telemetry_db tool was crashing with AttributeError because
exp.AlterTable doesn't exist in this sqlglot version (renamed to Alter).
Made the deny-class list build defensively via getattr and dropped any
classes that the installed sqlglot doesn't expose.
Also broadened the deny list (Alter, AlterColumn, AlterDatabase, Truncate,
Grant, Revoke, Copy) and made the toplevel allowlist tolerant of missing
classes too. The walk() return shape is also normalized in case sqlglot
versions yield (node, parent, key) tuples vs. bare nodes.
Belt-and-suspenders is fine — the GRANT-SELECT-only PG role is the real
write barrier; the parser is just a faster/friendlier reject path.
Two complementary changes after observing the model probe boundaries
(it tried mcp__claude_ai_Gmail__search_threads, then tried to delegate
to a subagent via the Agent tool, then suggested the user edit
settings.local.json to add Gmail tools):
1. claude_wrapper.py adds to --disallowed-tools:
- Agent (subagent spawning — should never delegate)
- WebFetch (already; settings.json re-allows acpedia.org only)
- Every Gmail/Calendar/Drive connector tool name we know about
2. CLAUDE.md adds a 'Non-negotiable scope rules' section:
- Be a read-only game-state QA service, nothing else
- Don't attempt tools outside your role
- Don't explain how to bypass restrictions
- Don't suggest settings.json edits
- Don't enumerate hidden tools when asked
Soft (system-prompt) + hard (CLI flag) defenses combined.
The previous commit put .claude/settings.json IN THE REPO, which would
have applied its strict deny rules to ANY Claude Code invocation from
this cwd — including the human user's interactive dev sessions on their
own machine. That's wrong; the production agent's lockdown should not
constrain the developer.
Remove the committed file and gitignore .claude/ entirely. The repo is
permission-neutral now.
Strict permissions for the production agent come from two server-only
sources:
1. CLI flags in agent/claude_wrapper.py (--allowed-tools +
--disallowed-tools, passed by the systemd-spawned subprocess only)
2. /var/lib/overlord-agent/.claude/settings.json (the agent's own HOME
— separate from any user's .claude/)
Also bumps claude_wrapper.py with the explicit --disallowed-tools list
of meta-tools (ToolSearch, Monitor, TodoWrite, TaskOutput, Skill, cron
tools, etc.) that the --allowed-tools whitelist does not block on its
own. Verified empirically: with only --allowed-tools, ToolSearch was
still callable; --disallowed-tools is required.
The agent service was running as User=erik, which meant:
- Sessions polluted erik's ~/.claude/projects/
- erik's .claude/settings.local.json (months of accumulated dev permissions
for docker/git/dotnet/etc.) was loaded by the production agent, defeating
the --allowed-tools whitelist
- Subscription rate quota mingled between human-erik's interactive Claude
Code use and the production assistant
- Theoretical access to /home/erik/.ssh, .bash_history, .gitconfig
Now:
- User=overlord-agent (system account, no shell, /var/lib/overlord-agent home)
- HOME=/var/lib/overlord-agent — claude state fully isolated from erik
- /home/erik/.claude permissions tightened to 0700 (was 0755)
- group=overlord-agent on the repo + /etc/overlord/agent.env (read-only)
Project settings:
- New strict committed .claude/settings.json: deny Bash/Read/Write/Edit/
Glob/Grep/NotebookEdit/WebSearch; allow only WebFetch(domain:acpedia.org)
- .claude/settings.local.json now gitignored (was leaking dev permissions
to the server through the deploy)
The extra ~@cpu-emulation ~@obsolete ~@swap ~@raw-io negations on top of
@system-service killed Claude Code (Node) with SIGSYS during startup.
Keep just the truly dangerous groups blocked: ~@privileged ~@reboot
~@mount. The base @system-service preset already excludes others (no
@debug, no @resources, etc. are included by default in that preset).
Claude Code is a Node app. V8 JIT requires W^X transitions via mprotect
with PROT_EXEC on JIT'd code pages. MemoryDenyWriteExecute kills the
process with SIGTRAP/abort during startup (~10ms in).
Without JIT we'd have to use --jitless mode, which destroys performance.
The other systemd hardening (ProtectSystem, ProtectHome,
InaccessiblePaths, NoNewPrivileges, capability drop, syscall filter,
PrivateTmp, etc.) still gives strong filesystem and privilege isolation.
The remaining shellcode-injection risk is theoretical — there is no
Bash/Write/Edit tool exposed for an attacker to chain into.
Also: MemoryLimit -> MemoryMax (deprecated unit form).
systemd unit now applies defense-in-depth:
- ProtectSystem=strict + ProtectHome=read-only (rest of FS sealed)
- ReadWritePaths only for ~/.claude (session JSONLs) and venv + audit log
- InaccessiblePaths blocks /etc/shadow, /etc/ssh, /root, ~/.ssh, shell history
- NoNewPrivileges + dropped capabilities (no setuid escalation, no caps)
- PrivateTmp, PrivateDevices, ProtectKernel*, MemoryDenyWriteExecute
- SystemCallFilter @system-service ~@privileged ~@debug ~@mount etc.
- RestrictAddressFamilies blocks raw/packet sockets
Application layer:
- Per-user rate limit 60/hour (configurable via AGENT_RATE_MAX)
- Per-user concurrency cap of 1 in-flight (no parallel claude burns)
- JSONL audit log of every /agent/ask to /var/log/overlord-agent/audit.jsonl
Logs username, message preview, result preview, timing, errors.
Plus secrets migration: EnvironmentFile now prefers /etc/overlord/agent.env
(root:erik 0640) over /home/erik/MosswartOverlord/.env, so even the
read-only /home doesn't expose them. Falls back to old path during
transition.
Adds an MCP tool wrapping the inventory-service /search/items endpoint
with include_all_characters=true, so questions like 'find me a bracelet
with Legendary Acid Ward on any unequipped char' resolve in ONE tool call
instead of looping get_inventory over 60+ chars (which timed out at 120s).
- agent/tools.py: search_items_global wrapper
- agent/mcp_overlord.py: register new tool with detailed schema doc
- agent/claude_wrapper.py: include in --allowed-tools whitelist;
bump timeout 120s -> 240s
- nginx/overlord.conf: bump /api/agent/ proxy timeout 180s -> 300s
- CLAUDE.md: brief Claude to USE search_items for cross-char searches
bypassPermissions ignores --allowed-tools entirely (per
permission-modes.md docs). With it, the model could call Bash, Write,
Edit, Read, etc. — confirmed by writing /tmp/owned.sh in a test.
dontAsk is the correct production headless mode: auto-DENIES anything
outside the --allowed-tools whitelist instead of prompting. Without
this, our entire MCP whitelist was effectively useless.
Claude Code rejects --session-id on a session that already exists on disk
('Session ID ... is already in use'). The first message of a conversation
must use --session-id to create; every message after must use --resume.
Detect by checking ~/.claude/projects/<encoded-cwd>/<uuid>.jsonl. Plus a
belt-and-suspenders retry: if --session-id surprisingly fails with the
'already in use' string, automatically retry with --resume.
This was the bug that caused chat windows to fail on the second message.
Same pattern we already use for /ws/live (host-side Discord bot bypass).
Lets the new overlord-agent service call any tracker HTTP endpoint
without forging a session cookie. Safe because port 8765 is bound to
127.0.0.1 in docker-compose.yml — only the host or other compose-network
containers can reach it.