From 9911edbfa81648aa2f318ce684f9b899c9ac785c Mon Sep 17 00:00:00 2001 From: Erik Date: Wed, 24 Jun 2026 19:46:50 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20Go=20is=20production=20=E2=80=94=20rewr?= =?UTF-8?q?ite=20README,=20update=20CLAUDE.md,=20gitignore=20.env?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - README: Go-backend architecture, build/run via the compose override stack, WS/payload/auth/DB contracts, the branch layout (master = Go, python-legacy). - CLAUDE.md: Project Overview + Components reflect the Go services; a "Go services — build, deploy, gotchas" section (string coercion, typeless telemetry, the trinket dedup, rollback); Deploying + Suitbuilder point at the Go paths. The behavioral contracts (WS/auth/DB/routes) are kept — Go honors them; file refs to main.py/inventory-service mark the legacy source. - .gitignore: ignore .env / .env.bak-* (public repo; .env.example stays tracked). Co-Authored-By: Claude Opus 4.8 --- .gitignore | 6 + CLAUDE.md | 43 +++-- README.md | 537 +++++++++++++---------------------------------------- 3 files changed, 172 insertions(+), 414 deletions(-) diff --git a/.gitignore b/.gitignore index 0696fc7f..b523c6d7 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,12 @@ __pycache__ static/v2/ frontend/node_modules/ +# Secrets — the server-side env files hold SHARED_SECRET, SECRET_KEY, DB +# passwords, and the Discord token. This repo is PUBLIC — never commit them. +# .env.example stays tracked as the template. +.env +.env.bak-* + # Claude Code config — never commit. The production agent's strict # permissions live server-side at /var/lib/overlord-agent/.claude/ # (and via CLI flags in agent/claude_wrapper.py). The repo stays diff --git a/CLAUDE.md b/CLAUDE.md index 3fa68e55..e825bcb9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -5,22 +5,41 @@ Cross-repo workflows (plugin coupling, deploy commands, nginx) live in the works ## Project Overview -Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracking. A FastAPI WebSocket/HTTP service (`main.py`, single file ~4200 lines) ingests player data from the MosswartMassacre DECAL plugin and serves a live React dashboard, with TimescaleDB persistence, a separate inventory microservice, Grafana dashboards, a Discord rare bot, and a host-side Claude-powered assistant. +Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracking. **The production backend is Go** (`go-services/`): a tracker service (`tracker-go/`) ingests player data from the MosswartMassacre DECAL plugin over `/ws/position`, serves the React dashboard + login/admin + the read API, and writes TimescaleDB; an inventory service (`inventory-go/`) handles item search, the suitbuilder solver, and inventory ingestion. Plus Grafana, a (Python) Discord rare bot, and a host-side Claude-powered assistant. + +The original Python/FastAPI implementation (`main.py` ~4200 lines, `inventory-service/`) is preserved on the **`python-legacy`** branch; the Go services were validated byte-identical against it in a parallel "strangler-fig" run, then production was cut over. ⚠ **The behavioral contracts below (WS, auth, DB, routes, suitbuilder) describe what Go honors. Where they cite `main.py` / `inventory-service/`, that's the legacy source that defined the contract — the live implementation is the corresponding Go handler.** ## Components | Component | Where | Runs as | |---|---|---| -| Tracker API (`main.py`) | repo root | Docker `dereth-tracker`, 127.0.0.1:8765 | -| Telemetry DB (TimescaleDB) | `db_async.py` schema | Docker `dereth-db`, port 5432 | -| Inventory service + DB | `inventory-service/` | Docker `inventory-service` (127.0.0.1:8766) + `inventory-db` (5433) | -| React frontend | `frontend/` → built into `static/` | served by tracker (FastAPI StaticFiles) | -| Classic v1 frontend | `static/classic/` | served at `/classic` | -| Legacy vanilla pages | `static/inventory.html`, `static/suitbuilder.html` | still live | +| **Tracker** (ingest + website + read API + WS) | `go-services/tracker-go/` | Docker `dereth-tracker-go`, 127.0.0.1:8770 | +| **Inventory** (search + suitbuilder + ingestion) | `go-services/inventory-go/` | Docker `inventory-go`, 127.0.0.1:8772 | +| Telemetry DB (TimescaleDB) | schema in `tracker-go/schema.go` (replica of legacy `db_async.py`) | Docker `dereth-db`, port 5432 | +| Inventory DB | schema in `inventory-go/schema.go` | Docker `inventory-db`, 5433 | +| React frontend | `frontend/` → built into `static/` | served by `tracker-go` (static file server, SPA fallback) | +| Classic v1 / legacy pages | `static/classic/`, `static/*.html` | served by `tracker-go` | | Grafana | compose service `dereth-grafana` | 127.0.0.1:3000, anonymous Viewer auth, proxied at `/grafana/` | -| Discord rare bot | `discord-rare-monitor/` | Docker, connects to `/ws/live` internally | +| Discord rare bot | `discord-rare-monitor/` (Python) | Docker, reads the Go `/ws/live` | | Overlord Agent (assistant) | `agent/` | **host-side systemd service** `overlord-agent`, 127.0.0.1:8767 | +### Go services — build, deploy, gotchas + +- **Build on the server, no host Go needed** (multi-stage distroless images). Go 1.25, `pgx/v5`, `coder/websocket`, `bwmarrin/discordgo`, `x/crypto/bcrypt`. Sync + build + recreate: + ```bash + tar czf - go-services | ssh erik@overlord.snakedesert.se "tar xzf - -C /home/erik/MosswartOverlord/" + ssh erik@overlord.snakedesert.se 'cd /home/erik/MosswartOverlord && \ + export BUILD_VERSION="$(date -u +%Y.%-m.%-d.%H%M)-$(git rev-parse --short HEAD)" && \ + docker compose -f docker-compose.yml -f go-services/docker-compose.go.yml build dereth-tracker-go inventory-go && \ + docker compose -f docker-compose.yml -f go-services/docker-compose.go.yml -f go-services/docker-compose.cutover.yml \ + up -d --no-deps dereth-tracker-go inventory-go' + ``` +- **`docker-compose.cutover.yml`** is what makes the Go services production: `READ_ONLY=false` (write the prod DBs), `SKIP_SCHEMA_INIT=true` (trust the existing schema, run NO DDL), `SHARED_SECRET`/`DISCORD_ACLOG_WEBHOOK` for the tracker, and the Discord bot repointed at `ws://dereth-tracker-go:8770/ws/live`. Drop it to revert to read-only parallel mode. +- **Rollback** = `docker compose ... up -d` WITHOUT the cutover override (Go → read-only) + start the Python `dereth-tracker`/`inventory-service` + revert the nginx `http://tracker_go/` lines to `http://tracker/`. +- ⚠ **Plugin sends some numeric fields as STRINGS** (`kills_per_hour`, `deaths`, `total_deaths`, `prismatic_taper_count`). Go coerces via `coerceNum` (`tracker-go/reads.go`) — pydantic did this implicitly; a plain number cast would write null/0. +- ⚠ **Telemetry must be broadcast TYPELESS** to `/ws/live` (`stripType` in `tracker-go/ingest.go`). The browser ignores typeless messages and uses the 5 s `/live` poll for player data; broadcasting telemetry WITH a type makes the UI overwrite the /live-derived counters and flap them 0↔value. +- ⚠ `inventory-go` `slot_names=Trinket` must exclude `%bracelet%` or bracelets duplicate the Wrist buckets in the suitbuilder. + ## WebSocket endpoints - `/ws/position` — plugin ingest (telemetry, inventory, portal, rare, combat, share_*, …). Authenticated by `X-Plugin-Secret` header against the `SHARED_SECRET` env var; fails closed (refuses all plugins) when unset or left at the old placeholder. Constant-time compare. @@ -63,12 +82,14 @@ Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracki ## Suitbuilder -Production equipment-optimization engine (`inventory-service/suitbuilder.py`): multi-character search, armor set constraints, cantrip overlap detection, SSE streaming. UI at `/suitbuilder.html`. Architecture doc: `docs/plans/2026-02-09-suitbuilder-architecture.md`. -Known limitations: no slot-aware spell filtering, equal spell weighting. The legacy `/optimize/*` solver in inventory-service/main.py is a near-duplicate — `suitbuilder.py` is the production path. +Production equipment-optimization engine, ported to Go in `inventory-go/suit_*.go` (constraint-satisfaction DFS: multi-character search, armor set constraints, cantrip overlap, SSE streaming) — validated byte-identical against the legacy `inventory-service/suitbuilder.py`. Live endpoint: `POST /suitbuilder/search` (the tracker proxies `/inv/suitbuilder/search`); the `/optimize/*` solver in the legacy `inventory-service/main.py` was a near-duplicate and is NOT the live path. UI at `/suitbuilder.html`. Known limitations: no slot-aware spell filtering, equal spell weighting. ## Deploying -See workspace `../CLAUDE.md` "Build & Deploy Instructions" — quick deploy (git pull + `docker compose restart dereth-tracker` for Python; nothing for static), `deploy-frontend.sh` for React, full `--no-cache` rebuild only for Dockerfile/pip/version-stamp changes. Bind mounts: `main.py`, `db_async.py`, `static/`, `alembic/` only. +- **Go backend changes** → see "Go services — build, deploy, gotchas" above (sync `go-services/`, build, recreate with the cutover override). `BUILD_VERSION` (CalVer `YYYY.M.D.HHMM-gitshorthash`) shows in the frontend sidebar. +- **Frontend** → `bash deploy-frontend.sh` (complete build+copy into `static/`); the tracker serves `static/` from a bind mount, no restart needed. +- **Overlord Agent** → unchanged (host-side Python systemd): `git pull && sudo systemctl restart overlord-agent`. +- `README.md` has the full build/run reference. The legacy Python deploy lives on the `python-legacy` branch. ## Operational notes diff --git a/README.md b/README.md index 0864e4d0..381cd775 100644 --- a/README.md +++ b/README.md @@ -1,424 +1,155 @@ # Mosswart Overlord (Dereth Tracker) -Real-time telemetry, inventory, and analytics platform for Asheron's Call. -FastAPI backend + React frontend + PostgreSQL (TimescaleDB) + Discord integrations, -all driven by WebSocket events from the companion [MosswartMassacre](https://github.com/SawatoMosswartsEnjoyersClub/MosswartMassacre) DECAL plugin. +Real-time telemetry, inventory, and analytics platform for Asheron's Call — +driven by a firehose of WebSocket events from the companion +[MosswartMassacre](https://github.com/SawatoMosswartsEnjoyersClub/MosswartMassacre) +DECAL plugin running on 60+ characters. + +**The production backend is written in Go** (`go-services/`). It replaced the +original Python/FastAPI implementation via a strangler-fig migration: the Go +services ran in parallel against live traffic until every endpoint was proven +byte-identical, then production was cut over. The Python implementation is +preserved on the `python-legacy` branch. --- -## Table of Contents -- [Overview](#overview) -- [Architecture](#architecture) -- [Features](#features) -- [Requirements](#requirements) -- [Installation](#installation) -- [Configuration](#configuration) -- [Deploying Changes](#deploying-changes) -- [WebSocket Contract](#websocket-contract) -- [HTTP API Reference](#http-api-reference) -- [Frontend](#frontend) -- [AI Assistant (Overlord Agent)](#ai-assistant-overlord-agent) -- [Database Schema](#database-schema) -- [Operations & Health](#operations--health) -- [Contributing](#contributing) - ---- - -## Overview - -Mosswart Overlord is the backend that consumes a firehose of telemetry, vitals, inventory, combat, and chat events from 60+ characters running the `MosswartMassacre` plugin. It stores selected data in TimescaleDB, runs analytics (combat stats, idle/death detection), and broadcasts live updates to connected browser clients. - -The frontend is a React + Vite app served at `/` with a live map, draggable windows (inventory, chat, combat, radar, etc.), and a server uptime sidebar. The previous vanilla JS frontend is preserved at `/classic`. - ## Architecture ``` - ┌─────────────────────────┐ - │ MosswartMassacre (C#) │ ← plugin per game client - └────────────┬────────────┘ - │ WebSocket /ws/position (authenticated) - ▼ -┌────────────────────────────────────────────────────────┐ -│ dereth-tracker (FastAPI, Docker) │ -│ • main.py — WS routing, analytics, broadcasts │ -│ • idle/death detection → Discord webhook │ -│ • combat stats delta/lifetime accumulation │ -│ • vital sharing relay (cross-machine) │ -└──┬──────────────────┬────────────────────┬────────────┘ - │ │ │ - │ WS /ws/live │ HTTP │ HTTP - ▼ ▼ ▼ -┌──────────┐ ┌──────────────────┐ ┌──────────────────┐ -│ Browsers │ │ inventory-svc │ │ Discord bot │ -│ (React) │ │ (FastAPI, Docker)│ │ (rare monitor) │ -└────┬─────┘ └────────┬─────────┘ └──────────────────┘ - │ ▼ - │ ┌──────────────┐ - │ │ inventory-db │ - │ └──────────────┘ - │ - │ /api/agent/* (host-side, OUTSIDE Docker) - ▼ -┌────────────────────────────────────────┐ -│ overlord-agent (FastAPI, systemd) │ ← runs as dedicated unprivileged user -│ • shells out to `claude -p ...` │ /var/lib/overlord-agent home, -│ • MCP server: live-state Q&A tools │ strict settings, no /home/erik -└────────────────────────────────────────┘ - - ┌──────────────┐ - │ dereth-db │ ← TimescaleDB (telemetry, spawns, rares, portals) - └──────────────┘ + MosswartMassacre plugin ──wss──> nginx ──> Go tracker (tracker-go) ──> dereth (TimescaleDB) + (60+ game clients) │ │ + │ ├──HTTP──> Go inventory (inventory-go) ──> inventory_db + Browsers ──https──────────────────> nginx │ + │ └──/ws/live──> Discord rare bot (relays rares + chat) + └──> Grafana (/grafana/) death/idle alerts → Discord webhook ``` -Most services run via Docker Compose. **`overlord-agent` is host-side** -(systemd) because it shells out to the `claude` CLI which depends on -host-side credentials — see [AI Assistant](#ai-assistant-overlord-agent). - -## Features - -### Live Data -- **Live Map** — real-time player positions, dots, trails, portals, heatmap -- **WebSocket firehose** (`/ws/live`) — broadcasts every incoming event to browsers -- **Per-client subscriptions** — clients can send `{"type":"subscribe","message_types":[...]}` to receive only specific event types (the Discord rare monitor bot uses this to filter the 82GB/day firehose down to just `rare` and `chat`) - -### Inventory -- Full inventory snapshot on login + incremental `inventory_delta` updates (add/update/remove) -- Per-character live refresh in the browser (debounced 2s) -- Advanced search with filters: material, set, armor level, spells, tinks, workmanship, etc. -- **Suitbuilder** at `/suitbuilder.html` — constraint-based armor optimization across multiple mule inventories with primary/secondary set support, cantrip overlap detection, and real-time SSE streaming - -### Combat Stats (Mag-Tools style) -- Plugin parses combat chat into session deltas -- Backend accumulates lifetime totals from per-session snapshots -- Offense/defense broken out per damage element -- Browser combat window shows monster-by-monster damage - -### Cross-Machine Vital Sharing -- WebSocket relay replaces UtilityBelt's localhost-only `VTankFellowHeals` -- Plugin broadcasts its own vitals and consumes peer vitals -- In-game `DxHud` overlay shows peer health/stamina/mana bars with direction arrows - -### AI Assistant -- 🤖 chat window in the dashboard backed by `claude -p` running headless on the server -- Read-only access to live game state via 12 MCP tools (live players, inventory cross-search, combat stats, quests, suitbuilder, read-only SQL, etc.) -- Per-browser persistent session, "New Chat" button, history rehydration on reload -- Hardened: dedicated unprivileged Linux user, systemd lockdown, strict tool whitelist, audit log, rate limit. See [AI Assistant section](#ai-assistant-overlord-agent) for the full security stack. - -### Discord Integration -- **Rare Monitor Bot** — posts rares (split by common/great) to configured channels -- **Death Alerts** — webhook to `#alerts` when a character's vitae goes from 0 → >0 (rate-limited to one per character per 5 min) -- **Idle Alerts** — webhook after 5 minutes of continuous idle state (caught portals, stuck nav, etc.). The grace period prevents false positives on brief idle blips. -- **Vortex Warning** — bot watches for "whirlwind of vortexes" chat and posts a warning embed - -### Portals -- Automatic discovery + 1-hour retention -- Coordinate-deduplicated (rounded to 0.1 precision) - -### Stats -- Per-character lifetime kills, deaths, rares, taper counts -- Grafana dashboards (2x2 iframe grid in the stats window) - -### Health & Monitoring -- Server uptime + latency + player count from TreeStats.net (checked every 30s) -- Only current state is kept — no historical `server_health_checks` table (removed April 2026 as write-only bloat) - -## Requirements - -- Docker & Docker Compose (recommended) -- OR: Python 3.11+, Node.js 20+, and a PostgreSQL 14+ with TimescaleDB - -## Installation - -```bash -git clone git@git.snakedesert.se:SawatoMosswartsEnjoyersClub/MosswartOverlord.git -cd MosswartOverlord -cp .env.example .env # fill in secrets (see Configuration below) -docker compose up -d -``` - -### Frontend development loop - -```bash -cd frontend -npm install -npm run dev # local Vite server -# ...edit files, hot reload... -cd .. -bash deploy-frontend.sh # builds + copies to static/ for production serving -``` - -⚠️ **`npm run build` writes to `static/_build/` but the web server serves from `static/`.** You must run `deploy-frontend.sh` to copy `_build/ → static/`. Otherwise the browser keeps loading the previous bundle. - -## Configuration - -All secrets go in `.env`: - -| Variable | Purpose | -|---|---| -| `POSTGRES_PASSWORD` | Telemetry DB password | -| `INVENTORY_DB_PASSWORD` | Inventory DB password | -| `SHARED_SECRET` | Plugin auth for `/ws/position` | -| `SECRET_KEY` | Session cookie signing | -| `DISCORD_RARE_BOT_TOKEN` | Bot token for rare monitor | -| `DISCORD_ACLOG_WEBHOOK` | Webhook URL for death/idle alerts | -| `GF_SECURITY_ADMIN_PASSWORD` | Grafana admin | -| `COMMON_RARE_CHANNEL_ID` | Discord channel ID for common rares | -| `GREAT_RARE_CHANNEL_ID` | Discord channel ID for great rares | -| `ACLOG_CHANNEL_ID` | Discord channel ID for the rare bot's status/vortex messages | -| `MONITOR_CHARACTER` | Which character's chat the bot monitors | - -The Overlord Agent has its own env file at `/etc/overlord/agent.env` (root:overlord-agent 0640) so it doesn't share the tracker's secrets: - -| Variable | Purpose | -|---|---| -| `SECRET_KEY` | Same value as the tracker — validates browser session cookies | -| `AGENT_DB_DSN` | Read-only connection string `postgresql://overlord_agent_ro:@127.0.0.1:5432/dereth` | -| `TRACKER_URL` | Loopback to the tracker container (default `http://127.0.0.1:8765`) | -| `AGENT_RATE_MAX` | Per-user rate limit (default 60/hour) | -| `AGENT_RATE_WINDOW_S` | Rate-limit window in seconds (default 3600) | -| `AGENT_AUDIT_LOG` | Path to audit JSONL (default `/var/log/overlord-agent/audit.jsonl`) | -| `CLAUDE_TIMEOUT_S` | Max seconds per `claude -p` invocation (default 240) | - -## Deploying Changes - -Live backend host: `overlord.snakedesert.se` (SSH user `erik`, key-based auth). - -### Quick deploy — Python / static file changes - -```bash -ssh erik@overlord.snakedesert.se \ - "cd /home/erik/MosswartOverlord && git pull --ff-only origin master" -# Python changes require a restart: -ssh erik@overlord.snakedesert.se "docker compose restart dereth-tracker" -# Static files (JS/CSS/HTML) are served from the bind-mounted static/ — no restart. -``` - -⚠️ Uvicorn runs **without** `--reload` in production. Do not add it back — without the `watchfiles` package it falls back to a polling reloader that busy-loops at 100% CPU and eats a whole core. - -### React frontend deploy - -```bash -cd frontend && npm run build && cd .. -bash deploy-frontend.sh -git add static/ && git commit -m "deploy frontend" && git push -ssh erik@overlord.snakedesert.se "cd /home/erik/MosswartOverlord && git pull" -# No container restart needed. -``` - -### Full rebuild — Dockerfile / pip package / version stamp changes - -```bash -ssh erik@overlord.snakedesert.se "cd /home/erik/MosswartOverlord && \ - git pull --ff-only origin master && \ - export BUILD_VERSION=\"\$(date -u +%Y.%-m.%-d.%H%M)-\$(git rev-parse --short HEAD)\" && \ - docker compose build --no-cache --build-arg BUILD_VERSION=\$BUILD_VERSION dereth-tracker && \ - docker compose up -d dereth-tracker" -``` - -`BUILD_VERSION` is displayed in the sidebar of the live frontend. Format is CalVer: `YYYY.M.D.HHMM-gitshorthash`. - -### Overlord Agent deploy - -Code changes to `agent/` only: -```bash -ssh erik@overlord.snakedesert.se "cd /home/erik/MosswartOverlord && \ - git pull --ff-only origin master && \ - sudo systemctl restart overlord-agent" -journalctl -u overlord-agent -f # tail logs to verify -``` - -`agent/requirements.txt` changed (new pip deps): -```bash -ssh erik@overlord.snakedesert.se "cd /home/erik/MosswartOverlord && \ - git pull --ff-only origin master && \ - agent/.venv/bin/pip install -r agent/requirements.txt && \ - sudo systemctl restart overlord-agent" -``` - -systemd unit changed: -```bash -ssh erik@overlord.snakedesert.se "cd /home/erik/MosswartOverlord && \ - git pull --ff-only origin master && \ - sudo cp agent/overlord-agent.service /etc/systemd/system/ && \ - sudo systemctl daemon-reload && sudo systemctl restart overlord-agent" -``` - -First-time install: `bash agent/install.sh` — see `agent/README.md` for the full bootstrap procedure (creating the `overlord-agent` user, copying claude auth, granting filesystem access, populating `/etc/overlord/agent.env`). - -## WebSocket Contract - -### `/ws/position` (plugin → backend) - -Authenticated via `?secret=` or `X-Plugin-Secret` header. Accepts JSON frames with a `type` discriminator: - -| `type` | Purpose | -|---|---| -| `telemetry` | Position, kills, session metrics (every 2s per character) | -| `vitals` | Health/stamina/mana/vitae percentages | -| `character_stats` | Full attributes/skills/allegiance (every 10 min) | -| `inventory` / `full_inventory` | Complete inventory dump on login | -| `inventory_delta` | Incremental add/update/remove of a single item | -| `equipment_cantrip_state` | Equipped spell effects | -| `portal` | Discovered portal with coordinates | -| `spawn` | Monster spawn observation | -| `chat` | In-game chat line (any channel) | -| `quest` | Quest timer / progress | -| `rare` | Rare item find notification | -| `nearby_objects` | On-demand radar data (nearby entities) | -| `combat_stats` | Session combat snapshot (Mag-Tools parser output) | -| `share_*` | Cross-machine vital/debuff sharing envelopes | -| `dungeon_map` | Dungeon floor tile data for radar overlay | - -See `EVENT_FORMATS.json` for exact per-type schemas. - -### `/ws/live` (browser → backend) - -Session-cookie authenticated (except for internal Docker network clients, which are trusted by IP). Clients can: - -- Send `{"type":"subscribe","message_types":["rare","chat"]}` to filter which events they receive. Without subscribing, all types are forwarded (browser default). -- Send `{"player_name":"Larsson","command":"/radar start"}` to route a command to that character's plugin client. -- Send `{"type":"request_dungeon_map","landblock":"..."}` to pull cached dungeon tile data. - -Backend pushes the same firehose (subject to subscription filter) to every browser client. - -## HTTP API Reference - -See `EVENT_FORMATS.json` for event schemas. Major HTTP endpoints: - -- `GET /live` — active players seen in the last 30s -- `GET /history?from=…&to=…` — historical telemetry snapshots -- `GET /trails` — recent player trails for the map -- `GET /spawns/heatmap?hours=N` — aggregated spawn density -- `GET /portals` — discovered portals within retention window -- `GET /inventory/{character}` — current inventory (proxied to inventory-service) -- `GET /character-stats/{character}` — full character attributes/skills -- `GET /combat-stats/{character}` — session + lifetime combat stats -- `GET /vital-sharing/peers` — currently-registered vital sharing peers -- `GET /api-version` — build version stamp -- `GET /server-health` — current Coldeve server status + player count - -## Frontend - -### React v2 (primary, at `/`) -- Map-first layout with draggable/resizable windows -- Code-split bundles: one chunk per window type, lazy-loaded on open -- Window types: Chat, Stats, Inventory, Character, Radar, CombatStats, CombatPicker, Issues, VitalSharing, QuestStatus, PlayerDashboard -- Per-character inventory version counter — an open inventory window refreshes 2s after its own character's last `inventory_delta`, ignoring unrelated traffic -- Direct DOM pan/zoom on the map (no React state per frame) -- Service worker caches a small whitelist of static assets -- Version badge in the sidebar confirms which build is loaded - -### Classic v1 (preserved at `/classic`) -The original vanilla JS frontend with element-pooling optimization is kept for fallback and reference. - -## AI Assistant (Overlord Agent) - -A draggable chat window in the dashboard (🤖 Assistant button). Powered by `claude -p` running headless on the server, with read-only access to live game state via an MCP server. - -### Architecture -- **Host-side service** (`agent/`, systemd unit `overlord-agent`) runs OUTSIDE Docker because the `claude` CLI binary lives on the host (`/home/erik/.local/bin/claude`) and depends on host-side authentication credentials. -- **Dedicated UNIX user** (`overlord-agent`, system account, `/var/lib/overlord-agent` home, no shell) — kernel-level isolation from the operator's `erik` account. Cannot read `/home/erik/.claude`, `~/.ssh`, `.bash_history`, `.env`, etc. -- **MCP stdio server** (`agent/mcp_overlord.py`) exposes 12 tools that wrap the tracker's HTTP endpoints + read-only DB queries. Claude only sees these tools; no `Bash`, `Read`, `Write`, etc. -- **Frontend** (`AgentWindow.tsx`) — per-browser session UUID in localStorage, "New Chat" button, on-mount rehydration from `/agent/sessions/{id}/history`. - -### MCP tools available to the assistant -`get_live_players`, `get_player_state`, `get_combat_stats`, `get_equipment_cantrips`, `get_inventory`, `get_inventory_search`, `search_items` (cross-character), `get_recent_rares`, `get_quest_status`, `get_server_health`, `query_telemetry_db` (read-only SQL via sqlglot parser + GRANT-SELECT-only PG role), `suitbuilder_search`. Plus `WebFetch(domain:acpedia.org)` for AC info lookups. - -### Security stack (defense-in-depth) -1. **Cookie auth** on `/agent/ask` (same session cookie the tracker issues) -2. **Per-user rate limit** (60 req/h default) and **concurrency cap** (1 in-flight) -3. **JSONL audit log** at `/var/log/overlord-agent/audit.jsonl` (every prompt + result) -4. **CLI flags** — `--allowed-tools` (just our 12 MCP tools), `--disallowed-tools` (Bash, Write, Read, Edit, Agent, ToolSearch, Monitor, scheduling, Gmail/Drive/Calendar, etc.), `--permission-mode dontAsk` -5. **`/var/lib/overlord-agent/.claude/settings.json`** — strict deny rules (server-side only, NOT in repo) -6. **System-prompt scope rules** in `CLAUDE.md` — instruct the model not to probe, not to suggest workarounds -7. **SQL parser** (`sqlglot`) rejects any non-SELECT statement on `query_telemetry_db` -8. **Read-only PG role** `overlord_agent_ro` (GRANT SELECT only) — even a parser bypass can't mutate -9. **systemd hardening** — `ProtectSystem=strict`, `ProtectHome=read-only`, `InaccessiblePaths=/etc/shadow,/root,~/.ssh,…`, `NoNewPrivileges=true`, `CapabilityBoundingSet=` (empty), `PrivateTmp=true`, `PrivateDevices=true`, `RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6`, `SystemCallFilter=@system-service ~@privileged ~@reboot ~@mount`, `MemoryMax=512M`, `TasksMax=128` -10. **Secrets out of /home** — `/etc/overlord/agent.env` (root:overlord-agent 0640) for SECRET_KEY + AGENT_DB_DSN - -### Files - -| Path | What | -|------|------| -| `agent/service.py` | FastAPI app: `/agent/health`, `/agent/sessions/new`, `/agent/ask`, `/agent/sessions/{id}/history` | -| `agent/auth.py` | Session cookie validation (mirrors `main.py:1013-1019`) | -| `agent/claude_wrapper.py` | `asyncio.create_subprocess_exec("claude", "-p", …)` with allowed/disallowed-tools | -| `agent/tools.py` | Pure tool implementations | -| `agent/mcp_overlord.py` | MCP stdio server registering tools | -| `agent/sql/0001_overlord_agent_ro.sql` | Read-only PG role | -| `agent/overlord-agent.service` | systemd unit (the hardening directives) | -| `agent/install.sh` | venv + systemd setup | -| `agent/README.md` | Operator's deeper reference | -| `.mcp.json` (repo root) | Project-level MCP config Claude Code auto-loads | -| `CLAUDE.md` "Overlord Assistant Mode" section | System-prompt briefing | - -### Routing -nginx forwards `/api/agent/*` to `127.0.0.1:8767` (the host-side service) with a 300s read/send timeout (suitbuilder runs can be slow). Other `/api/*` continues to the dereth-tracker container at `127.0.0.1:8765`. - -### Cost / quota -Subscription auth (no API key); per-call cost is informational only. Each `/agent/ask` invocation = one `claude -p` subprocess with shared session cache. Reactive only — no background polling, no scheduled tasks. - -## Database Schema - -### Telemetry DB (`dereth`, TimescaleDB) - -| Table | Type | Retention | Purpose | +| Component | Path | Runs as | Notes | |---|---|---|---| -| `telemetry_events` | hypertable | 30 days | Position/stats snapshots | -| `spawn_events` | hypertable | 7 days | Monster spawn observations (heatmap source) | -| `rare_events` | regular | forever | Rare find history | -| `portals` | regular | 1 hour | Discovered portals, dedup by rounded coords | -| `char_stats` | regular | forever | Per-character lifetime kill total | -| `rare_stats` | regular | forever | Per-character lifetime rare total | -| `rare_stats_sessions` | regular | forever | Per-session rare count | -| `combat_stats` | regular | forever | Lifetime combat accumulator | -| `combat_stats_sessions` | regular | forever | Per-session combat snapshots | -| `character_stats` | regular | forever | Latest full stats JSON per character | -| `server_status` | regular | forever | Current Coldeve server state (single row) | +| **Tracker** (ingest + website + read API + WS) | `go-services/tracker-go/` | Docker `dereth-tracker-go`, 127.0.0.1:8770 | serves the React frontend, login/admin, the plugin `/ws/position`, browser `/ws/live`, and the full read API; writes the `dereth` DB | +| **Inventory** (search + suitbuilder + ingestion) | `go-services/inventory-go/` | Docker `inventory-go`, 127.0.0.1:8772 | normalized item search, the suitbuilder solver (SSE), inventory ingestion; writes `inventory_db` | +| Telemetry DB | TimescaleDB | Docker `dereth-db`, 5432 | hypertables `telemetry_events`, `spawn_events` | +| Inventory DB | postgres:14 | Docker `inventory-db`, 5433 | 7-table normalized item schema | +| React frontend | `frontend/` → `static/` | served by `tracker-go` | unchanged by the migration — same paths, same API | +| Classic v1 / legacy pages | `static/classic/`, `static/*.html` | served by `tracker-go` | `/classic`, `/suitbuilder.html`, `/inventory.html` | +| Grafana | compose `dereth-grafana` | 127.0.0.1:3000 | anonymous Viewer auth, proxied at `/grafana/` | +| Discord rare bot | `discord-rare-monitor/` (Python) | Docker, reads Go `/ws/live` | posts rares + relays allegiance chat | +| Overlord Agent (assistant) | `agent/` | host-side systemd `overlord-agent`, 127.0.0.1:8767 | shells out to `claude -p`; outside Docker by design | -### Inventory DB (`inventory_db`, PostgreSQL) +**Stack:** Go 1.25 (stdlib `net/http` with 1.22 method+path routing, `pgx/v5`, +`coder/websocket`, `bwmarrin/discordgo`, `golang.org/x/crypto/bcrypt`), distroless +multi-stage images. React 19 + Vite + TypeScript. PostgreSQL/TimescaleDB. nginx +reverse proxy (host-side). Unlike the old single-worker Python service, the Go +tracker uses `GOMAXPROCS` = all available cores, so traffic bursts parallelize +instead of bottlenecking on one core. -Normalized schema: `items`, `item_combat_stats`, `item_requirements`, `item_enhancements`, `item_ratings`, `item_spells`, `item_raw_data`. +--- -`items.container_id` stores the in-game ID of the container holding the item (0 = character body). The frontend groups items into packs by this ID. +## Build & run -## Operations & Health +Everything builds and runs in Docker — **no host Go toolchain needed** (the +multi-stage images compile from source). The production stack is the base compose +(databases, Grafana, Discord bot) plus two override files for the Go services and +the cutover wiring. -### PostgreSQL tuning -`dereth-db` runs with explicit memory overrides in `docker-compose.yml`: -- `shared_buffers=8GB` (was 96GB via auto-tune on a 32GB host — caused thrashing) -- `effective_cache_size=16GB` -- `work_mem=16MB`, `maintenance_work_mem=1GB` -- `max_wal_size=4GB` - -### Retention policies -- `telemetry_events`: 30-day drop, daily -- `spawn_events`: 7-day drop, daily -- `portals`: 1-hour cleanup (background task in `main.py`) -- `server_health_checks`: **removed** — was write-only, 850K rows of nothing - -### Log levels -Both `dereth-tracker` and `inventory-service` run at `LOG_LEVEL=INFO`. Do not set to `DEBUG` in production — it dumps full inventory_delta payloads for every item update (hundreds of KB/sec). - -### Host (Proxmox VM) -- 6 vCPU, 32 GiB RAM (of which ~30 GiB is normally free under current load) -- Live host: `overlord.snakedesert.se` -- Reverse proxy: Nginx on the host terminates TLS and strips the `/api/` prefix before forwarding to port 8765 - -### Debug commands ```bash -docker ps -docker logs mosswartoverlord-dereth-tracker-1 --tail 100 -docker logs mosswartoverlord-inventory-service-1 --tail 100 -docker logs mosswartoverlord-discord-rare-monitor-1 --tail 100 -docker exec dereth-db psql -U postgres -d dereth +# --- build the Go service images --- +export BUILD_VERSION="$(date -u +%Y.%-m.%-d.%H%M)-$(git rev-parse --short HEAD)" +docker compose -f docker-compose.yml -f go-services/docker-compose.go.yml \ + build dereth-tracker-go inventory-go + +# --- production: Go services in write mode, serving the site + ingest --- +docker compose -f docker-compose.yml \ + -f go-services/docker-compose.go.yml \ + -f go-services/docker-compose.cutover.yml \ + up -d --no-deps dereth-tracker-go inventory-go ``` -## Contributing +- `docker-compose.go.yml` defines the Go services (plus the isolated shadow DBs used during the parallel run). +- `docker-compose.cutover.yml` flips the Go services to **write mode** against the production DBs (`READ_ONLY=false`, `SKIP_SCHEMA_INIT=true` so they run no DDL and trust the existing schema) and points the Discord bot at the Go `/ws/live`. Drop this file to return the Go services to read-only parallel mode. +- `BUILD_VERSION` is shown in the frontend sidebar (CalVer: `YYYY.M.D.HHMM-gitshorthash`). +- Required env (server `.env`, **never committed**): `SHARED_SECRET`, `SECRET_KEY`, `POSTGRES_PASSWORD`, `INVENTORY_DB_PASSWORD`, `DISCORD_ACLOG_WEBHOOK`, `DISCORD_RARE_BOT_TOKEN`, the Discord channel IDs, and Grafana admin. See `.env.example`. -Contributions welcome. Please: -- Keep cross-repo protocol changes additive (new optional fields > renames/removes) -- Update both this README and `CLAUDE.md` when workflows change -- Test end-to-end: plugin → backend → browser for any new event type +### Frontend (unchanged by the migration) -For detailed architecture notes and ongoing investigations, see `CLAUDE.md` and `docs/plans/`. +The React app and the legacy static pages call the same absolute paths +(`/api/...`, `/inv/...`, `/live`, …) — the Go tracker answers them, so the +frontend ships as-is. + +```bash +cd frontend && npm run dev # local dev, port 5173, /api → :8770 +bash deploy-frontend.sh # complete build + copy into static/ (runs npm run build itself) +``` + +The tracker serves `static/` directly (bind-mounted), so static/JS/CSS changes +need no restart. ⚠️ `npm run build` writes to `static/_build/`; only +`deploy-frontend.sh` copies it into the served `static/`. + +### nginx + +The live config is host-side at `/etc/nginx/sites-enabled/overlord` (source copy +in `nginx/overlord.conf`); the `tracker_go` upstream is in +`/etc/nginx/conf.d/tracker_go.conf` (`server 127.0.0.1:8770;`). Production routes +`/`, `/api/`, `/websocket/` to the Go tracker. Every location that proxies to the +tracker **must** set `X-Forwarded-For` — it drives the internal-trust auth rule. + +### Overlord Agent + +Unchanged by the migration — it's a host-side Python systemd service. Code change: +`git pull && sudo systemctl restart overlord-agent`. Its env lives separately at +`/etc/overlord/agent.env`. See `agent/` and `CLAUDE.md`. + +--- + +## WebSocket contract + +- **`/ws/position`** — plugin → backend. Telemetry, vitals, inventory, portal, rare, combat, quest, chat, share_*, … Authenticated by the `X-Plugin-Secret` header against `SHARED_SECRET` (constant-time; fails closed when unset). The tracker forwards inventory to `inventory-go`, accumulates kill/combat stats, and re-broadcasts to browsers. +- **`/ws/live`** — browser ↔ backend. Session-cookie (or internal-trust) authenticated. Accepts `subscribe`, `request_dungeon_map`, and `{player_name, command}` envelopes routed to the matching plugin socket. **Telemetry is broadcast typeless** so the browser ignores it and takes player data from the 5 s `/live` poll (matching the original design — broadcasting it typed flaps the per-player counters). +- **Internal-trust rule:** a request skips cookie auth only when its source is private/loopback **and** carries no `X-Forwarded-For`. nginx sets XFF on all internet traffic, so only host-side / compose-network callers qualify. + +### Payload note + +Payloads are snake_case JSON; keep field names and shapes stable across plugin + +backend. The plugin sends several numeric telemetry fields as **strings** +(`kills_per_hour`, `deaths`, `total_deaths`, `prismatic_taper_count`) — the backend +coerces them (`coerceNum` in `tracker-go/reads.go`). + +## Auth & users + +Session cookies are signed with `SECRET_KEY` via an itsdangerous-compatible +`URLSafeTimedSerializer` (HMAC-SHA1, 30-day expiry) — cookies interoperate with +the legacy Python service. Login at `/login` (bcrypt against the `users` table), +admin user CRUD at `/api-admin/users`, current user at `/me`. + +## Databases + +Two separate Postgres databases, both schema-from-code: + +- **`dereth`** (TimescaleDB, `dereth-db`): hypertables `telemetry_events` + `spawn_events`, plus `char_stats`, `combat_stats(_sessions)`, `rare_*`, `portals`, `character_stats`, `users`. Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only. +- **`inventory_db`** (postgres:14, `inventory-db`): 7 normalized tables (`items` + combat/requirements/enhancements/ratings/spells/raw_data). + +In cutover mode the Go services reuse these production databases directly; the +shadow DBs in `docker-compose.go.yml` exist only for isolated parallel-run +validation. **Backups:** `pg_dump -Fc` of both DBs; TimescaleDB restore needs +`timescaledb_pre_restore()` / `post_restore()` around `pg_restore`. + +## Route conventions + +- nginx strips `/api/` before proxying, so backend routes do **not** start with `/api/`. +- Hyphenated routes (`/api-version`, `/api-admin/...`) deliberately bypass the strip (they fall through nginx's `location /`). +- The static SPA is the catch-all (`GET /`), registered after the API routes, with `index.html` fallback for client-side routing. +- `/inv/*` reverse-proxies to the inventory service; `/api/agent/*` is proxied by nginx (not the tracker) to the host-side agent. + +## Operational notes + +- Discord: the rare bot posts rares + relays allegiance chat; **death/idle alerts come from the tracker itself** via `DISCORD_ACLOG_WEBHOOK`. +- Issue board persists to the flat file `static/openissues.json` (web-served, mounted read-write). +- Logs: `docker logs dereth-tracker-go`, `docker logs inventory-go`. Read-only psql: `docker exec dereth-db psql -U postgres -d dereth`, `docker exec inventory-db psql -U inventory_user -d inventory_db`. +- **This repo is PUBLIC** on git.snakedesert.se — never commit secrets. `.env` is gitignored; `.env.example` is the template. + +## Branches + +- **`master`** — the Go production backend (this). +- **`python-legacy`** — the original Python/FastAPI implementation, preserved for reference and rollback. + +See [`CLAUDE.md`](CLAUDE.md) for contributor/agent guidance and deeper internals.