From c6a1af0c39231434fe5d547aed3dd1986e1e86ae Mon Sep 17 00:00:00 2001 From: Erik Date: Wed, 10 Jun 2026 16:36:01 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20rewrite=20CLAUDE.md=20from=20audit=20?= =?UTF-8?q?=E2=80=94=20drop=20stale=202025=20fix=20journal?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The old file was ~half September-2025 changelog with claims now wrong: portal race condition (fixed via ON CONFLICT upsert), hypertable warning (telemetry_events IS a hypertable on live), pool 5-20 (actually 5-100), --no-cache rebuild for code changes (bind mounts + restart suffice), /ws/live unauthenticated (cookie-authenticated since), static-HTML frontend description (React since). Rewritten around current reality: component map, WS endpoints + auth caveats, two-database schema-as-code situation (alembic empty, manual ALTERs), route conventions, React deploy flow, operational notes. Overlord Assistant Mode section preserved verbatim (consumed at runtime by the agent service). AGENTS.md: remove nonexistent GET /history, fix /ws/live auth claim. Also delete stray mangled-path file from repo root. Co-Authored-By: Claude Fable 5 --- AGENTS.md | 6 +- CLAUDE.md | 173 ++++++++++++++++++------------------------------------ 2 files changed, 59 insertions(+), 120 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 74f1e340..46ea6fa7 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -68,9 +68,9 @@ Read shared integration rules first: `../AGENTS.md`. - Main health endpoint: `GET /debug` - Live data endpoint: `GET /live` -- History endpoint: `GET /history` -- Plugin WS endpoint: `/ws/position` (authenticated) -- Browser WS endpoint: `/ws/live` (unauthenticated) +- Trails endpoint: `GET /trails` +- Plugin WS endpoint: `/ws/position` (authenticated via X-Plugin-Secret) +- Browser WS endpoint: `/ws/live` (session-cookie authenticated; internal Docker-network clients trusted by IP) - Inventory service endpoint family: `/search/*`, `/inventory/*`, `/suitbuilder/*` ## Repo-specific architecture notes diff --git a/CLAUDE.md b/CLAUDE.md index 33099b0c..526c69d4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,143 +1,82 @@ # CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. +Cross-repo workflows (plugin coupling, deploy commands, nginx) live in the workspace-level `../CLAUDE.md` — read that too for any deploy or protocol change. ## Project Overview -Dereth Tracker is a real-time telemetry service for game world tracking. It's a FastAPI-based WebSocket and HTTP API service that ingests player position/stats data via plugins and provides live map visualization through a web interface. +Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracking. A FastAPI WebSocket/HTTP service (`main.py`, single file ~4200 lines) ingests player data from the MosswartMassacre DECAL plugin and serves a live React dashboard, with TimescaleDB persistence, a separate inventory microservice, Grafana dashboards, a Discord rare bot, and a host-side Claude-powered assistant. -## Key Components +## Components -### Main Service (main.py) -- WebSocket endpoint `/ws/position` receives telemetry and inventory events -- Routes inventory events to inventory service via HTTP -- Handles real-time player tracking and map updates +| Component | Where | Runs as | +|---|---|---| +| Tracker API (`main.py`) | repo root | Docker `dereth-tracker`, 127.0.0.1:8765 | +| Telemetry DB (TimescaleDB) | `db_async.py` schema | Docker `dereth-db`, port 5432 | +| Inventory service + DB | `inventory-service/` | Docker `inventory-service` (127.0.0.1:8766) + `inventory-db` (5433) | +| React frontend | `frontend/` → built into `static/` | served by tracker (FastAPI StaticFiles) | +| Classic v1 frontend | `static/classic/` | served at `/classic` | +| Legacy vanilla pages | `static/inventory.html`, `static/suitbuilder.html` | still live | +| Grafana | compose service `dereth-grafana` | 127.0.0.1:3000, anonymous Viewer auth, proxied at `/grafana/` | +| Discord rare bot | `discord-rare-monitor/` | Docker, connects to `/ws/live` internally | +| Overlord Agent (assistant) | `agent/` | **host-side systemd service** `overlord-agent`, 127.0.0.1:8767 | -### Inventory Service (inventory-service/main.py) -- Separate FastAPI service for inventory management -- Processes inventory JSON into normalized PostgreSQL tables -- Provides search API with advanced filtering and sorting -- Uses comprehensive enum database for translating game IDs to readable names +## WebSocket endpoints -### Database Architecture -- **Telemetry DB**: TimescaleDB for time-series player tracking data -- **Inventory DB**: PostgreSQL with normalized schema for equipment data - - `items`: Core item properties - - `item_combat_stats`: Armor level, damage bonuses - - `item_enhancements`: Material, item sets, tinkering - - `item_spells`: Spell names and categories - - `item_raw_data`: Original JSON for complex queries +- `/ws/position` — plugin ingest (telemetry, inventory, portal, rare, combat, share_*, …). Authenticated by `X-Plugin-Secret` header. ⚠ The secret is currently the hardcoded placeholder `"your_shared_secret"` at `main.py:994`; the `SHARED_SECRET` env var is NOT read (known issue — fix both repos together). +- `/ws/live` — browser clients: session-cookie authenticated; clients from the Docker network (172.x / loopback) are trusted by IP. Accepts `subscribe`, `request_dungeon_map`, and `{player_name, command}` envelopes forwarded to the matching plugin socket. +- ⚠ Because nginx → docker-proxy makes ALL external traffic appear as 172.x to the app, the IP-trust shortcut currently bypasses cookie auth for proxied browsers (see workspace security notes before relying on auth). -## Memories and Known Bugs +## Auth & users -* Fixed: Material names now properly display (e.g., "Gold Celdon Girth" instead of "Unknown_Material_Gold Celdon Girth") -* Fixed: Slot column shows "-" instead of "Unknown" for items without slot data -* Fixed: All 208 items in Larsson's inventory now process successfully (was 186 with 22 SQL type errors) -* Added: Type column in inventory search using object_classes enum for accurate item type classification -* Note: ItemType data is inconsistent in JSON - using ObjectClass as primary source for Type column +- Session cookies signed with `SECRET_KEY` (itsdangerous, 30-day expiry); login at `/login`, user CRUD at `/api-admin/users` (admin-only), `/me` returns the current user. +- Users live in the `users` table (bcrypt). `seed_users()` seeds initial accounts only when the table is empty. +- The agent service (`agent/auth.py`) verifies the same cookie with the same `SECRET_KEY` — keep them identical. -## Recent Fixes (September 2025) +## Database -### Portal Coordinate Rounding Fix ✅ RESOLVED -* **Problem**: Portal insertion failed with duplicate key errors due to coordinate rounding mismatch -* **Root Cause**: Code used 2 decimal places (`ROUND(ns::numeric, 2)`) but database constraint used 1 decimal place -* **Solution**: Changed all portal coordinate checks to use 1 decimal place to match DB constraint -* **Result**: 98% reduction in duplicate key errors (from 600+/min to ~11/min) -* **Location**: `main.py` lines ~1989, 1996, 2025, 2047 +- **Two separate Postgres databases**: telemetry (`dereth` on TimescaleDB, container `dereth-db`) and inventory (`inventory_db` on plain postgres:14, container `inventory-db`). +- **Schema source of truth is code, not migrations**: `db_async.py` table metadata + `metadata.create_all()` + ad-hoc `IF NOT EXISTS` DDL in `init_db_async()`. Alembic is configured but `alembic/versions/` is empty — `create_all()` never ALTERs existing tables, so **adding a column to db_async.py requires a manual `ALTER TABLE` on the live DB**. +- Hypertables: `telemetry_events` (retention via `DB_RETENTION_DAYS`, default 7 days in code) and `spawn_events` (7 days). Both confirmed hypertables on the live DB with active retention jobs. +- ⚠ Known divergence: live `portals` unique index uses `ROUND(...,1)` (matches the `ON CONFLICT` in main.py), but `db_async.py` creates `ROUND(...,2)` on fresh databases — a fresh install breaks portal upserts until aligned. +- Connection pool: `min_size=5, max_size=100, command_timeout=120` (`db_async.py:21`). Postgres `max_connections` is the default 100, shared with Grafana and the agent's read-only role — don't widen the pool further. +- Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only. +- Read-only agent role `overlord_agent_ro` is provisioned manually via `agent/sql/0001_overlord_agent_ro.sql` (SELECT-only). +- There is **no backup mechanism** — durability is the two Docker volumes (`timescale-data`, `inventory-data`). +- `db.py` is a dead legacy SQLite layer — nothing imports it. All persistence goes through `db_async.py`. -### Character Display Issues ✅ RESOLVED -* **Problem**: Some characters (e.g., "Crazed n Dazed") not appearing in frontend -* **Root Cause**: Database connection pool exhaustion from portal error spam -* **Solution**: Fixed portal errors to reduce database load -* **Result**: Characters now display correctly after portal fix +## Route conventions -### Docker Container Deployment -* **Issue**: Code changes require container rebuild with `--no-cache` flag -* **Command**: `docker compose build --no-cache dereth-tracker` -* **Reason**: Docker layer caching can prevent updated source code from being copied +- nginx strips `/api/` before proxying, so backend routes must NOT start with `/api/`. +- Routes that need to bypass the strip are hyphen-named on purpose: `/api-version`, `/api-admin/...` (they fall through nginx's `location /`). +- The static SPA is mounted last (`app.mount('/', NoCacheStaticFiles(...), html=True)`), so unmatched paths serve `static/`. +- `/inv/*` is a catch-all HTTP proxy to the inventory service; `/api/agent/*` is proxied by nginx (not the tracker) to the host-side agent. -## Current Known Issues +## Frontend -### Minor Portal Race Conditions -* **Status**: ~11 duplicate key errors per minute (down from 600+) -* **Cause**: Multiple players discovering same portal simultaneously -* **Impact**: Minimal - errors are caught and handled gracefully -* **Handling**: Try/catch in code logs as debug messages and updates portal timestamp -* **Potential Fix**: PostgreSQL ON CONFLICT DO UPDATE (upsert pattern) would eliminate completely +- Source: `frontend/` (React 19 + Vite + TypeScript). Built output goes to `static/_build/`, then `deploy-frontend.sh` copies it into `static/` — **running `bash deploy-frontend.sh` alone is the complete build+deploy flow** (it runs `npm run build` itself). +- Local dev: `cd frontend && npm run dev` (port 5173, `/api` proxied to localhost:8765). +- The React app's WebSocket URL is `/api/ws/live` (goes through nginx `location /api/`); the classic frontend uses `/ws/live` (through `location /`). +- Window components are routed by id prefix in `WindowRenderer.tsx`: `{prefix}-{charName}` (chat|stats|char|inv|radar|combat|combatpicker|issues|vitalsharing|queststatus|playerdash|agent|adminusers). +- `?view=dashboard` renders the fullscreen Player Dashboard (own tab, own WS connection per tab — by design). +- Map positions update from the 5 s `/live` HTTP poll; backend telemetry broadcasts have no `type` field so the WS telemetry branch in the frontend is inert. -### Database Initialization Warnings -* **TimescaleDB Hypertable**: `telemetry_events` fails to become hypertable due to primary key constraint -* **Impact**: None - table works as regular PostgreSQL table -* **Warning**: "cannot create a unique index without the column 'timestamp'" +## Suitbuilder -### Connection Pool Under Load -* **Issue**: Database queries can timeout when connection pool is exhausted -* **Symptom**: Characters may not appear during high error load -* **Mitigation**: Portal error fix significantly reduced this issue +Production equipment-optimization engine (`inventory-service/suitbuilder.py`): multi-character search, armor set constraints, cantrip overlap detection, SSE streaming. UI at `/suitbuilder.html`. Architecture doc: `docs/plans/2026-02-09-suitbuilder-architecture.md`. +Known limitations: no slot-aware spell filtering, equal spell weighting. The legacy `/optimize/*` solver in inventory-service/main.py is a near-duplicate — `suitbuilder.py` is the production path. -## Equipment Suit Builder +## Deploying -### Status: PRODUCTION READY +See workspace `../CLAUDE.md` "Build & Deploy Instructions" — quick deploy (git pull + `docker compose restart dereth-tracker` for Python; nothing for static), `deploy-frontend.sh` for React, full `--no-cache` rebuild only for Dockerfile/pip/version-stamp changes. Bind mounts: `main.py`, `db_async.py`, `static/`, `alembic/` only. -Real-time equipment optimization engine for building optimal character loadouts by searching across multiple characters' inventories (mules). Uses Mag-SuitBuilder constraint satisfaction algorithms. +## Operational notes -**Core Features:** -- Multi-character inventory search across 100+ characters, 25,000+ items -- Armor set constraints (primary 5-piece + secondary 4-piece set support) -- Cantrip/ward spell optimization with bitmap-based overlap detection -- Crit damage rating optimization -- Locked slots with set/spell preservation across searches -- Real-time SSE streaming with progressive phase updates -- Suit summary with copy-to-clipboard functionality -- Stable deterministic sorting for reproducible results - -**Access:** `/suitbuilder.html` - -**Architecture Details:** See `docs/plans/2026-02-09-suitbuilder-architecture.md` - -### Known Limitations -- Slot-aware spell filtering not yet implemented (e.g., underclothes have limited spell pools but system treats all slots equally) -- All spells weighted equally (no priority/importance weighting yet) -- See architecture doc for future enhancement roadmap - -## Technical Notes for Development - -### Database Performance -- Connection pool: 5-20 connections (configured in `db_async.py`) -- Under heavy error load, pool exhaustion can cause 2-minute query timeouts -- Portal error fix significantly improved database performance - -### Docker Development Workflow -1. **Code Changes**: Edit source files locally -2. **Rebuild**: `docker compose build --no-cache dereth-tracker` (required for code changes) -3. **Deploy**: `docker compose up -d dereth-tracker` -4. **Debug**: `docker logs mosswartoverlord-dereth-tracker-1` and `docker logs dereth-db` - -### Frontend Architecture -- **Main Map**: `static/index.html` - Real-time player tracking -- **Inventory Search**: `static/inventory.html` - Advanced item filtering -- **Suitbuilder**: `static/suitbuilder.html` - Equipment optimization interface -- **All static files**: Served directly by FastAPI StaticFiles - -### DOM Optimization Status ✅ COMPLETE (September 2025) -* **Achievement**: 100% DOM element reuse with zero element creation after initial render -* **Performance**: ~5ms render time for 69 players, eliminated 4,140+ elements/minute creation -* **Implementation**: Element pooling system with player name mapping for O(1) lookup -* **Monitoring**: Color-coded console output (✨ green = optimized, ⚡ yellow = partial, 🔥 red = poor) -* **Status**: Production ready - achieving perfect element reuse consistently - -**Current Render Stats**: -- ✅ This render: 0 dots created, 69 reused | 0 list items created, 69 reused -- ✅ Lifetime: 69 dots created, 800+ reused | 69 list items created, 800+ reused - -**Remaining TODO**: -- ❌ Fix CSS Grid layout for player sidebar (deferred per user request) -- ❌ Extend optimization to trails and portal rendering -- ❌ Add memory usage tracking - -### WebSocket Endpoints -- `/ws/position`: Plugin telemetry, inventory, portal, rare events (authenticated) -- `/ws/live`: Browser client commands and live updates (unauthenticated) +- Discord: rare bot posts rares + relays allegiance chat; **death/idle alerts come from the backend** via `DISCORD_ACLOG_WEBHOOK` (`_idle_detection_loop` in main.py). +- Issues board persists to a flat file `static/openissues.json` (web-served, bind-mounted). +- Server status (Coldeve) is polled via UDP every 30 s; TreeStats player count every 5 min. +- Debugging: `docker logs mosswartoverlord-dereth-tracker-1`, `docker logs dereth-db`. Read-only psql: `docker exec dereth-db psql -U postgres -d dereth`. +- This repo is **public** on git.snakedesert.se — never commit secrets (a Grafana token was leaked & removed June 2026; nginx `/grafana/` works via anonymous Viewer auth, no token needed). Grafana's container state DB is ephemeral (no volume) — don't create service accounts expecting them to persist. --- @@ -223,4 +162,4 @@ SELECT timestamp, character_name, name FROM rare_events - `combat_stats`, `combat_stats_sessions` — combat tracking - `server_status` — current Coldeve game-server state (single row) -If asked about something not covered above, look in `db_async.py` for the schema or just try a query and report what you see. \ No newline at end of file +If asked about something not covered above, look in `db_async.py` for the schema or just try a query and report what you see.