docs: rewrite CLAUDE.md from audit — drop stale 2025 fix journal

The old file was ~half September-2025 changelog with claims now wrong:
portal race condition (fixed via ON CONFLICT upsert), hypertable warning
(telemetry_events IS a hypertable on live), pool 5-20 (actually 5-100),
--no-cache rebuild for code changes (bind mounts + restart suffice),
/ws/live unauthenticated (cookie-authenticated since), static-HTML
frontend description (React since). Rewritten around current reality:
component map, WS endpoints + auth caveats, two-database schema-as-code
situation (alembic empty, manual ALTERs), route conventions, React
deploy flow, operational notes. Overlord Assistant Mode section
preserved verbatim (consumed at runtime by the agent service).

AGENTS.md: remove nonexistent GET /history, fix /ws/live auth claim.
Also delete stray mangled-path file from repo root.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Erik 2026-06-10 16:36:01 +02:00
parent a5b80fd9cd
commit c6a1af0c39
2 changed files with 59 additions and 120 deletions

View file

@ -68,9 +68,9 @@ Read shared integration rules first: `../AGENTS.md`.
- Main health endpoint: `GET /debug`
- Live data endpoint: `GET /live`
- History endpoint: `GET /history`
- Plugin WS endpoint: `/ws/position` (authenticated)
- Browser WS endpoint: `/ws/live` (unauthenticated)
- Trails endpoint: `GET /trails`
- Plugin WS endpoint: `/ws/position` (authenticated via X-Plugin-Secret)
- Browser WS endpoint: `/ws/live` (session-cookie authenticated; internal Docker-network clients trusted by IP)
- Inventory service endpoint family: `/search/*`, `/inventory/*`, `/suitbuilder/*`
## Repo-specific architecture notes

173
CLAUDE.md
View file

@ -1,143 +1,82 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Cross-repo workflows (plugin coupling, deploy commands, nginx) live in the workspace-level `../CLAUDE.md` — read that too for any deploy or protocol change.
## Project Overview
Dereth Tracker is a real-time telemetry service for game world tracking. It's a FastAPI-based WebSocket and HTTP API service that ingests player position/stats data via plugins and provides live map visualization through a web interface.
Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracking. A FastAPI WebSocket/HTTP service (`main.py`, single file ~4200 lines) ingests player data from the MosswartMassacre DECAL plugin and serves a live React dashboard, with TimescaleDB persistence, a separate inventory microservice, Grafana dashboards, a Discord rare bot, and a host-side Claude-powered assistant.
## Key Components
## Components
### Main Service (main.py)
- WebSocket endpoint `/ws/position` receives telemetry and inventory events
- Routes inventory events to inventory service via HTTP
- Handles real-time player tracking and map updates
| Component | Where | Runs as |
|---|---|---|
| Tracker API (`main.py`) | repo root | Docker `dereth-tracker`, 127.0.0.1:8765 |
| Telemetry DB (TimescaleDB) | `db_async.py` schema | Docker `dereth-db`, port 5432 |
| Inventory service + DB | `inventory-service/` | Docker `inventory-service` (127.0.0.1:8766) + `inventory-db` (5433) |
| React frontend | `frontend/` → built into `static/` | served by tracker (FastAPI StaticFiles) |
| Classic v1 frontend | `static/classic/` | served at `/classic` |
| Legacy vanilla pages | `static/inventory.html`, `static/suitbuilder.html` | still live |
| Grafana | compose service `dereth-grafana` | 127.0.0.1:3000, anonymous Viewer auth, proxied at `/grafana/` |
| Discord rare bot | `discord-rare-monitor/` | Docker, connects to `/ws/live` internally |
| Overlord Agent (assistant) | `agent/` | **host-side systemd service** `overlord-agent`, 127.0.0.1:8767 |
### Inventory Service (inventory-service/main.py)
- Separate FastAPI service for inventory management
- Processes inventory JSON into normalized PostgreSQL tables
- Provides search API with advanced filtering and sorting
- Uses comprehensive enum database for translating game IDs to readable names
## WebSocket endpoints
### Database Architecture
- **Telemetry DB**: TimescaleDB for time-series player tracking data
- **Inventory DB**: PostgreSQL with normalized schema for equipment data
- `items`: Core item properties
- `item_combat_stats`: Armor level, damage bonuses
- `item_enhancements`: Material, item sets, tinkering
- `item_spells`: Spell names and categories
- `item_raw_data`: Original JSON for complex queries
- `/ws/position` — plugin ingest (telemetry, inventory, portal, rare, combat, share_*, …). Authenticated by `X-Plugin-Secret` header. ⚠ The secret is currently the hardcoded placeholder `"your_shared_secret"` at `main.py:994`; the `SHARED_SECRET` env var is NOT read (known issue — fix both repos together).
- `/ws/live` — browser clients: session-cookie authenticated; clients from the Docker network (172.x / loopback) are trusted by IP. Accepts `subscribe`, `request_dungeon_map`, and `{player_name, command}` envelopes forwarded to the matching plugin socket.
- ⚠ Because nginx → docker-proxy makes ALL external traffic appear as 172.x to the app, the IP-trust shortcut currently bypasses cookie auth for proxied browsers (see workspace security notes before relying on auth).
## Memories and Known Bugs
## Auth & users
* Fixed: Material names now properly display (e.g., "Gold Celdon Girth" instead of "Unknown_Material_Gold Celdon Girth")
* Fixed: Slot column shows "-" instead of "Unknown" for items without slot data
* Fixed: All 208 items in Larsson's inventory now process successfully (was 186 with 22 SQL type errors)
* Added: Type column in inventory search using object_classes enum for accurate item type classification
* Note: ItemType data is inconsistent in JSON - using ObjectClass as primary source for Type column
- Session cookies signed with `SECRET_KEY` (itsdangerous, 30-day expiry); login at `/login`, user CRUD at `/api-admin/users` (admin-only), `/me` returns the current user.
- Users live in the `users` table (bcrypt). `seed_users()` seeds initial accounts only when the table is empty.
- The agent service (`agent/auth.py`) verifies the same cookie with the same `SECRET_KEY` — keep them identical.
## Recent Fixes (September 2025)
## Database
### Portal Coordinate Rounding Fix ✅ RESOLVED
* **Problem**: Portal insertion failed with duplicate key errors due to coordinate rounding mismatch
* **Root Cause**: Code used 2 decimal places (`ROUND(ns::numeric, 2)`) but database constraint used 1 decimal place
* **Solution**: Changed all portal coordinate checks to use 1 decimal place to match DB constraint
* **Result**: 98% reduction in duplicate key errors (from 600+/min to ~11/min)
* **Location**: `main.py` lines ~1989, 1996, 2025, 2047
- **Two separate Postgres databases**: telemetry (`dereth` on TimescaleDB, container `dereth-db`) and inventory (`inventory_db` on plain postgres:14, container `inventory-db`).
- **Schema source of truth is code, not migrations**: `db_async.py` table metadata + `metadata.create_all()` + ad-hoc `IF NOT EXISTS` DDL in `init_db_async()`. Alembic is configured but `alembic/versions/` is empty — `create_all()` never ALTERs existing tables, so **adding a column to db_async.py requires a manual `ALTER TABLE` on the live DB**.
- Hypertables: `telemetry_events` (retention via `DB_RETENTION_DAYS`, default 7 days in code) and `spawn_events` (7 days). Both confirmed hypertables on the live DB with active retention jobs.
- ⚠ Known divergence: live `portals` unique index uses `ROUND(...,1)` (matches the `ON CONFLICT` in main.py), but `db_async.py` creates `ROUND(...,2)` on fresh databases — a fresh install breaks portal upserts until aligned.
- Connection pool: `min_size=5, max_size=100, command_timeout=120` (`db_async.py:21`). Postgres `max_connections` is the default 100, shared with Grafana and the agent's read-only role — don't widen the pool further.
- Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only.
- Read-only agent role `overlord_agent_ro` is provisioned manually via `agent/sql/0001_overlord_agent_ro.sql` (SELECT-only).
- There is **no backup mechanism** — durability is the two Docker volumes (`timescale-data`, `inventory-data`).
- `db.py` is a dead legacy SQLite layer — nothing imports it. All persistence goes through `db_async.py`.
### Character Display Issues ✅ RESOLVED
* **Problem**: Some characters (e.g., "Crazed n Dazed") not appearing in frontend
* **Root Cause**: Database connection pool exhaustion from portal error spam
* **Solution**: Fixed portal errors to reduce database load
* **Result**: Characters now display correctly after portal fix
## Route conventions
### Docker Container Deployment
* **Issue**: Code changes require container rebuild with `--no-cache` flag
* **Command**: `docker compose build --no-cache dereth-tracker`
* **Reason**: Docker layer caching can prevent updated source code from being copied
- nginx strips `/api/` before proxying, so backend routes must NOT start with `/api/`.
- Routes that need to bypass the strip are hyphen-named on purpose: `/api-version`, `/api-admin/...` (they fall through nginx's `location /`).
- The static SPA is mounted last (`app.mount('/', NoCacheStaticFiles(...), html=True)`), so unmatched paths serve `static/`.
- `/inv/*` is a catch-all HTTP proxy to the inventory service; `/api/agent/*` is proxied by nginx (not the tracker) to the host-side agent.
## Current Known Issues
## Frontend
### Minor Portal Race Conditions
* **Status**: ~11 duplicate key errors per minute (down from 600+)
* **Cause**: Multiple players discovering same portal simultaneously
* **Impact**: Minimal - errors are caught and handled gracefully
* **Handling**: Try/catch in code logs as debug messages and updates portal timestamp
* **Potential Fix**: PostgreSQL ON CONFLICT DO UPDATE (upsert pattern) would eliminate completely
- Source: `frontend/` (React 19 + Vite + TypeScript). Built output goes to `static/_build/`, then `deploy-frontend.sh` copies it into `static/`**running `bash deploy-frontend.sh` alone is the complete build+deploy flow** (it runs `npm run build` itself).
- Local dev: `cd frontend && npm run dev` (port 5173, `/api` proxied to localhost:8765).
- The React app's WebSocket URL is `/api/ws/live` (goes through nginx `location /api/`); the classic frontend uses `/ws/live` (through `location /`).
- Window components are routed by id prefix in `WindowRenderer.tsx`: `{prefix}-{charName}` (chat|stats|char|inv|radar|combat|combatpicker|issues|vitalsharing|queststatus|playerdash|agent|adminusers).
- `?view=dashboard` renders the fullscreen Player Dashboard (own tab, own WS connection per tab — by design).
- Map positions update from the 5 s `/live` HTTP poll; backend telemetry broadcasts have no `type` field so the WS telemetry branch in the frontend is inert.
### Database Initialization Warnings
* **TimescaleDB Hypertable**: `telemetry_events` fails to become hypertable due to primary key constraint
* **Impact**: None - table works as regular PostgreSQL table
* **Warning**: "cannot create a unique index without the column 'timestamp'"
## Suitbuilder
### Connection Pool Under Load
* **Issue**: Database queries can timeout when connection pool is exhausted
* **Symptom**: Characters may not appear during high error load
* **Mitigation**: Portal error fix significantly reduced this issue
Production equipment-optimization engine (`inventory-service/suitbuilder.py`): multi-character search, armor set constraints, cantrip overlap detection, SSE streaming. UI at `/suitbuilder.html`. Architecture doc: `docs/plans/2026-02-09-suitbuilder-architecture.md`.
Known limitations: no slot-aware spell filtering, equal spell weighting. The legacy `/optimize/*` solver in inventory-service/main.py is a near-duplicate — `suitbuilder.py` is the production path.
## Equipment Suit Builder
## Deploying
### Status: PRODUCTION READY
See workspace `../CLAUDE.md` "Build & Deploy Instructions" — quick deploy (git pull + `docker compose restart dereth-tracker` for Python; nothing for static), `deploy-frontend.sh` for React, full `--no-cache` rebuild only for Dockerfile/pip/version-stamp changes. Bind mounts: `main.py`, `db_async.py`, `static/`, `alembic/` only.
Real-time equipment optimization engine for building optimal character loadouts by searching across multiple characters' inventories (mules). Uses Mag-SuitBuilder constraint satisfaction algorithms.
## Operational notes
**Core Features:**
- Multi-character inventory search across 100+ characters, 25,000+ items
- Armor set constraints (primary 5-piece + secondary 4-piece set support)
- Cantrip/ward spell optimization with bitmap-based overlap detection
- Crit damage rating optimization
- Locked slots with set/spell preservation across searches
- Real-time SSE streaming with progressive phase updates
- Suit summary with copy-to-clipboard functionality
- Stable deterministic sorting for reproducible results
**Access:** `/suitbuilder.html`
**Architecture Details:** See `docs/plans/2026-02-09-suitbuilder-architecture.md`
### Known Limitations
- Slot-aware spell filtering not yet implemented (e.g., underclothes have limited spell pools but system treats all slots equally)
- All spells weighted equally (no priority/importance weighting yet)
- See architecture doc for future enhancement roadmap
## Technical Notes for Development
### Database Performance
- Connection pool: 5-20 connections (configured in `db_async.py`)
- Under heavy error load, pool exhaustion can cause 2-minute query timeouts
- Portal error fix significantly improved database performance
### Docker Development Workflow
1. **Code Changes**: Edit source files locally
2. **Rebuild**: `docker compose build --no-cache dereth-tracker` (required for code changes)
3. **Deploy**: `docker compose up -d dereth-tracker`
4. **Debug**: `docker logs mosswartoverlord-dereth-tracker-1` and `docker logs dereth-db`
### Frontend Architecture
- **Main Map**: `static/index.html` - Real-time player tracking
- **Inventory Search**: `static/inventory.html` - Advanced item filtering
- **Suitbuilder**: `static/suitbuilder.html` - Equipment optimization interface
- **All static files**: Served directly by FastAPI StaticFiles
### DOM Optimization Status ✅ COMPLETE (September 2025)
* **Achievement**: 100% DOM element reuse with zero element creation after initial render
* **Performance**: ~5ms render time for 69 players, eliminated 4,140+ elements/minute creation
* **Implementation**: Element pooling system with player name mapping for O(1) lookup
* **Monitoring**: Color-coded console output (✨ green = optimized, ⚡ yellow = partial, 🔥 red = poor)
* **Status**: Production ready - achieving perfect element reuse consistently
**Current Render Stats**:
- ✅ This render: 0 dots created, 69 reused | 0 list items created, 69 reused
- ✅ Lifetime: 69 dots created, 800+ reused | 69 list items created, 800+ reused
**Remaining TODO**:
- ❌ Fix CSS Grid layout for player sidebar (deferred per user request)
- ❌ Extend optimization to trails and portal rendering
- ❌ Add memory usage tracking
### WebSocket Endpoints
- `/ws/position`: Plugin telemetry, inventory, portal, rare events (authenticated)
- `/ws/live`: Browser client commands and live updates (unauthenticated)
- Discord: rare bot posts rares + relays allegiance chat; **death/idle alerts come from the backend** via `DISCORD_ACLOG_WEBHOOK` (`_idle_detection_loop` in main.py).
- Issues board persists to a flat file `static/openissues.json` (web-served, bind-mounted).
- Server status (Coldeve) is polled via UDP every 30 s; TreeStats player count every 5 min.
- Debugging: `docker logs mosswartoverlord-dereth-tracker-1`, `docker logs dereth-db`. Read-only psql: `docker exec dereth-db psql -U postgres -d dereth`.
- This repo is **public** on git.snakedesert.se — never commit secrets (a Grafana token was leaked & removed June 2026; nginx `/grafana/` works via anonymous Viewer auth, no token needed). Grafana's container state DB is ephemeral (no volume) — don't create service accounts expecting them to persist.
---
@ -223,4 +162,4 @@ SELECT timestamp, character_name, name FROM rare_events
- `combat_stats`, `combat_stats_sessions` — combat tracking
- `server_status` — current Coldeve game-server state (single row)
If asked about something not covered above, look in `db_async.py` for the schema or just try a query and report what you see.
If asked about something not covered above, look in `db_async.py` for the schema or just try a query and report what you see.