Commit graph

17 commits

Author SHA1 Message Date
Erik
52bf9342df feat: SHARED_SECRET_LEGACY migration escape hatch for plugin secret rollout
Accepts one legacy secret alongside the real one so existing clients keep
registering while game machines migrate to websocket_secret.txt. Remove
SHARED_SECRET_LEGACY from .env after the rollout.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 20:20:19 +02:00
Erik
a28b61511c security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
  ALL plugin connections (constant-time compare). The old hardcoded
  'your_shared_secret' in this public repo was no auth at all. Dockerfile
  default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
  falling back to a publicly-known signing key; agent systemd unit now
  requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
  nginx-proxied internet request satisfied via docker-proxy — full session
  bypass and unauthenticated in-game command injection) with
  private-source AND no X-Forwarded-For, i.e. only genuinely internal
  callers (overlord-agent on the host, compose-network services). Invariant
  documented in nginx/overlord.conf: every tracker-bound location must set
  X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
  reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
  (telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
  TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.

Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:02:47 +02:00
Erik
7dc5996820 Fix PostgreSQL shared_buffers (was 96GB on 32GB host)
timescaledb-tune had configured shared_buffers=96396MB — three times the
physical RAM of the host. The kernel was giving PG everything it could
(~30GB of shared memory), leaving <100MB free for everything else.
This caused the OS page cache to be constantly evicted, every query to
hit disk, and telemetry writes to balloon to 20+ seconds.

New settings (standard 25/50 rule for 32GB):
- shared_buffers: 96GB → 8GB
- effective_cache_size: 16GB (query planner hint)
- work_mem: 16MB per operation
- maintenance_work_mem: 1GB (for vacuum/index)
- max_wal_size: 4GB

Requires a db container restart to take effect.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:11:35 +02:00
Erik
926e912c57 Server load optimization: spawn_events retention + log spam fix
Database cleanup:
- Converted spawn_events to a TimescaleDB hypertable with 7-day retention.
  Previously a regular table growing unbounded — had reached 482M rows/66GB
  from June 2025. Manual migration copied last 7 days (12M rows) to a new
  hypertable, swapped names, and dropped the old table.
  Result: DB shrunk from 77GB → 12GB.
- Dropped server_health_checks table entirely. It was write-only (850K rows,
  134MB) — only current state in server_status is actually read. Eliminated
  the insert from monitor_server_health().

Telemetry handler cleanup:
- Removed 4 per-message INFO log lines (TELEMETRY_RECEIVED, DB_WRITE_ATTEMPT,
  DB_WRITE_SUCCESS, PROCESSING_COMPLETE). At 60+ chars × every 2s = hundreds
  of log lines/sec. Replaced with single SLOW_* warnings above 500ms/1000ms
  thresholds.
- Removed redundant pool-size introspection (try/except + hasattr) on every
  telemetry message — useless noise in the hot path.
- Removed debug cache-miss and kill-delta logs.

Log level:
- docker-compose.yml: dereth-tracker LOG_LEVEL DEBUG → INFO (was dumping
  entire inventory_delta JSON payloads for every item update).
- inventory-service LOG_LEVEL DEBUG → INFO.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:08:51 +02:00
Erik
de2cc3a0e3 Add WS message filtering, idle grace period, webhook env var
- Browser WS clients can now send {"type": "subscribe", "message_types": [...]}
  to only receive specific message types. Default is all (no change for browsers).
- Discord bot subscribes to only "rare" and "chat" — eliminates 82GB+ of
  unnecessary telemetry/vitals/inventory traffic.
- Idle detection now has a 5-minute grace period before firing Discord alerts,
  preventing false positives on brief idle states.
- Added DISCORD_ACLOG_WEBHOOK env var to docker-compose.yml for death/idle alerts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:08:55 +02:00
Erik
b09169ade2 feat: add app-level authentication with login, session cookies, and admin panel
Replace Nginx basic auth with proper user accounts:
- Session cookies via itsdangerous (30-day expiry, httponly, secure)
- Password hashing with bcrypt via passlib
- Login page with AC-themed UI
- Admin page for user management (CRUD)
- AuthMiddleware exempts plugin WS and browser WS endpoints
- Issues/comments author auto-populated from session
- Sidebar shows logged-in username, admin link, and logout
- Seed users: erik (admin), alex, lundberg
- SECRET_KEY env var for cookie signing
2026-04-10 19:45:08 +02:00
erik
4d19e29847 major fixes for inventory 2025-07-02 10:29:36 +00:00
erik
dffd295091 added portals, quest tracking, discord monitor etc etc 2025-06-23 19:26:44 +00:00
erik
72de9b0f7f fixed server status uptime for coldeve 2025-06-21 21:00:23 +00:00
erik
57a2384511 added inventory service for armor and jewelry 2025-06-12 23:05:33 +00:00
erik
10c51f6825 added inventory, updated DB 2025-06-10 19:21:21 +00:00
erik
fdf9f04bc6 added grafana and minor fix 2025-06-08 09:05:43 +00:00
erik
d9b3b403da fixed crash and autorestart 2025-05-25 19:33:48 +00:00
erik
09404da121 new comments 2025-05-24 18:33:03 +00:00
erik
b2f649a489 New version with grafana 2025-05-23 08:11:11 +00:00
erik
c20d54d037 Refactor to async TimescaleDB backend & add Alembic migrations 2025-05-18 19:07:23 +00:00
erik
0313c2a2ae Added docker file 2025-05-15 07:43:17 +00:00