security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups

- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
  ALL plugin connections (constant-time compare). The old hardcoded
  'your_shared_secret' in this public repo was no auth at all. Dockerfile
  default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
  falling back to a publicly-known signing key; agent systemd unit now
  requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
  nginx-proxied internet request satisfied via docker-proxy — full session
  bypass and unauthenticated in-game command injection) with
  private-source AND no X-Forwarded-For, i.e. only genuinely internal
  callers (overlord-agent on the host, compose-network services). Invariant
  documented in nginx/overlord.conf: every tracker-bound location must set
  X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
  reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
  (telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
  TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.

Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Erik 2026-06-10 17:02:47 +02:00
parent c6a1af0c39
commit a28b61511c
12 changed files with 261 additions and 2579 deletions

View file

@ -14,6 +14,12 @@
# WebSockets are long-lived; nginx's default 60s timeout drops idle clients.
# Removing these timeouts caused all plugin connections to drop every
# ~60s when no data flowed from backend to client (April 2026 incident).
# - SECURITY INVARIANT: every location that proxies to the `tracker`
# upstream MUST set proxy_set_header X-Forwarded-For. The backend treats
# a private-source request WITHOUT that header as internal (host/compose
# callers) and skips session auth — a tracker-bound location that forgot
# the header would silently bypass login for the whole internet. This
# includes any future port-80 or alternate server block.
# - /grafana/ panel embeds rely on Grafana's anonymous Viewer auth
# (GF_AUTH_ANONYMOUS_ENABLED=true in docker-compose.yml) — no credentials
# in this file. Do NOT hardcode tokens here: this file is committed to a