MosswartOverlord/docs/backups.md
Erik a28b61511c security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
  ALL plugin connections (constant-time compare). The old hardcoded
  'your_shared_secret' in this public repo was no auth at all. Dockerfile
  default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
  falling back to a publicly-known signing key; agent systemd unit now
  requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
  nginx-proxied internet request satisfied via docker-proxy — full session
  bypass and unauthenticated in-game command injection) with
  private-source AND no X-Forwarded-For, i.e. only genuinely internal
  callers (overlord-agent on the host, compose-network services). Invariant
  documented in nginx/overlord.conf: every tracker-bound location must set
  X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
  reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
  (telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
  TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.

Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:02:47 +02:00

102 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Database backups
Nightly logical backups of both databases, taken by
[`scripts/backup-databases.sh`](../scripts/backup-databases.sh) via a cron
job on the live host (user `erik`, who is in the `docker` group — no sudo
needed). Install with:
```
mkdir -p /home/erik/backups # MUST exist before the first run —
# cron opens the log redirect before
# the script's own mkdir executes
crontab -e # add the line below
15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
```
Dumps land in `/home/erik/backups/postgres/` as `dereth-YYYYMMDD-HHMM.dump`
and `inventory-YYYYMMDD-HHMM.dump` (pg_dump custom format, compressed,
mode 0600). Retention: ~8 days of dailies (`-mtime +7`), pruned by the
script itself only after a successful run. The nightly `backup.log` will
contain pg_dump circular-FK warnings about hypertable chunks — those are
normal; the canary to watch is the printed dump sizes (a healthy dereth
dump is ~50 MB, and the script aborts if it drops below 10 MB).
## What is and isn't included
- **dereth** (TimescaleDB): everything EXCEPT the row data of the
`telemetry_events` and `spawn_events` hypertables (their chunk data in
`_timescaledb_internal._hyper_*` is excluded). That data is ~12 GB and
expires through retention policies within 730 days anyway. The
irreplaceable tables — `users`, `char_stats`, `rare_stats`,
`rare_stats_sessions`, `rare_events`, `combat_stats`,
`combat_stats_sessions`, `portals`, `character_stats`, `server_status`
are fully included. Table *schemas* for the excluded hypertables are
still dumped, so a restore recreates them empty.
- **inventory_db**: full dump (items, combat stats, enhancements, spells,
requirements, ratings, raw JSON).
⚠ The `_timescaledb_internal._hyper_*` exclusion drops the chunk data of
**every** hypertable, present and future. If an irreplaceable table is ever
converted to a hypertable (or a continuous aggregate is added), revisit the
exclusion list — otherwise its data silently disappears from backups.
## Off-host copies (recommended, not yet automated)
The dumps live on the same disk as the databases. Sync them off-host
periodically, e.g. from another machine:
```
rsync -av erik@overlord.snakedesert.se:backups/postgres/ ./overlord-backups/
```
## Restore
### inventory_db (plain Postgres)
```bash
docker exec -i inventory-db pg_restore -U inventory_user -d inventory_db --clean --if-exists < inventory-<stamp>.dump
```
### dereth (TimescaleDB — needs pre/post restore calls)
TimescaleDB requires putting the extension into restore mode around the
`pg_restore`, otherwise catalog rows fail:
```bash
# 1. Create a fresh DB (or use --clean against the existing one)
docker exec dereth-db psql -U postgres -c "CREATE DATABASE dereth_restore;"
docker exec dereth-db psql -U postgres -d dereth_restore -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
# 2. Pre-restore mode
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_pre_restore();"
# 3. Restore the dump
docker exec -i dereth-db pg_restore -U postgres -d dereth_restore --no-owner < dereth-<stamp>.dump
# 4. Post-restore mode (re-enables background workers, validates catalog)
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_post_restore();"
```
Notes:
- Step 3 reports one ignorable error — the dump's `CREATE EXTENSION
timescaledb` collides with the extension pre-created in step 1
("already exists", `errors ignored on restore: 1`). That is expected,
not a failed restore.
- The TimescaleDB **version** at restore time must be the **same** as at
dump time (restore first, then `ALTER EXTENSION timescaledb UPDATE` if
upgrading). Same-container restores with the image pinned in
docker-compose.yml (`timescale/timescaledb:2.19.3-pg14`) are fine.
Then either point `DATABASE_URL` at the restored DB or rename databases.
The `telemetry_events`/`spawn_events` hypertables come back empty (by
design); retention/compression policies are part of the dump and reattach.
## Verifying a backup
```bash
pg_restore --list dereth-<stamp>.dump | head # table of contents
pg_restore --list dereth-<stamp>.dump | grep -c 'TABLE DATA'
```
A dump that suddenly shrinks dramatically (check `backup.log` sizes) is the
canary for silent failure.