MosswartOverlord/docs/backups.md
Erik a28b61511c security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
  ALL plugin connections (constant-time compare). The old hardcoded
  'your_shared_secret' in this public repo was no auth at all. Dockerfile
  default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
  falling back to a publicly-known signing key; agent systemd unit now
  requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
  nginx-proxied internet request satisfied via docker-proxy — full session
  bypass and unauthenticated in-game command injection) with
  private-source AND no X-Forwarded-For, i.e. only genuinely internal
  callers (overlord-agent on the host, compose-network services). Invariant
  documented in nginx/overlord.conf: every tracker-bound location must set
  X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
  reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
  (telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
  TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.

Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:02:47 +02:00

4.4 KiB
Raw Permalink Blame History

Database backups

Nightly logical backups of both databases, taken by scripts/backup-databases.sh via a cron job on the live host (user erik, who is in the docker group — no sudo needed). Install with:

mkdir -p /home/erik/backups            # MUST exist before the first run —
                                       # cron opens the log redirect before
                                       # the script's own mkdir executes
crontab -e                             # add the line below
15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1

Dumps land in /home/erik/backups/postgres/ as dereth-YYYYMMDD-HHMM.dump and inventory-YYYYMMDD-HHMM.dump (pg_dump custom format, compressed, mode 0600). Retention: ~8 days of dailies (-mtime +7), pruned by the script itself only after a successful run. The nightly backup.log will contain pg_dump circular-FK warnings about hypertable chunks — those are normal; the canary to watch is the printed dump sizes (a healthy dereth dump is ~50 MB, and the script aborts if it drops below 10 MB).

What is and isn't included

  • dereth (TimescaleDB): everything EXCEPT the row data of the telemetry_events and spawn_events hypertables (their chunk data in _timescaledb_internal._hyper_* is excluded). That data is ~12 GB and expires through retention policies within 730 days anyway. The irreplaceable tables — users, char_stats, rare_stats, rare_stats_sessions, rare_events, combat_stats, combat_stats_sessions, portals, character_stats, server_status — are fully included. Table schemas for the excluded hypertables are still dumped, so a restore recreates them empty.
  • inventory_db: full dump (items, combat stats, enhancements, spells, requirements, ratings, raw JSON).

⚠ The _timescaledb_internal._hyper_* exclusion drops the chunk data of every hypertable, present and future. If an irreplaceable table is ever converted to a hypertable (or a continuous aggregate is added), revisit the exclusion list — otherwise its data silently disappears from backups.

The dumps live on the same disk as the databases. Sync them off-host periodically, e.g. from another machine:

rsync -av erik@overlord.snakedesert.se:backups/postgres/ ./overlord-backups/

Restore

inventory_db (plain Postgres)

docker exec -i inventory-db pg_restore -U inventory_user -d inventory_db --clean --if-exists < inventory-<stamp>.dump

dereth (TimescaleDB — needs pre/post restore calls)

TimescaleDB requires putting the extension into restore mode around the pg_restore, otherwise catalog rows fail:

# 1. Create a fresh DB (or use --clean against the existing one)
docker exec dereth-db psql -U postgres -c "CREATE DATABASE dereth_restore;"
docker exec dereth-db psql -U postgres -d dereth_restore -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"

# 2. Pre-restore mode
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_pre_restore();"

# 3. Restore the dump
docker exec -i dereth-db pg_restore -U postgres -d dereth_restore --no-owner < dereth-<stamp>.dump

# 4. Post-restore mode (re-enables background workers, validates catalog)
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_post_restore();"

Notes:

  • Step 3 reports one ignorable error — the dump's CREATE EXTENSION timescaledb collides with the extension pre-created in step 1 ("already exists", errors ignored on restore: 1). That is expected, not a failed restore.
  • The TimescaleDB version at restore time must be the same as at dump time (restore first, then ALTER EXTENSION timescaledb UPDATE if upgrading). Same-container restores with the image pinned in docker-compose.yml (timescale/timescaledb:2.19.3-pg14) are fine.

Then either point DATABASE_URL at the restored DB or rename databases. The telemetry_events/spawn_events hypertables come back empty (by design); retention/compression policies are part of the dump and reattach.

Verifying a backup

pg_restore --list dereth-<stamp>.dump | head      # table of contents
pg_restore --list dereth-<stamp>.dump | grep -c 'TABLE DATA'

A dump that suddenly shrinks dramatically (check backup.log sizes) is the canary for silent failure.