security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses ALL plugin connections (constant-time compare). The old hardcoded 'your_shared_secret' in this public repo was no auth at all. Dockerfile default removed; generate_data.py reads the env var. - SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of falling back to a publicly-known signing key; agent systemd unit now requires /etc/overlord/agent.env (no '-' prefix). - AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every nginx-proxied internet request satisfied via docker-proxy — full session bypass and unauthenticated in-game command injection) with private-source AND no X-Forwarded-For, i.e. only genuinely internal callers (overlord-agent on the host, compose-network services). Invariant documented in nginx/overlord.conf: every tracker-bound location must set X-Forwarded-For. - /character-stats/test endpoints gated behind admin (they upsert real rows). - docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet- reachable; active brute-force observed in dereth-db logs). - discord-rare-monitor: drop dead SHARED_SECRET constant. - scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs (telemetry/spawn hypertable data excluded), 10MB canary, umask 077, TimescaleDB restore procedure. - Remove stray mangled-path css file from repo root. Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy- sequencing blockers addressed (secret staged before enforcement, exec bit set, cron uses bash). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
c6a1af0c39
commit
a28b61511c
12 changed files with 261 additions and 2579 deletions
102
docs/backups.md
Normal file
102
docs/backups.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
# Database backups
|
||||
|
||||
Nightly logical backups of both databases, taken by
|
||||
[`scripts/backup-databases.sh`](../scripts/backup-databases.sh) via a cron
|
||||
job on the live host (user `erik`, who is in the `docker` group — no sudo
|
||||
needed). Install with:
|
||||
|
||||
```
|
||||
mkdir -p /home/erik/backups # MUST exist before the first run —
|
||||
# cron opens the log redirect before
|
||||
# the script's own mkdir executes
|
||||
crontab -e # add the line below
|
||||
15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
|
||||
```
|
||||
|
||||
Dumps land in `/home/erik/backups/postgres/` as `dereth-YYYYMMDD-HHMM.dump`
|
||||
and `inventory-YYYYMMDD-HHMM.dump` (pg_dump custom format, compressed,
|
||||
mode 0600). Retention: ~8 days of dailies (`-mtime +7`), pruned by the
|
||||
script itself only after a successful run. The nightly `backup.log` will
|
||||
contain pg_dump circular-FK warnings about hypertable chunks — those are
|
||||
normal; the canary to watch is the printed dump sizes (a healthy dereth
|
||||
dump is ~50 MB, and the script aborts if it drops below 10 MB).
|
||||
|
||||
## What is and isn't included
|
||||
|
||||
- **dereth** (TimescaleDB): everything EXCEPT the row data of the
|
||||
`telemetry_events` and `spawn_events` hypertables (their chunk data in
|
||||
`_timescaledb_internal._hyper_*` is excluded). That data is ~12 GB and
|
||||
expires through retention policies within 7–30 days anyway. The
|
||||
irreplaceable tables — `users`, `char_stats`, `rare_stats`,
|
||||
`rare_stats_sessions`, `rare_events`, `combat_stats`,
|
||||
`combat_stats_sessions`, `portals`, `character_stats`, `server_status` —
|
||||
are fully included. Table *schemas* for the excluded hypertables are
|
||||
still dumped, so a restore recreates them empty.
|
||||
- **inventory_db**: full dump (items, combat stats, enhancements, spells,
|
||||
requirements, ratings, raw JSON).
|
||||
|
||||
⚠ The `_timescaledb_internal._hyper_*` exclusion drops the chunk data of
|
||||
**every** hypertable, present and future. If an irreplaceable table is ever
|
||||
converted to a hypertable (or a continuous aggregate is added), revisit the
|
||||
exclusion list — otherwise its data silently disappears from backups.
|
||||
|
||||
## Off-host copies (recommended, not yet automated)
|
||||
|
||||
The dumps live on the same disk as the databases. Sync them off-host
|
||||
periodically, e.g. from another machine:
|
||||
|
||||
```
|
||||
rsync -av erik@overlord.snakedesert.se:backups/postgres/ ./overlord-backups/
|
||||
```
|
||||
|
||||
## Restore
|
||||
|
||||
### inventory_db (plain Postgres)
|
||||
|
||||
```bash
|
||||
docker exec -i inventory-db pg_restore -U inventory_user -d inventory_db --clean --if-exists < inventory-<stamp>.dump
|
||||
```
|
||||
|
||||
### dereth (TimescaleDB — needs pre/post restore calls)
|
||||
|
||||
TimescaleDB requires putting the extension into restore mode around the
|
||||
`pg_restore`, otherwise catalog rows fail:
|
||||
|
||||
```bash
|
||||
# 1. Create a fresh DB (or use --clean against the existing one)
|
||||
docker exec dereth-db psql -U postgres -c "CREATE DATABASE dereth_restore;"
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
|
||||
|
||||
# 2. Pre-restore mode
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_pre_restore();"
|
||||
|
||||
# 3. Restore the dump
|
||||
docker exec -i dereth-db pg_restore -U postgres -d dereth_restore --no-owner < dereth-<stamp>.dump
|
||||
|
||||
# 4. Post-restore mode (re-enables background workers, validates catalog)
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_post_restore();"
|
||||
```
|
||||
|
||||
Notes:
|
||||
- Step 3 reports one ignorable error — the dump's `CREATE EXTENSION
|
||||
timescaledb` collides with the extension pre-created in step 1
|
||||
("already exists", `errors ignored on restore: 1`). That is expected,
|
||||
not a failed restore.
|
||||
- The TimescaleDB **version** at restore time must be the **same** as at
|
||||
dump time (restore first, then `ALTER EXTENSION timescaledb UPDATE` if
|
||||
upgrading). Same-container restores with the image pinned in
|
||||
docker-compose.yml (`timescale/timescaledb:2.19.3-pg14`) are fine.
|
||||
|
||||
Then either point `DATABASE_URL` at the restored DB or rename databases.
|
||||
The `telemetry_events`/`spawn_events` hypertables come back empty (by
|
||||
design); retention/compression policies are part of the dump and reattach.
|
||||
|
||||
## Verifying a backup
|
||||
|
||||
```bash
|
||||
pg_restore --list dereth-<stamp>.dump | head # table of contents
|
||||
pg_restore --list dereth-<stamp>.dump | grep -c 'TABLE DATA'
|
||||
```
|
||||
|
||||
A dump that suddenly shrinks dramatically (check `backup.log` sizes) is the
|
||||
canary for silent failure.
|
||||
Loading…
Add table
Add a link
Reference in a new issue