security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses ALL plugin connections (constant-time compare). The old hardcoded 'your_shared_secret' in this public repo was no auth at all. Dockerfile default removed; generate_data.py reads the env var. - SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of falling back to a publicly-known signing key; agent systemd unit now requires /etc/overlord/agent.env (no '-' prefix). - AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every nginx-proxied internet request satisfied via docker-proxy — full session bypass and unauthenticated in-game command injection) with private-source AND no X-Forwarded-For, i.e. only genuinely internal callers (overlord-agent on the host, compose-network services). Invariant documented in nginx/overlord.conf: every tracker-bound location must set X-Forwarded-For. - /character-stats/test endpoints gated behind admin (they upsert real rows). - docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet- reachable; active brute-force observed in dereth-db logs). - discord-rare-monitor: drop dead SHARED_SECRET constant. - scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs (telemetry/spawn hypertable data excluded), 10MB canary, umask 077, TimescaleDB restore procedure. - Remove stray mangled-path css file from repo root. Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy- sequencing blockers addressed (secret staged before enforcement, exec bit set, cron uses bash). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
c6a1af0c39
commit
a28b61511c
12 changed files with 261 additions and 2579 deletions
|
|
@ -42,7 +42,7 @@ Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracki
|
|||
- Connection pool: `min_size=5, max_size=100, command_timeout=120` (`db_async.py:21`). Postgres `max_connections` is the default 100, shared with Grafana and the agent's read-only role — don't widen the pool further.
|
||||
- Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only.
|
||||
- Read-only agent role `overlord_agent_ro` is provisioned manually via `agent/sql/0001_overlord_agent_ro.sql` (SELECT-only).
|
||||
- There is **no backup mechanism** — durability is the two Docker volumes (`timescale-data`, `inventory-data`).
|
||||
- Backups: nightly cron on the host runs `scripts/backup-databases.sh` (pg_dump both DBs to `/home/erik/backups/postgres/`, 7-day retention; telemetry/spawn hypertable data deliberately excluded). Restore procedure: `docs/backups.md` — TimescaleDB needs `timescaledb_pre_restore()/post_restore()`.
|
||||
- `db.py` is a dead legacy SQLite layer — nothing imports it. All persistence goes through `db_async.py`.
|
||||
|
||||
## Route conventions
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -36,13 +36,14 @@ ARG BUILD_VERSION=dev
|
|||
ENV APP_VERSION=$BUILD_VERSION
|
||||
|
||||
## Default environment variables for application configuration
|
||||
## NOTE: no SHARED_SECRET default here on purpose — main.py fails closed
|
||||
## (refuses plugin connections) unless a real value arrives via compose/.env.
|
||||
ENV DATABASE_URL=postgresql://postgres:password@db:5432/dereth \
|
||||
DB_MAX_SIZE_MB=2048 \
|
||||
DB_RETENTION_DAYS=7 \
|
||||
DB_MAX_SQL_LENGTH=1000000000 \
|
||||
DB_MAX_SQL_VARIABLES=32766 \
|
||||
DB_WAL_AUTOCHECKPOINT_PAGES=1000 \
|
||||
SHARED_SECRET=your_shared_secret
|
||||
DB_WAL_AUTOCHECKPOINT_PAGES=1000
|
||||
|
||||
## Launch the FastAPI app using Uvicorn
|
||||
CMD ["uvicorn","main:app","--host","0.0.0.0","--port","8765","--workers","1","--no-access-log","--log-level","warning"]
|
||||
|
|
|
|||
|
|
@ -12,8 +12,15 @@ import os
|
|||
from fastapi import HTTPException, Request, status
|
||||
from itsdangerous import BadSignature, SignatureExpired, URLSafeTimedSerializer
|
||||
|
||||
# Mirror main.py:996-998
|
||||
SECRET_KEY = os.getenv("SECRET_KEY", "change-me-in-production-please")
|
||||
# Mirror main.py — and fail closed like it does: starting with a known
|
||||
# default key would let anyone forge a valid session cookie.
|
||||
SECRET_KEY = os.getenv("SECRET_KEY", "")
|
||||
if not SECRET_KEY or SECRET_KEY == "change-me-in-production-please":
|
||||
raise RuntimeError(
|
||||
"SECRET_KEY env var must be set (shared with dereth-tracker; see "
|
||||
"/etc/overlord/agent.env) — refusing to start with a forgeable "
|
||||
"session-signing key"
|
||||
)
|
||||
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days
|
||||
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
||||
|
||||
|
|
|
|||
|
|
@ -20,8 +20,10 @@ WorkingDirectory=/home/erik/MosswartOverlord
|
|||
# HOME explicitly set so claude reads /var/lib/overlord-agent/.claude/*
|
||||
# instead of trying /home/erik/.claude/* (which is now 0700, locked out).
|
||||
Environment="HOME=/var/lib/overlord-agent"
|
||||
# Secrets file (root:overlord-agent 0640).
|
||||
EnvironmentFile=-/etc/overlord/agent.env
|
||||
# Secrets file (root:overlord-agent 0640). REQUIRED (no leading '-'):
|
||||
# a missing secrets file must abort startup, not fail open — auth.py also
|
||||
# refuses to start without SECRET_KEY.
|
||||
EnvironmentFile=/etc/overlord/agent.env
|
||||
# Run inside the venv populated by install.sh.
|
||||
ExecStart=/home/erik/MosswartOverlord/agent/.venv/bin/python -m agent.service
|
||||
Restart=on-failure
|
||||
|
|
|
|||
|
|
@ -34,7 +34,6 @@ logger = logging.getLogger(__name__)
|
|||
# Configuration from environment variables
|
||||
DISCORD_TOKEN = os.getenv('DISCORD_RARE_BOT_TOKEN')
|
||||
WEBSOCKET_URL = os.getenv('DERETH_TRACKER_WS_URL', 'ws://dereth-tracker:8765/ws/live')
|
||||
SHARED_SECRET = 'your_shared_secret'
|
||||
ACLOG_CHANNEL_ID = int(os.getenv('ACLOG_CHANNEL_ID', '1349649482786275328'))
|
||||
COMMON_RARE_CHANNEL_ID = int(os.getenv('COMMON_RARE_CHANNEL_ID', '1355328792184226014'))
|
||||
GREAT_RARE_CHANNEL_ID = int(os.getenv('GREAT_RARE_CHANNEL_ID', '1353676584334131211'))
|
||||
|
|
|
|||
|
|
@ -62,7 +62,11 @@ services:
|
|||
volumes:
|
||||
- timescale-data:/var/lib/postgresql/data
|
||||
ports:
|
||||
- "5432:5432"
|
||||
# Loopback only — Docker-published ports bypass ufw, and this host is
|
||||
# internet-facing (active brute-force on the open port observed June
|
||||
# 2026). In-stack consumers use the compose network; host-side tools
|
||||
# (psql, overlord-agent) use 127.0.0.1.
|
||||
- "127.0.0.1:5432:5432"
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||
|
|
@ -104,7 +108,8 @@ services:
|
|||
volumes:
|
||||
- inventory-data:/var/lib/postgresql/data
|
||||
ports:
|
||||
- "5433:5432"
|
||||
# Loopback only — see db service note.
|
||||
- "127.0.0.1:5433:5432"
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U inventory_user"]
|
||||
|
|
|
|||
102
docs/backups.md
Normal file
102
docs/backups.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
# Database backups
|
||||
|
||||
Nightly logical backups of both databases, taken by
|
||||
[`scripts/backup-databases.sh`](../scripts/backup-databases.sh) via a cron
|
||||
job on the live host (user `erik`, who is in the `docker` group — no sudo
|
||||
needed). Install with:
|
||||
|
||||
```
|
||||
mkdir -p /home/erik/backups # MUST exist before the first run —
|
||||
# cron opens the log redirect before
|
||||
# the script's own mkdir executes
|
||||
crontab -e # add the line below
|
||||
15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
|
||||
```
|
||||
|
||||
Dumps land in `/home/erik/backups/postgres/` as `dereth-YYYYMMDD-HHMM.dump`
|
||||
and `inventory-YYYYMMDD-HHMM.dump` (pg_dump custom format, compressed,
|
||||
mode 0600). Retention: ~8 days of dailies (`-mtime +7`), pruned by the
|
||||
script itself only after a successful run. The nightly `backup.log` will
|
||||
contain pg_dump circular-FK warnings about hypertable chunks — those are
|
||||
normal; the canary to watch is the printed dump sizes (a healthy dereth
|
||||
dump is ~50 MB, and the script aborts if it drops below 10 MB).
|
||||
|
||||
## What is and isn't included
|
||||
|
||||
- **dereth** (TimescaleDB): everything EXCEPT the row data of the
|
||||
`telemetry_events` and `spawn_events` hypertables (their chunk data in
|
||||
`_timescaledb_internal._hyper_*` is excluded). That data is ~12 GB and
|
||||
expires through retention policies within 7–30 days anyway. The
|
||||
irreplaceable tables — `users`, `char_stats`, `rare_stats`,
|
||||
`rare_stats_sessions`, `rare_events`, `combat_stats`,
|
||||
`combat_stats_sessions`, `portals`, `character_stats`, `server_status` —
|
||||
are fully included. Table *schemas* for the excluded hypertables are
|
||||
still dumped, so a restore recreates them empty.
|
||||
- **inventory_db**: full dump (items, combat stats, enhancements, spells,
|
||||
requirements, ratings, raw JSON).
|
||||
|
||||
⚠ The `_timescaledb_internal._hyper_*` exclusion drops the chunk data of
|
||||
**every** hypertable, present and future. If an irreplaceable table is ever
|
||||
converted to a hypertable (or a continuous aggregate is added), revisit the
|
||||
exclusion list — otherwise its data silently disappears from backups.
|
||||
|
||||
## Off-host copies (recommended, not yet automated)
|
||||
|
||||
The dumps live on the same disk as the databases. Sync them off-host
|
||||
periodically, e.g. from another machine:
|
||||
|
||||
```
|
||||
rsync -av erik@overlord.snakedesert.se:backups/postgres/ ./overlord-backups/
|
||||
```
|
||||
|
||||
## Restore
|
||||
|
||||
### inventory_db (plain Postgres)
|
||||
|
||||
```bash
|
||||
docker exec -i inventory-db pg_restore -U inventory_user -d inventory_db --clean --if-exists < inventory-<stamp>.dump
|
||||
```
|
||||
|
||||
### dereth (TimescaleDB — needs pre/post restore calls)
|
||||
|
||||
TimescaleDB requires putting the extension into restore mode around the
|
||||
`pg_restore`, otherwise catalog rows fail:
|
||||
|
||||
```bash
|
||||
# 1. Create a fresh DB (or use --clean against the existing one)
|
||||
docker exec dereth-db psql -U postgres -c "CREATE DATABASE dereth_restore;"
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
|
||||
|
||||
# 2. Pre-restore mode
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_pre_restore();"
|
||||
|
||||
# 3. Restore the dump
|
||||
docker exec -i dereth-db pg_restore -U postgres -d dereth_restore --no-owner < dereth-<stamp>.dump
|
||||
|
||||
# 4. Post-restore mode (re-enables background workers, validates catalog)
|
||||
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_post_restore();"
|
||||
```
|
||||
|
||||
Notes:
|
||||
- Step 3 reports one ignorable error — the dump's `CREATE EXTENSION
|
||||
timescaledb` collides with the extension pre-created in step 1
|
||||
("already exists", `errors ignored on restore: 1`). That is expected,
|
||||
not a failed restore.
|
||||
- The TimescaleDB **version** at restore time must be the **same** as at
|
||||
dump time (restore first, then `ALTER EXTENSION timescaledb UPDATE` if
|
||||
upgrading). Same-container restores with the image pinned in
|
||||
docker-compose.yml (`timescale/timescaledb:2.19.3-pg14`) are fine.
|
||||
|
||||
Then either point `DATABASE_URL` at the restored DB or rename databases.
|
||||
The `telemetry_events`/`spawn_events` hypertables come back empty (by
|
||||
design); retention/compression policies are part of the dump and reattach.
|
||||
|
||||
## Verifying a backup
|
||||
|
||||
```bash
|
||||
pg_restore --list dereth-<stamp>.dump | head # table of contents
|
||||
pg_restore --list dereth-<stamp>.dump | grep -c 'TABLE DATA'
|
||||
```
|
||||
|
||||
A dump that suddenly shrinks dramatically (check `backup.log` sizes) is the
|
||||
canary for silent failure.
|
||||
|
|
@ -7,6 +7,7 @@ fabricated TelemetrySnapshot payloads at regular intervals. Useful for:
|
|||
- Demonstrating real-time map updates without a live game client
|
||||
"""
|
||||
import asyncio # Async event loop and sleep support
|
||||
import os
|
||||
import websockets # WebSocket client for Python
|
||||
import json # JSON serialization of payloads
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
|
@ -32,8 +33,10 @@ async def main() -> None:
|
|||
# Starting coordinates (E/W and N/S)
|
||||
ew = 0.0
|
||||
ns = 0.0
|
||||
# WebSocket endpoint for plugin telemetry (include secret for auth)
|
||||
uri = "ws://localhost:8000/ws/position?secret=your_shared_secret"
|
||||
# WebSocket endpoint for plugin telemetry. The secret must match the
|
||||
# backend's SHARED_SECRET env var (no insecure default anymore).
|
||||
secret = os.environ["SHARED_SECRET"]
|
||||
uri = f"ws://localhost:8000/ws/position?secret={secret}"
|
||||
# Connect to the plugin WebSocket endpoint with authentication
|
||||
# Establish WebSocket connection to the server
|
||||
async with websockets.connect(uri) as websocket:
|
||||
|
|
|
|||
100
main.py
100
main.py
|
|
@ -8,7 +8,9 @@ endpoints for browser clients to retrieve live and historical data, trails, and
|
|||
|
||||
from collections import defaultdict
|
||||
from datetime import datetime, timedelta, timezone
|
||||
import hmac
|
||||
import html as _html
|
||||
import ipaddress
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
|
@ -990,10 +992,25 @@ live_equipment_cantrip_states: Dict[str, dict] = {}
|
|||
live_nearby_objects: Dict[str, dict] = {}
|
||||
dungeon_map_cache: Dict[str, dict] = {} # landblock hex string -> dungeon map data
|
||||
|
||||
# Shared secret used to authenticate plugin WebSocket connections (override for production)
|
||||
SHARED_SECRET = "your_shared_secret"
|
||||
# Secret key for signing session cookies (override via SECRET_KEY env var)
|
||||
SECRET_KEY = os.getenv("SECRET_KEY", "change-me-in-production-please")
|
||||
# Shared secret used to authenticate plugin WebSocket connections.
|
||||
# MUST come from the environment — this repo is public, so a hardcoded value
|
||||
# is no auth at all. When unset (or left at the old placeholder) we fail
|
||||
# closed: every plugin connection is refused until it is configured.
|
||||
SHARED_SECRET = os.getenv("SHARED_SECRET", "")
|
||||
_SHARED_SECRET_OK = bool(SHARED_SECRET) and SHARED_SECRET != "your_shared_secret"
|
||||
if not _SHARED_SECRET_OK:
|
||||
logger.critical(
|
||||
"SHARED_SECRET env var is unset or still the placeholder — "
|
||||
"refusing ALL plugin WebSocket connections until it is set in .env"
|
||||
)
|
||||
# Secret key for signing session cookies. Fail closed: running with a
|
||||
# publicly-known default would let anyone forge admin sessions.
|
||||
SECRET_KEY = os.getenv("SECRET_KEY", "")
|
||||
if not SECRET_KEY or SECRET_KEY == "change-me-in-production-please":
|
||||
raise RuntimeError(
|
||||
"SECRET_KEY env var must be set to a strong random value — "
|
||||
"session cookies are signed with it"
|
||||
)
|
||||
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days in seconds
|
||||
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
||||
|
||||
|
|
@ -1024,6 +1041,19 @@ _PUBLIC_PATHS = {"/login", "/logout"}
|
|||
_PUBLIC_PREFIXES = ("/ws/position",) # Plugin WS uses X-Plugin-Secret
|
||||
|
||||
|
||||
def _is_private_addr(host: str) -> bool:
|
||||
"""True when `host` is a private/loopback address (RFC1918, 127/8, ::1).
|
||||
|
||||
Used by the internal-trust rule: a private TCP peer WITHOUT an
|
||||
X-Forwarded-For header cannot have come through nginx and therefore
|
||||
cannot originate from the internet.
|
||||
"""
|
||||
try:
|
||||
return ipaddress.ip_address(host).is_private
|
||||
except ValueError:
|
||||
return False
|
||||
|
||||
|
||||
class AuthMiddleware(BaseHTTPMiddleware):
|
||||
"""Redirect unauthenticated requests to /login."""
|
||||
|
||||
|
|
@ -1046,20 +1076,20 @@ class AuthMiddleware(BaseHTTPMiddleware):
|
|||
if path.startswith("/ws/live"):
|
||||
return await call_next(request)
|
||||
|
||||
# Trust internal connections (Docker network gateway + loopback). The
|
||||
# tracker port (8765) is bound to 127.0.0.1 in docker-compose.yml and
|
||||
# only the host or other compose-network containers can reach it.
|
||||
# This lets host-side helpers (overlord-agent, discord-rare-monitor,
|
||||
# etc.) call any endpoint without forging a session cookie.
|
||||
#
|
||||
# IMPORTANT: We still try to decode the session cookie if present, so
|
||||
# that endpoints like /me which check `request.state.user` work for
|
||||
# real authenticated browsers proxied through nginx → docker-proxy
|
||||
# (which makes them look like they're coming from 172.x). Without
|
||||
# this, /me returned 401 even for logged-in users, silently
|
||||
# disabling the admin-only UI on the dashboard.
|
||||
# Trust genuinely internal callers only. The tracker port (8765) is
|
||||
# published on 127.0.0.1, so host-side helpers (overlord-agent) and
|
||||
# compose-network containers reach it directly — but so does ALL
|
||||
# external browser traffic, via nginx → docker-proxy, which makes it
|
||||
# arrive with a 172.x source IP. Source IP alone therefore proves
|
||||
# nothing. The distinguishing signal is X-Forwarded-For: nginx sets
|
||||
# it on every proxied request, while direct internal calls have no
|
||||
# proxy in front of them and lack the header. A request with a
|
||||
# private source AND no X-Forwarded-For cannot have come through
|
||||
# nginx, i.e. cannot originate from the internet.
|
||||
client_host = request.client.host if request.client else ""
|
||||
if client_host.startswith("172.") or client_host in ("127.0.0.1", "::1", "localhost"):
|
||||
if _is_private_addr(client_host) and "x-forwarded-for" not in request.headers:
|
||||
# Still decode the cookie if present so request.state.user works
|
||||
# for internal tools that do log in.
|
||||
token = request.cookies.get("session")
|
||||
if token:
|
||||
user = verify_session_cookie(token)
|
||||
|
|
@ -2945,9 +2975,13 @@ async def ws_receive_snapshots(
|
|||
"""
|
||||
global _plugin_connections
|
||||
|
||||
# Authenticate plugin connection using shared secret
|
||||
key = secret or x_plugin_secret
|
||||
if key != SHARED_SECRET:
|
||||
# Authenticate plugin connection using shared secret (constant-time
|
||||
# compare; refuse everything when the secret is not configured).
|
||||
key = secret or x_plugin_secret or ""
|
||||
# compare bytes: compare_digest(str, str) raises TypeError on non-ASCII
|
||||
if not _SHARED_SECRET_OK or not hmac.compare_digest(
|
||||
key.encode("utf-8", "replace"), SHARED_SECRET.encode("utf-8")
|
||||
):
|
||||
# Reject without completing the WebSocket handshake
|
||||
logger.warning(
|
||||
f"Plugin WebSocket authentication failed from {websocket.client}"
|
||||
|
|
@ -3693,11 +3727,16 @@ async def ws_live_updates(websocket: WebSocket):
|
|||
Manages a set of connected browser clients; listens for incoming command messages
|
||||
and forwards them to the appropriate plugin client WebSocket.
|
||||
"""
|
||||
# Require valid session cookie for browser WebSocket.
|
||||
# Internal Docker network connections (172.x.x.x) are trusted — this allows
|
||||
# the Discord bot and other internal services to connect without a cookie.
|
||||
# Require a valid session cookie for browser WebSockets. Internal
|
||||
# services (discord-rare-monitor connects over the compose network) are
|
||||
# identified by a private source IP WITHOUT an X-Forwarded-For header —
|
||||
# nginx-proxied browser traffic always carries X-Forwarded-For, so an
|
||||
# internet client can never satisfy this check (same rule as
|
||||
# AuthMiddleware; see comment there).
|
||||
client_host = websocket.client.host if websocket.client else ""
|
||||
is_internal = client_host.startswith("172.") or client_host in ("127.0.0.1", "::1", "localhost")
|
||||
is_internal = (
|
||||
_is_private_addr(client_host) and "x-forwarded-for" not in websocket.headers
|
||||
)
|
||||
if not is_internal:
|
||||
token = websocket.cookies.get("session")
|
||||
if not token or not verify_session_cookie(token):
|
||||
|
|
@ -3865,15 +3904,18 @@ async def get_stats(character_name: str):
|
|||
|
||||
|
||||
@app.post("/character-stats/test")
|
||||
async def test_character_stats_default():
|
||||
"""Inject mock character_stats data for frontend development."""
|
||||
return await test_character_stats("TestCharacter")
|
||||
async def test_character_stats_default(request: Request):
|
||||
"""Inject mock character_stats data for frontend development (admin only)."""
|
||||
_require_admin(request)
|
||||
return await test_character_stats("TestCharacter", request)
|
||||
|
||||
|
||||
@app.post("/character-stats/test/{name}")
|
||||
async def test_character_stats(name: str):
|
||||
async def test_character_stats(name: str, request: Request):
|
||||
"""Inject mock character_stats data for a specific character name.
|
||||
Processes through the same pipeline as real plugin data."""
|
||||
Processes through the same pipeline as real plugin data — it OVERWRITES
|
||||
the real character_stats row for {name}, hence admin-only."""
|
||||
_require_admin(request)
|
||||
mock_data = {
|
||||
"type": "character_stats",
|
||||
"timestamp": datetime.utcnow().isoformat() + "Z",
|
||||
|
|
|
|||
|
|
@ -14,6 +14,12 @@
|
|||
# WebSockets are long-lived; nginx's default 60s timeout drops idle clients.
|
||||
# Removing these timeouts caused all plugin connections to drop every
|
||||
# ~60s when no data flowed from backend to client (April 2026 incident).
|
||||
# - SECURITY INVARIANT: every location that proxies to the `tracker`
|
||||
# upstream MUST set proxy_set_header X-Forwarded-For. The backend treats
|
||||
# a private-source request WITHOUT that header as internal (host/compose
|
||||
# callers) and skips session auth — a tracker-bound location that forgot
|
||||
# the header would silently bypass login for the whole internet. This
|
||||
# includes any future port-80 or alternate server block.
|
||||
# - /grafana/ panel embeds rely on Grafana's anonymous Viewer auth
|
||||
# (GF_AUTH_ANONYMOUS_ENABLED=true in docker-compose.yml) — no credentials
|
||||
# in this file. Do NOT hardcode tokens here: this file is committed to a
|
||||
|
|
|
|||
53
scripts/backup-databases.sh
Executable file
53
scripts/backup-databases.sh
Executable file
|
|
@ -0,0 +1,53 @@
|
|||
#!/usr/bin/env bash
|
||||
# Nightly logical backups for both MosswartOverlord databases.
|
||||
# Install as a cron job on the live host (see docs/backups.md). Note `bash`
|
||||
# in the cron line (survives a lost executable bit) and that /home/erik/backups
|
||||
# must exist BEFORE the first run (cron sets up the >> redirection before this
|
||||
# script's mkdir runs):
|
||||
# 15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
|
||||
#
|
||||
# What is backed up:
|
||||
# - dereth (TimescaleDB): full schema + all data EXCEPT the raw
|
||||
# telemetry_events/spawn_events hypertable chunks. Those tables hold
|
||||
# ~12 GB of data that expires via retention policies in 7-30 days
|
||||
# anyway; the irreplaceable rows (users, char_stats, rare_stats,
|
||||
# rare_events, combat_stats*, portals, character_stats, server_status)
|
||||
# are all included.
|
||||
# - inventory_db (postgres): full dump (~1 GB raw, much smaller compressed).
|
||||
#
|
||||
# Restore procedure: docs/backups.md (TimescaleDB needs pre/post restore calls).
|
||||
set -euo pipefail
|
||||
# Dumps contain the users table (bcrypt hashes) — keep them owner-only.
|
||||
umask 077
|
||||
|
||||
BACKUP_DIR="${BACKUP_DIR:-/home/erik/backups/postgres}"
|
||||
KEEP_DAYS="${KEEP_DAYS:-7}"
|
||||
STAMP="$(date -u +%Y%m%d-%H%M)"
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# dereth: -Fc is compressed; exclude hypertable chunk DATA (schema kept so a
|
||||
# restore recreates the tables empty and retention/compression jobs reattach).
|
||||
docker exec dereth-db pg_dump -U postgres -Fc \
|
||||
--exclude-table-data='public.telemetry_events' \
|
||||
--exclude-table-data='public.spawn_events' \
|
||||
--exclude-table-data='_timescaledb_internal._hyper_*' \
|
||||
dereth > "$BACKUP_DIR/dereth-$STAMP.dump.tmp"
|
||||
# Canary: a healthy dereth dump is ~50 MB; a tiny one means pg_dump silently
|
||||
# produced garbage (fail the run so the old dumps are kept and cron logs it).
|
||||
if [ "$(stat -c%s "$BACKUP_DIR/dereth-$STAMP.dump.tmp")" -lt 10000000 ]; then
|
||||
echo "$(date -u +%FT%TZ) FAIL dereth dump under 10MB — keeping old backups" >&2
|
||||
exit 1
|
||||
fi
|
||||
mv "$BACKUP_DIR/dereth-$STAMP.dump.tmp" "$BACKUP_DIR/dereth-$STAMP.dump"
|
||||
|
||||
docker exec inventory-db pg_dump -U inventory_user -Fc inventory_db \
|
||||
> "$BACKUP_DIR/inventory-$STAMP.dump.tmp"
|
||||
mv "$BACKUP_DIR/inventory-$STAMP.dump.tmp" "$BACKUP_DIR/inventory-$STAMP.dump"
|
||||
|
||||
# Retention: keep KEEP_DAYS days of dailies.
|
||||
find "$BACKUP_DIR" -name 'dereth-*.dump' -mtime +"$KEEP_DAYS" -delete
|
||||
find "$BACKUP_DIR" -name 'inventory-*.dump' -mtime +"$KEEP_DAYS" -delete
|
||||
# Clean up aborted runs older than a day.
|
||||
find "$BACKUP_DIR" -name '*.dump.tmp' -mtime +1 -delete
|
||||
|
||||
echo "$(date -u +%FT%TZ) OK dereth=$(du -h "$BACKUP_DIR/dereth-$STAMP.dump" | cut -f1) inventory=$(du -h "$BACKUP_DIR/inventory-$STAMP.dump" | cut -f1)"
|
||||
Loading…
Add table
Add a link
Reference in a new issue