security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses ALL plugin connections (constant-time compare). The old hardcoded 'your_shared_secret' in this public repo was no auth at all. Dockerfile default removed; generate_data.py reads the env var. - SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of falling back to a publicly-known signing key; agent systemd unit now requires /etc/overlord/agent.env (no '-' prefix). - AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every nginx-proxied internet request satisfied via docker-proxy — full session bypass and unauthenticated in-game command injection) with private-source AND no X-Forwarded-For, i.e. only genuinely internal callers (overlord-agent on the host, compose-network services). Invariant documented in nginx/overlord.conf: every tracker-bound location must set X-Forwarded-For. - /character-stats/test endpoints gated behind admin (they upsert real rows). - docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet- reachable; active brute-force observed in dereth-db logs). - discord-rare-monitor: drop dead SHARED_SECRET constant. - scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs (telemetry/spawn hypertable data excluded), 10MB canary, umask 077, TimescaleDB restore procedure. - Remove stray mangled-path css file from repo root. Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy- sequencing blockers addressed (secret staged before enforcement, exec bit set, cron uses bash). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
c6a1af0c39
commit
a28b61511c
12 changed files with 261 additions and 2579 deletions
|
|
@ -42,7 +42,7 @@ Dereth Tracker is a real-time telemetry platform for Asheron's Call world tracki
|
||||||
- Connection pool: `min_size=5, max_size=100, command_timeout=120` (`db_async.py:21`). Postgres `max_connections` is the default 100, shared with Grafana and the agent's read-only role — don't widen the pool further.
|
- Connection pool: `min_size=5, max_size=100, command_timeout=120` (`db_async.py:21`). Postgres `max_connections` is the default 100, shared with Grafana and the agent's read-only role — don't widen the pool further.
|
||||||
- Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only.
|
- Persisted event types: telemetry, spawn, rare, portal, character_stats, combat_stats. Everything else (vitals, quest, cantrips, nearby_objects, dungeon_map, share_*) is memory-only.
|
||||||
- Read-only agent role `overlord_agent_ro` is provisioned manually via `agent/sql/0001_overlord_agent_ro.sql` (SELECT-only).
|
- Read-only agent role `overlord_agent_ro` is provisioned manually via `agent/sql/0001_overlord_agent_ro.sql` (SELECT-only).
|
||||||
- There is **no backup mechanism** — durability is the two Docker volumes (`timescale-data`, `inventory-data`).
|
- Backups: nightly cron on the host runs `scripts/backup-databases.sh` (pg_dump both DBs to `/home/erik/backups/postgres/`, 7-day retention; telemetry/spawn hypertable data deliberately excluded). Restore procedure: `docs/backups.md` — TimescaleDB needs `timescaledb_pre_restore()/post_restore()`.
|
||||||
- `db.py` is a dead legacy SQLite layer — nothing imports it. All persistence goes through `db_async.py`.
|
- `db.py` is a dead legacy SQLite layer — nothing imports it. All persistence goes through `db_async.py`.
|
||||||
|
|
||||||
## Route conventions
|
## Route conventions
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -36,13 +36,14 @@ ARG BUILD_VERSION=dev
|
||||||
ENV APP_VERSION=$BUILD_VERSION
|
ENV APP_VERSION=$BUILD_VERSION
|
||||||
|
|
||||||
## Default environment variables for application configuration
|
## Default environment variables for application configuration
|
||||||
|
## NOTE: no SHARED_SECRET default here on purpose — main.py fails closed
|
||||||
|
## (refuses plugin connections) unless a real value arrives via compose/.env.
|
||||||
ENV DATABASE_URL=postgresql://postgres:password@db:5432/dereth \
|
ENV DATABASE_URL=postgresql://postgres:password@db:5432/dereth \
|
||||||
DB_MAX_SIZE_MB=2048 \
|
DB_MAX_SIZE_MB=2048 \
|
||||||
DB_RETENTION_DAYS=7 \
|
DB_RETENTION_DAYS=7 \
|
||||||
DB_MAX_SQL_LENGTH=1000000000 \
|
DB_MAX_SQL_LENGTH=1000000000 \
|
||||||
DB_MAX_SQL_VARIABLES=32766 \
|
DB_MAX_SQL_VARIABLES=32766 \
|
||||||
DB_WAL_AUTOCHECKPOINT_PAGES=1000 \
|
DB_WAL_AUTOCHECKPOINT_PAGES=1000
|
||||||
SHARED_SECRET=your_shared_secret
|
|
||||||
|
|
||||||
## Launch the FastAPI app using Uvicorn
|
## Launch the FastAPI app using Uvicorn
|
||||||
CMD ["uvicorn","main:app","--host","0.0.0.0","--port","8765","--workers","1","--no-access-log","--log-level","warning"]
|
CMD ["uvicorn","main:app","--host","0.0.0.0","--port","8765","--workers","1","--no-access-log","--log-level","warning"]
|
||||||
|
|
|
||||||
|
|
@ -12,8 +12,15 @@ import os
|
||||||
from fastapi import HTTPException, Request, status
|
from fastapi import HTTPException, Request, status
|
||||||
from itsdangerous import BadSignature, SignatureExpired, URLSafeTimedSerializer
|
from itsdangerous import BadSignature, SignatureExpired, URLSafeTimedSerializer
|
||||||
|
|
||||||
# Mirror main.py:996-998
|
# Mirror main.py — and fail closed like it does: starting with a known
|
||||||
SECRET_KEY = os.getenv("SECRET_KEY", "change-me-in-production-please")
|
# default key would let anyone forge a valid session cookie.
|
||||||
|
SECRET_KEY = os.getenv("SECRET_KEY", "")
|
||||||
|
if not SECRET_KEY or SECRET_KEY == "change-me-in-production-please":
|
||||||
|
raise RuntimeError(
|
||||||
|
"SECRET_KEY env var must be set (shared with dereth-tracker; see "
|
||||||
|
"/etc/overlord/agent.env) — refusing to start with a forgeable "
|
||||||
|
"session-signing key"
|
||||||
|
)
|
||||||
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days
|
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days
|
||||||
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -20,8 +20,10 @@ WorkingDirectory=/home/erik/MosswartOverlord
|
||||||
# HOME explicitly set so claude reads /var/lib/overlord-agent/.claude/*
|
# HOME explicitly set so claude reads /var/lib/overlord-agent/.claude/*
|
||||||
# instead of trying /home/erik/.claude/* (which is now 0700, locked out).
|
# instead of trying /home/erik/.claude/* (which is now 0700, locked out).
|
||||||
Environment="HOME=/var/lib/overlord-agent"
|
Environment="HOME=/var/lib/overlord-agent"
|
||||||
# Secrets file (root:overlord-agent 0640).
|
# Secrets file (root:overlord-agent 0640). REQUIRED (no leading '-'):
|
||||||
EnvironmentFile=-/etc/overlord/agent.env
|
# a missing secrets file must abort startup, not fail open — auth.py also
|
||||||
|
# refuses to start without SECRET_KEY.
|
||||||
|
EnvironmentFile=/etc/overlord/agent.env
|
||||||
# Run inside the venv populated by install.sh.
|
# Run inside the venv populated by install.sh.
|
||||||
ExecStart=/home/erik/MosswartOverlord/agent/.venv/bin/python -m agent.service
|
ExecStart=/home/erik/MosswartOverlord/agent/.venv/bin/python -m agent.service
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
|
|
|
||||||
|
|
@ -34,7 +34,6 @@ logger = logging.getLogger(__name__)
|
||||||
# Configuration from environment variables
|
# Configuration from environment variables
|
||||||
DISCORD_TOKEN = os.getenv('DISCORD_RARE_BOT_TOKEN')
|
DISCORD_TOKEN = os.getenv('DISCORD_RARE_BOT_TOKEN')
|
||||||
WEBSOCKET_URL = os.getenv('DERETH_TRACKER_WS_URL', 'ws://dereth-tracker:8765/ws/live')
|
WEBSOCKET_URL = os.getenv('DERETH_TRACKER_WS_URL', 'ws://dereth-tracker:8765/ws/live')
|
||||||
SHARED_SECRET = 'your_shared_secret'
|
|
||||||
ACLOG_CHANNEL_ID = int(os.getenv('ACLOG_CHANNEL_ID', '1349649482786275328'))
|
ACLOG_CHANNEL_ID = int(os.getenv('ACLOG_CHANNEL_ID', '1349649482786275328'))
|
||||||
COMMON_RARE_CHANNEL_ID = int(os.getenv('COMMON_RARE_CHANNEL_ID', '1355328792184226014'))
|
COMMON_RARE_CHANNEL_ID = int(os.getenv('COMMON_RARE_CHANNEL_ID', '1355328792184226014'))
|
||||||
GREAT_RARE_CHANNEL_ID = int(os.getenv('GREAT_RARE_CHANNEL_ID', '1353676584334131211'))
|
GREAT_RARE_CHANNEL_ID = int(os.getenv('GREAT_RARE_CHANNEL_ID', '1353676584334131211'))
|
||||||
|
|
|
||||||
|
|
@ -62,7 +62,11 @@ services:
|
||||||
volumes:
|
volumes:
|
||||||
- timescale-data:/var/lib/postgresql/data
|
- timescale-data:/var/lib/postgresql/data
|
||||||
ports:
|
ports:
|
||||||
- "5432:5432"
|
# Loopback only — Docker-published ports bypass ufw, and this host is
|
||||||
|
# internet-facing (active brute-force on the open port observed June
|
||||||
|
# 2026). In-stack consumers use the compose network; host-side tools
|
||||||
|
# (psql, overlord-agent) use 127.0.0.1.
|
||||||
|
- "127.0.0.1:5432:5432"
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
test: ["CMD-SHELL", "pg_isready -U postgres"]
|
||||||
|
|
@ -104,7 +108,8 @@ services:
|
||||||
volumes:
|
volumes:
|
||||||
- inventory-data:/var/lib/postgresql/data
|
- inventory-data:/var/lib/postgresql/data
|
||||||
ports:
|
ports:
|
||||||
- "5433:5432"
|
# Loopback only — see db service note.
|
||||||
|
- "127.0.0.1:5433:5432"
|
||||||
restart: unless-stopped
|
restart: unless-stopped
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD-SHELL", "pg_isready -U inventory_user"]
|
test: ["CMD-SHELL", "pg_isready -U inventory_user"]
|
||||||
|
|
|
||||||
102
docs/backups.md
Normal file
102
docs/backups.md
Normal file
|
|
@ -0,0 +1,102 @@
|
||||||
|
# Database backups
|
||||||
|
|
||||||
|
Nightly logical backups of both databases, taken by
|
||||||
|
[`scripts/backup-databases.sh`](../scripts/backup-databases.sh) via a cron
|
||||||
|
job on the live host (user `erik`, who is in the `docker` group — no sudo
|
||||||
|
needed). Install with:
|
||||||
|
|
||||||
|
```
|
||||||
|
mkdir -p /home/erik/backups # MUST exist before the first run —
|
||||||
|
# cron opens the log redirect before
|
||||||
|
# the script's own mkdir executes
|
||||||
|
crontab -e # add the line below
|
||||||
|
15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
Dumps land in `/home/erik/backups/postgres/` as `dereth-YYYYMMDD-HHMM.dump`
|
||||||
|
and `inventory-YYYYMMDD-HHMM.dump` (pg_dump custom format, compressed,
|
||||||
|
mode 0600). Retention: ~8 days of dailies (`-mtime +7`), pruned by the
|
||||||
|
script itself only after a successful run. The nightly `backup.log` will
|
||||||
|
contain pg_dump circular-FK warnings about hypertable chunks — those are
|
||||||
|
normal; the canary to watch is the printed dump sizes (a healthy dereth
|
||||||
|
dump is ~50 MB, and the script aborts if it drops below 10 MB).
|
||||||
|
|
||||||
|
## What is and isn't included
|
||||||
|
|
||||||
|
- **dereth** (TimescaleDB): everything EXCEPT the row data of the
|
||||||
|
`telemetry_events` and `spawn_events` hypertables (their chunk data in
|
||||||
|
`_timescaledb_internal._hyper_*` is excluded). That data is ~12 GB and
|
||||||
|
expires through retention policies within 7–30 days anyway. The
|
||||||
|
irreplaceable tables — `users`, `char_stats`, `rare_stats`,
|
||||||
|
`rare_stats_sessions`, `rare_events`, `combat_stats`,
|
||||||
|
`combat_stats_sessions`, `portals`, `character_stats`, `server_status` —
|
||||||
|
are fully included. Table *schemas* for the excluded hypertables are
|
||||||
|
still dumped, so a restore recreates them empty.
|
||||||
|
- **inventory_db**: full dump (items, combat stats, enhancements, spells,
|
||||||
|
requirements, ratings, raw JSON).
|
||||||
|
|
||||||
|
⚠ The `_timescaledb_internal._hyper_*` exclusion drops the chunk data of
|
||||||
|
**every** hypertable, present and future. If an irreplaceable table is ever
|
||||||
|
converted to a hypertable (or a continuous aggregate is added), revisit the
|
||||||
|
exclusion list — otherwise its data silently disappears from backups.
|
||||||
|
|
||||||
|
## Off-host copies (recommended, not yet automated)
|
||||||
|
|
||||||
|
The dumps live on the same disk as the databases. Sync them off-host
|
||||||
|
periodically, e.g. from another machine:
|
||||||
|
|
||||||
|
```
|
||||||
|
rsync -av erik@overlord.snakedesert.se:backups/postgres/ ./overlord-backups/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Restore
|
||||||
|
|
||||||
|
### inventory_db (plain Postgres)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker exec -i inventory-db pg_restore -U inventory_user -d inventory_db --clean --if-exists < inventory-<stamp>.dump
|
||||||
|
```
|
||||||
|
|
||||||
|
### dereth (TimescaleDB — needs pre/post restore calls)
|
||||||
|
|
||||||
|
TimescaleDB requires putting the extension into restore mode around the
|
||||||
|
`pg_restore`, otherwise catalog rows fail:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Create a fresh DB (or use --clean against the existing one)
|
||||||
|
docker exec dereth-db psql -U postgres -c "CREATE DATABASE dereth_restore;"
|
||||||
|
docker exec dereth-db psql -U postgres -d dereth_restore -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
|
||||||
|
|
||||||
|
# 2. Pre-restore mode
|
||||||
|
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_pre_restore();"
|
||||||
|
|
||||||
|
# 3. Restore the dump
|
||||||
|
docker exec -i dereth-db pg_restore -U postgres -d dereth_restore --no-owner < dereth-<stamp>.dump
|
||||||
|
|
||||||
|
# 4. Post-restore mode (re-enables background workers, validates catalog)
|
||||||
|
docker exec dereth-db psql -U postgres -d dereth_restore -c "SELECT timescaledb_post_restore();"
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
- Step 3 reports one ignorable error — the dump's `CREATE EXTENSION
|
||||||
|
timescaledb` collides with the extension pre-created in step 1
|
||||||
|
("already exists", `errors ignored on restore: 1`). That is expected,
|
||||||
|
not a failed restore.
|
||||||
|
- The TimescaleDB **version** at restore time must be the **same** as at
|
||||||
|
dump time (restore first, then `ALTER EXTENSION timescaledb UPDATE` if
|
||||||
|
upgrading). Same-container restores with the image pinned in
|
||||||
|
docker-compose.yml (`timescale/timescaledb:2.19.3-pg14`) are fine.
|
||||||
|
|
||||||
|
Then either point `DATABASE_URL` at the restored DB or rename databases.
|
||||||
|
The `telemetry_events`/`spawn_events` hypertables come back empty (by
|
||||||
|
design); retention/compression policies are part of the dump and reattach.
|
||||||
|
|
||||||
|
## Verifying a backup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pg_restore --list dereth-<stamp>.dump | head # table of contents
|
||||||
|
pg_restore --list dereth-<stamp>.dump | grep -c 'TABLE DATA'
|
||||||
|
```
|
||||||
|
|
||||||
|
A dump that suddenly shrinks dramatically (check `backup.log` sizes) is the
|
||||||
|
canary for silent failure.
|
||||||
|
|
@ -7,6 +7,7 @@ fabricated TelemetrySnapshot payloads at regular intervals. Useful for:
|
||||||
- Demonstrating real-time map updates without a live game client
|
- Demonstrating real-time map updates without a live game client
|
||||||
"""
|
"""
|
||||||
import asyncio # Async event loop and sleep support
|
import asyncio # Async event loop and sleep support
|
||||||
|
import os
|
||||||
import websockets # WebSocket client for Python
|
import websockets # WebSocket client for Python
|
||||||
import json # JSON serialization of payloads
|
import json # JSON serialization of payloads
|
||||||
from datetime import datetime, timedelta, timezone
|
from datetime import datetime, timedelta, timezone
|
||||||
|
|
@ -32,8 +33,10 @@ async def main() -> None:
|
||||||
# Starting coordinates (E/W and N/S)
|
# Starting coordinates (E/W and N/S)
|
||||||
ew = 0.0
|
ew = 0.0
|
||||||
ns = 0.0
|
ns = 0.0
|
||||||
# WebSocket endpoint for plugin telemetry (include secret for auth)
|
# WebSocket endpoint for plugin telemetry. The secret must match the
|
||||||
uri = "ws://localhost:8000/ws/position?secret=your_shared_secret"
|
# backend's SHARED_SECRET env var (no insecure default anymore).
|
||||||
|
secret = os.environ["SHARED_SECRET"]
|
||||||
|
uri = f"ws://localhost:8000/ws/position?secret={secret}"
|
||||||
# Connect to the plugin WebSocket endpoint with authentication
|
# Connect to the plugin WebSocket endpoint with authentication
|
||||||
# Establish WebSocket connection to the server
|
# Establish WebSocket connection to the server
|
||||||
async with websockets.connect(uri) as websocket:
|
async with websockets.connect(uri) as websocket:
|
||||||
|
|
|
||||||
100
main.py
100
main.py
|
|
@ -8,7 +8,9 @@ endpoints for browser clients to retrieve live and historical data, trails, and
|
||||||
|
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
from datetime import datetime, timedelta, timezone
|
from datetime import datetime, timedelta, timezone
|
||||||
|
import hmac
|
||||||
import html as _html
|
import html as _html
|
||||||
|
import ipaddress
|
||||||
import json
|
import json
|
||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
|
|
@ -990,10 +992,25 @@ live_equipment_cantrip_states: Dict[str, dict] = {}
|
||||||
live_nearby_objects: Dict[str, dict] = {}
|
live_nearby_objects: Dict[str, dict] = {}
|
||||||
dungeon_map_cache: Dict[str, dict] = {} # landblock hex string -> dungeon map data
|
dungeon_map_cache: Dict[str, dict] = {} # landblock hex string -> dungeon map data
|
||||||
|
|
||||||
# Shared secret used to authenticate plugin WebSocket connections (override for production)
|
# Shared secret used to authenticate plugin WebSocket connections.
|
||||||
SHARED_SECRET = "your_shared_secret"
|
# MUST come from the environment — this repo is public, so a hardcoded value
|
||||||
# Secret key for signing session cookies (override via SECRET_KEY env var)
|
# is no auth at all. When unset (or left at the old placeholder) we fail
|
||||||
SECRET_KEY = os.getenv("SECRET_KEY", "change-me-in-production-please")
|
# closed: every plugin connection is refused until it is configured.
|
||||||
|
SHARED_SECRET = os.getenv("SHARED_SECRET", "")
|
||||||
|
_SHARED_SECRET_OK = bool(SHARED_SECRET) and SHARED_SECRET != "your_shared_secret"
|
||||||
|
if not _SHARED_SECRET_OK:
|
||||||
|
logger.critical(
|
||||||
|
"SHARED_SECRET env var is unset or still the placeholder — "
|
||||||
|
"refusing ALL plugin WebSocket connections until it is set in .env"
|
||||||
|
)
|
||||||
|
# Secret key for signing session cookies. Fail closed: running with a
|
||||||
|
# publicly-known default would let anyone forge admin sessions.
|
||||||
|
SECRET_KEY = os.getenv("SECRET_KEY", "")
|
||||||
|
if not SECRET_KEY or SECRET_KEY == "change-me-in-production-please":
|
||||||
|
raise RuntimeError(
|
||||||
|
"SECRET_KEY env var must be set to a strong random value — "
|
||||||
|
"session cookies are signed with it"
|
||||||
|
)
|
||||||
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days in seconds
|
SESSION_MAX_AGE = 30 * 24 * 3600 # 30 days in seconds
|
||||||
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
_serializer = URLSafeTimedSerializer(SECRET_KEY)
|
||||||
|
|
||||||
|
|
@ -1024,6 +1041,19 @@ _PUBLIC_PATHS = {"/login", "/logout"}
|
||||||
_PUBLIC_PREFIXES = ("/ws/position",) # Plugin WS uses X-Plugin-Secret
|
_PUBLIC_PREFIXES = ("/ws/position",) # Plugin WS uses X-Plugin-Secret
|
||||||
|
|
||||||
|
|
||||||
|
def _is_private_addr(host: str) -> bool:
|
||||||
|
"""True when `host` is a private/loopback address (RFC1918, 127/8, ::1).
|
||||||
|
|
||||||
|
Used by the internal-trust rule: a private TCP peer WITHOUT an
|
||||||
|
X-Forwarded-For header cannot have come through nginx and therefore
|
||||||
|
cannot originate from the internet.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
return ipaddress.ip_address(host).is_private
|
||||||
|
except ValueError:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
class AuthMiddleware(BaseHTTPMiddleware):
|
class AuthMiddleware(BaseHTTPMiddleware):
|
||||||
"""Redirect unauthenticated requests to /login."""
|
"""Redirect unauthenticated requests to /login."""
|
||||||
|
|
||||||
|
|
@ -1046,20 +1076,20 @@ class AuthMiddleware(BaseHTTPMiddleware):
|
||||||
if path.startswith("/ws/live"):
|
if path.startswith("/ws/live"):
|
||||||
return await call_next(request)
|
return await call_next(request)
|
||||||
|
|
||||||
# Trust internal connections (Docker network gateway + loopback). The
|
# Trust genuinely internal callers only. The tracker port (8765) is
|
||||||
# tracker port (8765) is bound to 127.0.0.1 in docker-compose.yml and
|
# published on 127.0.0.1, so host-side helpers (overlord-agent) and
|
||||||
# only the host or other compose-network containers can reach it.
|
# compose-network containers reach it directly — but so does ALL
|
||||||
# This lets host-side helpers (overlord-agent, discord-rare-monitor,
|
# external browser traffic, via nginx → docker-proxy, which makes it
|
||||||
# etc.) call any endpoint without forging a session cookie.
|
# arrive with a 172.x source IP. Source IP alone therefore proves
|
||||||
#
|
# nothing. The distinguishing signal is X-Forwarded-For: nginx sets
|
||||||
# IMPORTANT: We still try to decode the session cookie if present, so
|
# it on every proxied request, while direct internal calls have no
|
||||||
# that endpoints like /me which check `request.state.user` work for
|
# proxy in front of them and lack the header. A request with a
|
||||||
# real authenticated browsers proxied through nginx → docker-proxy
|
# private source AND no X-Forwarded-For cannot have come through
|
||||||
# (which makes them look like they're coming from 172.x). Without
|
# nginx, i.e. cannot originate from the internet.
|
||||||
# this, /me returned 401 even for logged-in users, silently
|
|
||||||
# disabling the admin-only UI on the dashboard.
|
|
||||||
client_host = request.client.host if request.client else ""
|
client_host = request.client.host if request.client else ""
|
||||||
if client_host.startswith("172.") or client_host in ("127.0.0.1", "::1", "localhost"):
|
if _is_private_addr(client_host) and "x-forwarded-for" not in request.headers:
|
||||||
|
# Still decode the cookie if present so request.state.user works
|
||||||
|
# for internal tools that do log in.
|
||||||
token = request.cookies.get("session")
|
token = request.cookies.get("session")
|
||||||
if token:
|
if token:
|
||||||
user = verify_session_cookie(token)
|
user = verify_session_cookie(token)
|
||||||
|
|
@ -2945,9 +2975,13 @@ async def ws_receive_snapshots(
|
||||||
"""
|
"""
|
||||||
global _plugin_connections
|
global _plugin_connections
|
||||||
|
|
||||||
# Authenticate plugin connection using shared secret
|
# Authenticate plugin connection using shared secret (constant-time
|
||||||
key = secret or x_plugin_secret
|
# compare; refuse everything when the secret is not configured).
|
||||||
if key != SHARED_SECRET:
|
key = secret or x_plugin_secret or ""
|
||||||
|
# compare bytes: compare_digest(str, str) raises TypeError on non-ASCII
|
||||||
|
if not _SHARED_SECRET_OK or not hmac.compare_digest(
|
||||||
|
key.encode("utf-8", "replace"), SHARED_SECRET.encode("utf-8")
|
||||||
|
):
|
||||||
# Reject without completing the WebSocket handshake
|
# Reject without completing the WebSocket handshake
|
||||||
logger.warning(
|
logger.warning(
|
||||||
f"Plugin WebSocket authentication failed from {websocket.client}"
|
f"Plugin WebSocket authentication failed from {websocket.client}"
|
||||||
|
|
@ -3693,11 +3727,16 @@ async def ws_live_updates(websocket: WebSocket):
|
||||||
Manages a set of connected browser clients; listens for incoming command messages
|
Manages a set of connected browser clients; listens for incoming command messages
|
||||||
and forwards them to the appropriate plugin client WebSocket.
|
and forwards them to the appropriate plugin client WebSocket.
|
||||||
"""
|
"""
|
||||||
# Require valid session cookie for browser WebSocket.
|
# Require a valid session cookie for browser WebSockets. Internal
|
||||||
# Internal Docker network connections (172.x.x.x) are trusted — this allows
|
# services (discord-rare-monitor connects over the compose network) are
|
||||||
# the Discord bot and other internal services to connect without a cookie.
|
# identified by a private source IP WITHOUT an X-Forwarded-For header —
|
||||||
|
# nginx-proxied browser traffic always carries X-Forwarded-For, so an
|
||||||
|
# internet client can never satisfy this check (same rule as
|
||||||
|
# AuthMiddleware; see comment there).
|
||||||
client_host = websocket.client.host if websocket.client else ""
|
client_host = websocket.client.host if websocket.client else ""
|
||||||
is_internal = client_host.startswith("172.") or client_host in ("127.0.0.1", "::1", "localhost")
|
is_internal = (
|
||||||
|
_is_private_addr(client_host) and "x-forwarded-for" not in websocket.headers
|
||||||
|
)
|
||||||
if not is_internal:
|
if not is_internal:
|
||||||
token = websocket.cookies.get("session")
|
token = websocket.cookies.get("session")
|
||||||
if not token or not verify_session_cookie(token):
|
if not token or not verify_session_cookie(token):
|
||||||
|
|
@ -3865,15 +3904,18 @@ async def get_stats(character_name: str):
|
||||||
|
|
||||||
|
|
||||||
@app.post("/character-stats/test")
|
@app.post("/character-stats/test")
|
||||||
async def test_character_stats_default():
|
async def test_character_stats_default(request: Request):
|
||||||
"""Inject mock character_stats data for frontend development."""
|
"""Inject mock character_stats data for frontend development (admin only)."""
|
||||||
return await test_character_stats("TestCharacter")
|
_require_admin(request)
|
||||||
|
return await test_character_stats("TestCharacter", request)
|
||||||
|
|
||||||
|
|
||||||
@app.post("/character-stats/test/{name}")
|
@app.post("/character-stats/test/{name}")
|
||||||
async def test_character_stats(name: str):
|
async def test_character_stats(name: str, request: Request):
|
||||||
"""Inject mock character_stats data for a specific character name.
|
"""Inject mock character_stats data for a specific character name.
|
||||||
Processes through the same pipeline as real plugin data."""
|
Processes through the same pipeline as real plugin data — it OVERWRITES
|
||||||
|
the real character_stats row for {name}, hence admin-only."""
|
||||||
|
_require_admin(request)
|
||||||
mock_data = {
|
mock_data = {
|
||||||
"type": "character_stats",
|
"type": "character_stats",
|
||||||
"timestamp": datetime.utcnow().isoformat() + "Z",
|
"timestamp": datetime.utcnow().isoformat() + "Z",
|
||||||
|
|
|
||||||
|
|
@ -14,6 +14,12 @@
|
||||||
# WebSockets are long-lived; nginx's default 60s timeout drops idle clients.
|
# WebSockets are long-lived; nginx's default 60s timeout drops idle clients.
|
||||||
# Removing these timeouts caused all plugin connections to drop every
|
# Removing these timeouts caused all plugin connections to drop every
|
||||||
# ~60s when no data flowed from backend to client (April 2026 incident).
|
# ~60s when no data flowed from backend to client (April 2026 incident).
|
||||||
|
# - SECURITY INVARIANT: every location that proxies to the `tracker`
|
||||||
|
# upstream MUST set proxy_set_header X-Forwarded-For. The backend treats
|
||||||
|
# a private-source request WITHOUT that header as internal (host/compose
|
||||||
|
# callers) and skips session auth — a tracker-bound location that forgot
|
||||||
|
# the header would silently bypass login for the whole internet. This
|
||||||
|
# includes any future port-80 or alternate server block.
|
||||||
# - /grafana/ panel embeds rely on Grafana's anonymous Viewer auth
|
# - /grafana/ panel embeds rely on Grafana's anonymous Viewer auth
|
||||||
# (GF_AUTH_ANONYMOUS_ENABLED=true in docker-compose.yml) — no credentials
|
# (GF_AUTH_ANONYMOUS_ENABLED=true in docker-compose.yml) — no credentials
|
||||||
# in this file. Do NOT hardcode tokens here: this file is committed to a
|
# in this file. Do NOT hardcode tokens here: this file is committed to a
|
||||||
|
|
|
||||||
53
scripts/backup-databases.sh
Executable file
53
scripts/backup-databases.sh
Executable file
|
|
@ -0,0 +1,53 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# Nightly logical backups for both MosswartOverlord databases.
|
||||||
|
# Install as a cron job on the live host (see docs/backups.md). Note `bash`
|
||||||
|
# in the cron line (survives a lost executable bit) and that /home/erik/backups
|
||||||
|
# must exist BEFORE the first run (cron sets up the >> redirection before this
|
||||||
|
# script's mkdir runs):
|
||||||
|
# 15 3 * * * bash /home/erik/MosswartOverlord/scripts/backup-databases.sh >> /home/erik/backups/backup.log 2>&1
|
||||||
|
#
|
||||||
|
# What is backed up:
|
||||||
|
# - dereth (TimescaleDB): full schema + all data EXCEPT the raw
|
||||||
|
# telemetry_events/spawn_events hypertable chunks. Those tables hold
|
||||||
|
# ~12 GB of data that expires via retention policies in 7-30 days
|
||||||
|
# anyway; the irreplaceable rows (users, char_stats, rare_stats,
|
||||||
|
# rare_events, combat_stats*, portals, character_stats, server_status)
|
||||||
|
# are all included.
|
||||||
|
# - inventory_db (postgres): full dump (~1 GB raw, much smaller compressed).
|
||||||
|
#
|
||||||
|
# Restore procedure: docs/backups.md (TimescaleDB needs pre/post restore calls).
|
||||||
|
set -euo pipefail
|
||||||
|
# Dumps contain the users table (bcrypt hashes) — keep them owner-only.
|
||||||
|
umask 077
|
||||||
|
|
||||||
|
BACKUP_DIR="${BACKUP_DIR:-/home/erik/backups/postgres}"
|
||||||
|
KEEP_DAYS="${KEEP_DAYS:-7}"
|
||||||
|
STAMP="$(date -u +%Y%m%d-%H%M)"
|
||||||
|
mkdir -p "$BACKUP_DIR"
|
||||||
|
|
||||||
|
# dereth: -Fc is compressed; exclude hypertable chunk DATA (schema kept so a
|
||||||
|
# restore recreates the tables empty and retention/compression jobs reattach).
|
||||||
|
docker exec dereth-db pg_dump -U postgres -Fc \
|
||||||
|
--exclude-table-data='public.telemetry_events' \
|
||||||
|
--exclude-table-data='public.spawn_events' \
|
||||||
|
--exclude-table-data='_timescaledb_internal._hyper_*' \
|
||||||
|
dereth > "$BACKUP_DIR/dereth-$STAMP.dump.tmp"
|
||||||
|
# Canary: a healthy dereth dump is ~50 MB; a tiny one means pg_dump silently
|
||||||
|
# produced garbage (fail the run so the old dumps are kept and cron logs it).
|
||||||
|
if [ "$(stat -c%s "$BACKUP_DIR/dereth-$STAMP.dump.tmp")" -lt 10000000 ]; then
|
||||||
|
echo "$(date -u +%FT%TZ) FAIL dereth dump under 10MB — keeping old backups" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
mv "$BACKUP_DIR/dereth-$STAMP.dump.tmp" "$BACKUP_DIR/dereth-$STAMP.dump"
|
||||||
|
|
||||||
|
docker exec inventory-db pg_dump -U inventory_user -Fc inventory_db \
|
||||||
|
> "$BACKUP_DIR/inventory-$STAMP.dump.tmp"
|
||||||
|
mv "$BACKUP_DIR/inventory-$STAMP.dump.tmp" "$BACKUP_DIR/inventory-$STAMP.dump"
|
||||||
|
|
||||||
|
# Retention: keep KEEP_DAYS days of dailies.
|
||||||
|
find "$BACKUP_DIR" -name 'dereth-*.dump' -mtime +"$KEEP_DAYS" -delete
|
||||||
|
find "$BACKUP_DIR" -name 'inventory-*.dump' -mtime +"$KEEP_DAYS" -delete
|
||||||
|
# Clean up aborted runs older than a day.
|
||||||
|
find "$BACKUP_DIR" -name '*.dump.tmp' -mtime +1 -delete
|
||||||
|
|
||||||
|
echo "$(date -u +%FT%TZ) OK dereth=$(du -h "$BACKUP_DIR/dereth-$STAMP.dump" | cut -f1) inventory=$(du -h "$BACKUP_DIR/inventory-$STAMP.dump" | cut -f1)"
|
||||||
Loading…
Add table
Add a link
Reference in a new issue