MosswartOverlord/agent/overlord-agent.service
Erik a28b61511c security: enforce real plugin secret, fix proxy auth bypass, loopback DB ports, nightly backups
- SHARED_SECRET now read from env and fail-closed: unset/placeholder refuses
  ALL plugin connections (constant-time compare). The old hardcoded
  'your_shared_secret' in this public repo was no auth at all. Dockerfile
  default removed; generate_data.py reads the env var.
- SECRET_KEY fails closed at startup (main.py and agent/auth.py) instead of
  falling back to a publicly-known signing key; agent systemd unit now
  requires /etc/overlord/agent.env (no '-' prefix).
- AuthMiddleware + /ws/live: replace the 172.x source-IP trust (which every
  nginx-proxied internet request satisfied via docker-proxy — full session
  bypass and unauthenticated in-game command injection) with
  private-source AND no X-Forwarded-For, i.e. only genuinely internal
  callers (overlord-agent on the host, compose-network services). Invariant
  documented in nginx/overlord.conf: every tracker-bound location must set
  X-Forwarded-For.
- /character-stats/test endpoints gated behind admin (they upsert real rows).
- docker-compose: bind 5432/5433 to 127.0.0.1 (both DBs were internet-
  reachable; active brute-force observed in dereth-db logs).
- discord-rare-monitor: drop dead SHARED_SECRET constant.
- scripts/backup-databases.sh + docs/backups.md: nightly pg_dump of both DBs
  (telemetry/spawn hypertable data excluded), 10MB canary, umask 077,
  TimescaleDB restore procedure.
- Remove stray mangled-path css file from repo root.

Adversarially reviewed pre-deploy (3-lens workflow): ship verdict; deploy-
sequencing blockers addressed (secret staged before enforcement, exec bit
set, cron uses bash).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:02:47 +02:00

115 lines
5 KiB
Desktop File

[Unit]
Description=Overlord Agent (Claude Code shell-out service)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
# Dedicated unprivileged user — kernel-level isolation from `erik`.
# overlord-agent has NO access to /home/erik/.claude (mode 0700),
# /home/erik/.ssh, /home/erik/.bash_history, /home/erik/.gitconfig, etc.
# Its own claude state lives at /var/lib/overlord-agent/.claude/ and its
# claude session JSONLs land there — completely separate from any
# interactive Claude Code use by the human user.
User=overlord-agent
Group=overlord-agent
# Working directory: the repo root (group-readable to overlord-agent).
# claude session JSONLs path-encode this cwd so it's important to keep
# stable across restarts.
WorkingDirectory=/home/erik/MosswartOverlord
# HOME explicitly set so claude reads /var/lib/overlord-agent/.claude/*
# instead of trying /home/erik/.claude/* (which is now 0700, locked out).
Environment="HOME=/var/lib/overlord-agent"
# Secrets file (root:overlord-agent 0640). REQUIRED (no leading '-'):
# a missing secrets file must abort startup, not fail open — auth.py also
# refuses to start without SECRET_KEY.
EnvironmentFile=/etc/overlord/agent.env
# Run inside the venv populated by install.sh.
ExecStart=/home/erik/MosswartOverlord/agent/.venv/bin/python -m agent.service
Restart=on-failure
RestartSec=3
StandardOutput=journal
StandardError=journal
# ─── Resource caps ─────────────────────────────────────────────────
MemoryMax=512M
CPUQuota=200%
TasksMax=128
# ─── Filesystem hardening ──────────────────────────────────────────
# /usr, /boot, /efi become read-only; /etc + /var get a writable overlay
# that's discarded on stop. Subprocesses inherit these protections.
ProtectSystem=strict
ProtectHome=read-only
# Allow writing only to the explicit paths claude / our service need.
# - ~/.claude — session JSONL files
# - .venv pycache — minor pip cache writes
ReadWritePaths=/var/lib/overlord-agent/.claude
ReadWritePaths=/home/erik/MosswartOverlord/agent/.venv
ReadWritePaths=/var/log/overlord-agent
# StateDirectory creates/owns /var/lib/overlord-agent automatically.
StateDirectory=overlord-agent
LogsDirectory=overlord-agent
LogsDirectoryMode=0755
PrivateTmp=true
PrivateDevices=true
ProtectClock=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectKernelLogs=true
ProtectControlGroups=true
ProtectHostname=true
ProtectProc=invisible
ProcSubset=pid
# Hide sensitive host paths even if something in the python or claude
# subprocess tree tries to read them.
InaccessiblePaths=/etc/shadow
InaccessiblePaths=/etc/gshadow
InaccessiblePaths=/etc/ssh
InaccessiblePaths=/root
InaccessiblePaths=-/home/erik/.ssh
InaccessiblePaths=-/home/erik/.bash_history
InaccessiblePaths=-/home/erik/.zsh_history
# ─── Privilege & capability hardening ──────────────────────────────
NoNewPrivileges=true
CapabilityBoundingSet=
AmbientCapabilities=
LockPersonality=true
RestrictRealtime=true
RestrictSUIDSGID=true
RemoveIPC=true
# MemoryDenyWriteExecute would break Node.js (V8 JIT requires W^X
# transitions via mprotect with PROT_EXEC on JITted code pages). Claude
# Code is a Node app, so omit this. Without JIT we'd lose all model
# performance. The other restrictions still prevent shellcode injection
# in practice (no Bash/Write tools, no shellcraft surface).
# MemoryDenyWriteExecute=true ← DO NOT enable; breaks Node V8 JIT
RestrictNamespaces=true
# ─── Network family restriction ────────────────────────────────────
# Block raw/packet sockets so even a kernel-LPE-class bug can't sniff
# traffic or forge packets. We don't IPAddressAllow-restrict because
# Anthropic's Cloudflare IPs shift and the whitelist would break claude.
# If you need true egress filtering, run nftables scoped to this
# service's cgroup — that's reliable in a way IPAddressAllow isn't.
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
# ─── Syscall filter ────────────────────────────────────────────────
# Use the standard @system-service preset which is what almost every
# hardened systemd unit uses. It already excludes the dangerous groups
# (privileged, mount, reboot, raw-io, etc.) by NOT including them, while
# being broad enough to host typical apps including Node.js.
#
# We tried adding extra "~@..." negations on top — they killed Claude
# (Node) with SIGSYS during startup. The default @system-service preset
# is the right balance; the rest of the hardening covers what we need.
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged
SystemCallFilter=~@reboot
SystemCallFilter=~@mount
[Install]
WantedBy=multi-user.target