Skip to content

Volume Backup & Restore Procedure #2

@jsboige

Description

@jsboige

Problem

On 2026-05-14, the Hermes container was rebuilt (upstream sync) without taking a volume snapshot first. The original Docker volume was replaced, causing total loss of bot memory: sessions, cron jobs, persona, learned protocols.

This must never happen again. Hermes needs — and NanoClaw will need — a formal backup & restore procedure for Docker volumes.

Requirements

Pre-Rebuild (Mandatory Before Any docker compose build / Container Recreation)

  1. Snapshot the Docker volume to a tarball on the host:

    docker run --rm \
      -v <volume-name>:/data:ro \
      -v /backup/hermes:/backup \
      alpine tar czf /backup/hermes-pre-rebuild-$(date +%Y%m%d-%H%M%S).tar.gz -C / data
  2. Verify the snapshot (file exists, non-zero size, readable)

  3. Export state.db separately as a quick-restore point:

    docker exec hermes python3 -c "
      import shutil
      shutil.copy('/opt/data/state.db', '/opt/data/state.db.pre-rebuild')
    "
  4. Only then: proceed with rebuild

Post-Rebuild (Restore Procedure)

  1. Restore configs from roosync-cluster/config/ (repo copy):

    • config.yaml → fix model/provider/base_url/compression
    • .env.template → restore Telegram allowlists
  2. Run hermes-restore-config.sh (handles base_url contamination, model fix, ownership)

  3. Restore state.db from snapshot if session history is needed:

    docker run --rm \
      -v <volume-name>:/data \
      -v /backup/hermes:/backup \
      alpine sh -c "cd /data && tar xzf /backup/hermes-pre-rebuild-YYYYMMDD-HHMMSS.tar.gz --strip-components=1"
  4. Verify: model config, Telegram connectivity, MCP bridges, cron jobs

Automation

  • Pre-rebuild snapshot script: roosync-cluster/scripts/hermes-backup.sh
  • Post-rebuild restore script: roosync-cluster/scripts/hermes-restore-config.sh (exists, needs base_url check added)
  • Retention policy: keep last 3 snapshots, prune older ones
  • NanoClaw adaptation: same pattern for nanoclaw container volumes

Safety Checklist (Human Gate)

Before any destructive Docker operation on ANY bot container:

  • Volume snapshot taken and verified
  • Snapshot filename noted in session / dashboard
  • Config repo copy is up to date (roosync-cluster/config/)
  • .env.secrets is current

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions