Skip to content

feat: add session artifact disk cleanup (port from NanoClaw #1632)#8685

Closed
teknium1 wants to merge 1 commit into
mainfrom
nanoclaw-port/session-artifact-cleanup
Closed

feat: add session artifact disk cleanup (port from NanoClaw #1632)#8685
teknium1 wants to merge 1 commit into
mainfrom
nanoclaw-port/session-artifact-cleanup

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Ports the session artifact auto-pruning concept from qwibitai/nanoclaw#1632, adapted to hermes-agent's Python architecture.

Problem

hermes-agent accumulates disk artifacts that are never automatically cleaned up:

  • Session transcript JSON files (session_*.json) — ~5,000 files, ~2 GB on a typical install
  • API request debug dumps (request_dump_*.json) — ~180 files, ~32 MB
  • Filesystem checkpoint shadow repos (~/.hermes/checkpoints/) — ~1,100 dirs, ~12 GB
  • Gateway JSONL transcript files (*.jsonl)

The existing hermes sessions prune command only deletes DB rows, leaving all disk files behind forever.

What Changed

  1. New tools/session_cleanup.py module — Core cleanup logic with safety guards:

    • Only deletes files older than configurable retention period (default: 30 days for sessions, 14 days for checkpoints)
    • Never deletes files belonging to active (non-ended) sessions
    • Never touches sessions.json state file
    • Dry-run mode for previewing what would be deleted
  2. Enhanced hermes sessions prune — Three new flags:

    • --include-files: Also clean disk artifacts alongside DB pruning
    • --files-only: Only clean disk files, skip DB pruning
    • --dry-run: Preview what would be deleted without actually deleting
  3. Enhanced hermes sessions stats — Now shows disk artifact counts and sizes, with a tip about cleanup

  4. Automated daily cleanup in gateway — The existing _session_expiry_watcher now runs disk artifact cleanup once every 24 hours, reclaiming space automatically for long-running gateway deployments

  5. 28 new tests covering all cleanup paths, safety guards, active session protection, and edge cases

Architectural Differences from NanoClaw

NanoClaw uses a bash script + Node.js timer. Our implementation is pure Python:

  • prune_session_files() handles session transcripts and request dumps with active-session-ID protection
  • prune_checkpoints() handles checkpoint shadow repos with age-based deletion
  • prune_all_artifacts() orchestrates both
  • Gateway integration runs in the existing _session_expiry_watcher async loop via run_in_executor

Test Plan

# Run the 28 new tests
python -m pytest tests/tools/test_session_cleanup.py -n0 -q

# Verify existing session tests still pass
python -m pytest tests/hermes_cli/test_sessions_delete.py -n0 -q

# Manual: hermes sessions stats (see disk artifact counts)
# Manual: hermes sessions prune --dry-run --include-files (preview)
# Manual: hermes sessions prune --include-files -y (actual cleanup)

Port from nanocoai/nanoclaw#1632: Auto-prune stale session artifacts.

hermes-agent accumulates disk artifacts that are never cleaned up:
- Session transcript JSON files (~2 GB on a typical install)
- API request debug dumps
- Filesystem checkpoint shadow repos (~12 GB)
- Gateway JSONL transcript files

The existing 'hermes sessions prune' only deletes DB rows, leaving all
disk files behind.

Changes:
- New tools/session_cleanup.py module with safe, active-session-aware
  disk artifact pruning (session files, request dumps, checkpoints)
- Enhanced 'hermes sessions prune' with --include-files, --files-only,
  and --dry-run flags for disk artifact cleanup
- Enhanced 'hermes sessions stats' to show disk artifact counts and sizes
- Automated daily cleanup in gateway's session expiry watcher
- 28 new tests covering all cleanup paths, safety guards, and edge cases

Safety:
- Never deletes files belonging to active (non-ended) sessions
- Never touches sessions.json state file
- Checkpoints use age-based deletion only (no session ID correlation)
- Dry-run mode available for preview before deletion
- All errors are caught and logged, never crash the gateway
@teknium1

Copy link
Copy Markdown
Contributor Author

Superseded by #16286 (salvaged @SeeYangZhi's #6613, wired into #13861's auto_prune helper). The NanoClaw-style daily watcher loop is out of scope now that we ride on the existing opt-in. The ~/.hermes/checkpoints/ cleanup portion of this PR (~12 GB, the actually-largest offender) remains an open story worth a dedicated follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant