Skip to content

fix(sessions): delete on-disk transcript files during prune and delete#6613

Closed
SeeYangZhi wants to merge 1 commit into
NousResearch:mainfrom
SeeYangZhi:fix/prune-session-json-files
Closed

fix(sessions): delete on-disk transcript files during prune and delete#6613
SeeYangZhi wants to merge 1 commit into
NousResearch:mainfrom
SeeYangZhi:fix/prune-session-json-files

Conversation

@SeeYangZhi

Copy link
Copy Markdown
Contributor

Fixes #3015

Problem

delete_session() and prune_sessions() only remove SQLite records, leaving .json/.jsonl transcript files in ~/.hermes/sessions/ forever. session_search reads from SQLite FTS5 (state.db), not from these files, so they are dead weight after session end.

Observed: 340 files, 82MB in 3 days (~27MB/day, ~10GB/year).

Changes

hermes_state.py:

  • Add _remove_session_files() static helper — cleans up {session_id}.json, .jsonl, and request_dump_{session_id}_*.json
  • delete_session(sessions_dir=) — removes transcript files for the session and its children
  • prune_sessions(sessions_dir=) — removes transcript files for all pruned sessions after the DB transaction

hermes_cli/main.py:

  • hermes sessions delete now passes sessions_dir
  • hermes sessions prune now passes sessions_dir

Design Decisions

  • Optional sessions_dir param (default None) — fully backward-compatible. Callers that don't pass it get existing behavior (SQLite-only cleanup). This avoids breaking any internal callers.
  • File cleanup is best-effortOSError silenced so DB operations are never blocked by filesystem issues
  • Cleanup happens outside the DB transaction for prune_sessions() — session IDs are collected during the transaction, files deleted after commit
  • Handles child sessions — file cleanup includes children deleted via FK constraint cascade

What This Does NOT Change

  • No automatic retention policy (that's a separate feature)
  • No changes to gateway session persistence (files are still written during active sessions)
  • No changes to /resume behavior (resumed sessions keep their files)

NousResearch#3015)

`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).

Changes:
- Add `_remove_session_files()` static helper that cleans up
  `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
  files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
  files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
  pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
  never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
  existing behavior
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #16286 — your commit was cherry-picked onto current main with authorship preserved in git log. We extended it to ride on #13861's existing sessions.auto_prune opt-in so file cleanup is automatic for users who've already enabled auto-maintenance (no new knob). Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session File Leak: JSON sessions never deleted, causing unbounded disk growth and cost explosion

2 participants