Skip to content

fix(sessions): delete transcript files during auto-prune (salvage #6613, closes #3015)#16286

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-5de18fdc
Apr 27, 2026
Merged

fix(sessions): delete transcript files during auto-prune (salvage #6613, closes #3015)#16286
teknium1 merged 2 commits into
mainfrom
hermes/hermes-5de18fdc

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Rides on #13861's existing sessions.auto_prune opt-in to also delete on-disk transcript files — not just SQLite rows — so disk growth from accumulated session JSONs is bounded by the same retention_days / min_interval_hours knobs users already set. Closes #3015.

Salvage of #6613 by @SeeYangZhi, cherry-picked onto current main with conflict resolution.

Why this over #8685

@SeeYangZhi's #6613 is the surgical version of the same feature: extends delete_session() / prune_sessions() with an optional sessions_dir= param and best-effort file cleanup. #13861 (already merged) built the auto-maintenance helper and per-user config block; extending that helper to take sessions_dir makes file cleanup automatic for users who already opted into auto_prune, with zero new knobs. #8685 (pre-#13861) would duplicate the schedule machinery with a parallel daily watcher loop and new CLI flags — useful but separate scope (especially ~/.hermes/checkpoints/, the actually-largest offender at ~12 GB, which should be its own follow-up PR).

Conflict resolution notes

  • Preserved main's "orphan children on prune/delete" semantics (docstring on delete_session: "Child sessions remain accessible independently"). fix(sessions): delete on-disk transcript files during prune and delete #6613 had switched to cascade-delete children, which is a design regression — dropped from the salvage.
  • Extended maybe_auto_prune_and_vacuum(sessions_dir=...) to pass through to prune_sessions(), then wired sessions_dir through from both startup call sites (cli.py uses get_hermes_home() / "sessions", gateway/run.py uses self.config.sessions_dir).

Changes

  • hermes_state.py: new _remove_session_files() static helper; delete_session() and prune_sessions() accept sessions_dir: Optional[Path] = None; maybe_auto_prune_and_vacuum() accepts sessions_dir and passes it through.
  • hermes_cli/main.py: hermes sessions delete and hermes sessions prune now pass sessions_dir=get_hermes_home() / "sessions".
  • cli.py, gateway/run.py: auto-maintenance call sites pass sessions_dir.
  • tests/test_hermes_state.py: 3 new TestAutoMaintenance tests — file deletion with sessions_dir, backward-compat without it, active-session files never touched.
  • tests/hermes_cli/test_sessions_delete.py: FakeDB.delete_session helpers accept **kwargs.

Validation

Before After
Auto-prune behavior DB rows removed, .json / .jsonl / request_dump_* files remain on disk forever All pruned-session transcripts deleted; active sessions never touched
Opt-in config sessions.auto_prune already gated by #13861 Same gate now covers file cleanup too (no new knob)
Backward compat sessions_dir=None preserves legacy DB-only behavior
Targeted tests TestAutoMaintenance: 10/10 (7 existing + 3 new); test_sessions_delete: 4/4
Broader suite tests/test_hermes_state.py + tests/hermes_cli/: 3109/3110 (1 pre-existing web_server failure unrelated)
E2E with real imports + isolated HERMES_HOME 5 transcript files for 2 pruned sessions deleted, active-session transcript survives, second call within interval skips cleanly

Closes #3015. Credit to @SeeYangZhi for the original #6613 design; @teknium1's #8685 covers the broader ~/.hermes/checkpoints/ cleanup story which is out of scope here.

SeeYangZhi and others added 2 commits April 26, 2026 18:27
#3015)

`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).

Changes:
- Add `_remove_session_files()` static helper that cleans up
  `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
  files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
  files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
  pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
  never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
  existing behavior
…egression tests

- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
  when sessions_dir is passed, preserves them when it isn't (backward-
  compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
  don't break when delete_session signature gains sessions_dir.
@teknium1 teknium1 merged commit cd2aee3 into main Apr 27, 2026
11 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-5de18fdc branch April 27, 2026 01:31
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery labels Apr 27, 2026
teknium1 added a commit that referenced this pull request Apr 27, 2026
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
teknium1 added a commit that referenced this pull request Apr 27, 2026
#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
NousResearch#16303)

Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from NousResearch#13861 / NousResearch#16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs NousResearch#3015 (session-file disk growth was fixed in NousResearch#16286; this covers
the checkpoint side noted out-of-scope there).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Session File Leak: JSON sessions never deleted, causing unbounded disk growth and cost explosion

3 participants