Skip to content

fix(state): register atexit WAL checkpoint to prevent unbounded WAL growth#16510

Open
chinadbo wants to merge 1 commit into
NousResearch:mainfrom
chinadbo:fix/b4-wal-checkpoint-on-exit
Open

fix(state): register atexit WAL checkpoint to prevent unbounded WAL growth#16510
chinadbo wants to merge 1 commit into
NousResearch:mainfrom
chinadbo:fix/b4-wal-checkpoint-on-exit

Conversation

@chinadbo

Copy link
Copy Markdown
Contributor

Summary

  • SessionDB.close() existed but was never called — SQLite WAL files grew unboundedly on long-running gateway processes (field reports: 100+ MB WAL files degrading read performance)
  • Fix: register atexit.register(self.close) in SessionDB.__init__() so the WAL is always checkpointed on process exit
  • Add __enter__/__exit__ context manager support to SessionDB
  • Add explicit close() call in CLI atexit cleanup path
  • Improve close() error handling: log a warning for unexpected errors instead of silently swallowing all exceptions

Test plan

  • tests/test_hermes_state.py — 6 new tests in TestWALCheckpointOnClose including PRAGMA return value verification
  • 189 state tests pass, 0 failures

🤖 Generated with Claude Code

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026
…rowth

SessionDB.close() was never called in practice, causing SQLite WAL files to
grow unboundedly on long-running gateway and CLI processes (field reports of
100+ MB WAL files).

- Add `atexit.register(self.close)` in SessionDB.__init__ so the WAL
  checkpoint always runs on process exit, even without an explicit close()
- Add __enter__/__exit__ so SessionDB can be used as a context manager
- Fix the silent `except Exception: pass` in close() to log a warning for
  non-ENOENT errors, aiding operator diagnostics
- Add explicit close() call in the CLI atexit cleanup path as a belt-and-
  suspenders measure alongside the atexit registration
- Add TestWALCheckpointOnClose test class (6 tests) covering: WAL file is
  empty/absent after close, idempotent close, context manager closes on
  normal and exceptional exit, atexit registration present in source, and
  PRAGMA wal_checkpoint return value verification

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants