Skip to content

feat: add conversation prune function for data retention#285

Merged
jalehman merged 5 commits into
Martian-Engineering:mainfrom
mvanhorn:feat/209-prune-command
Apr 6, 2026
Merged

feat: add conversation prune function for data retention#285
jalehman merged 5 commits into
Martian-Engineering:mainfrom
mvanhorn:feat/209-prune-command

Conversation

@mvanhorn

@mvanhorn mvanhorn commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a pruneConversations function for bulk conversation lifecycle management. Conversations where all messages are older than a configurable threshold can be identified (dry-run) and deleted, with cascading cleanup of messages, summaries, and other dependent data via ON DELETE CASCADE.

Changes

  • src/prune.ts - Core prune logic with parseDuration (supports "90d", "3m", "1y" etc.) and pruneConversations function. Accepts --before duration, dry-run by default, --confirm to delete, optional --vacuum.
  • test/prune.test.ts - 13 tests covering duration parsing, dry-run candidate identification, confirmed deletion with cascade verification, empty conversations, VACUUM, and invalid input.
  • .changeset/warm-clouds-prune.md - Minor changeset for versioning.

Design decisions

  • Exported as a standalone function (not wired into the /lossless CLI command) so lcm-tui and other consumers can call it programmatically.
  • Dry-run by default for safety. Callers must explicitly pass confirm: true to delete.
  • Uses the latest message created_at as the age signal. Conversations with zero messages fall back to conversations.created_at.
  • VACUUM is opt-in since it can be slow on large databases.

Testing

All 574 tests pass (35 test files), including 13 new prune-specific tests.

Fixes #209

This contribution was developed with AI assistance (Claude Code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jalehman added 4 commits April 6, 2026 12:45
Use SQLite date math for prune candidate selection so mixed timestamp formats compare chronologically instead of lexically. Wrap confirm-mode candidate selection and deletion in one IMMEDIATE transaction to avoid deleting conversations that become fresh during the prune run.

Add a regression test covering SQLite-formatted timestamps on the cutoff boundary.

Regeneration-Prompt: |
  The prune helper added in PR 285 had two review findings to address before it is safe to use against a live LCM database. First, the candidate query compared message timestamps as raw TEXT against an ISO cutoff string. This repo stores some timestamps via SQLite datetime('now') and others via JavaScript toISOString(), so lexical comparison can prune same-day rows that are actually newer than the cutoff. Change the filter to use SQLite julianday(...) and add a regression test that seeds a SQLite-format timestamp newer than the cutoff but lexically smaller than the ISO string.

  Second, confirm-mode pruning selected candidates and then deleted them row by row outside a transaction. Tighten that by running candidate selection and deletion inside BEGIN IMMEDIATE so the prune sees one consistent snapshot and does not remove conversations that received a fresh message mid-run. Keep dry-run behavior unchanged and preserve the existing optional VACUUM behavior.
Delete summary lineage, context items, and FTS rows ahead of conversation deletion so prune works against the current schema's RESTRICT edges. Add a regression test that prunes a conversation containing summary_messages and context_items.

Regeneration-Prompt: |
  Running the prune helper against the live LCM database exposed a schema-level failure that the existing tests missed. Deleting a conversation directly did not work because several child tables mix CASCADE links from conversations with RESTRICT links back to messages and summaries. Reproduce that case with a test conversation that has a message, a linked summary, summary_messages lineage, and a context_items row. Then change prune so confirm-mode deletes the dependent rows in a safe order before deleting the conversation, and also clear any optional FTS rows tied to the pruned messages and summaries so search indexes do not retain orphaned entries.
Chunk confirmed pruning into bounded transactions so large live databases can be cleaned incrementally without one giant write lock. Delete cross-conversation context rows that reference pruned summaries or messages, and add supporting indexes plus regression coverage for batch mode and retained-context cleanup.

Regeneration-Prompt: |
  The prune helper already handled mixed timestamp formats and dependent summary/message cleanup, but it still did not work reliably on a large live LCM database. Update it so confirm-mode pruning runs in small committed batches instead of one giant transaction. Add options to control batch size and an optional max batch count for bounded runs. Preserve dry-run behavior.

  While testing against a large live database, pruning exposed an additional FK case: retained conversations can keep context_items rows that reference summaries being pruned from another conversation. Extend the delete path to remove context_items rows by referenced candidate message_id and summary_id, not just by candidate conversation_id. Keep the existing summary_messages and summary_parents cleanup.

  Add regression tests for multi-batch pruning, bounded batch runs, and the cross-conversation context_items case. Also add the missing indexes needed for live-scale deletes on summary_messages(message_id) and summary_parents(parent_summary_id).
Follow VACUUM with wal_checkpoint(TRUNCATE) so operator-triggered prune runs reclaim disk space immediately in WAL mode instead of leaving the rewritten pages stranded in lcm.db-wal. Add a regression test that verifies the WAL is drained after a vacuumed prune.

Regeneration-Prompt: |
  The prune helper already supports an optional vacuum pass after confirmed deletion, but in WAL mode that still leaves reclaimed pages sitting in the WAL file until a checkpoint happens. Update the vacuum path so a prune with vacuum enabled also runs PRAGMA wal_checkpoint(TRUNCATE) immediately afterward. Keep the existing API shape.

  Add a focused regression test in prune.test.ts that proves the WAL is drained after a vacuumed prune, for example by checking PRAGMA wal_checkpoint(PASSIVE) returns zero log frames after the prune completes.
@jalehman

jalehman commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

Thank you! I made a number of changes after using this successfully (eventually) to prune my own 3.8GB lcm.db, mostly surrounding performance.

@jalehman jalehman merged commit aac2668 into Martian-Engineering:main Apr 6, 2026
1 check passed
@mvanhorn

mvanhorn commented Apr 7, 2026

Copy link
Copy Markdown
Contributor Author

Nice, glad it worked on a real 3.8GB database. Would love to hear what performance changes you made - happy to incorporate anything useful upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Data retention / prune command for old conversations

2 participants