feat: add conversation prune function for data retention#285
Merged
jalehman merged 5 commits intoApr 6, 2026
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use SQLite date math for prune candidate selection so mixed timestamp formats compare chronologically instead of lexically. Wrap confirm-mode candidate selection and deletion in one IMMEDIATE transaction to avoid deleting conversations that become fresh during the prune run.
Add a regression test covering SQLite-formatted timestamps on the cutoff boundary.
Regeneration-Prompt: |
The prune helper added in PR 285 had two review findings to address before it is safe to use against a live LCM database. First, the candidate query compared message timestamps as raw TEXT against an ISO cutoff string. This repo stores some timestamps via SQLite datetime('now') and others via JavaScript toISOString(), so lexical comparison can prune same-day rows that are actually newer than the cutoff. Change the filter to use SQLite julianday(...) and add a regression test that seeds a SQLite-format timestamp newer than the cutoff but lexically smaller than the ISO string.
Second, confirm-mode pruning selected candidates and then deleted them row by row outside a transaction. Tighten that by running candidate selection and deletion inside BEGIN IMMEDIATE so the prune sees one consistent snapshot and does not remove conversations that received a fresh message mid-run. Keep dry-run behavior unchanged and preserve the existing optional VACUUM behavior.
Delete summary lineage, context items, and FTS rows ahead of conversation deletion so prune works against the current schema's RESTRICT edges. Add a regression test that prunes a conversation containing summary_messages and context_items. Regeneration-Prompt: | Running the prune helper against the live LCM database exposed a schema-level failure that the existing tests missed. Deleting a conversation directly did not work because several child tables mix CASCADE links from conversations with RESTRICT links back to messages and summaries. Reproduce that case with a test conversation that has a message, a linked summary, summary_messages lineage, and a context_items row. Then change prune so confirm-mode deletes the dependent rows in a safe order before deleting the conversation, and also clear any optional FTS rows tied to the pruned messages and summaries so search indexes do not retain orphaned entries.
Chunk confirmed pruning into bounded transactions so large live databases can be cleaned incrementally without one giant write lock. Delete cross-conversation context rows that reference pruned summaries or messages, and add supporting indexes plus regression coverage for batch mode and retained-context cleanup. Regeneration-Prompt: | The prune helper already handled mixed timestamp formats and dependent summary/message cleanup, but it still did not work reliably on a large live LCM database. Update it so confirm-mode pruning runs in small committed batches instead of one giant transaction. Add options to control batch size and an optional max batch count for bounded runs. Preserve dry-run behavior. While testing against a large live database, pruning exposed an additional FK case: retained conversations can keep context_items rows that reference summaries being pruned from another conversation. Extend the delete path to remove context_items rows by referenced candidate message_id and summary_id, not just by candidate conversation_id. Keep the existing summary_messages and summary_parents cleanup. Add regression tests for multi-batch pruning, bounded batch runs, and the cross-conversation context_items case. Also add the missing indexes needed for live-scale deletes on summary_messages(message_id) and summary_parents(parent_summary_id).
Follow VACUUM with wal_checkpoint(TRUNCATE) so operator-triggered prune runs reclaim disk space immediately in WAL mode instead of leaving the rewritten pages stranded in lcm.db-wal. Add a regression test that verifies the WAL is drained after a vacuumed prune. Regeneration-Prompt: | The prune helper already supports an optional vacuum pass after confirmed deletion, but in WAL mode that still leaves reclaimed pages sitting in the WAL file until a checkpoint happens. Update the vacuum path so a prune with vacuum enabled also runs PRAGMA wal_checkpoint(TRUNCATE) immediately afterward. Keep the existing API shape. Add a focused regression test in prune.test.ts that proves the WAL is drained after a vacuumed prune, for example by checking PRAGMA wal_checkpoint(PASSIVE) returns zero log frames after the prune completes.
Contributor
|
Thank you! I made a number of changes after using this successfully (eventually) to prune my own 3.8GB lcm.db, mostly surrounding performance. |
Contributor
Author
|
Nice, glad it worked on a real 3.8GB database. Would love to hear what performance changes you made - happy to incorporate anything useful upstream. |
This was referenced Apr 7, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
pruneConversationsfunction for bulk conversation lifecycle management. Conversations where all messages are older than a configurable threshold can be identified (dry-run) and deleted, with cascading cleanup of messages, summaries, and other dependent data via ON DELETE CASCADE.Changes
src/prune.ts- Core prune logic withparseDuration(supports "90d", "3m", "1y" etc.) andpruneConversationsfunction. Accepts--beforeduration, dry-run by default,--confirmto delete, optional--vacuum.test/prune.test.ts- 13 tests covering duration parsing, dry-run candidate identification, confirmed deletion with cascade verification, empty conversations, VACUUM, and invalid input..changeset/warm-clouds-prune.md- Minor changeset for versioning.Design decisions
/losslessCLI command) so lcm-tui and other consumers can call it programmatically.confirm: trueto delete.created_atas the age signal. Conversations with zero messages fall back toconversations.created_at.Testing
All 574 tests pass (35 test files), including 13 new prune-specific tests.
Fixes #209
This contribution was developed with AI assistance (Claude Code).