feat: add mempalace prune to detect and remove stale drawers#522
feat: add mempalace prune to detect and remove stale drawers#522vanachterjacob wants to merge 1 commit into
mempalace prune to detect and remove stale drawers#522Conversation
Stale drawers accumulate when source files are deleted or modified after mining. This causes outdated content to surface in search results, which can inject contradictory information into agent context. Adds three detection strategies: - existence: finds drawers whose source file no longer exists on disk - mtime: finds drawers whose source file has been modified since mining - orphans: finds leftover chunks when a file shrinks after re-mining Also patches the project miner to clean old drawers before re-mining a modified file (clean-before-remine), preventing orphaned chunks from accumulating in the first place. Available as CLI (`mempalace prune --dry-run`) and MCP tool (`mempalace_prune`). Addresses MemPalace#224, MemPalace#420, MemPalace#331 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Stale drawer cleanup has been on my wishlist since before soft-archive (#336) landed — glad to see a first-class The One concern: the
Interaction with soft-archive (#336) and Synapse consolidation candidates (#451): a drawers that's been soft-archived probably shouldn't be flagged as stale by the existence check — it was intentionally removed from active retrieval. The prune logic should check drawer status metadata ( Dry-run first: the Good foundation — once the archive/status interaction is handled, this fills a real gap. |
|
Heads up: this PR's Mechanism: modified-file re-mines previously Deleting by Coordination:
Urgency note: PR #518 (still open) fixes the float-mtime epsilon in Regression test worth adding (not currently in either PR's test plan): mine a file → |
|
@StefanKremen — this is exactly the kind of cross-PR analysis that saves reviewers time. The mechanism is right: pre-delete converts re-mines from update operations to pure inserts, which never touches hnswlib's The merge-order recommendation is important and I want to amplify it: land this PR or #523 before #518. The #518 float-mtime epsilon fix makes The regression test scenario you described is exactly right: That test would have caught this race in CI. Worth adding to whichever PR lands first. |
|
Isn't pruning stale data against the stated purpose of the project "to retain everything"? Instead of pruning, perhaps archiving instead, so that information could theoretically be brought back if desired later? |
|
@nanoscopic — this is a fair tension to flag, and it's worth distinguishing between the two use cases: Prune targets genuinely stale metadata, not chosen memories. When a source file is deleted from disk, the drawers it generated have no recoverable original — the text is already in ChromaDB but the file it came from no longer exists. Pruning those is less like "discarding a memory" and more like "removing a dangling reference." The content isn't gone from the palace yet — it's the source link that's broken. The The soft-archive approach you're describing is real though. There's ongoing discussion in #336 about That said, for the |
|
watching this one! thanks |
|
Hi, thanks for the contribution. This PR has merge conflicts with Could you rebase onto If this change is no longer relevant, feel free to close the PR. (This message is part of a periodic backlog pass, sent to all open PRs that match this state.) |
Summary
Stale drawers accumulate when source files are deleted or modified after mining. This causes outdated content to surface in
mempalace_searchresults, which can inject contradictory information into agent context — a memory correctness risk, not just a maintenance inconvenience.This PR adds:
mempalace/pruner.py— core prune logic with three detection strategies:existence: finds drawers whose source file no longer exists on diskmtime: finds drawers whose source file has been modified since miningorphans: finds leftover chunks when a file shrinks after re-miningmempalace prune --strategy <all|existence|mtime|orphans> --wing <w> --dry-runmempalace_prune— callable from Claude Code / Cursorminer.pyto delete all old drawers for a source file before re-mining, preventing orphaned chunks from accumulating in the first placeAddresses
Usage
Via MCP:
Test plan
existencestrategy detects drawers with missing source filesmtimestrategy detects drawers with outdated modification times--wingfilter scopes detection correctly--dry-runreports without deleting🤖 Generated with Claude Code