feat: add `mempalace prune` to detect and remove stale drawers by vanachterjacob · Pull Request #522 · MemPalace/mempalace

vanachterjacob · 2026-04-10T10:50:02Z

Summary

Stale drawers accumulate when source files are deleted or modified after mining. This causes outdated content to surface in mempalace_search results, which can inject contradictory information into agent context — a memory correctness risk, not just a maintenance inconvenience.

This PR adds:

mempalace/pruner.py — core prune logic with three detection strategies:
- existence: finds drawers whose source file no longer exists on disk
- mtime: finds drawers whose source file has been modified since mining
- orphans: finds leftover chunks when a file shrinks after re-mining
CLI command: mempalace prune --strategy <all|existence|mtime|orphans> --wing <w> --dry-run
MCP tool: mempalace_prune — callable from Claude Code / Cursor
Clean-before-remine: patches miner.py to delete all old drawers for a source file before re-mining, preventing orphaned chunks from accumulating in the first place

Addresses

Stale drawer retrieval can inject contradictory memory into live agent context; no official sync/update workflow exists #224 — Stale drawer retrieval can inject contradictory memory into live agent context
Mempalace as hoarder cleanup! #420 — Mempalace as hoarder cleanup
feat: time-decay scoring for search results — prioritize recent memories #331 — time-decay scoring (this PR provides the cleanup foundation)

Usage

# Preview stale drawers without deleting
mempalace prune --dry-run

# Delete drawers from deleted source files only
mempalace prune --strategy existence

# Full cleanup: deleted files + modified files + orphaned chunks
mempalace prune --strategy all

# Scope to one wing
mempalace prune --wing my_project --dry-run

Via MCP:

mempalace_prune(strategy="all", dry_run=true)
mempalace_prune(strategy="existence", dry_run=false)

Test plan

Verified existence strategy detects drawers with missing source files
Verified mtime strategy detects drawers with outdated modification times
Verified --wing filter scopes detection correctly
Verified --dry-run reports without deleting
Verified actual deletion removes only stale drawers, keeps fresh ones
Verified CLI output formatting
Clean-before-remine prevents orphaned chunks on modified files

🤖 Generated with Claude Code

Stale drawers accumulate when source files are deleted or modified after mining. This causes outdated content to surface in search results, which can inject contradictory information into agent context. Adds three detection strategies: - existence: finds drawers whose source file no longer exists on disk - mtime: finds drawers whose source file has been modified since mining - orphans: finds leftover chunks when a file shrinks after re-mining Also patches the project miner to clean old drawers before re-mining a modified file (clean-before-remine), preventing orphaned chunks from accumulating in the first place. Available as CLI (`mempalace prune --dry-run`) and MCP tool (`mempalace_prune`). Addresses MemPalace#224, MemPalace#420, MemPalace#331 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

web3guru888 · 2026-04-10T10:59:51Z

Stale drawer cleanup has been on my wishlist since before soft-archive (#336) landed — glad to see a first-class prune command. A few notes from running on a corpus that gets re-mined regularly:

The clean-before-remine patch to miner.py is the most important part of this PR. Orphaned chunks from shrinking files are a subtle correctness problem — they don't show up in status outputs, they just silently contribute irrelevant hits to search results. The delete-before-insert approach is the right fix.

One concern: the mtime strategy and float precision. PR #518 (merged?) just fixed the == float comparison in file_already_mined() — the same epsilon-comparison fix should apply here if you're comparing stored mtime metadata to os.path.getmtime(). The epsilon from #475/#518 (abs(diff) < 0.01) is the right baseline.

orphans detection and chunk numbering: the orphan detection relies on chunk index continuity (finding chunk N where N-1 doesn't exist). This works but it's brittle if chunk IDs aren't generated sequentially or if a re-mine leaves gaps for another reason. Would be worth documenting the invariant this relies on, or adding a per-file chunk count stored in metadata that the orphan check can use directly.

Interaction with soft-archive (#336) and Synapse consolidation candidates (#451): a drawers that's been soft-archived probably shouldn't be flagged as stale by the existence check — it was intentionally removed from active retrieval. The prune logic should check drawer status metadata (status: "archived") and skip archived drawers from the existence strategy (mtime/orphan checks are still valid). Otherwise prune will eagerly delete what archive intentionally demoted.

Dry-run first: the --dry-run flag is essential — +1 for including it. I'd also add a --min-age-days filter so prune doesn't flag drawers that were just mined (e.g., give new drawers a 24h grace period before they're eligible for existence pruning). Helps avoid race conditions during active mining sessions.

Good foundation — once the archive/status interaction is handled, this fills a real gap.

MemPalace-AGI dashboard

StefanKremen · 2026-04-10T11:32:00Z

Heads up: this PR's miner.py clean-before-remine change also happens to fix the hnswlib updatePoint / repairConnectionsForUpdate race documented in #521 (PR #523).

Mechanism: modified-file re-mines previously upsert'd over existing deterministic drawer IDs, pushing ChromaDB through hnswlib's thread-unsafe updatePoint → repairConnectionsForUpdate path, which reliably segfaults the mining subprocess on macOS ARM64 / Python 3.13 / chromadb 0.6.3. The unambiguous fingerprint is repairConnectionsForUpdate in the crashed thread's stack — that function is only called on the update path.

Deleting by source_file before the re-insert loop (what both PRs do) converts re-mines into pure INSERT operations, bypassing the update path entirely. Different motivation (stale-drawer correctness here vs. segfault in #521), same effective fix on the miner hunk.

Coordination:

If this PR lands first, fix: purge stale drawers before re-mine to avoid hnswlib update-path race #523 becomes a subset and can be closed as superseded.
If fix: purge stale drawers before re-mine to avoid hnswlib update-path race #523 lands first, this PR will conflict in miner.py but should rebase cleanly (both delete by source_file before re-inserting chunks).

Urgency note: PR #518 (still open) fixes the float-mtime epsilon in file_already_mined(). Once that merges, modified-file re-mines will fire more reliably — which fires the hnswlib race more reliably too. Worth landing either this PR or #523 before #518.

Regression test worth adding (not currently in either PR's test plan): mine a file → os.utime() → re-mine → assert no crash and new chunks replace the old ones with the updated mtime. That's the scenario where the race fires.

web3guru888 · 2026-04-10T11:47:23Z

@StefanKremen — this is exactly the kind of cross-PR analysis that saves reviewers time. The mechanism is right: pre-delete converts re-mines from update operations to pure inserts, which never touches hnswlib's updatePoint path. Same fix, different motivations.

The merge-order recommendation is important and I want to amplify it: land this PR or #523 before #518. The #518 float-mtime epsilon fix makes file_already_mined() return False more accurately for modified files — which is correct behavior, but it increases the frequency of re-mines on files that previously slipped through, which fires the hnswlib race more reliably on affected platforms (macOS ARM64 / Python 3.13 / chromadb 0.6.3).

The regression test scenario you described is exactly right:

mine file → os.utime(file, now+1) → re-mine → assert no crash + new chunks replace old

That test would have caught this race in CI. Worth adding to whichever PR lands first.

nanoscopic · 2026-04-10T14:34:42Z

Isn't pruning stale data against the stated purpose of the project "to retain everything"?

Instead of pruning, perhaps archiving instead, so that information could theoretically be brought back if desired later?

web3guru888 · 2026-04-10T15:47:13Z

@nanoscopic — this is a fair tension to flag, and it's worth distinguishing between the two use cases:

Prune targets genuinely stale metadata, not chosen memories. When a source file is deleted from disk, the drawers it generated have no recoverable original — the text is already in ChromaDB but the file it came from no longer exists. Pruning those is less like "discarding a memory" and more like "removing a dangling reference." The content isn't gone from the palace yet — it's the source link that's broken.

The dry-run default is key. This PR defaults to preview mode. Nothing is deleted without explicit --confirm. So the danger of accidental loss is low.

The soft-archive approach you're describing is real though. There's ongoing discussion in #336 about status: archived as a first-class metadata value for "retired but retained" content. A prune command could soft-archive by default (set status=archived) rather than hard-delete, and let users hard-delete separately. That would satisfy both retention for recall and stale content not surfacing in search — archived drawers could be excluded from default search unless include_archived=True.

That said, for the existence strategy (source file no longer on disk at all) — I'd argue hard delete is appropriate since re-mine is impossible and the content would just accumulate forever.

wafuzio · 2026-04-10T18:14:08Z

watching this one! thanks

igorls · 2026-05-08T11:00:42Z

Hi, thanks for the contribution.

This PR has merge conflicts with develop, and the branch has not been updated in over 7 days, which puts it before our most recent release. The conflicts are likely against work that landed in that release.

Could you rebase onto develop so we can take another look?

If this change is no longer relevant, feel free to close the PR.

(This message is part of a periodic backlog pass, sent to all open PRs that match this state.)

This was referenced Apr 10, 2026

fix: purge stale drawers before re-mine to avoid hnswlib update-path race #523

Closed

fix: resolve 6 community-reported bugs #518

Open

This was referenced Apr 10, 2026

feat : convert to rust #530

Closed

Remove Baldfaced Lies Please #524

Closed

bensig changed the base branch from main to develop April 11, 2026 22:21

bensig requested review from bensig and milla-jovovich as code owners April 11, 2026 22:21

igorls added area/cli CLI commands area/mcp MCP server and tools area/mining File and conversation mining enhancement New feature or request labels Apr 14, 2026

igorls added the needs-rebase PR has merge conflicts with develop and needs rebase label May 8, 2026

mvalentsev mentioned this pull request May 8, 2026

feat(sync): add gitignore-aware drawer prune (#1252) #1421

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `mempalace prune` to detect and remove stale drawers#522

feat: add `mempalace prune` to detect and remove stale drawers#522
vanachterjacob wants to merge 1 commit into
MemPalace:developfrom
vanachterjacob:feat/prune-stale-drawers

vanachterjacob commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

StefanKremen commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

nanoscopic commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

wafuzio commented Apr 10, 2026

Uh oh!

igorls commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

vanachterjacob commented Apr 10, 2026

Summary

Addresses

Usage

Test plan

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

StefanKremen commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

nanoscopic commented Apr 10, 2026

Uh oh!

web3guru888 commented Apr 10, 2026

Uh oh!

wafuzio commented Apr 10, 2026

Uh oh!

igorls commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants