fix(entity_registry): clean up .tmp sidecar on write or rename failure (#1373)#1408
Merged
igorls merged 1 commit intoMay 17, 2026
Conversation
This was referenced May 7, 2026
MemPalace#1373) MemPalace#1215 made `EntityRegistry.save()` atomic via temp-file + fsync + os.replace. Crash-mid-write durability is correct: the previous registry stays intact on any failure. But if `f.write()` / `f.flush()` / `os.fsync()` / `os.replace()` raise (disk full, perms flip, broken FUSE mount, IO error), the `.tmp` sidecar was left on disk. Subsequent saves overwrite the same path so it does not grow unboundedly, but it litters the palace directory and obscures diagnostics — a user inspecting `entity_registry.json.tmp` after a crash cannot distinguish in-flight writes from stale debris. Wrap the write+chmod+replace block in try/except. On any exception, attempt `tmp_path.unlink(missing_ok=True)` before re-raising. The dir-fsync is deliberately outside the try — that is durability for a successful rename, not a write step that needs cleanup. Tests: - Augment `test_save_preserves_previous_on_serialization_failure` to also assert the .tmp sidecar is gone after a forced os.replace failure. - New `test_save_cleans_tmp_on_write_failure` forces os.fsync to raise, covering the gap between write and rename that the existing test does not exercise. Closes MemPalace#1373.
Contributor
Author
|
Rebased on |
02e0de9 to
e6f7398
Compare
3 tasks
arnoldwender
pushed a commit
to arnoldwender/mempalace
that referenced
this pull request
May 24, 2026
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json, .codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the major user-facing entries that landed without changelog entries during the cycle: Features: - MemPalace#1555 office-document mining via --mode extract + virtual line numbers - MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a) - MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph) - MemPalace#1565 cross-wing tunnels auto-promoted from hallways - MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels - MemPalace#1236 API-tool transcripts auto-route to wing_api - MemPalace#711 hooks.auto_save toggle for silent-mode sessions - MemPalace#1605 COCA content-word filter for entity detection - MemPalace#1557 case-insensitive entity matching at mine time - MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default Bug Fixes (selected, user-visible): - MemPalace#1540 silent data loss in three unchunked upsert sites - MemPalace#1538 paragraph chunker oversized chunks - MemPalace#1554 per-file chunk cap too low for transcripts - MemPalace#1562 Windows hook subprocess/ChromaDB deadlock - MemPalace#1529 create_tunnel corrupted hyphenated wing names - MemPalace#1424 save-hook truncated hyphenated project folders - MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths - MemPalace#1466 silent symlink skip now logged - MemPalace#1441 macOS stock-bash 3.2 hook compatibility - MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input - MemPalace#1523 VACUUM + FTS5 rebuild after repair - MemPalace#1548 FTS5 validation at end of mine - plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466, MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585 Performance: - MemPalace#1474 convo miner pre-fetches mined-set - MemPalace#1487 rebuild_index progress callback - MemPalace#1530 MCP cold-start diagnostics + opt-in warmup Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment verified per RELEASING.md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What and Why
#1215 made
EntityRegistry.save()atomic via temp-file + fsync +os.replace. Crash-mid-write durability is correct — the previous registry stays intact on any failure. But the.tmpsidecar was left on disk iff.write()/f.flush()/os.fsync()/os.replace()raised. Subsequent saves overwrite the same temp filename so this does not grow unboundedly, but it litters the palace directory and obscures diagnostics: a user inspectingentity_registry.json.tmpafter a crash cannot tell whether it is the in-flight write of an ongoing process or stale debris from a prior failure.@igorls flagged this in #1373.
Root Cause
mempalace/entity_registry.py:328-338— the write + chmod +os.replaceblock ran without an outertry/except. Thewith open(...) as f:block closes the fd on exit but does not unlink the file. Iff.write(),f.flush(), oros.fsync()raised (disk full, perms flip, broken FUSE mount, IO error), the temp file stayed. Ifos.replace()raised, same outcome.Reproduction
Change Summary
mempalace/entity_registry.py:328-348— wrap the write + chmod +os.replaceblock intry/except BaseException. On failure,tmp_path.unlink(missing_ok=True)and re-raise. The dir-fsync stays outside thetrybecause it only runs on the successful-rename path.BaseExceptionsoKeyboardInterruptmid-write also leaves a clean directory; the innerunlinkitself is wrapped against anOSErrorrace so cleanup cannot mask the original exception.tests/test_entity_registry.py— augmenttest_save_preserves_previous_on_serialization_failurewith an additionalassert leftover == []after the forcedos.replacefailure.tests/test_entity_registry.py— newtest_save_cleans_tmp_on_write_failurethat forcesos.fsyncto raise (the gap between write and rename that the existing test does not cover) and asserts both the durability invariant (previous registry intact) and the cleanup invariant (no.tmpsidecar left).Test Plan
test_entity_registry.pytests pass.develop.ruff check .passes on both modified files.ruff format --check .passes.os.fsyncandos.replaceto raise leaves no.tmplitter; previous registry content unchanged.Closes #1373.