fix: handle non-interactive init prompts gracefully#13
Closed
sheetsync wants to merge 1 commit into
Closed
Conversation
Contributor
|
This was fixed in #123 — |
gnusam
pushed a commit
to gnusam/mempalace-pgsql
that referenced
this pull request
Apr 9, 2026
Port of upstream a4149ab (by @igorls, findings MemPalace#6, MemPalace#11, MemPalace#13) adapted to the PG+pgvector backend. 1. MCP/diary path ID is now hash(content), not content[:200]+timestamp. Same content → same deterministic ID. Eliminates TOCTOU races and stops identical content from piling up as distinct duplicate rows. Matches upstream's intent (findings MemPalace#6 TOCTOU, MemPalace#13 non-deterministic IDs). 2. INSERT ... ON CONFLICT (id) DO NOTHING → ON CONFLICT (id) DO UPDATE. Re-mining a modified file previously got the new content silently dropped because the slot ID matched an existing row (upstream finding MemPalace#11 HIGH — "add ignores updates"). True upsert: content, embedding, metadata, and filed_at are all refreshed. 3. add_drawer now always returns the drawer_id (both insert and update paths). The old "return None on conflict" signal implicitly encoded the stagnation bug and is no longer meaningful. Updated tests/test_db.py::test_mine_drawer_reupsert_by_source to assert the new upsert semantics instead of the stagnation assertion it replaced, and added test_mcp_add_drawer_is_idempotent_on_same_content to cover the deterministic-content-ID path. The mtime-based file_already_mined half of bf88daa is in a follow-up commit. Co-Authored-By: Igor Lins e Silva <4753812+igorls@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jphein
referenced
this pull request
in techempower-org/mempalace
Apr 27, 2026
Three new fork-change rows for 2026-04-27 work that wasn't yet
captured in the documented inventory:
29. CLOSET_RANK_BOOSTS hoist + VecRecall ablation finding
(f558d3c → 3cb03f3) — module-level constants + 12-probe A/B
showing closet boost is mostly inert on chat-heavy corpus;
VecRecall's MemPalace#1129 critique didn't reproduce.
30. .claude-plugin/ manifest scrub (8119149 → 9f91e18) — removed
embedded API key + homelab URL from .mcp.json and hooks.json;
daemon routing restored without the literal credential, env
inherits from settings.local.json at runtime.
31. palace-daemon upstream PR push (PRs #7-#13) — seven small/medium
PRs covering most fork-ahead daemon work; pending follow-ups
and needs-generalization items tracked in daemon README.
YAML manifest gets corresponding entries for #29 (kind-filter-retired,
closet-boost-ablation) and #30 (plugin-manifest-scrub); FORK_CHANGELOG
re-rendered. Lint clean (11 fork hashes resolve, 90 PR refs match
upstream, FORK_CHANGELOG matches YAML).
mvalentsev
pushed a commit
to mvalentsev/mempalace
that referenced
this pull request
May 21, 2026
…#1555 PR MemPalace#1555 (format coverage + virtual line numbering) merged with twelve inline polish comments from Copilot + gemini-code-assist that weren't load-bearing enough to block the original ship but are real cleanups. This PR addresses them. Twelve items in scope; one item (drawer ID delimiter — Copilot MemPalace#13) is deferred to its own dedicated PR because it's a breaking schema change that requires migration design beyond the scope of a polish PR. ## Behavioral fixes (5 items, RED-tested first) 1. **FileNotFoundError vs broken symlink (Copilot MemPalace#8).** ``extract_text`` previously mapped every ``FileNotFoundError`` from ``stat()`` to ``SKIP_BROKEN_SYMLINK``. That's misleading for the common case of a regular file deleted between scan and extract. Now distinguishes: ``SKIP_BROKEN_SYMLINK`` only when ``p.is_symlink()`` is true; ``SKIP_UNREADABLE`` otherwise. 2. **``file_already_mined`` extract_mode scoping (Copilot MemPalace#11, MemPalace#12).** Both call sites in ``mine_formats`` and ``_file_chunks_locked`` now pass ``extract_mode="format"``. Previously the format miner could falsely treat drawers from project / convo miner on the same source file as "already mined" (and vice versa). Scopes idempotency to the correct drawer subset. 3. **Sentinel skip for transient missing-dep statuses (Copilot MemPalace#14).** New ``_TRANSIENT_MISSING_DEP_STATUSES`` set + ``_register_skip_sentinel_if_appropriate`` helper. Skip variants like ``SKIP_NO_MARKITDOWN`` / ``SKIP_NO_STRIPRTF`` / ``SKIP_MISSING_FORMAT_DEPS`` / ``SKIP_NETWORK_TIMEOUT`` no longer write the "already-mined" sentinel. Otherwise installing the missing extra later wouldn't trigger a re-mine. 4. **Outer ``except Exception`` in ``mine_formats`` (Gemini MemPalace#5).** The outer try around the loop previously caught only ``KeyboardInterrupt``, leaving any setup-time error (e.g., ``scan_formats`` raising) to propagate as a bare traceback. Now catches ``Exception`` defensively, logs it, prints a partial-progress summary, and lets the ``finally`` PID-cleanup run. Mirrors miner.py's belt-and-suspenders pattern. 5. **Thread user's ``chunk_size`` / ``chunk_overlap`` / ``min_chunk_size`` through to ``chunk_text`` (Gemini MemPalace#3).** ``MempalaceConfig`` was loaded only to validate readability; users who tuned their config saw no effect in format-mode mining. Now properly threaded. ## Trivial cleanups (5 items) 6. **Path expanduser in ``extract_text`` (Copilot MemPalace#7).** ``Path(path)`` → ``Path(path).expanduser()`` so CLI inputs like ``~/docs/file.pdf`` resolve correctly. 7. **Path expanduser+resolve in ``scan_formats`` (Copilot MemPalace#9).** Same fix; ``~/docs`` and relative paths now work consistently. 8. **Use resolved ``format_path`` in ``mine_formats`` (Copilot MemPalace#10).** ``scan_formats(format_dir)`` → ``scan_formats(format_path)`` so the already-resolved path is used. 9. **``render_with_line_numbers`` type annotation (Copilot MemPalace#15).** ``text: "str | None"`` reflects the documented + tested ``None`` handling. 10. **Test + docs claims (Copilot MemPalace#16, MemPalace#17, MemPalace#18).** Stale framings removed: - ``docs/format-coverage.md`` — 14 fringe cases + "see the file for the current test inventory" (no more frozen test count). - ``tests/test_line_numbers.py`` — drops "proposed for mempalace 3.3.6" + "run from the proposal directory" references. - ``tests/test_format_miner.py`` — drops "MarkItDown is mocked throughout" (live integration tests exist) + proposal-directory framing. ## Module-level hoists (enables clean test patching) - ``MempalaceConfig`` (from ``.config``) hoisted from lazy local import to module-level so tests can patch ``mempalace.format_miner.MempalaceConfig``. - ``chunk_text`` (from ``.miner``) hoisted similarly. Both follow the pattern PR MemPalace#1565 used for ``compute_hallways_for_wing``. ## Complexity refactor Extracted ``_print_mine_summary`` from ``mine_formats`` so the orchestrator stays under the project's ``max-complexity = 25`` ceiling (per ``pyproject.toml [tool.ruff.lint.mccabe]``). Behavior unchanged; pure extraction. ## Out of scope (intentionally deferred) - **Drawer ID delimiter collision (Copilot MemPalace#13)** — ``f"{source_file}{chunk_index}"`` can theoretically collide (``"/path/a1" + "23"`` == ``"/path/a" + "123"``). Fixing this is a breaking schema change to drawer IDs and requires a migration plan; will land as its own PR after design. - The four bot comments that were ALREADY addressed by amendment MemPalace#3 before the PR MemPalace#1555 merge (``_SKIP_DIRS`` dedup, ``scan_formats`` symlink skip, ``source_mtime`` tracking, hall+entities metadata) — no action needed; verified during audit. ## Tests (RED-first) Six new RED-first tests in ``tests/test_format_miner.py``: test_extract_text_nonexistent_regular_file_returns_unreadable_not_broken_symlink test_mine_formats_passes_extract_mode_format_to_file_already_mined test_mine_formats_does_not_write_sentinel_for_skip_no_markitdown test_mine_formats_does_not_write_sentinel_for_skip_missing_format_deps test_mine_formats_catches_unexpected_exception_and_prints_summary test_mine_formats_threads_chunk_size_from_user_config All six RED before this commit (failures correctly identified the bugs they're targeting), all six GREEN after. One existing test (``test_mine_formats_continues_after_per_file_error``) updated to patch the new module-level binding ``mempalace.format_miner.chunk_text`` instead of the old ``mempalace.miner.chunk_text`` source location, and to accept the ``**kwargs`` the call now passes through. Behavior unchanged. ## Verification pytest -q (full mempalace suite) → 2065 passed, 3 skipped, 0 regressions ruff check mempalace/format_miner.py mempalace/searcher.py tests/ → All checks passed! ruff format --check ... → 4 files already formatted (pinned 0.15.9) mine_formats complexity → ≤ 25 (under the project ceiling)
milla-jovovich
added a commit
that referenced
this pull request
May 30, 2026
Drawer/closet/triple IDs built by concatenating strings without a
delimiter before hashing can collide: hash((s1 + str(i1))) ==
hash((s2 + str(i2))) whenever s1+str(i1) == s2+str(i2). Concretely,
source_file="foo" + chunk_index=12 produces the same hash input as
source_file="foo1" + chunk_index=2 — both yield "foo12". Under
ChromaDB's primary-key constraint the second write silently
overwrites the first; one chunk's content is lost without surfacing
an error (the surrounding broad `except Exception:` blocks swallow
any backend complaint).
The defect class is "string + string concat fed to a hash without
a delimiter." Per the styleguide's partial-scope-key-migration rule,
every site sharing the pattern is a candidate and must be triaged —
not just the ones the original issue named.
FIX — 6 sites
- mempalace/miner.py:1253 drawer_id, add_drawer() back-compat
- mempalace/miner.py:1386 drawer_id, batched mine loop
- mempalace/miner.py:1416 drawer_ids, closet-build re-derivation
- mempalace/format_miner.py:643 drawer_id, format_miner batched loop
- mempalace/mcp_server.py:1136 drawer_id, tool_add_drawer
- mempalace/knowledge_graph.py:305 triple_id, KG triple insertion
MIGRATED FOR CONSISTENCY — 2 sites
- mempalace/convo_miner.py:87 sentinel_key — was `:`, now `|`
- mempalace/convo_miner.py:422 drawer_key — was `:`, now `|`
Pre-existing delimiter was `:`. Migrated because (a) source_file can
contain literal `:` on Windows paths (`C:\Users\…`) and URL-like
sources (`https://host:8080`), weakening `:` as a separator; (b) `|`
is reserved in Windows filenames so source_file cannot contain it,
making it strictly safer.
DELIMITER CHOICE
- `|` chosen over `:` (Windows / URL source_file edge cases)
- `|` chosen over `\x00` (NUL breaks log greppability + print()
debug + downstream string handling for marginal extra safety)
- Matches the established precedent in mempalace/diary_ingest.py
(lines 52, 76, 91, 98 — all already on `|`)
EXEMPT — audited and correct as-is
Single-input hashes (nothing to delimit):
- mempalace/miner.py:1432 closet_id (source_file only)
- mempalace/format_miner.py:559 sentinel_id (source_file only)
- mempalace/palace.py:433 lock filename (source_file only)
- mempalace/palace.py:629 palace_key (lock_key_source only)
- mempalace/diary_ingest.py:158 content_hash (text only)
- mempalace/hooks_cli.py:329 pidfile digest (joined cmd only)
- mempalace/sources/context.py:141 record digest (source_file only)
Already correctly delimited:
- mempalace/hallways.py:157 `f"{wing}::{a}::{b}"` (`::`)
- mempalace/palace_graph.py:454 `f"{a}↔{b}"` (`↔`)
- mempalace/diary_ingest.py:52,76,91,98 (`|` precedent)
Protected by composition (uniqueness guaranteed by the ID prefix,
not by the hash slice):
- mempalace/mcp_server.py:1635 entry_id is
`diary_{wing}_{strftime('%Y%m%d_%H%M%S%f')}_{sha[:12]}`.
Microsecond-resolution timestamp prefix supplies uniqueness;
the trailing hash is a content-discriminator, not the
write-time uniqueness guarantor.
NEW MODULES
- mempalace/ids.py — single source of truth for ID construction.
Five helpers (make_drawer_id_from_chunk, make_drawer_id_from_content,
make_convo_drawer_id, make_convo_sentinel_id, make_triple_id) plus
an ID_RECIPE = "v2" constant.
- mempalace/collision_scan.py — pre-mining defense. Runs immediately
before each batched ChromaDB upsert; raises CollisionError naming
the colliding (source_file, chunk_index) pairs if any proposed
drawer_id appears more than once with conflicting metadata across
the union of incoming and existing rows.
DETECTION
ChromaDB's primary-key constraint means past collisions were
silently resolved at upsert time (last-write-wins), erasing
evidence. A post-hoc audit cannot reconstruct what was lost from
palace state alone. Therefore:
- Pre-mining risk scan. Before each batched upsert, compute the
proposed drawer_ids for the incoming chunk set AND query existing
drawer_ids from the collection. If any proposed id appears more
than once in the union (incoming-vs-incoming or incoming-vs-
existing) with conflicting (source_file, chunk_index), abort the
mine with an actionable error naming the colliding pairs.
Collision is caught BEFORE it destroys data, which is the only
point at which palace state still carries the evidence.
- New metadata key: `"id_recipe": "v2"` on every drawer written
under the delimited recipe. Audits compare like-for-like;
drawers without `id_recipe` are treated as v1 legacy (undelimited
or `:`-delimited), not as collisions.
- Honest disclosure: palaces mined under any pre-v2 mempalace may
carry silent past collisions whose original content is
unrecoverable from palace state. Future library tier work will
give users a per-drawer audit + opt-in archival path.
TESTS
- tests/test_ids.py: 17 unit tests covering all 5 helpers, the
ID_RECIPE constant, the private `_delimited_sha256` helper, and
the four defect-class collision shapes (chunk_index boundary,
content boundary, extract_mode boundary, ISO datetime boundary).
RED collision tests confirm the v2 recipe breaks the defect class.
- tests/test_collision_scan.py: 10 tests covering clean batches,
idempotent re-mines, incoming-vs-incoming collisions, incoming-vs-
existing collisions, error-message quality, empty batches,
metadata without chunk_index, and ChromaDB backend errors
propagating cleanly.
- tests/test_convo_miner_unit.py: FakeCol gains a stub `get()` so
the pre-mining scan can probe an empty in-test collection.
BACKWARDS COMPATIBILITY
- Existing v1 drawers remain queryable; no rewrite of stored IDs.
- New mining passes write v2 drawers alongside v1 via PR #1628's
additive-mining model.
- No user action required; opt-in cleanup ships separately.
VERIFICATION
- macOS Python 3.12 (local): 2304 passed, 3 skipped, 0 failed,
coverage 85.44% (above 80% threshold).
- Linux Python 3.9/3.11/3.13 (OrbStack with full `pip install -e
'.[dev]'` flow): see PR body for per-version pass counts.
- `ruff check .` + `ruff format --check .`: clean.
- pylint score: ids.py 10.00/10, collision_scan.py 10.00/10; all
modified files >= 8.92 (no regression).
- bandit: clean on all new files; the one MEDIUM finding in
knowledge_graph.py is on lines 385/407 (pre-existing SQL string
construction in timeline queries), not in code touched by this PR.
- pre_push_check.sh: ALL CHECKS PASSED.
Refs: deferred from PR #1572 (Copilot finding #13). Styleguide audit
documented above: 8 fix/migration + 11 enumerated exempt sites.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Repro
Before this change, running
mempalace init <dir> < /dev/nullcrashed withEOFErrorwhen room approval prompted for input.Validation
pytest -qpython -m mempalace init <dir> < /dev/nullnow writesmempalace.yamlinstead of crashing