fix: handle non-interactive init prompts gracefully by sheetsync · Pull Request #13 · MemPalace/mempalace

sheetsync · 2026-04-07T07:13:34Z

Summary

make room approval fall back safely when stdin is unavailable
make entity confirmation fall back safely on EOF too
add regression tests for non-interactive approval flows

Repro

Before this change, running mempalace init <dir> < /dev/null crashed with EOFError when room approval prompted for input.

Validation

pytest -q
verified python -m mempalace init <dir> < /dev/null now writes mempalace.yaml instead of crashing

bensig · 2026-04-07T19:34:34Z

This was fixed in #123 — --yes now skips all interactive prompts. Closing as duplicate.

@igorls

Port of upstream a4149ab (by @igorls, findings MemPalace#6, MemPalace#11, MemPalace#13) adapted to the PG+pgvector backend. 1. MCP/diary path ID is now hash(content), not content[:200]+timestamp. Same content → same deterministic ID. Eliminates TOCTOU races and stops identical content from piling up as distinct duplicate rows. Matches upstream's intent (findings MemPalace#6 TOCTOU, MemPalace#13 non-deterministic IDs). 2. INSERT ... ON CONFLICT (id) DO NOTHING → ON CONFLICT (id) DO UPDATE. Re-mining a modified file previously got the new content silently dropped because the slot ID matched an existing row (upstream finding MemPalace#11 HIGH — "add ignores updates"). True upsert: content, embedding, metadata, and filed_at are all refreshed. 3. add_drawer now always returns the drawer_id (both insert and update paths). The old "return None on conflict" signal implicitly encoded the stagnation bug and is no longer meaningful. Updated tests/test_db.py::test_mine_drawer_reupsert_by_source to assert the new upsert semantics instead of the stagnation assertion it replaced, and added test_mcp_add_drawer_is_idempotent_on_same_content to cover the deterministic-content-ID path. The mtime-based file_already_mined half of bf88daa is in a follow-up commit. Co-Authored-By: Igor Lins e Silva <4753812+igorls@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three new fork-change rows for 2026-04-27 work that wasn't yet captured in the documented inventory: 29. CLOSET_RANK_BOOSTS hoist + VecRecall ablation finding (f558d3c → 3cb03f3) — module-level constants + 12-probe A/B showing closet boost is mostly inert on chat-heavy corpus; VecRecall's MemPalace#1129 critique didn't reproduce. 30. .claude-plugin/ manifest scrub (8119149 → 9f91e18) — removed embedded API key + homelab URL from .mcp.json and hooks.json; daemon routing restored without the literal credential, env inherits from settings.local.json at runtime. 31. palace-daemon upstream PR push (PRs #7-#13) — seven small/medium PRs covering most fork-ahead daemon work; pending follow-ups and needs-generalization items tracked in daemon README. YAML manifest gets corresponding entries for #29 (kind-filter-retired, closet-boost-ablation) and #30 (plugin-manifest-scrub); FORK_CHANGELOG re-rendered. Lint clean (11 fork hashes resolve, 90 PR refs match upstream, FORK_CHANGELOG matches YAML).

…#1555 PR MemPalace#1555 (format coverage + virtual line numbering) merged with twelve inline polish comments from Copilot + gemini-code-assist that weren't load-bearing enough to block the original ship but are real cleanups. This PR addresses them. Twelve items in scope; one item (drawer ID delimiter — Copilot MemPalace#13) is deferred to its own dedicated PR because it's a breaking schema change that requires migration design beyond the scope of a polish PR. ## Behavioral fixes (5 items, RED-tested first) 1. **FileNotFoundError vs broken symlink (Copilot MemPalace#8).** ``extract_text`` previously mapped every ``FileNotFoundError`` from ``stat()`` to ``SKIP_BROKEN_SYMLINK``. That's misleading for the common case of a regular file deleted between scan and extract. Now distinguishes: ``SKIP_BROKEN_SYMLINK`` only when ``p.is_symlink()`` is true; ``SKIP_UNREADABLE`` otherwise. 2. **``file_already_mined`` extract_mode scoping (Copilot MemPalace#11, MemPalace#12).** Both call sites in ``mine_formats`` and ``_file_chunks_locked`` now pass ``extract_mode="format"``. Previously the format miner could falsely treat drawers from project / convo miner on the same source file as "already mined" (and vice versa). Scopes idempotency to the correct drawer subset. 3. **Sentinel skip for transient missing-dep statuses (Copilot MemPalace#14).** New ``_TRANSIENT_MISSING_DEP_STATUSES`` set + ``_register_skip_sentinel_if_appropriate`` helper. Skip variants like ``SKIP_NO_MARKITDOWN`` / ``SKIP_NO_STRIPRTF`` / ``SKIP_MISSING_FORMAT_DEPS`` / ``SKIP_NETWORK_TIMEOUT`` no longer write the "already-mined" sentinel. Otherwise installing the missing extra later wouldn't trigger a re-mine. 4. **Outer ``except Exception`` in ``mine_formats`` (Gemini MemPalace#5).** The outer try around the loop previously caught only ``KeyboardInterrupt``, leaving any setup-time error (e.g., ``scan_formats`` raising) to propagate as a bare traceback. Now catches ``Exception`` defensively, logs it, prints a partial-progress summary, and lets the ``finally`` PID-cleanup run. Mirrors miner.py's belt-and-suspenders pattern. 5. **Thread user's ``chunk_size`` / ``chunk_overlap`` / ``min_chunk_size`` through to ``chunk_text`` (Gemini MemPalace#3).** ``MempalaceConfig`` was loaded only to validate readability; users who tuned their config saw no effect in format-mode mining. Now properly threaded. ## Trivial cleanups (5 items) 6. **Path expanduser in ``extract_text`` (Copilot MemPalace#7).** ``Path(path)`` → ``Path(path).expanduser()`` so CLI inputs like ``~/docs/file.pdf`` resolve correctly. 7. **Path expanduser+resolve in ``scan_formats`` (Copilot MemPalace#9).** Same fix; ``~/docs`` and relative paths now work consistently. 8. **Use resolved ``format_path`` in ``mine_formats`` (Copilot MemPalace#10).** ``scan_formats(format_dir)`` → ``scan_formats(format_path)`` so the already-resolved path is used. 9. **``render_with_line_numbers`` type annotation (Copilot MemPalace#15).** ``text: "str | None"`` reflects the documented + tested ``None`` handling. 10. **Test + docs claims (Copilot MemPalace#16, MemPalace#17, MemPalace#18).** Stale framings removed: - ``docs/format-coverage.md`` — 14 fringe cases + "see the file for the current test inventory" (no more frozen test count). - ``tests/test_line_numbers.py`` — drops "proposed for mempalace 3.3.6" + "run from the proposal directory" references. - ``tests/test_format_miner.py`` — drops "MarkItDown is mocked throughout" (live integration tests exist) + proposal-directory framing. ## Module-level hoists (enables clean test patching) - ``MempalaceConfig`` (from ``.config``) hoisted from lazy local import to module-level so tests can patch ``mempalace.format_miner.MempalaceConfig``. - ``chunk_text`` (from ``.miner``) hoisted similarly. Both follow the pattern PR MemPalace#1565 used for ``compute_hallways_for_wing``. ## Complexity refactor Extracted ``_print_mine_summary`` from ``mine_formats`` so the orchestrator stays under the project's ``max-complexity = 25`` ceiling (per ``pyproject.toml [tool.ruff.lint.mccabe]``). Behavior unchanged; pure extraction. ## Out of scope (intentionally deferred) - **Drawer ID delimiter collision (Copilot MemPalace#13)** — ``f"{source_file}{chunk_index}"`` can theoretically collide (``"/path/a1" + "23"`` == ``"/path/a" + "123"``). Fixing this is a breaking schema change to drawer IDs and requires a migration plan; will land as its own PR after design. - The four bot comments that were ALREADY addressed by amendment MemPalace#3 before the PR MemPalace#1555 merge (``_SKIP_DIRS`` dedup, ``scan_formats`` symlink skip, ``source_mtime`` tracking, hall+entities metadata) — no action needed; verified during audit. ## Tests (RED-first) Six new RED-first tests in ``tests/test_format_miner.py``: test_extract_text_nonexistent_regular_file_returns_unreadable_not_broken_symlink test_mine_formats_passes_extract_mode_format_to_file_already_mined test_mine_formats_does_not_write_sentinel_for_skip_no_markitdown test_mine_formats_does_not_write_sentinel_for_skip_missing_format_deps test_mine_formats_catches_unexpected_exception_and_prints_summary test_mine_formats_threads_chunk_size_from_user_config All six RED before this commit (failures correctly identified the bugs they're targeting), all six GREEN after. One existing test (``test_mine_formats_continues_after_per_file_error``) updated to patch the new module-level binding ``mempalace.format_miner.chunk_text`` instead of the old ``mempalace.miner.chunk_text`` source location, and to accept the ``**kwargs`` the call now passes through. Behavior unchanged. ## Verification pytest -q (full mempalace suite) → 2065 passed, 3 skipped, 0 regressions ruff check mempalace/format_miner.py mempalace/searcher.py tests/ → All checks passed! ruff format --check ... → 4 files already formatted (pinned 0.15.9) mine_formats complexity → ≤ 25 (under the project ceiling)

Drawer/closet/triple IDs built by concatenating strings without a delimiter before hashing can collide: hash((s1 + str(i1))) == hash((s2 + str(i2))) whenever s1+str(i1) == s2+str(i2). Concretely, source_file="foo" + chunk_index=12 produces the same hash input as source_file="foo1" + chunk_index=2 — both yield "foo12". Under ChromaDB's primary-key constraint the second write silently overwrites the first; one chunk's content is lost without surfacing an error (the surrounding broad `except Exception:` blocks swallow any backend complaint). The defect class is "string + string concat fed to a hash without a delimiter." Per the styleguide's partial-scope-key-migration rule, every site sharing the pattern is a candidate and must be triaged — not just the ones the original issue named. FIX — 6 sites - mempalace/miner.py:1253 drawer_id, add_drawer() back-compat - mempalace/miner.py:1386 drawer_id, batched mine loop - mempalace/miner.py:1416 drawer_ids, closet-build re-derivation - mempalace/format_miner.py:643 drawer_id, format_miner batched loop - mempalace/mcp_server.py:1136 drawer_id, tool_add_drawer - mempalace/knowledge_graph.py:305 triple_id, KG triple insertion MIGRATED FOR CONSISTENCY — 2 sites - mempalace/convo_miner.py:87 sentinel_key — was `:`, now `|` - mempalace/convo_miner.py:422 drawer_key — was `:`, now `|` Pre-existing delimiter was `:`. Migrated because (a) source_file can contain literal `:` on Windows paths (`C:\Users\…`) and URL-like sources (`https://host:8080`), weakening `:` as a separator; (b) `|` is reserved in Windows filenames so source_file cannot contain it, making it strictly safer. DELIMITER CHOICE - `|` chosen over `:` (Windows / URL source_file edge cases) - `|` chosen over `\x00` (NUL breaks log greppability + print() debug + downstream string handling for marginal extra safety) - Matches the established precedent in mempalace/diary_ingest.py (lines 52, 76, 91, 98 — all already on `|`) EXEMPT — audited and correct as-is Single-input hashes (nothing to delimit): - mempalace/miner.py:1432 closet_id (source_file only) - mempalace/format_miner.py:559 sentinel_id (source_file only) - mempalace/palace.py:433 lock filename (source_file only) - mempalace/palace.py:629 palace_key (lock_key_source only) - mempalace/diary_ingest.py:158 content_hash (text only) - mempalace/hooks_cli.py:329 pidfile digest (joined cmd only) - mempalace/sources/context.py:141 record digest (source_file only) Already correctly delimited: - mempalace/hallways.py:157 `f"{wing}::{a}::{b}"` (`::`) - mempalace/palace_graph.py:454 `f"{a}↔{b}"` (`↔`) - mempalace/diary_ingest.py:52,76,91,98 (`|` precedent) Protected by composition (uniqueness guaranteed by the ID prefix, not by the hash slice): - mempalace/mcp_server.py:1635 entry_id is `diary_{wing}_{strftime('%Y%m%d_%H%M%S%f')}_{sha[:12]}`. Microsecond-resolution timestamp prefix supplies uniqueness; the trailing hash is a content-discriminator, not the write-time uniqueness guarantor. NEW MODULES - mempalace/ids.py — single source of truth for ID construction. Five helpers (make_drawer_id_from_chunk, make_drawer_id_from_content, make_convo_drawer_id, make_convo_sentinel_id, make_triple_id) plus an ID_RECIPE = "v2" constant. - mempalace/collision_scan.py — pre-mining defense. Runs immediately before each batched ChromaDB upsert; raises CollisionError naming the colliding (source_file, chunk_index) pairs if any proposed drawer_id appears more than once with conflicting metadata across the union of incoming and existing rows. DETECTION ChromaDB's primary-key constraint means past collisions were silently resolved at upsert time (last-write-wins), erasing evidence. A post-hoc audit cannot reconstruct what was lost from palace state alone. Therefore: - Pre-mining risk scan. Before each batched upsert, compute the proposed drawer_ids for the incoming chunk set AND query existing drawer_ids from the collection. If any proposed id appears more than once in the union (incoming-vs-incoming or incoming-vs- existing) with conflicting (source_file, chunk_index), abort the mine with an actionable error naming the colliding pairs. Collision is caught BEFORE it destroys data, which is the only point at which palace state still carries the evidence. - New metadata key: `"id_recipe": "v2"` on every drawer written under the delimited recipe. Audits compare like-for-like; drawers without `id_recipe` are treated as v1 legacy (undelimited or `:`-delimited), not as collisions. - Honest disclosure: palaces mined under any pre-v2 mempalace may carry silent past collisions whose original content is unrecoverable from palace state. Future library tier work will give users a per-drawer audit + opt-in archival path. TESTS - tests/test_ids.py: 17 unit tests covering all 5 helpers, the ID_RECIPE constant, the private `_delimited_sha256` helper, and the four defect-class collision shapes (chunk_index boundary, content boundary, extract_mode boundary, ISO datetime boundary). RED collision tests confirm the v2 recipe breaks the defect class. - tests/test_collision_scan.py: 10 tests covering clean batches, idempotent re-mines, incoming-vs-incoming collisions, incoming-vs- existing collisions, error-message quality, empty batches, metadata without chunk_index, and ChromaDB backend errors propagating cleanly. - tests/test_convo_miner_unit.py: FakeCol gains a stub `get()` so the pre-mining scan can probe an empty in-test collection. BACKWARDS COMPATIBILITY - Existing v1 drawers remain queryable; no rewrite of stored IDs. - New mining passes write v2 drawers alongside v1 via PR #1628's additive-mining model. - No user action required; opt-in cleanup ships separately. VERIFICATION - macOS Python 3.12 (local): 2304 passed, 3 skipped, 0 failed, coverage 85.44% (above 80% threshold). - Linux Python 3.9/3.11/3.13 (OrbStack with full `pip install -e '.[dev]'` flow): see PR body for per-version pass counts. - `ruff check .` + `ruff format --check .`: clean. - pylint score: ids.py 10.00/10, collision_scan.py 10.00/10; all modified files >= 8.92 (no regression). - bandit: clean on all new files; the one MEDIUM finding in knowledge_graph.py is on lines 385/407 (pre-existing SQL string construction in timeline queries), not in code touched by this PR. - pre_push_check.sh: ALL CHECKS PASSED. Refs: deferred from PR #1572 (Copilot finding #13). Styleguide audit documented above: 8 fix/migration + 11 enumerated exempt sites.

fix: handle non-interactive init prompts gracefully

1f61e90

bensig closed this Apr 7, 2026

igorls mentioned this pull request Apr 7, 2026

fix: use upsert and deterministic IDs to prevent data stagnation #140

Merged

milla-jovovich mentioned this pull request May 21, 2026

fix(extract): polish PR — address bot review feedback on PR #1555 #1572

Merged

milla-jovovich mentioned this pull request May 30, 2026

fix(ids): delimit hash inputs to prevent drawer_id collisions (#80) #1666

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle non-interactive init prompts gracefully#13

fix: handle non-interactive init prompts gracefully#13
sheetsync wants to merge 1 commit into
MemPalace:mainfrom
sheetsync:bugfix/noninteractive-init-eof

sheetsync commented Apr 7, 2026

Uh oh!

bensig commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sheetsync commented Apr 7, 2026

Summary

Repro

Validation

Uh oh!

bensig commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants