Skip to content

fix: handle non-interactive init prompts gracefully#13

Closed
sheetsync wants to merge 1 commit into
MemPalace:mainfrom
sheetsync:bugfix/noninteractive-init-eof
Closed

fix: handle non-interactive init prompts gracefully#13
sheetsync wants to merge 1 commit into
MemPalace:mainfrom
sheetsync:bugfix/noninteractive-init-eof

Conversation

@sheetsync

Copy link
Copy Markdown
Contributor

Summary

  • make room approval fall back safely when stdin is unavailable
  • make entity confirmation fall back safely on EOF too
  • add regression tests for non-interactive approval flows

Repro

Before this change, running mempalace init <dir> < /dev/null crashed with EOFError when room approval prompted for input.

Validation

  • pytest -q
  • verified python -m mempalace init <dir> < /dev/null now writes mempalace.yaml instead of crashing

@bensig

bensig commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

This was fixed in #123--yes now skips all interactive prompts. Closing as duplicate.

@bensig bensig closed this Apr 7, 2026
gnusam pushed a commit to gnusam/mempalace-pgsql that referenced this pull request Apr 9, 2026
Port of upstream a4149ab (by @igorls, findings MemPalace#6, MemPalace#11, MemPalace#13) adapted to
the PG+pgvector backend.

1. MCP/diary path ID is now hash(content), not content[:200]+timestamp.
   Same content → same deterministic ID. Eliminates TOCTOU races and
   stops identical content from piling up as distinct duplicate rows.
   Matches upstream's intent (findings MemPalace#6 TOCTOU, MemPalace#13 non-deterministic
   IDs).

2. INSERT ... ON CONFLICT (id) DO NOTHING → ON CONFLICT (id) DO UPDATE.
   Re-mining a modified file previously got the new content silently
   dropped because the slot ID matched an existing row (upstream
   finding MemPalace#11 HIGH — "add ignores updates"). True upsert: content,
   embedding, metadata, and filed_at are all refreshed.

3. add_drawer now always returns the drawer_id (both insert and update
   paths). The old "return None on conflict" signal implicitly encoded
   the stagnation bug and is no longer meaningful.

Updated tests/test_db.py::test_mine_drawer_reupsert_by_source to assert
the new upsert semantics instead of the stagnation assertion it replaced,
and added test_mcp_add_drawer_is_idempotent_on_same_content to cover the
deterministic-content-ID path. The mtime-based file_already_mined half
of bf88daa is in a follow-up commit.

Co-Authored-By: Igor Lins e Silva <4753812+igorls@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jphein referenced this pull request in techempower-org/mempalace Apr 27, 2026
Three new fork-change rows for 2026-04-27 work that wasn't yet
captured in the documented inventory:

29. CLOSET_RANK_BOOSTS hoist + VecRecall ablation finding
    (f558d3c3cb03f3) — module-level constants + 12-probe A/B
    showing closet boost is mostly inert on chat-heavy corpus;
    VecRecall's MemPalace#1129 critique didn't reproduce.

30. .claude-plugin/ manifest scrub (81191499f91e18) — removed
    embedded API key + homelab URL from .mcp.json and hooks.json;
    daemon routing restored without the literal credential, env
    inherits from settings.local.json at runtime.

31. palace-daemon upstream PR push (PRs #7-#13) — seven small/medium
    PRs covering most fork-ahead daemon work; pending follow-ups
    and needs-generalization items tracked in daemon README.

YAML manifest gets corresponding entries for #29 (kind-filter-retired,
closet-boost-ablation) and #30 (plugin-manifest-scrub); FORK_CHANGELOG
re-rendered. Lint clean (11 fork hashes resolve, 90 PR refs match
upstream, FORK_CHANGELOG matches YAML).
mvalentsev pushed a commit to mvalentsev/mempalace that referenced this pull request May 21, 2026
…#1555

PR MemPalace#1555 (format coverage + virtual line numbering) merged with twelve
inline polish comments from Copilot + gemini-code-assist that weren't
load-bearing enough to block the original ship but are real cleanups.
This PR addresses them.

Twelve items in scope; one item (drawer ID delimiter — Copilot MemPalace#13) is
deferred to its own dedicated PR because it's a breaking schema change
that requires migration design beyond the scope of a polish PR.

## Behavioral fixes (5 items, RED-tested first)

1. **FileNotFoundError vs broken symlink (Copilot MemPalace#8).** ``extract_text``
   previously mapped every ``FileNotFoundError`` from ``stat()`` to
   ``SKIP_BROKEN_SYMLINK``. That's misleading for the common case of a
   regular file deleted between scan and extract. Now distinguishes:
   ``SKIP_BROKEN_SYMLINK`` only when ``p.is_symlink()`` is true;
   ``SKIP_UNREADABLE`` otherwise.

2. **``file_already_mined`` extract_mode scoping (Copilot MemPalace#11, MemPalace#12).**
   Both call sites in ``mine_formats`` and ``_file_chunks_locked`` now
   pass ``extract_mode="format"``. Previously the format miner could
   falsely treat drawers from project / convo miner on the same source
   file as "already mined" (and vice versa). Scopes idempotency to the
   correct drawer subset.

3. **Sentinel skip for transient missing-dep statuses (Copilot MemPalace#14).**
   New ``_TRANSIENT_MISSING_DEP_STATUSES`` set + ``_register_skip_sentinel_if_appropriate``
   helper. Skip variants like ``SKIP_NO_MARKITDOWN`` /
   ``SKIP_NO_STRIPRTF`` / ``SKIP_MISSING_FORMAT_DEPS`` /
   ``SKIP_NETWORK_TIMEOUT`` no longer write the "already-mined" sentinel.
   Otherwise installing the missing extra later wouldn't trigger a re-mine.

4. **Outer ``except Exception`` in ``mine_formats`` (Gemini MemPalace#5).** The
   outer try around the loop previously caught only ``KeyboardInterrupt``,
   leaving any setup-time error (e.g., ``scan_formats`` raising) to
   propagate as a bare traceback. Now catches ``Exception`` defensively,
   logs it, prints a partial-progress summary, and lets the ``finally``
   PID-cleanup run. Mirrors miner.py's belt-and-suspenders pattern.

5. **Thread user's ``chunk_size`` / ``chunk_overlap`` / ``min_chunk_size``
   through to ``chunk_text`` (Gemini MemPalace#3).** ``MempalaceConfig`` was loaded
   only to validate readability; users who tuned their config saw no
   effect in format-mode mining. Now properly threaded.

## Trivial cleanups (5 items)

6. **Path expanduser in ``extract_text`` (Copilot MemPalace#7).** ``Path(path)`` →
   ``Path(path).expanduser()`` so CLI inputs like ``~/docs/file.pdf``
   resolve correctly.

7. **Path expanduser+resolve in ``scan_formats`` (Copilot MemPalace#9).** Same
   fix; ``~/docs`` and relative paths now work consistently.

8. **Use resolved ``format_path`` in ``mine_formats`` (Copilot MemPalace#10).**
   ``scan_formats(format_dir)`` → ``scan_formats(format_path)`` so the
   already-resolved path is used.

9. **``render_with_line_numbers`` type annotation (Copilot MemPalace#15).**
   ``text: "str | None"`` reflects the documented + tested ``None``
   handling.

10. **Test + docs claims (Copilot MemPalace#16, MemPalace#17, MemPalace#18).** Stale framings
    removed:
    - ``docs/format-coverage.md`` — 14 fringe cases + "see the file for
      the current test inventory" (no more frozen test count).
    - ``tests/test_line_numbers.py`` — drops "proposed for mempalace
      3.3.6" + "run from the proposal directory" references.
    - ``tests/test_format_miner.py`` — drops "MarkItDown is mocked
      throughout" (live integration tests exist) + proposal-directory
      framing.

## Module-level hoists (enables clean test patching)

- ``MempalaceConfig`` (from ``.config``) hoisted from lazy local import
  to module-level so tests can patch ``mempalace.format_miner.MempalaceConfig``.
- ``chunk_text`` (from ``.miner``) hoisted similarly.

Both follow the pattern PR MemPalace#1565 used for ``compute_hallways_for_wing``.

## Complexity refactor

Extracted ``_print_mine_summary`` from ``mine_formats`` so the orchestrator
stays under the project's ``max-complexity = 25`` ceiling (per
``pyproject.toml [tool.ruff.lint.mccabe]``). Behavior unchanged; pure
extraction.

## Out of scope (intentionally deferred)

- **Drawer ID delimiter collision (Copilot MemPalace#13)** — ``f"{source_file}{chunk_index}"``
  can theoretically collide (``"/path/a1" + "23"`` == ``"/path/a" + "123"``).
  Fixing this is a breaking schema change to drawer IDs and requires a
  migration plan; will land as its own PR after design.

- The four bot comments that were ALREADY addressed by amendment MemPalace#3
  before the PR MemPalace#1555 merge (``_SKIP_DIRS`` dedup, ``scan_formats``
  symlink skip, ``source_mtime`` tracking, hall+entities metadata) —
  no action needed; verified during audit.

## Tests (RED-first)

Six new RED-first tests in ``tests/test_format_miner.py``:

  test_extract_text_nonexistent_regular_file_returns_unreadable_not_broken_symlink
  test_mine_formats_passes_extract_mode_format_to_file_already_mined
  test_mine_formats_does_not_write_sentinel_for_skip_no_markitdown
  test_mine_formats_does_not_write_sentinel_for_skip_missing_format_deps
  test_mine_formats_catches_unexpected_exception_and_prints_summary
  test_mine_formats_threads_chunk_size_from_user_config

All six RED before this commit (failures correctly identified the bugs
they're targeting), all six GREEN after.

One existing test (``test_mine_formats_continues_after_per_file_error``)
updated to patch the new module-level binding
``mempalace.format_miner.chunk_text`` instead of the old
``mempalace.miner.chunk_text`` source location, and to accept the
``**kwargs`` the call now passes through. Behavior unchanged.

## Verification

  pytest -q (full mempalace suite)
    → 2065 passed, 3 skipped, 0 regressions
  ruff check mempalace/format_miner.py mempalace/searcher.py tests/
    → All checks passed!
  ruff format --check ...
    → 4 files already formatted (pinned 0.15.9)
  mine_formats complexity
    → ≤ 25 (under the project ceiling)
milla-jovovich added a commit that referenced this pull request May 30, 2026
Drawer/closet/triple IDs built by concatenating strings without a
delimiter before hashing can collide: hash((s1 + str(i1))) ==
hash((s2 + str(i2))) whenever s1+str(i1) == s2+str(i2). Concretely,
source_file="foo" + chunk_index=12 produces the same hash input as
source_file="foo1" + chunk_index=2 — both yield "foo12". Under
ChromaDB's primary-key constraint the second write silently
overwrites the first; one chunk's content is lost without surfacing
an error (the surrounding broad `except Exception:` blocks swallow
any backend complaint).

The defect class is "string + string concat fed to a hash without
a delimiter." Per the styleguide's partial-scope-key-migration rule,
every site sharing the pattern is a candidate and must be triaged —
not just the ones the original issue named.

FIX — 6 sites
- mempalace/miner.py:1253        drawer_id, add_drawer() back-compat
- mempalace/miner.py:1386        drawer_id, batched mine loop
- mempalace/miner.py:1416        drawer_ids, closet-build re-derivation
- mempalace/format_miner.py:643  drawer_id, format_miner batched loop
- mempalace/mcp_server.py:1136   drawer_id, tool_add_drawer
- mempalace/knowledge_graph.py:305  triple_id, KG triple insertion

MIGRATED FOR CONSISTENCY — 2 sites
- mempalace/convo_miner.py:87    sentinel_key — was `:`, now `|`
- mempalace/convo_miner.py:422   drawer_key   — was `:`, now `|`

Pre-existing delimiter was `:`. Migrated because (a) source_file can
contain literal `:` on Windows paths (`C:\Users\…`) and URL-like
sources (`https://host:8080`), weakening `:` as a separator; (b) `|`
is reserved in Windows filenames so source_file cannot contain it,
making it strictly safer.

DELIMITER CHOICE
- `|` chosen over `:` (Windows / URL source_file edge cases)
- `|` chosen over `\x00` (NUL breaks log greppability + print()
  debug + downstream string handling for marginal extra safety)
- Matches the established precedent in mempalace/diary_ingest.py
  (lines 52, 76, 91, 98 — all already on `|`)

EXEMPT — audited and correct as-is
  Single-input hashes (nothing to delimit):
    - mempalace/miner.py:1432         closet_id (source_file only)
    - mempalace/format_miner.py:559   sentinel_id (source_file only)
    - mempalace/palace.py:433         lock filename (source_file only)
    - mempalace/palace.py:629         palace_key (lock_key_source only)
    - mempalace/diary_ingest.py:158   content_hash (text only)
    - mempalace/hooks_cli.py:329      pidfile digest (joined cmd only)
    - mempalace/sources/context.py:141  record digest (source_file only)

  Already correctly delimited:
    - mempalace/hallways.py:157     `f"{wing}::{a}::{b}"`  (`::`)
    - mempalace/palace_graph.py:454 `f"{a}↔{b}"`           (`↔`)
    - mempalace/diary_ingest.py:52,76,91,98                (`|` precedent)

  Protected by composition (uniqueness guaranteed by the ID prefix,
  not by the hash slice):
    - mempalace/mcp_server.py:1635  entry_id is
      `diary_{wing}_{strftime('%Y%m%d_%H%M%S%f')}_{sha[:12]}`.
      Microsecond-resolution timestamp prefix supplies uniqueness;
      the trailing hash is a content-discriminator, not the
      write-time uniqueness guarantor.

NEW MODULES
- mempalace/ids.py — single source of truth for ID construction.
  Five helpers (make_drawer_id_from_chunk, make_drawer_id_from_content,
  make_convo_drawer_id, make_convo_sentinel_id, make_triple_id) plus
  an ID_RECIPE = "v2" constant.
- mempalace/collision_scan.py — pre-mining defense. Runs immediately
  before each batched ChromaDB upsert; raises CollisionError naming
  the colliding (source_file, chunk_index) pairs if any proposed
  drawer_id appears more than once with conflicting metadata across
  the union of incoming and existing rows.

DETECTION
ChromaDB's primary-key constraint means past collisions were
silently resolved at upsert time (last-write-wins), erasing
evidence. A post-hoc audit cannot reconstruct what was lost from
palace state alone. Therefore:

- Pre-mining risk scan. Before each batched upsert, compute the
  proposed drawer_ids for the incoming chunk set AND query existing
  drawer_ids from the collection. If any proposed id appears more
  than once in the union (incoming-vs-incoming or incoming-vs-
  existing) with conflicting (source_file, chunk_index), abort the
  mine with an actionable error naming the colliding pairs.
  Collision is caught BEFORE it destroys data, which is the only
  point at which palace state still carries the evidence.

- New metadata key: `"id_recipe": "v2"` on every drawer written
  under the delimited recipe. Audits compare like-for-like;
  drawers without `id_recipe` are treated as v1 legacy (undelimited
  or `:`-delimited), not as collisions.

- Honest disclosure: palaces mined under any pre-v2 mempalace may
  carry silent past collisions whose original content is
  unrecoverable from palace state. Future library tier work will
  give users a per-drawer audit + opt-in archival path.

TESTS
- tests/test_ids.py: 17 unit tests covering all 5 helpers, the
  ID_RECIPE constant, the private `_delimited_sha256` helper, and
  the four defect-class collision shapes (chunk_index boundary,
  content boundary, extract_mode boundary, ISO datetime boundary).
  RED collision tests confirm the v2 recipe breaks the defect class.
- tests/test_collision_scan.py: 10 tests covering clean batches,
  idempotent re-mines, incoming-vs-incoming collisions, incoming-vs-
  existing collisions, error-message quality, empty batches,
  metadata without chunk_index, and ChromaDB backend errors
  propagating cleanly.
- tests/test_convo_miner_unit.py: FakeCol gains a stub `get()` so
  the pre-mining scan can probe an empty in-test collection.

BACKWARDS COMPATIBILITY
- Existing v1 drawers remain queryable; no rewrite of stored IDs.
- New mining passes write v2 drawers alongside v1 via PR #1628's
  additive-mining model.
- No user action required; opt-in cleanup ships separately.

VERIFICATION
- macOS Python 3.12 (local): 2304 passed, 3 skipped, 0 failed,
  coverage 85.44% (above 80% threshold).
- Linux Python 3.9/3.11/3.13 (OrbStack with full `pip install -e
  '.[dev]'` flow): see PR body for per-version pass counts.
- `ruff check .` + `ruff format --check .`: clean.
- pylint score: ids.py 10.00/10, collision_scan.py 10.00/10; all
  modified files >= 8.92 (no regression).
- bandit: clean on all new files; the one MEDIUM finding in
  knowledge_graph.py is on lines 385/407 (pre-existing SQL string
  construction in timeline queries), not in code touched by this PR.
- pre_push_check.sh: ALL CHECKS PASSED.

Refs: deferred from PR #1572 (Copilot finding #13). Styleguide audit
documented above: 8 fix/migration + 11 enumerated exempt sites.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants