Skip to content

fix(mine): validate FTS5 at end of mine (#1537)#1548

Merged
igorls merged 1 commit into
MemPalace:developfrom
mvalentsev:fix/1537-validate-fts5-after-mine
May 24, 2026
Merged

fix(mine): validate FTS5 at end of mine (#1537)#1548
igorls merged 1 commit into
MemPalace:developfrom
mvalentsev:fix/1537-validate-fts5-after-mine

Conversation

@mvalentsev

@mvalentsev mvalentsev commented May 18, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes #1537. mempalace mine currently prints Done. and exits 0 even when the resulting chroma.sqlite3 left FTS5 in a malformed state. Automation, hooks, and CI then proceed against a palace whose wing-filtered searches will fail with Error finding id. The sibling repair --yes path already runs PRAGMA quick_check and refuses on the same failure (closed via #1216 / #1523); this PR closes the same gap on the mine path.

Change shape

  • mempalace/palace.py: new MineValidationError(RuntimeError) plus _validate_palace_fts5_after_mine(palace_path) helper. Helper composes two existing repair-side primitives: _close_chroma_handles(palace_path, backend=_DEFAULT_BACKEND) to release the live writer's handles (Windows mmap guard) and sqlite_integrity_errors(palace_path) to run PRAGMA quick_check on a fresh read-only connection. Raises MineValidationError(palace_path, errors) on any non-ok row.
  • mempalace/miner.py: validator call after the topic-tunnels block, before the Done. banner, inside if not dry_run:. A new explicit except MineValidationError: raise clause sits before the existing except Exception as exc: partial-progress handler. Without it the operator would see the misleading "Mine aborted by exception. files_processed: N. last_file: ." banner before the recovery banner, since the loop completed cleanly and validation ran post-loop.
  • mempalace/convo_miner.py: symmetric call site. mine_convos has no outer try/except (pre-existing), so the exception bubbles directly to the CLI handler.
  • mempalace/format_miner.py: symmetric call site in mine_formats (the --mode extract entry point), placed in the post-loop else: branch after _compute_topic_tunnels_for_wing. Closes a gap that opened on develop after this PR was rebased (see next section).
  • mempalace/cli.py: cmd_mine adds an except MineValidationError arm that prints print_sqlite_integrity_abort (the exact recovery banner cmd_repair already prints), appends a mine-specific note to stderr, and exits 1. The arm wraps all three mode branches (project / convos / extract).

Post-#1555 extract-mode coverage

When this PR was opened on 2026-05-18, _mine_impl was the single mine entry point and the validator there covered every mine path. Merge #1555 (3.3.6 release, 2026-05-21) introduced mempalace/format_miner.py::mine_formats as a third mine entry point, wired into cli.py as elif args.mode == "extract":. Without an additional validator call, mempalace mine <dir> --mode extract would have exited 0 on a corrupted palace exactly as the original reporter scenario describes, just from a different operator-facing command. The rebase in this branch resolves the develop conflict in _mine_impl (preserving the upstream hallways + entity-tunnels additions while keeping the validator call at the end) and extends the same _validate_palace_fts5_after_mine helper into mine_formats's else: branch so all three modes share the same end-of-mine integrity contract.

Why reuse repair-side primitives

The same PRAGMA quick_check failure can be reached from two paths: a corrupted-by-repair sqlite (#1517, fixed in #1216 / #1523) and a corrupted-by-mine sqlite (this issue). Operators should see the same authoritative recovery banner regardless of which command surfaced it. Reusing print_sqlite_integrity_abort keeps the wording singular and the recovery steps coherent.

Threaded backend=_DEFAULT_BACKEND

_close_chroma_handles without a backend argument falls back to a transient ChromaBackend() instance whose _clients dict is empty, so it never closes the live writer's PersistentClient. The helper passes palace._DEFAULT_BACKEND so the cached client for the just-mined palace actually gets closed and WAL flushes before the read-only sqlite3 re-open. Matches the backend=backend pattern cmd_repair already uses (mempalace/repair.py:838, :851; mempalace/cli.py:972).

Wording note

The mine-specific stderr note explicitly hedges attribution: "the corruption may pre-date the mine itself." quick_check runs once at end-of-mine, so it cannot distinguish corruption produced by this mine from pre-existing corruption surfaced by it. mempalace repair --yes rebuilds the FTS5 virtual table either way.

How to test

uv sync --extra dev
uv run pytest tests/test_miner_fts5_validation.py -v
uv run pytest tests/ --ignore=tests/benchmarks -q
uvx --from 'ruff==0.15.9' ruff check .
uvx --from 'ruff==0.15.9' ruff format --check .

End-to-end against the actual mempalace binary:

# Happy path: clean mine still exits 0 with Done banner.
rm -rf /tmp/mp1537 && mkdir -p /tmp/mp1537/src
echo "lorem ipsum " > /tmp/mp1537/src/a.md
PALACE=/tmp/mp1537/palace
cd /tmp/mp1537
mempalace --palace "$PALACE" init . --yes
# (mine ran via init or directly)
mempalace --palace "$PALACE" mine .

# Reporter-shape failure: soft FTS5-only corruption.
python3 -c "
import sqlite3
conn = sqlite3.connect('$PALACE/chroma.sqlite3')
rows = conn.execute('SELECT id, block FROM embedding_fulltext_search_data').fetchall()
target = next((r for r in rows if r[0] > 10), rows[0])
conn.execute('UPDATE embedding_fulltext_search_data SET block=? WHERE id=?',
             (b'\\xde\\xad\\xbe\\xef' * (len(target[1]) // 4), target[0]))
conn.commit()
"
mempalace --palace "$PALACE" mine .   # should exit 1 with recovery banner
echo "exit=$?"                          # → 1

The corrupted second run prints ABORT: SQLite-layer corruption detected before repair rebuild plus quick_check output: - malformed inverted index for FTS5 table main.embedding_fulltext_search, verbatim the operator-facing message reporter cited. The --mode extract and --mode convos paths surface the same banner via the shared cmd_mine handler.

Test coverage

17 new cases in tests/test_miner_fts5_validation.py:

  1. Helper returns silently on a clean palace.
  2. Helper raises MineValidationError on a page-mangled sqlite (corruption recipe mirrors tests/test_repair.py).
  3. Helper raises on FTS5-segment-only corruption: the reporter's exact failure mode (chromadb opens cleanly, only the inverted index is malformed). Skips cleanly with pytest.skip if chromadb renames the shadow table in a future version.
  4. cmd_mine project mode surfaces MineValidationError as exit 1 + recovery banner.
  5. cmd_mine convos mode symmetric.
  6. Dry-run skips the validator in project miner (no writes, nothing to check).
  7. _close_chroma_handles runs before sqlite_integrity_errors (Windows guard order).
  8. Full chain through real _mine_impl (no monkeypatch): mine, corrupt FTS5, re-mine, assert validator was the explicit raise-source via spy.
  9. _mine_impl does NOT print the "Mine aborted by exception" partial-progress banner on MineValidationError (the new explicit except MineValidationError: raise clause bypasses the generic handler).
  10. MineValidationError constructor rejects empty errors list and blank palace_path (defense-in-depth against future mis-construction).
  11. MineValidationError.errors is a tuple (immutable forensic snapshot; handlers cannot mutate).
  12. Helper is silent on missing chroma.sqlite3 (palace dir without sqlite file).
  13. mine_convos(dry_run=True) skips the validator (mirrors project-miner guarantee).
  14. cmd_mine --mode extract surfaces MineValidationError as exit 1 + recovery banner (post-feat(3.3.6): virtual line numbering + format coverage via --mode extract #1555 third entry point).
  15. mine_formats(dry_run=True) skips the validator (matches the other two miners).
  16. mine_formats on KeyboardInterrupt skips the validator: the per-file except Exception does not catch BaseException, so the interrupt propagates past the else: branch where validation sits.
  17. Full chain through real mine_formats (no monkeypatch on the validator itself): build palace, corrupt FTS5, invoke mine_formats, assert the helper raised MineValidationError with the corrupt path in errors.

Out of scope

A codebase-wide audit found additional write paths that print success and exit 0 without validating FTS5. They are deliberately not touched here: each is a separate user-visible surface, and the reporter scope is mempalace mine specifically. A follow-up issue will track adopting the same _validate_palace_fts5_after_mine helper across these sites.

CLI surface:

  • cmd_sweep (sweeper.sweep / sweep_directory)
  • cmd_sync --apply (sync_palace; deletes can leave FTS5 shadow tables stale)
  • cmd_compress (comp_col.upsert of compressed closets)
  • dedup_palace (deletes drawers)
  • closet_llm.regenerate_closets
  • diary_ingest (drawers_col.upsert + upsert_closet_lines)
  • migrate.migrate (col.add batch into temp palace, then swap; no post-swap quick_check)
  • repair.rebuild_from_sqlite (multi-collection upsert; preflight covers the legacy rebuild_index path but not this sub-mode)

Library surface:

  • sources/context.py:PalaceContext.upsert_drawer (shared helper used by source adapters)

MCP surface:

  • tool_add_drawer, tool_diary_write, tool_delete_drawer (mcp_server.py:1171). Different surface: JSON-RPC responses rather than CLI exit codes.

The recovery path this PR's stderr note directs operators to (mempalace repair --yesrebuild_index) is itself safe: it preflights via sqlite_integrity_errors and post-#1523 rebuilds the FTS5 virtual table.

Three pre-existing structural asymmetries that this PR observes but does not address:

  • mine_convos does not wrap its body in mine_palace_lock the way mine() does. The validator therefore runs outside the palace lock in convos mode, which means a concurrent writer (e.g. MCP server) could in principle slip in between the last upsert and the validator. The new stderr note hedges attribution rather than falsely blaming the mine.
  • mine_formats (introduced by feat(3.3.6): virtual line numbering + format coverage via --mode extract #1555) also lacks mine_palace_lock, with the same concurrent-writer caveat as mine_convos. The validator wired in by this PR runs in the same lock-free zone. Locking parity across the three miners is a separate refactor.
  • mine_convos and mine_formats have no outer try/except wrapping their Done. banners, so project mode prints a partial-progress summary on MineValidationError while convos and extract modes print only the recovery banner. Recovering symmetry is a separate refactor.

Checklist

  • Tests pass (uv run pytest tests/ --ignore=tests/benchmarks): 2076 passed, 3 skipped
  • No hardcoded paths
  • Linter passes (uvx --from 'ruff==0.15.9' ruff check . + ruff format --check)

Closes #1537.

Co-authored-by: Caleb Wells 15988028+calebcwells@users.noreply.github.com

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces post-mine FTS5/SQLite integrity validation to detect database corruption early. It adds a new MineValidationError exception, a validation helper that runs PRAGMA quick_check, and integrates these checks into the mining processes for both projects and conversations. The CLI is updated to catch these validation errors and display a consistent recovery banner. Additionally, a new test suite is included to verify the validation logic and error handling. I have no feedback to provide as there are no review comments.

@mvalentsev mvalentsev marked this pull request as ready for review May 18, 2026 23:11
@mvalentsev mvalentsev force-pushed the fix/1537-validate-fts5-after-mine branch 6 times, most recently from d3adba8 to fcef78b Compare May 23, 2026 06:22
Wires _validate_palace_fts5_after_mine into all three mine entry
points so corrupted-FTS5 palaces cannot silently exit 0 from any
of them:

- _mine_impl (mempalace/miner.py) — project file miner
- mine_convos (mempalace/convo_miner.py) — conversation exports
- mine_formats (mempalace/format_miner.py) — binary office documents
  via --mode extract, introduced on develop by MemPalace#1555 (3.3.6 release)
  between this PR's open date and its rebase

cmd_mine surfaces MineValidationError as exit 1 + the same
print_sqlite_integrity_abort banner cmd_repair already prints,
appended with a mine-specific stderr note that hedges attribution
(quick_check cannot tell pre-existing corruption from corruption
this mine produced). 17 tests in tests/test_miner_fts5_validation.py
cover the helper, the three call sites, dry-run / KeyboardInterrupt
skip semantics, and the MineValidationError constructor invariants.

Closes MemPalace#1537.

Co-authored-by: Caleb Wells <15988028+calebcwells@users.noreply.github.com>
@mvalentsev mvalentsev force-pushed the fix/1537-validate-fts5-after-mine branch from fcef78b to 0ecf0e5 Compare May 24, 2026 13:27
@igorls igorls merged commit ce1ee26 into MemPalace:develop May 24, 2026
6 checks passed
arnoldwender pushed a commit to arnoldwender/mempalace that referenced this pull request May 24, 2026
Bumps version 3.3.5 → 3.3.6 across pyproject.toml, version.py, plugin
manifests (.claude-plugin/plugin.json, .claude-plugin/marketplace.json,
.codex-plugin/plugin.json), README badge, and uv.lock. Flips CHANGELOG.md
from ``[Unreleased]`` to ``[3.3.6] — 2026-05-24`` and backfills the
major user-facing entries that landed without changelog entries during
the cycle:

Features:
- MemPalace#1555 office-document mining via --mode extract + virtual line numbers
- MemPalace#1584 surgical closet pointers with date+line locators (Tier 6a)
- MemPalace#1558 + MemPalace#1560 within-wing hallways (entity co-occurrence graph)
- MemPalace#1565 cross-wing tunnels auto-promoted from hallways
- MemPalace#1578 Hebbian potentiation + Ebbinghaus decay on hallways/tunnels
- MemPalace#1236 API-tool transcripts auto-route to wing_api
- MemPalace#711 hooks.auto_save toggle for silent-mode sessions
- MemPalace#1605 COCA content-word filter for entity detection
- MemPalace#1557 case-insensitive entity matching at mine time
- MemPalace#1483 multilingual embeddings (embeddinggemma-300m) by default

Bug Fixes (selected, user-visible):
- MemPalace#1540 silent data loss in three unchunked upsert sites
- MemPalace#1538 paragraph chunker oversized chunks
- MemPalace#1554 per-file chunk cap too low for transcripts
- MemPalace#1562 Windows hook subprocess/ChromaDB deadlock
- MemPalace#1529 create_tunnel corrupted hyphenated wing names
- MemPalace#1424 save-hook truncated hyphenated project folders
- MemPalace#1383 KG cache duplicated graphs for symlinked/cased paths
- MemPalace#1466 silent symlink skip now logged
- MemPalace#1441 macOS stock-bash 3.2 hook compatibility
- MemPalace#1500 / MemPalace#1513 structured JSON-RPC errors on bad MCP input
- MemPalace#1523 VACUUM + FTS5 rebuild after repair
- MemPalace#1548 FTS5 validation at end of mine
- plus MemPalace#1216, MemPalace#1408, MemPalace#1438, MemPalace#1439, MemPalace#1445, MemPalace#1452, MemPalace#1459, MemPalace#1461, MemPalace#1466,
  MemPalace#1470, MemPalace#1477, MemPalace#1485, MemPalace#1500, MemPalace#1513, MemPalace#1528, MemPalace#1532, MemPalace#1543, MemPalace#1546, MemPalace#1585

Performance:
- MemPalace#1474 convo miner pre-fetches mined-set
- MemPalace#1487 rebuild_index progress callback
- MemPalace#1530 MCP cold-start diagnostics + opt-in warmup

Lint passes (ruff 0.15.14); mempalace-mcp entry point alignment
verified per RELEASING.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mempalace mine reports 'Done ✓' with full drawer counts when the mine produced an FTS5-corrupt palace

2 participants