Skip to content

fix(miner): harden Windows mine against ONNX bad_alloc + silent partial exits (#1296)#1402

Merged
igorls merged 1 commit into
developfrom
fix/1296-windows-mine-resilience
May 7, 2026
Merged

fix(miner): harden Windows mine against ONNX bad_alloc + silent partial exits (#1296)#1402
igorls merged 1 commit into
developfrom
fix/1296-windows-mine-resilience

Conversation

@igorls

@igorls igorls commented May 7, 2026

Copy link
Copy Markdown
Member

Three small changes that together address the failure modes in #1296.

Summary

  1. pnpm-lock.yaml and yarn.lock added to SKIP_FILENAMES — mirrors the existing package-lock.json rule. A 24K-line pnpm-lock.yaml produced ~1124 chunks in one batch and tripped onnxruntime bad_alloc on Windows in the original report.
  2. MAX_CHUNKS_PER_FILE = 500 cap — any file producing more than the cap is skipped with a clear log line. Catches the broader class (generated CSV/JSON, build artifacts) that a named-file skip list will never fully cover. Conservative budget: 500 chunks × 800 chars ≈ 400 KB of source, so legitimate hand-written content still mines.
  3. _mine_impl now prints a summary on any exception, not just KeyboardInterrupt — without this, an arbitrary exception (ONNX bad_alloc, chromadb HNSW error, OS fault) propagated silently and the operator saw only the last progress line, assuming the mine succeeded (mine fails on Windows: HNSW index corruption + ONNX bad allocation on large files #1296 Failure 2). The new path mirrors the KeyboardInterrupt summary plus the exception class + message, then re-raises so the original traceback surfaces and the exit code is non-zero.

Tests

Three additions in tests/test_miner.py:

  • test_skip_filenames_includes_lockfiles — pin the constant
  • test_process_file_skips_when_chunks_exceed_maxprocess_file returns (0, room) with no upserts when chunks exceed the cap
  • test_mine_arbitrary_exception_prints_summary_and_reraisesRuntimeError mid-mine surfaces the summary banner with files_processed / drawers_filed / last_file / exception class, then re-raises

Local: 1596 tests pass, ruff lint + format clean against the 0.4.x CI pin.

Out of scope

Closes #1296

…al exits

Three small changes that together address the failure modes in #1296:

1. Add pnpm-lock.yaml and yarn.lock to SKIP_FILENAMES, mirroring the
   existing package-lock.json rule. A 24K-line pnpm-lock.yaml produced
   ~1124 chunks in one batch and tripped onnxruntime bad_alloc on
   Windows; pnpm/yarn lockfiles are no more useful to mine than npm's.

2. Skip any file that produces more than MAX_CHUNKS_PER_FILE (500)
   chunks, with a clear log line. Catches the broader class — generated
   CSV/JSON, build artifacts, etc. — that the named-file SKIP list will
   never fully cover. The cap is conservative (500 chunks * 800 chars ≈
   400 KB of source) so legitimate hand-written content still mines.

3. Print a partial-progress summary on any exception in _mine_impl, not
   just KeyboardInterrupt, then re-raise. Without this, an arbitrary
   exception (ONNX bad_alloc, chromadb HNSW error, OS fault) propagates
   silently — the operator sees only the last progress line and assumes
   the mine succeeded. The new path mirrors the KeyboardInterrupt
   summary (files_processed, drawers_filed, last_file) plus the
   exception type and message, then re-raises so the original traceback
   surfaces and the exit code is non-zero.

Tests cover: SKIP_FILENAMES contents, the chunk-cap path returning
(0, room) with no upserts, and the new mine-aborted summary surfacing
both the partial counters and the exception class.
Copilot AI review requested due to automatic review settings May 7, 2026 11:57
@igorls igorls added this to the v3.3.5 milestone May 7, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the project mining pipeline against Windows-specific failure modes reported in #1296 by reducing pathological per-file ingestion size and ensuring partial-progress summaries are printed when mining aborts unexpectedly.

Changes:

  • Extend SKIP_FILENAMES to exclude additional JS lockfiles (pnpm-lock.yaml, yarn.lock).
  • Add MAX_CHUNKS_PER_FILE to skip files that generate an excessive number of chunks.
  • Print an abort summary for non-KeyboardInterrupt exceptions (then re-raise), and add tests covering the new behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
mempalace/miner.py Adds lockfile skips, a per-file chunk cap, and an exception abort summary during mining.
tests/test_miner.py Adds tests to pin the new skip list entries, chunk-cap behavior, and exception-summary behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/miner.py
Comment on lines 835 to +839
chunks = chunk_text(content, source_file)

if len(chunks) > MAX_CHUNKS_PER_FILE:
print(
f" ! [skip] {filepath.name[:50]:50} produced {len(chunks)} chunks "
Comment thread mempalace/miner.py
Comment on lines +1194 to +1197
print(f" files_processed: {files_processed}/{len(files)}")
print(f" drawers_filed: {total_drawers}")
print(f" last_file: {last_file or '<none>'}")
print(f" error: {type(exc).__name__}: {exc}")
Comment thread tests/test_miner.py
Comment on lines +713 to +718
def test_process_file_skips_when_chunks_exceed_max(tmp_path, monkeypatch):
"""A file producing more than MAX_CHUNKS_PER_FILE chunks must be skipped
with a clear message and zero upserts. Generated artifacts (CSVs, lock
files not in SKIP_FILENAMES) hit this — the cap is what prevents ONNX
bad_alloc on Windows when the embedder is asked to swallow thousands of
chunks in one batch (#1296)."""
@igorls igorls merged commit 52c70c9 into develop May 7, 2026
10 checks passed
@igorls igorls deleted the fix/1296-windows-mine-resilience branch May 7, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

mine fails on Windows: HNSW index corruption + ONNX bad allocation on large files

2 participants