Comparing changes

…t (4 bugs) (#11) * fix: threading crash, duplicate symbols, logging, and embedding insert Four bugs found while indexing openclaw/openclaw (17,212 source files, 945 doc files) on an RTX 5060 Ti. The repo is a large TypeScript/Swift/ Kotlin monorepo (~17k files across 60+ extensions). All bugs surface only at scale and were invisible in small test cases. --- Bug 1: cross-thread SQLite access crashes ~30% of file parses _parse_file_for_indexing ran inside ThreadPoolExecutor workers and called db.execute() on the shared main-thread connection. This caused: sqlite3.InterfaceError: bad parameter or other API misuse on roughly 30% of files, even though the connection was opened with check_same_thread=False. Python's sqlite3 binding is not safe for concurrent access without explicit locking. Fix: pre-fetch all existing file records into a dict[path → mtime] in the main thread before launching the pool. Workers receive the dict and do a dict.get() lookup instead of a DB query. No DB access in any worker thread. --- Bug 2: duplicate symbols from tree-sitter AST crash DB write tree-sitter can produce multiple symbols with the same (name, kind, line_start) for a single file. The plain INSERT INTO symbols raised: sqlite3.IntegrityError: UNIQUE constraint failed: symbols.file_id, symbols.name, symbols.kind, symbols.line_start This killed the entire DB write phase after all parsing and GPU embedding had already completed — wasting the entire indexing run. Fix: INSERT OR IGNORE INTO symbols. Use cursor.rowcount == 1 to detect whether the insert actually happened. cursor.lastrowid is NOT reliable here — after a no-op INSERT OR IGNORE it retains the rowid from the previous successful insert on the same connection, not 0. --- Bug 3: embedding insert crashes on sqlite-vec virtual table After the Bug 2 fix, a duplicate symbol falls through to a SELECT that returns the existing symbol_id. That ID already has an entry in symbol_embeddings (a sqlite-vec virtual table). Attempting to insert another embedding for it raised: sqlite3.OperationalError: UNIQUE constraint failed on symbol_embeddings primary key INSERT OR IGNORE does not work on sqlite-vec virtual tables — the conflict-resolution clause is rejected at the SQL level (OperationalError instead of the usual IntegrityError). Fix: guard embedding_pairs.append() with `if is_new` — only freshly inserted symbols get embeddings queued. Existing symbols already have one. --- Bug 4: logger.exception() reports all errors as "NoneType: None" Exceptions from worker threads are stored as return values: return (fpath, None, e) Then in the main thread: logger.exception("Failed to index %s", fpath) logger.exception() reads sys.exc_info() — the current thread's exception context — which is (None, None, None) since the exception occurred in a different thread. Every failure logged as "NoneType: None" with no traceback, making Bug 1 completely invisible. Fix: logger.error("Failed to index %s", fpath, exc_info=error) --- Tested against openclaw/openclaw: Before: ~30% of files silently skipped; DB write crash on first run After: 17,212/17,212 code files indexed, 111,000 symbols, 750 MB DB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: rename last_file_indexed to last_code_indexed and return ISO string get_index_stats was returning `last_file_indexed` (a raw float Unix timestamp) in the freshness dict, but api_types.py defines the field as `last_code_indexed: str | None`. This caused a Pydantic validation error in MCP clients that validate tool output against the schema. Two changes in get_index_stats(): - Rename key from `last_file_indexed` to `last_code_indexed` - Convert float timestamp to ISO-8601 string via datetime.fromtimestamp().isoformat() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

PR #11 left an unsorted import block in db.py (`from datetime import datetime` placed among the plain `import` statements), which fails `ruff check` (I001) and broke CI on main. Move it into the sorted from-import group. Bump version 1.0.32 -> 1.0.33 in pyproject.toml, server.json (x2), and uv.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on May 20, 2026

This comparison is taking too long to generate.

Uh oh!