-
Notifications
You must be signed in to change notification settings - Fork 10
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: kapillamba4/code-memory
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.32
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: kapillamba4/code-memory
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.0.33
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 2 commits
- 5 files changed
- 3 contributors
Commits on May 20, 2026
-
fix: threading crash, duplicate symbols, logging, and embedding inser…
…t (4 bugs) (#11) * fix: threading crash, duplicate symbols, logging, and embedding insert Four bugs found while indexing openclaw/openclaw (17,212 source files, 945 doc files) on an RTX 5060 Ti. The repo is a large TypeScript/Swift/ Kotlin monorepo (~17k files across 60+ extensions). All bugs surface only at scale and were invisible in small test cases. --- Bug 1: cross-thread SQLite access crashes ~30% of file parses _parse_file_for_indexing ran inside ThreadPoolExecutor workers and called db.execute() on the shared main-thread connection. This caused: sqlite3.InterfaceError: bad parameter or other API misuse on roughly 30% of files, even though the connection was opened with check_same_thread=False. Python's sqlite3 binding is not safe for concurrent access without explicit locking. Fix: pre-fetch all existing file records into a dict[path → mtime] in the main thread before launching the pool. Workers receive the dict and do a dict.get() lookup instead of a DB query. No DB access in any worker thread. --- Bug 2: duplicate symbols from tree-sitter AST crash DB write tree-sitter can produce multiple symbols with the same (name, kind, line_start) for a single file. The plain INSERT INTO symbols raised: sqlite3.IntegrityError: UNIQUE constraint failed: symbols.file_id, symbols.name, symbols.kind, symbols.line_start This killed the entire DB write phase after all parsing and GPU embedding had already completed — wasting the entire indexing run. Fix: INSERT OR IGNORE INTO symbols. Use cursor.rowcount == 1 to detect whether the insert actually happened. cursor.lastrowid is NOT reliable here — after a no-op INSERT OR IGNORE it retains the rowid from the previous successful insert on the same connection, not 0. --- Bug 3: embedding insert crashes on sqlite-vec virtual table After the Bug 2 fix, a duplicate symbol falls through to a SELECT that returns the existing symbol_id. That ID already has an entry in symbol_embeddings (a sqlite-vec virtual table). Attempting to insert another embedding for it raised: sqlite3.OperationalError: UNIQUE constraint failed on symbol_embeddings primary key INSERT OR IGNORE does not work on sqlite-vec virtual tables — the conflict-resolution clause is rejected at the SQL level (OperationalError instead of the usual IntegrityError). Fix: guard embedding_pairs.append() with `if is_new` — only freshly inserted symbols get embeddings queued. Existing symbols already have one. --- Bug 4: logger.exception() reports all errors as "NoneType: None" Exceptions from worker threads are stored as return values: return (fpath, None, e) Then in the main thread: logger.exception("Failed to index %s", fpath) logger.exception() reads sys.exc_info() — the current thread's exception context — which is (None, None, None) since the exception occurred in a different thread. Every failure logged as "NoneType: None" with no traceback, making Bug 1 completely invisible. Fix: logger.error("Failed to index %s", fpath, exc_info=error) --- Tested against openclaw/openclaw: Before: ~30% of files silently skipped; DB write crash on first run After: 17,212/17,212 code files indexed, 111,000 symbols, 750 MB DB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: rename last_file_indexed to last_code_indexed and return ISO string get_index_stats was returning `last_file_indexed` (a raw float Unix timestamp) in the freshness dict, but api_types.py defines the field as `last_code_indexed: str | None`. This caused a Pydantic validation error in MCP clients that validate tool output against the schema. Two changes in get_index_stats(): - Rename key from `last_file_indexed` to `last_code_indexed` - Convert float timestamp to ISO-8601 string via datetime.fromtimestamp().isoformat() Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for a9a4be0 - Browse repository at this point
Copy the full SHA a9a4be0View commit details -
Fix import-ordering lint error and bump version to 1.0.33
PR #11 left an unsorted import block in db.py (`from datetime import datetime` placed among the plain `import` statements), which fails `ruff check` (I001) and broke CI on main. Move it into the sorted from-import group. Bump version 1.0.32 -> 1.0.33 in pyproject.toml, server.json (x2), and uv.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configuration menu - View commit details
-
Copy full SHA for 5a8db16 - Browse repository at this point
Copy the full SHA 5a8db16View commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v1.0.32...v1.0.33