Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: kapillamba4/code-memory
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.32
Choose a base ref
...
head repository: kapillamba4/code-memory
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v1.0.33
Choose a head ref
  • 2 commits
  • 5 files changed
  • 3 contributors

Commits on May 20, 2026

  1. fix: threading crash, duplicate symbols, logging, and embedding inser…

    …t (4 bugs) (#11)
    
    * fix: threading crash, duplicate symbols, logging, and embedding insert
    
    Four bugs found while indexing openclaw/openclaw (17,212 source files,
    945 doc files) on an RTX 5060 Ti. The repo is a large TypeScript/Swift/
    Kotlin monorepo (~17k files across 60+ extensions). All bugs surface only
    at scale and were invisible in small test cases.
    
    ---
    
    Bug 1: cross-thread SQLite access crashes ~30% of file parses
    
    _parse_file_for_indexing ran inside ThreadPoolExecutor workers and called
    db.execute() on the shared main-thread connection. This caused:
    
      sqlite3.InterfaceError: bad parameter or other API misuse
    
    on roughly 30% of files, even though the connection was opened with
    check_same_thread=False. Python's sqlite3 binding is not safe for
    concurrent access without explicit locking.
    
    Fix: pre-fetch all existing file records into a dict[path → mtime] in the
    main thread before launching the pool. Workers receive the dict and do a
    dict.get() lookup instead of a DB query. No DB access in any worker thread.
    
    ---
    
    Bug 2: duplicate symbols from tree-sitter AST crash DB write
    
    tree-sitter can produce multiple symbols with the same (name, kind,
    line_start) for a single file. The plain INSERT INTO symbols raised:
    
      sqlite3.IntegrityError: UNIQUE constraint failed:
        symbols.file_id, symbols.name, symbols.kind, symbols.line_start
    
    This killed the entire DB write phase after all parsing and GPU embedding
    had already completed — wasting the entire indexing run.
    
    Fix: INSERT OR IGNORE INTO symbols. Use cursor.rowcount == 1 to detect
    whether the insert actually happened. cursor.lastrowid is NOT reliable
    here — after a no-op INSERT OR IGNORE it retains the rowid from the
    previous successful insert on the same connection, not 0.
    
    ---
    
    Bug 3: embedding insert crashes on sqlite-vec virtual table
    
    After the Bug 2 fix, a duplicate symbol falls through to a SELECT that
    returns the existing symbol_id. That ID already has an entry in
    symbol_embeddings (a sqlite-vec virtual table). Attempting to insert
    another embedding for it raised:
    
      sqlite3.OperationalError: UNIQUE constraint failed on
        symbol_embeddings primary key
    
    INSERT OR IGNORE does not work on sqlite-vec virtual tables — the
    conflict-resolution clause is rejected at the SQL level (OperationalError
    instead of the usual IntegrityError).
    
    Fix: guard embedding_pairs.append() with `if is_new` — only freshly
    inserted symbols get embeddings queued. Existing symbols already have one.
    
    ---
    
    Bug 4: logger.exception() reports all errors as "NoneType: None"
    
    Exceptions from worker threads are stored as return values:
      return (fpath, None, e)
    
    Then in the main thread:
      logger.exception("Failed to index %s", fpath)
    
    logger.exception() reads sys.exc_info() — the current thread's exception
    context — which is (None, None, None) since the exception occurred in a
    different thread. Every failure logged as "NoneType: None" with no
    traceback, making Bug 1 completely invisible.
    
    Fix: logger.error("Failed to index %s", fpath, exc_info=error)
    
    ---
    
    Tested against openclaw/openclaw:
      Before: ~30% of files silently skipped; DB write crash on first run
      After:  17,212/17,212 code files indexed, 111,000 symbols, 750 MB DB
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    
    * fix: rename last_file_indexed to last_code_indexed and return ISO string
    
    get_index_stats was returning `last_file_indexed` (a raw float Unix
    timestamp) in the freshness dict, but api_types.py defines the field
    as `last_code_indexed: str | None`. This caused a Pydantic validation
    error in MCP clients that validate tool output against the schema.
    
    Two changes in get_index_stats():
    - Rename key from `last_file_indexed` to `last_code_indexed`
    - Convert float timestamp to ISO-8601 string via datetime.fromtimestamp().isoformat()
    
    Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
    
    ---------
    
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    jimdawdy-hub and claude authored May 20, 2026
    Configuration menu
    Copy the full SHA
    a9a4be0 View commit details
    Browse the repository at this point in the history
  2. Fix import-ordering lint error and bump version to 1.0.33

    PR #11 left an unsorted import block in db.py (`from datetime import
    datetime` placed among the plain `import` statements), which fails
    `ruff check` (I001) and broke CI on main. Move it into the sorted
    from-import group.
    
    Bump version 1.0.32 -> 1.0.33 in pyproject.toml, server.json (x2),
    and uv.lock.
    
    Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
    kapillamba4 and claude committed May 20, 2026
    Configuration menu
    Copy the full SHA
    5a8db16 View commit details
    Browse the repository at this point in the history
Loading