Skip to content

perf: snapshot fast-load eagerly builds the symbol index — 33% of load time and ~43MB heap that plain search never uses #564

@justrach

Description

@justrach

Problem

loadSnapshotFast Pass C calls rebuildSymbolIndexFor for every restored file. Measured on openclaw (13,654 files, ReleaseFast, warm cache) with the new CODEDB_LOAD_PROFILE sub-phase attribution:

  • symidx = 18–20ms of a ~60ms load (the single largest insert cost)
  • Pass C heap growth +62.5MB, one-shot codedb search = 244MB max RSS / 132.7MB phys footprint

Plain content search never reads symbol_index — every one-shot CLI search pays the inserts and heap for nothing. The codebase already has the lazy pattern for exactly this (word index: word_index_complete + rebuild-on-first-use, #539).

Failing Test

test "issue-564: snapshot fast-load defers the symbol index until first symbol use" (src/test_snapshot.zig): after loadSnapshot, symbol_index.count() must be 0, and the first findAllSymbols must build it on demand and answer correctly. Fails on release/0.2.5825 (eager build → count > 0).

Fix

Defer: markSymbolIndexIncomplete() before Pass C; rebuildSymbolIndexFor no-ops while incomplete; ensureSymbolIndex() (mirrors the lazy word-index rebuild) at every reader entry — findSymbol/findAllSymbols/searchSymbols/renderSymbols/resolveCallees/findCallPath/buildCallCentrality — and an incomplete-guard in ensureCallGraph so an empty graph is never built and cached.

Measured after (same machine/protocol)

  • load ~60ms → ~40ms; symidx 18ms → 0.2ms
  • Pass C heap +62.5MB → +20.5MB
  • one-shot search: 244MB → 200MB max RSS; footprint 132.7MB → 89.2MB (−33%)
  • codedb symbol GatewayClient correct on first (lazy) use; 537b call-edge test green

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions