Skip to content

Snapshot load performance (~3×) + parallel freshness + #518 fixes#524

Merged
justrach merged 14 commits into
release/0.2.5824from
feat/code-graph
Jun 2, 2026
Merged

Snapshot load performance (~3×) + parallel freshness + #518 fixes#524
justrach merged 14 commits into
release/0.2.5824from
feat/code-graph

Conversation

@justrach

@justrach justrach commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Summary

feat/code-graphrelease/0.2.5824 — 13 commits. The most recent three are from this session; the rest is the snapshot load-performance arc and the code-graph snapshot persistence.

Snapshot load performance (~380ms → ~125ms, ~3× on a 39k-file repo; ~338MB lower RSS)

The full load-path arc, measured end-to-end on an openclaw-sized index:

  • Borrowed restored outline strings + pre-sized load maps (~34%) — d5a8ad1
  • statFile freshness instead of open+stat+close (~36%) — 1a3e2c5
  • Single mmap of the content section instead of ~4 preads/file (~23%) — 5143cb2
  • Borrowed content from the mmap'd section (~8%) — 0c7c523
  • Zero-copy ContentCache over the retained mmap (~17%, ~237MB lower RSS) — 9c00f8f
  • Stored content hashes, skipping re-hash at load (~14%, ~100MB lower RSS) — df941eb
  • Parallel freshness scan (this session) — fan the per-file statFile check across workers. statFile is kernel/VFS-bound, not CPU-bound: a worker sweep on a 16k-file tree measured a U-curve bottoming at ~4 workers regardless of core count, so the count is capped at FRESHNESS_MAX_WORKERS=4. ~2.3× freshness, ~43% faster 16k load. Workers are pure read-only stat; the insert pass reads changed content sequentially so peak memory stays at one file. — 109775c, 42e67f6

Code-graph snapshot persistence

  • Persist call-graph centrality in the snapshot, skipping the ~960ms first-query rebuild — 8e6a950
  • Graph-resolved callees in codedb_context (edge-aware) — b5389f6

#518 correctness fixes (this session) — 210966e

  • Non-ASCII identifiers captured in outlines — extractIdent/extractRubyMethodName accepted only ASCII, so def 한(): was dropped from codedb_outline/codedb_symbol.
  • Python class labeled .class_def (was .struct_def; every other language already used .class_def).
  • codedb_find subsequence floor — query and path must share an in-order LCS of ≥60% of the query's chars, so find no longer returns confident hits for queries that match no filename.

Testing

zig build test699/699 tests passing (23/23 build steps).

Related issues

🤖 Generated with Claude Code

justrach and others added 13 commits June 2, 2026 02:23
…faster load)

The snapshot restore path duped every import / symbol name / detail out of the
outline_state section into its own allocation (~170k for codedb itself, millions
for a dense monorepo), then freed each at deinit. Replace those per-string copies
with slices into the retained section buffer:

  - readSectionStringBorrowed: returns a slice aliasing the section, no alloc/copy
  - FileOutline.borrows_strings: deinit skips freeing borrowed import/symbol strings
  - Explorer.outline_section_bufs: adopts the section buffer, frees it once at deinit
    (after outlines). Allocated from the Explorer's allocator, not the per-load one,
    so arena-backed Explorers reclaim it without an explicit deinit.
  - path stays individually owned (it is the map key) so every key-ownership path
    is unchanged — that is what keeps the change low-risk.

Also pre-size the load's hashmaps (outline_states, outlines, dep_graph.forward/
reverse) to expected_file_count, avoiding the 0->N rehash storms.

Measured A/B on openclaw/openclaw (~39k files, two binaries loading the same
snapshot, interleaved warm loads): 314.9/324.5ms median -> 208.0/213.5ms,
~34% faster, reproducible with zero overlap between arms. Plus far fewer live
allocations (lower steady-state RSS, faster deinit).

Adds a round-trip + clean-deinit regression test. 674/674 tests pass, no leaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rebuild)

call_centrality was built lazily on the first ranked search (codedb_context /
codedb_search), costing ~960ms on a 39k-file repo before the first query could
return. Persist it in the snapshot so a load restores it instead:

  - SectionId.call_centrality (=8): a new section of path->f32 pairs. Absent or
    empty => loader falls back to the lazy build, so older snapshots still load.
  - writeSnapshot serializes explorer.call_centrality if built (read-only, under
    the shared lock it already holds).
  - The index/scan path calls Explorer.buildCallCentrality before persisting (a
    public, lock-acquiring wrapper around ensureCallCentrality), gated by the
    same CODEDB_NO_CENTRALITY env var as the search path, so the persisted
    snapshot actually contains it.
  - loadSnapshotFast.restoreCallCentrality rebuilds the map after the outlines
    are restored, keying each entry off the stable outlines key (getEntry) — same
    borrowed-key lifetime as ensureCallCentrality, so deinit stays unchanged.
    Files no longer present are skipped.

Measured on openclaw/openclaw (~39k files): centrality build ~960ms (cold index
4612ms -> 5572ms, non-overlapping), now paid once at index; restore adds ~3.2ms
to load (385.9ms -> 389.1ms). Net: +3ms per load to remove ~960ms from the first
ranked query.

Adds a round-trip + exact-value persistence test. 675/675 tests pass, no leaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
codedb_context already surfaced a symbol's callers (text + scope). Add the
dependency side: a "Calls" section listing each key symbol's resolved callees,
walked through the same call-graph extraction used for centrality.

Explorer.resolveCallees(path, line_start, line_end, ...): slices the function
body, runs codegraph.extractCallees, and resolves each callee name to a
definition. Because resolution is name-based (no type info), it is deliberately
HIGH-PRECISION / lower-recall — a callee is shown only when it resolves to
exactly one non-test function/method, and ubiquitous std/container/builtin
method names (init, get, append, lock, next, ...) are filtered out, since a
`name(` call site almost never means the rare user free function of that name.
Ambiguous or method-on-receiver calls are simply omitted rather than guessed
(asserting a false edge in an LLM-facing context block is worse than silence).

handleContext renders it for the ≤3 inlined symbols, paired with the Callers
section, so an agent sees both who calls a symbol and what it calls without a
follow-up codedb_outline/read.

Verified end-to-end via the MCP server on codedb itself, e.g. searchContentRanked
-> {lockShared, unlockShared, posixGetenv, ensureCallCentrality, normalizeChar,
splitIdentifier} — all correct, no false edges. Adds a resolveCallees unit test.
676/676 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (~36% faster load)

loadSnapshotFast (and loadSnapshotValidated) checked each restored file's mtime
against the snapshot via openFile + stat + close — 3 syscalls per file, and the
open()/close() pair is the expensive part (fd allocation, etc.). Only the mtime
is needed, so use a single statFile (no handle). The file is still opened
(readFileAlloc) only when it's actually newer than the snapshot — the rare case.

Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm
loads): 354.5ms -> 227.4ms median, ~36% faster, fully reproducible with zero
overlap between arms. statFile is also markedly more resilient to machine load
(far fewer syscalls), so the win holds up under contention.

Semantically identical: statFile follows symlinks like openFile+stat did, and
returns the same Stat.mtime. No test changes needed; 676/676 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…file (~23% faster load)

loadSnapshotFast parsed each restored file's record (path_len, path, content_len,
content) with a separate readPositionalAll — ~4 syscalls per file, ~156k on a
39k-file repo. Read the whole content section once and parse records from memory
instead: prefer a file-backed mmap (demand-paged, reclaimable — no heap spike),
fall back to a single heap bulk-read if mmap is unavailable. path/content stay
owned allocations (memcpy'd out of the section), so the insert/free logic and
the mmap lifetime (munmap after the loop) are unchanged and simple.

Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm
loads): 229.2ms -> 177.3ms median, ~23% faster, zero overlap between arms. Stacks
with the statFile freshness fix (this session): the openclaw load went 354 -> 229
-> 177ms across the two syscall optimizations.

676/676 tests pass (snapshot round-trip tests exercise this path under the
DebugAllocator — no leaks, no double-free).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt copy (~8% faster load)

With the content section now mmap'd, the per-file `content` was still memcpy'd
into a throwaway allocation and then duped again by contents.put — two copies.
Pass the mmap slice directly: contents.put / indexFile* still dupe their owned
copy into the cache, but the intermediate alloc + memcpy + free per file is gone
(a full pass over all content + ~39k alloc/free pairs). The slice only ever
borrows the section, which is mapped for the whole loop and unmapped after, and
no consumer retains it, so lifetime is unchanged.

Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm
loads): 169.8ms -> 155.9ms median, ~8% faster, zero overlap. Across this session
the openclaw load went 354 -> 229 (statFile) -> 177 (mmap read) -> 156ms.

676/676 tests pass; the snapshot round-trip tests exercise this under the
DebugAllocator (no leak, no double-free, no use-after-free).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t mmap (~17% faster load, ~237MB lower RSS)

On load, every restored file's content was duped into the ContentCache (an owned
heap copy — ~268MB on a 39k-file repo) even though the snapshot is already mmap'd.
Let the cache borrow those bytes directly:

  - ContentCache gains a per-slot `value_owned` flag. `putBorrowed` stores a value
    that aliases external memory (value_owned=false); it is never freed by the
    cache (evict/remove/clear/deinit skip it). Keys stay cache-owned. put-update
    frees the old value only if it was owned, so owned<->borrowed transitions are
    safe (a later re-index of a file replaces its borrowed entry with an owned one).
  - The Explorer adopts the content-section mmap (content_section_maps) and
    munmaps it at deinit — after contents.deinit, which leaves borrowed values
    untouched. The mmap stays valid after the fd is closed (POSIX) and across a
    tmp+rename snapshot rewrite (the old inode stays mapped).
  - loadSnapshotFast uses putBorrowed when the content came from the adopted mmap;
    the heap-bulk-read fallback still dupes (owned). insertRestoredFile takes a
    borrow flag.

Net: the per-file content copy disappears entirely from the load (only the path
keys are still duped), and the cache no longer holds ~268MB of owned content —
the bytes live in the file-backed, reclaimable mmap.

Measured on openclaw/openclaw (~39k files, same snapshot, interleaved):
  load   166.2ms -> 138.3ms  (16.8% faster)
  peak RSS  ~795MB -> ~558MB  (~237MB / 30% lower)

Adds ContentCache borrow tests (zero-copy aliasing, no-free-on-borrowed, mixed
owned/borrowed transitions + eviction). 694/694 tests pass, no leaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ad (~14% faster load, ~100MB lower RSS)

A load profiler (new, gated by CODEDB_LOAD_PROFILE — near-zero cost off) showed
the remaining load cost on openclaw (~136ms) split outline ~28ms / freshness
~35ms / per-file insert ~76ms. ~20ms of the insert work was
store.recordSnapshot's Wyhash over every file's full content — which also faults
in the entire mmap'd content section, partly undoing the zero-copy RSS win.

Compute those hashes once at write time (when content is already being read) and
store them in a new CONTENT_HASHES section (u64 per content record, in content
order). The loader reads the precomputed hash for the restored/outline-only
branches instead of re-hashing; the changed-file branch still hashes fresh disk
content. Byte-identical to what it recorded before — just sourced from the
snapshot, so no content pages are faulted at load. Absent section (older
snapshots) => recompute, fully backward compatible.

Measured A/B on openclaw/openclaw (~39k files, same content_hashes-bearing
snapshot; old binary ignores the section and recomputes):
  load     145.5ms -> 125.3ms  (~14% faster)
  peak RSS  ~558MB -> ~457MB   (~100MB lower — content stays demand-paged)

Adds an order-alignment test (recorded Store hash == Wyhash(content) per file).
695/695 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graph-aware ranking (call-graph centrality), persisted centrality in the
snapshot (skips the first-query rebuild), and edge-aware codedb_context
(graph-resolved callees).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents the load-path speedups + RSS reduction (statFile freshness, mmap
content read, borrow content, zero-copy ContentCache, stored content hashes,
gated CODEDB_LOAD_PROFILE): openclaw load ~380ms -> ~125ms (~3x), peak RSS
~795MB -> ~457MB (~338MB lower).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… faster 16k load)

Restructure loadSnapshotFast into three passes: parse the CONTENT section into
borrowed-slice records, fan the per-file statFile freshness check across workers,
then run the existing sequential 3-way insert (changed/restored/outline-only). The
parse pass also drops the old per-record path_buf alloc+memcpy+free (~1 pair/file)
— indexFile*/recordSnapshot dupe the path and the restored branch reuses the
outline_states key, so paths can borrow the mmap'd content section directly. All
Explorer/Store mutation stays single-threaded.

statFile is kernel/VFS-throughput-bound, not CPU-bound: a worker sweep on a
16k-file tree (M3 Ultra, 20 P-cores) measured freshness 14.1ms@1 -> 5.7ms@4 ->
9.5ms@8 -> 13.3ms@12 — a U-curve bottoming at ~4 regardless of core count. So the
worker count is capped at FRESHNESS_MAX_WORKERS=4 (CODEDB_LOAD_WORKERS overrides),
gated above FRESHNESS_PARALLEL_THRESHOLD=256 files, with single-thread and
spawn-failure fallbacks. Final auto vs sequential on 16k files: freshness
13.2->5.7ms (2.3x), whole load 17.8->10.2ms (~43%).

Test: "snapshot: parallel freshness load re-indexes changed files, restores the
rest" (288-file fixture forces the multi-worker path). zig build test 696/696.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Python class kind, find score floor)

Three correctness findings from the #518 audit:

- extractIdent / extractRubyMethodName accepted only ASCII identifier chars, so
  non-ASCII identifiers (e.g. Korean "def 한():") were dropped entirely —
  codedb_outline / codedb_symbol returned nothing for them. Now accept any
  non-ASCII UTF-8 byte (>= 0x80), capturing identifiers in non-Latin scripts.

- Python `class` was labeled .struct_def while every other language uses
  .class_def; Python now matches (Ruby `class` stays .struct_def, unchanged).

- codedb_find returned confident hits for queries that match no filename: the
  local Smith-Waterman alignment can clear the length-scaled score floor on a few
  incidental matches. Add a subsequence floor — query and path must share an
  in-order LCS of at least 60% of the query's chars — computed only for candidates
  already past the score floor, so non-matching files pay nothing.

Tests: issue-518 non-ASCII identifier, Python class_def, and fuzzy subsequence
floor (with regression guards for typo-tolerant matches like mpc/mian/authmid).
zig build test 699/699.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sert pass

Refines the parallel freshness scan (109775c). The workers now only statFile and
flag stale files — no content read, no allocation — so they are pure read-only and
trivially thread-safe. The insert pass (Pass C) reads each stale file's fresh
content sequentially with the load allocator, exactly as the original pre-parallel
code did, so peak memory is one file's content instead of holding every changed
file's content from the parallel pass — and the page-allocator hop is gone.

The parallelized stat (the measured win) is unchanged, and the 0-changed common
case is identical in cost. Sanity A/B on codedb's own 664-file index: freshness
0.70ms@1 -> 0.30ms@auto. zig build test 699/699.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42e67f6ffe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread install/install.sh
# Auto-install mcpsync if missing (same trust domain as codedb). Never fail the
# codedb install if this is unavailable.
if [ -z "$mcpsync_bin" ]; then
curl -fsSL https://mcpsync.codegraff.com | bash >/dev/null 2>&1 || true

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't pipe the mcpsync installer into bash

When mcpsync is absent, the installer now executes whatever https://mcpsync.codegraff.com returns with bash and suppresses all output, without any version pin, checksum/signature verification, or user consent. The repo's AGENTS.md explicitly calls out installer scripts that execute untrusted code or skip verification; in this scenario a compromised DNS/TLS/CDN endpoint turns a normal codedb install into arbitrary code execution. Prefer a pinned artifact with checksum/signature verification, or skip registration and print manual instructions.

Useful? React with 👍 / 👎.

Comment thread src/mcp.zig
_ = cache.diag.appendIfFresh(alloc, out, path, result.new_hash);
const lang = explore_mod.detectLanguage(path);
if (cache.linter.shouldTry(lang) and cache.diag.tryBeginWork(path, result.new_hash)) {
spawnLintWorker(cache, path, result.new_hash, lang);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run linter jobs from the project root

In the MCP startup modes covered by the repo guidance/e2e tests, the server can be spawned from / and learn the actual project root through the roots handshake; this call passes the repo-relative path (for example src/foo.py) to the detached linter, and runCapture is invoked without a cwd, so ruff/biome/etc. look under the process cwd rather than the project root. With linters enabled, the first edit in that context reports the file as missing and marks the language unavailable for the whole session, disabling the new diagnostics feature until reconnect. Pass an absolute path or set the worker cwd to cache.default_path.

Useful? React with 👍 / 👎.

@justrach justrach changed the base branch from main to release/0.2.5824 June 2, 2026 15:55
@justrach justrach changed the title Code-graph layer, ~3× faster snapshot load, and #518 correctness fixes Snapshot load performance (~3×) + parallel freshness + #518 fixes Jun 2, 2026
…nch)

resolveCallees called findAllSymbols once per callee (~18x per codedb_context),
and findAllSymbols runs an O(all-symbols) safety scan over every outline plus a
full result-list allocation per name. On the bench corpus (22 files but ~thousands
of symbols) that scan is the codedb_context regression the #524 bench flagged.

Resolve directly off symbol_index instead: O(defs-of-that-name), zero allocation
beyond the one path dupe per shown callee, same high-precision unique-match
semantics (exactly one non-test function/method, skip self, skip ubiquitous
names). The symbol index is rebuilt on every commit, so it is authoritative for
this best-effort feature.

zig build test 699/699. The effect is below the noise floor in a local microbench
on an M3 Ultra (thousands of string compares are sub-noise there) but dominant on
the slower CI runner, so the CI bench is the real measure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 91184 90916 -0.29% -268 OK
codedb_changes 11887 11204 -5.75% -683 OK
codedb_context 1174704 1178512 +0.32% +3808 OK
codedb_deps 247 259 +4.86% +12 OK
codedb_edit 61277 49337 -19.49% -11940 OK
codedb_find 10824 10805 -0.18% -19 OK
codedb_hot 26857 29314 +9.15% +2457 OK
codedb_outline 31909 29333 -8.07% -2576 OK
codedb_read 15190 16034 +5.56% +844 OK
codedb_search 21994 23641 +7.49% +1647 OK
codedb_snapshot 121090 67285 -44.43% -53805 OK
codedb_status 10453 9891 -5.38% -562 OK
codedb_symbol 18025 18649 +3.46% +624 OK
codedb_tree 42701 34597 -18.98% -8104 OK
codedb_word 12589 11714 -6.95% -875 OK

@justrach justrach merged commit 18be8d1 into release/0.2.5824 Jun 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant