Snapshot load performance (~3×) + parallel freshness + #518 fixes#524
Conversation
…faster load)
The snapshot restore path duped every import / symbol name / detail out of the
outline_state section into its own allocation (~170k for codedb itself, millions
for a dense monorepo), then freed each at deinit. Replace those per-string copies
with slices into the retained section buffer:
- readSectionStringBorrowed: returns a slice aliasing the section, no alloc/copy
- FileOutline.borrows_strings: deinit skips freeing borrowed import/symbol strings
- Explorer.outline_section_bufs: adopts the section buffer, frees it once at deinit
(after outlines). Allocated from the Explorer's allocator, not the per-load one,
so arena-backed Explorers reclaim it without an explicit deinit.
- path stays individually owned (it is the map key) so every key-ownership path
is unchanged — that is what keeps the change low-risk.
Also pre-size the load's hashmaps (outline_states, outlines, dep_graph.forward/
reverse) to expected_file_count, avoiding the 0->N rehash storms.
Measured A/B on openclaw/openclaw (~39k files, two binaries loading the same
snapshot, interleaved warm loads): 314.9/324.5ms median -> 208.0/213.5ms,
~34% faster, reproducible with zero overlap between arms. Plus far fewer live
allocations (lower steady-state RSS, faster deinit).
Adds a round-trip + clean-deinit regression test. 674/674 tests pass, no leaks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rebuild)
call_centrality was built lazily on the first ranked search (codedb_context /
codedb_search), costing ~960ms on a 39k-file repo before the first query could
return. Persist it in the snapshot so a load restores it instead:
- SectionId.call_centrality (=8): a new section of path->f32 pairs. Absent or
empty => loader falls back to the lazy build, so older snapshots still load.
- writeSnapshot serializes explorer.call_centrality if built (read-only, under
the shared lock it already holds).
- The index/scan path calls Explorer.buildCallCentrality before persisting (a
public, lock-acquiring wrapper around ensureCallCentrality), gated by the
same CODEDB_NO_CENTRALITY env var as the search path, so the persisted
snapshot actually contains it.
- loadSnapshotFast.restoreCallCentrality rebuilds the map after the outlines
are restored, keying each entry off the stable outlines key (getEntry) — same
borrowed-key lifetime as ensureCallCentrality, so deinit stays unchanged.
Files no longer present are skipped.
Measured on openclaw/openclaw (~39k files): centrality build ~960ms (cold index
4612ms -> 5572ms, non-overlapping), now paid once at index; restore adds ~3.2ms
to load (385.9ms -> 389.1ms). Net: +3ms per load to remove ~960ms from the first
ranked query.
Adds a round-trip + exact-value persistence test. 675/675 tests pass, no leaks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
codedb_context already surfaced a symbol's callers (text + scope). Add the
dependency side: a "Calls" section listing each key symbol's resolved callees,
walked through the same call-graph extraction used for centrality.
Explorer.resolveCallees(path, line_start, line_end, ...): slices the function
body, runs codegraph.extractCallees, and resolves each callee name to a
definition. Because resolution is name-based (no type info), it is deliberately
HIGH-PRECISION / lower-recall — a callee is shown only when it resolves to
exactly one non-test function/method, and ubiquitous std/container/builtin
method names (init, get, append, lock, next, ...) are filtered out, since a
`name(` call site almost never means the rare user free function of that name.
Ambiguous or method-on-receiver calls are simply omitted rather than guessed
(asserting a false edge in an LLM-facing context block is worse than silence).
handleContext renders it for the ≤3 inlined symbols, paired with the Callers
section, so an agent sees both who calls a symbol and what it calls without a
follow-up codedb_outline/read.
Verified end-to-end via the MCP server on codedb itself, e.g. searchContentRanked
-> {lockShared, unlockShared, posixGetenv, ensureCallCentrality, normalizeChar,
splitIdentifier} — all correct, no false edges. Adds a resolveCallees unit test.
676/676 tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (~36% faster load) loadSnapshotFast (and loadSnapshotValidated) checked each restored file's mtime against the snapshot via openFile + stat + close — 3 syscalls per file, and the open()/close() pair is the expensive part (fd allocation, etc.). Only the mtime is needed, so use a single statFile (no handle). The file is still opened (readFileAlloc) only when it's actually newer than the snapshot — the rare case. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 354.5ms -> 227.4ms median, ~36% faster, fully reproducible with zero overlap between arms. statFile is also markedly more resilient to machine load (far fewer syscalls), so the win holds up under contention. Semantically identical: statFile follows symlinks like openFile+stat did, and returns the same Stat.mtime. No test changes needed; 676/676 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…file (~23% faster load) loadSnapshotFast parsed each restored file's record (path_len, path, content_len, content) with a separate readPositionalAll — ~4 syscalls per file, ~156k on a 39k-file repo. Read the whole content section once and parse records from memory instead: prefer a file-backed mmap (demand-paged, reclaimable — no heap spike), fall back to a single heap bulk-read if mmap is unavailable. path/content stay owned allocations (memcpy'd out of the section), so the insert/free logic and the mmap lifetime (munmap after the loop) are unchanged and simple. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 229.2ms -> 177.3ms median, ~23% faster, zero overlap between arms. Stacks with the statFile freshness fix (this session): the openclaw load went 354 -> 229 -> 177ms across the two syscall optimizations. 676/676 tests pass (snapshot round-trip tests exercise this path under the DebugAllocator — no leaks, no double-free). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt copy (~8% faster load) With the content section now mmap'd, the per-file `content` was still memcpy'd into a throwaway allocation and then duped again by contents.put — two copies. Pass the mmap slice directly: contents.put / indexFile* still dupe their owned copy into the cache, but the intermediate alloc + memcpy + free per file is gone (a full pass over all content + ~39k alloc/free pairs). The slice only ever borrows the section, which is mapped for the whole loop and unmapped after, and no consumer retains it, so lifetime is unchanged. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 169.8ms -> 155.9ms median, ~8% faster, zero overlap. Across this session the openclaw load went 354 -> 229 (statFile) -> 177 (mmap read) -> 156ms. 676/676 tests pass; the snapshot round-trip tests exercise this under the DebugAllocator (no leak, no double-free, no use-after-free). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t mmap (~17% faster load, ~237MB lower RSS)
On load, every restored file's content was duped into the ContentCache (an owned
heap copy — ~268MB on a 39k-file repo) even though the snapshot is already mmap'd.
Let the cache borrow those bytes directly:
- ContentCache gains a per-slot `value_owned` flag. `putBorrowed` stores a value
that aliases external memory (value_owned=false); it is never freed by the
cache (evict/remove/clear/deinit skip it). Keys stay cache-owned. put-update
frees the old value only if it was owned, so owned<->borrowed transitions are
safe (a later re-index of a file replaces its borrowed entry with an owned one).
- The Explorer adopts the content-section mmap (content_section_maps) and
munmaps it at deinit — after contents.deinit, which leaves borrowed values
untouched. The mmap stays valid after the fd is closed (POSIX) and across a
tmp+rename snapshot rewrite (the old inode stays mapped).
- loadSnapshotFast uses putBorrowed when the content came from the adopted mmap;
the heap-bulk-read fallback still dupes (owned). insertRestoredFile takes a
borrow flag.
Net: the per-file content copy disappears entirely from the load (only the path
keys are still duped), and the cache no longer holds ~268MB of owned content —
the bytes live in the file-backed, reclaimable mmap.
Measured on openclaw/openclaw (~39k files, same snapshot, interleaved):
load 166.2ms -> 138.3ms (16.8% faster)
peak RSS ~795MB -> ~558MB (~237MB / 30% lower)
Adds ContentCache borrow tests (zero-copy aliasing, no-free-on-borrowed, mixed
owned/borrowed transitions + eviction). 694/694 tests pass, no leaks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ad (~14% faster load, ~100MB lower RSS) A load profiler (new, gated by CODEDB_LOAD_PROFILE — near-zero cost off) showed the remaining load cost on openclaw (~136ms) split outline ~28ms / freshness ~35ms / per-file insert ~76ms. ~20ms of the insert work was store.recordSnapshot's Wyhash over every file's full content — which also faults in the entire mmap'd content section, partly undoing the zero-copy RSS win. Compute those hashes once at write time (when content is already being read) and store them in a new CONTENT_HASHES section (u64 per content record, in content order). The loader reads the precomputed hash for the restored/outline-only branches instead of re-hashing; the changed-file branch still hashes fresh disk content. Byte-identical to what it recorded before — just sourced from the snapshot, so no content pages are faulted at load. Absent section (older snapshots) => recompute, fully backward compatible. Measured A/B on openclaw/openclaw (~39k files, same content_hashes-bearing snapshot; old binary ignores the section and recomputes): load 145.5ms -> 125.3ms (~14% faster) peak RSS ~558MB -> ~457MB (~100MB lower — content stays demand-paged) Adds an order-alignment test (recorded Store hash == Wyhash(content) per file). 695/695 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Graph-aware ranking (call-graph centrality), persisted centrality in the snapshot (skips the first-query rebuild), and edge-aware codedb_context (graph-resolved callees). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents the load-path speedups + RSS reduction (statFile freshness, mmap content read, borrow content, zero-copy ContentCache, stored content hashes, gated CODEDB_LOAD_PROFILE): openclaw load ~380ms -> ~125ms (~3x), peak RSS ~795MB -> ~457MB (~338MB lower). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… faster 16k load) Restructure loadSnapshotFast into three passes: parse the CONTENT section into borrowed-slice records, fan the per-file statFile freshness check across workers, then run the existing sequential 3-way insert (changed/restored/outline-only). The parse pass also drops the old per-record path_buf alloc+memcpy+free (~1 pair/file) — indexFile*/recordSnapshot dupe the path and the restored branch reuses the outline_states key, so paths can borrow the mmap'd content section directly. All Explorer/Store mutation stays single-threaded. statFile is kernel/VFS-throughput-bound, not CPU-bound: a worker sweep on a 16k-file tree (M3 Ultra, 20 P-cores) measured freshness 14.1ms@1 -> 5.7ms@4 -> 9.5ms@8 -> 13.3ms@12 — a U-curve bottoming at ~4 regardless of core count. So the worker count is capped at FRESHNESS_MAX_WORKERS=4 (CODEDB_LOAD_WORKERS overrides), gated above FRESHNESS_PARALLEL_THRESHOLD=256 files, with single-thread and spawn-failure fallbacks. Final auto vs sequential on 16k files: freshness 13.2->5.7ms (2.3x), whole load 17.8->10.2ms (~43%). Test: "snapshot: parallel freshness load re-indexes changed files, restores the rest" (288-file fixture forces the multi-worker path). zig build test 696/696. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Python class kind, find score floor) Three correctness findings from the #518 audit: - extractIdent / extractRubyMethodName accepted only ASCII identifier chars, so non-ASCII identifiers (e.g. Korean "def 한():") were dropped entirely — codedb_outline / codedb_symbol returned nothing for them. Now accept any non-ASCII UTF-8 byte (>= 0x80), capturing identifiers in non-Latin scripts. - Python `class` was labeled .struct_def while every other language uses .class_def; Python now matches (Ruby `class` stays .struct_def, unchanged). - codedb_find returned confident hits for queries that match no filename: the local Smith-Waterman alignment can clear the length-scaled score floor on a few incidental matches. Add a subsequence floor — query and path must share an in-order LCS of at least 60% of the query's chars — computed only for candidates already past the score floor, so non-matching files pay nothing. Tests: issue-518 non-ASCII identifier, Python class_def, and fuzzy subsequence floor (with regression guards for typo-tolerant matches like mpc/mian/authmid). zig build test 699/699. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sert pass Refines the parallel freshness scan (109775c). The workers now only statFile and flag stale files — no content read, no allocation — so they are pure read-only and trivially thread-safe. The insert pass (Pass C) reads each stale file's fresh content sequentially with the load allocator, exactly as the original pre-parallel code did, so peak memory is one file's content instead of holding every changed file's content from the parallel pass — and the page-allocator hop is gone. The parallelized stat (the measured win) is unchanged, and the 0-changed common case is identical in cost. Sanity A/B on codedb's own 664-file index: freshness 0.70ms@1 -> 0.30ms@auto. zig build test 699/699. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42e67f6ffe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Auto-install mcpsync if missing (same trust domain as codedb). Never fail the | ||
| # codedb install if this is unavailable. | ||
| if [ -z "$mcpsync_bin" ]; then | ||
| curl -fsSL https://mcpsync.codegraff.com | bash >/dev/null 2>&1 || true |
There was a problem hiding this comment.
Don't pipe the mcpsync installer into bash
When mcpsync is absent, the installer now executes whatever https://mcpsync.codegraff.com returns with bash and suppresses all output, without any version pin, checksum/signature verification, or user consent. The repo's AGENTS.md explicitly calls out installer scripts that execute untrusted code or skip verification; in this scenario a compromised DNS/TLS/CDN endpoint turns a normal codedb install into arbitrary code execution. Prefer a pinned artifact with checksum/signature verification, or skip registration and print manual instructions.
Useful? React with 👍 / 👎.
| _ = cache.diag.appendIfFresh(alloc, out, path, result.new_hash); | ||
| const lang = explore_mod.detectLanguage(path); | ||
| if (cache.linter.shouldTry(lang) and cache.diag.tryBeginWork(path, result.new_hash)) { | ||
| spawnLintWorker(cache, path, result.new_hash, lang); |
There was a problem hiding this comment.
Run linter jobs from the project root
In the MCP startup modes covered by the repo guidance/e2e tests, the server can be spawned from / and learn the actual project root through the roots handshake; this call passes the repo-relative path (for example src/foo.py) to the detached linter, and runCapture is invoked without a cwd, so ruff/biome/etc. look under the process cwd rather than the project root. With linters enabled, the first edit in that context reports the file as missing and marks the language unavailable for the whole session, disabling the new diagnostics feature until reconnect. Pass an absolute path or set the worker cwd to cache.default_path.
Useful? React with 👍 / 👎.
…nch) resolveCallees called findAllSymbols once per callee (~18x per codedb_context), and findAllSymbols runs an O(all-symbols) safety scan over every outline plus a full result-list allocation per name. On the bench corpus (22 files but ~thousands of symbols) that scan is the codedb_context regression the #524 bench flagged. Resolve directly off symbol_index instead: O(defs-of-that-name), zero allocation beyond the one path dupe per shown callee, same high-precision unique-match semantics (exactly one non-test function/method, skip self, skip ubiquitous names). The symbol index is rebuilt on every commit, so it is authoritative for this best-effort feature. zig build test 699/699. The effect is below the noise floor in a local microbench on an M3 Ultra (thousands of string compares are sub-noise there) but dominant on the slower CI runner, so the CI bench is the real measure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Summary
feat/code-graph→release/0.2.5824— 13 commits. The most recent three are from this session; the rest is the snapshot load-performance arc and the code-graph snapshot persistence.Snapshot load performance (~380ms → ~125ms, ~3× on a 39k-file repo; ~338MB lower RSS)
The full load-path arc, measured end-to-end on an
openclaw-sized index:d5a8ad1statFilefreshness instead of open+stat+close (~36%) —1a3e2c55143cb20c7c523ContentCacheover the retained mmap (~17%, ~237MB lower RSS) —9c00f8fdf941ebstatFilecheck across workers.statFileis kernel/VFS-bound, not CPU-bound: a worker sweep on a 16k-file tree measured a U-curve bottoming at ~4 workers regardless of core count, so the count is capped atFRESHNESS_MAX_WORKERS=4. ~2.3× freshness, ~43% faster 16k load. Workers are pure read-only stat; the insert pass reads changed content sequentially so peak memory stays at one file. —109775c,42e67f6Code-graph snapshot persistence
8e6a950codedb_context(edge-aware) —b5389f6#518 correctness fixes (this session) —
210966eextractIdent/extractRubyMethodNameaccepted only ASCII, sodef 한():was dropped fromcodedb_outline/codedb_symbol.classlabeled.class_def(was.struct_def; every other language already used.class_def).codedb_findsubsequence floor — query and path must share an in-order LCS of ≥60% of the query's chars, sofindno longer returns confident hits for queries that match no filename.Testing
zig build test— 699/699 tests passing (23/23 build steps).Related issues
codedb_findfalse hits, Python class kind) — addressed here.codedb) — fixed earlier on this line (infalliblemain+ synchronous fast path).codedb_remote530 / code 1033) — client emits an actionable hint for the Cloudflare-origin outage; the 530 itself is server-side.🤖 Generated with Claude Code