Snapshot load performance (~3×) + parallel freshness + #518 fixes by justrach · Pull Request #524 · justrach/codedb

justrach · 2026-06-02T15:35:06Z

Summary

feat/code-graph → release/0.2.5824 — 13 commits. The most recent three are from this session; the rest is the snapshot load-performance arc and the code-graph snapshot persistence.

Snapshot load performance (~380ms → ~125ms, ~3× on a 39k-file repo; ~338MB lower RSS)

The full load-path arc, measured end-to-end on an openclaw-sized index:

Borrowed restored outline strings + pre-sized load maps (~34%) — d5a8ad1
statFile freshness instead of open+stat+close (~36%) — 1a3e2c5
Single mmap of the content section instead of ~4 preads/file (~23%) — 5143cb2
Borrowed content from the mmap'd section (~8%) — 0c7c523
Zero-copy ContentCache over the retained mmap (~17%, ~237MB lower RSS) — 9c00f8f
Stored content hashes, skipping re-hash at load (~14%, ~100MB lower RSS) — df941eb
Parallel freshness scan (this session) — fan the per-file statFile check across workers. statFile is kernel/VFS-bound, not CPU-bound: a worker sweep on a 16k-file tree measured a U-curve bottoming at ~4 workers regardless of core count, so the count is capped at FRESHNESS_MAX_WORKERS=4. ~2.3× freshness, ~43% faster 16k load. Workers are pure read-only stat; the insert pass reads changed content sequentially so peak memory stays at one file. — 109775c, 42e67f6

Code-graph snapshot persistence

Persist call-graph centrality in the snapshot, skipping the ~960ms first-query rebuild — 8e6a950
Graph-resolved callees in codedb_context (edge-aware) — b5389f6

#518 correctness fixes (this session) — `210966e`

Non-ASCII identifiers captured in outlines — extractIdent/extractRubyMethodName accepted only ASCII, so def 한(): was dropped from codedb_outline/codedb_symbol.
Python class labeled .class_def (was .struct_def; every other language already used .class_def).
codedb_find subsequence floor — query and path must share an in-order LCS of ≥60% of the query's chars, so find no longer returns confident hits for queries that match no filename.

Testing

zig build test — 699/699 tests passing (23/23 build steps).

Related issues

codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness) #518 (non-ASCII outline, codedb_find false hits, Python class kind) — addressed here.
MacOS intel x64 segmentation error #504 (Intel x64 segfault on bare codedb) — fixed earlier on this line (infallible main + synchronous fast path).
codedb_remote returns api.wiki.codes HTTP 530 / code 1033 for public repos #508 (codedb_remote 530 / code 1033) — client emits an actionable hint for the Cloudflare-origin outage; the 530 itself is server-side.

🤖 Generated with Claude Code

…faster load) The snapshot restore path duped every import / symbol name / detail out of the outline_state section into its own allocation (~170k for codedb itself, millions for a dense monorepo), then freed each at deinit. Replace those per-string copies with slices into the retained section buffer: - readSectionStringBorrowed: returns a slice aliasing the section, no alloc/copy - FileOutline.borrows_strings: deinit skips freeing borrowed import/symbol strings - Explorer.outline_section_bufs: adopts the section buffer, frees it once at deinit (after outlines). Allocated from the Explorer's allocator, not the per-load one, so arena-backed Explorers reclaim it without an explicit deinit. - path stays individually owned (it is the map key) so every key-ownership path is unchanged — that is what keeps the change low-risk. Also pre-size the load's hashmaps (outline_states, outlines, dep_graph.forward/ reverse) to expected_file_count, avoiding the 0->N rehash storms. Measured A/B on openclaw/openclaw (~39k files, two binaries loading the same snapshot, interleaved warm loads): 314.9/324.5ms median -> 208.0/213.5ms, ~34% faster, reproducible with zero overlap between arms. Plus far fewer live allocations (lower steady-state RSS, faster deinit). Adds a round-trip + clean-deinit regression test. 674/674 tests pass, no leaks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rebuild) call_centrality was built lazily on the first ranked search (codedb_context / codedb_search), costing ~960ms on a 39k-file repo before the first query could return. Persist it in the snapshot so a load restores it instead: - SectionId.call_centrality (=8): a new section of path->f32 pairs. Absent or empty => loader falls back to the lazy build, so older snapshots still load. - writeSnapshot serializes explorer.call_centrality if built (read-only, under the shared lock it already holds). - The index/scan path calls Explorer.buildCallCentrality before persisting (a public, lock-acquiring wrapper around ensureCallCentrality), gated by the same CODEDB_NO_CENTRALITY env var as the search path, so the persisted snapshot actually contains it. - loadSnapshotFast.restoreCallCentrality rebuilds the map after the outlines are restored, keying each entry off the stable outlines key (getEntry) — same borrowed-key lifetime as ensureCallCentrality, so deinit stays unchanged. Files no longer present are skipped. Measured on openclaw/openclaw (~39k files): centrality build ~960ms (cold index 4612ms -> 5572ms, non-overlapping), now paid once at index; restore adds ~3.2ms to load (385.9ms -> 389.1ms). Net: +3ms per load to remove ~960ms from the first ranked query. Adds a round-trip + exact-value persistence test. 675/675 tests pass, no leaks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codedb_context already surfaced a symbol's callers (text + scope). Add the dependency side: a "Calls" section listing each key symbol's resolved callees, walked through the same call-graph extraction used for centrality. Explorer.resolveCallees(path, line_start, line_end, ...): slices the function body, runs codegraph.extractCallees, and resolves each callee name to a definition. Because resolution is name-based (no type info), it is deliberately HIGH-PRECISION / lower-recall — a callee is shown only when it resolves to exactly one non-test function/method, and ubiquitous std/container/builtin method names (init, get, append, lock, next, ...) are filtered out, since a `name(` call site almost never means the rare user free function of that name. Ambiguous or method-on-receiver calls are simply omitted rather than guessed (asserting a false edge in an LLM-facing context block is worse than silence). handleContext renders it for the ≤3 inlined symbols, paired with the Callers section, so an agent sees both who calls a symbol and what it calls without a follow-up codedb_outline/read. Verified end-to-end via the MCP server on codedb itself, e.g. searchContentRanked -> {lockShared, unlockShared, posixGetenv, ensureCallCentrality, normalizeChar, splitIdentifier} — all correct, no false edges. Adds a resolveCallees unit test. 676/676 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (~36% faster load) loadSnapshotFast (and loadSnapshotValidated) checked each restored file's mtime against the snapshot via openFile + stat + close — 3 syscalls per file, and the open()/close() pair is the expensive part (fd allocation, etc.). Only the mtime is needed, so use a single statFile (no handle). The file is still opened (readFileAlloc) only when it's actually newer than the snapshot — the rare case. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 354.5ms -> 227.4ms median, ~36% faster, fully reproducible with zero overlap between arms. statFile is also markedly more resilient to machine load (far fewer syscalls), so the win holds up under contention. Semantically identical: statFile follows symlinks like openFile+stat did, and returns the same Stat.mtime. No test changes needed; 676/676 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…file (~23% faster load) loadSnapshotFast parsed each restored file's record (path_len, path, content_len, content) with a separate readPositionalAll — ~4 syscalls per file, ~156k on a 39k-file repo. Read the whole content section once and parse records from memory instead: prefer a file-backed mmap (demand-paged, reclaimable — no heap spike), fall back to a single heap bulk-read if mmap is unavailable. path/content stay owned allocations (memcpy'd out of the section), so the insert/free logic and the mmap lifetime (munmap after the loop) are unchanged and simple. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 229.2ms -> 177.3ms median, ~23% faster, zero overlap between arms. Stacks with the statFile freshness fix (this session): the openclaw load went 354 -> 229 -> 177ms across the two syscall optimizations. 676/676 tests pass (snapshot round-trip tests exercise this path under the DebugAllocator — no leaks, no double-free). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nt copy (~8% faster load) With the content section now mmap'd, the per-file `content` was still memcpy'd into a throwaway allocation and then duped again by contents.put — two copies. Pass the mmap slice directly: contents.put / indexFile* still dupe their owned copy into the cache, but the intermediate alloc + memcpy + free per file is gone (a full pass over all content + ~39k alloc/free pairs). The slice only ever borrows the section, which is mapped for the whole loop and unmapped after, and no consumer retains it, so lifetime is unchanged. Measured A/B on openclaw/openclaw (~39k files, same snapshot, interleaved warm loads): 169.8ms -> 155.9ms median, ~8% faster, zero overlap. Across this session the openclaw load went 354 -> 229 (statFile) -> 177 (mmap read) -> 156ms. 676/676 tests pass; the snapshot round-trip tests exercise this under the DebugAllocator (no leak, no double-free, no use-after-free). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…t mmap (~17% faster load, ~237MB lower RSS) On load, every restored file's content was duped into the ContentCache (an owned heap copy — ~268MB on a 39k-file repo) even though the snapshot is already mmap'd. Let the cache borrow those bytes directly: - ContentCache gains a per-slot `value_owned` flag. `putBorrowed` stores a value that aliases external memory (value_owned=false); it is never freed by the cache (evict/remove/clear/deinit skip it). Keys stay cache-owned. put-update frees the old value only if it was owned, so owned<->borrowed transitions are safe (a later re-index of a file replaces its borrowed entry with an owned one). - The Explorer adopts the content-section mmap (content_section_maps) and munmaps it at deinit — after contents.deinit, which leaves borrowed values untouched. The mmap stays valid after the fd is closed (POSIX) and across a tmp+rename snapshot rewrite (the old inode stays mapped). - loadSnapshotFast uses putBorrowed when the content came from the adopted mmap; the heap-bulk-read fallback still dupes (owned). insertRestoredFile takes a borrow flag. Net: the per-file content copy disappears entirely from the load (only the path keys are still duped), and the cache no longer holds ~268MB of owned content — the bytes live in the file-backed, reclaimable mmap. Measured on openclaw/openclaw (~39k files, same snapshot, interleaved): load 166.2ms -> 138.3ms (16.8% faster) peak RSS ~795MB -> ~558MB (~237MB / 30% lower) Adds ContentCache borrow tests (zero-copy aliasing, no-free-on-borrowed, mixed owned/borrowed transitions + eviction). 694/694 tests pass, no leaks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ad (~14% faster load, ~100MB lower RSS) A load profiler (new, gated by CODEDB_LOAD_PROFILE — near-zero cost off) showed the remaining load cost on openclaw (~136ms) split outline ~28ms / freshness ~35ms / per-file insert ~76ms. ~20ms of the insert work was store.recordSnapshot's Wyhash over every file's full content — which also faults in the entire mmap'd content section, partly undoing the zero-copy RSS win. Compute those hashes once at write time (when content is already being read) and store them in a new CONTENT_HASHES section (u64 per content record, in content order). The loader reads the precomputed hash for the restored/outline-only branches instead of re-hashing; the changed-file branch still hashes fresh disk content. Byte-identical to what it recorded before — just sourced from the snapshot, so no content pages are faulted at load. Absent section (older snapshots) => recompute, fully backward compatible. Measured A/B on openclaw/openclaw (~39k files, same content_hashes-bearing snapshot; old binary ignores the section and recomputes): load 145.5ms -> 125.3ms (~14% faster) peak RSS ~558MB -> ~457MB (~100MB lower — content stays demand-paged) Adds an order-alignment test (recorded Store hash == Wyhash(content) per file). 695/695 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Graph-aware ranking (call-graph centrality), persisted centrality in the snapshot (skips the first-query rebuild), and edge-aware codedb_context (graph-resolved callees). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Documents the load-path speedups + RSS reduction (statFile freshness, mmap content read, borrow content, zero-copy ContentCache, stored content hashes, gated CODEDB_LOAD_PROFILE): openclaw load ~380ms -> ~125ms (~3x), peak RSS ~795MB -> ~457MB (~338MB lower). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… faster 16k load) Restructure loadSnapshotFast into three passes: parse the CONTENT section into borrowed-slice records, fan the per-file statFile freshness check across workers, then run the existing sequential 3-way insert (changed/restored/outline-only). The parse pass also drops the old per-record path_buf alloc+memcpy+free (~1 pair/file) — indexFile*/recordSnapshot dupe the path and the restored branch reuses the outline_states key, so paths can borrow the mmap'd content section directly. All Explorer/Store mutation stays single-threaded. statFile is kernel/VFS-throughput-bound, not CPU-bound: a worker sweep on a 16k-file tree (M3 Ultra, 20 P-cores) measured freshness 14.1ms@1 -> 5.7ms@4 -> 9.5ms@8 -> 13.3ms@12 — a U-curve bottoming at ~4 regardless of core count. So the worker count is capped at FRESHNESS_MAX_WORKERS=4 (CODEDB_LOAD_WORKERS overrides), gated above FRESHNESS_PARALLEL_THRESHOLD=256 files, with single-thread and spawn-failure fallbacks. Final auto vs sequential on 16k files: freshness 13.2->5.7ms (2.3x), whole load 17.8->10.2ms (~43%). Test: "snapshot: parallel freshness load re-indexes changed files, restores the rest" (288-file fixture forces the multi-worker path). zig build test 696/696. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… Python class kind, find score floor) Three correctness findings from the #518 audit: - extractIdent / extractRubyMethodName accepted only ASCII identifier chars, so non-ASCII identifiers (e.g. Korean "def 한():") were dropped entirely — codedb_outline / codedb_symbol returned nothing for them. Now accept any non-ASCII UTF-8 byte (>= 0x80), capturing identifiers in non-Latin scripts. - Python `class` was labeled .struct_def while every other language uses .class_def; Python now matches (Ruby `class` stays .struct_def, unchanged). - codedb_find returned confident hits for queries that match no filename: the local Smith-Waterman alignment can clear the length-scaled score floor on a few incidental matches. Add a subsequence floor — query and path must share an in-order LCS of at least 60% of the query's chars — computed only for candidates already past the score floor, so non-matching files pay nothing. Tests: issue-518 non-ASCII identifier, Python class_def, and fuzzy subsequence floor (with regression guards for typo-tolerant matches like mpc/mian/authmid). zig build test 699/699. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sert pass Refines the parallel freshness scan (109775c). The workers now only statFile and flag stale files — no content read, no allocation — so they are pure read-only and trivially thread-safe. The insert pass (Pass C) reads each stale file's fresh content sequentially with the load allocator, exactly as the original pre-parallel code did, so peak memory is one file's content instead of holding every changed file's content from the parallel pass — and the page-allocator hop is gone. The parallelized stat (the measured win) is unchanged, and the 0-changed common case is identical in cost. Sanity A/B on codedb's own 664-file index: freshness 0.70ms@1 -> 0.30ms@auto. zig build test 699/699. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42e67f6ffe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T15:37:53Z

+  # Auto-install mcpsync if missing (same trust domain as codedb). Never fail the
+  # codedb install if this is unavailable.
+  if [ -z "$mcpsync_bin" ]; then
+    curl -fsSL https://mcpsync.codegraff.com | bash >/dev/null 2>&1 || true


Don't pipe the mcpsync installer into bash

When mcpsync is absent, the installer now executes whatever https://mcpsync.codegraff.com returns with bash and suppresses all output, without any version pin, checksum/signature verification, or user consent. The repo's AGENTS.md explicitly calls out installer scripts that execute untrusted code or skip verification; in this scenario a compromised DNS/TLS/CDN endpoint turns a normal codedb install into arbitrary code execution. Prefer a pinned artifact with checksum/signature verification, or skip registration and print manual instructions.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-02T15:37:53Z

+        _ = cache.diag.appendIfFresh(alloc, out, path, result.new_hash);
+        const lang = explore_mod.detectLanguage(path);
+        if (cache.linter.shouldTry(lang) and cache.diag.tryBeginWork(path, result.new_hash)) {
+            spawnLintWorker(cache, path, result.new_hash, lang);


Run linter jobs from the project root

In the MCP startup modes covered by the repo guidance/e2e tests, the server can be spawned from / and learn the actual project root through the roots handshake; this call passes the repo-relative path (for example src/foo.py) to the detached linter, and runCapture is invoked without a cwd, so ruff/biome/etc. look under the process cwd rather than the project root. With linters enabled, the first edit in that context reports the file as missing and marks the language unavailable for the whole session, disabling the new diagnostics feature until reconnect. Pass an absolute path or set the worker cwd to cache.default_path.

Useful? React with 👍 / 👎.

…nch) resolveCallees called findAllSymbols once per callee (~18x per codedb_context), and findAllSymbols runs an O(all-symbols) safety scan over every outline plus a full result-list allocation per name. On the bench corpus (22 files but ~thousands of symbols) that scan is the codedb_context regression the #524 bench flagged. Resolve directly off symbol_index instead: O(defs-of-that-name), zero allocation beyond the one path dupe per shown callee, same high-precision unique-match semantics (exactly one non-test function/method, skip self, skip ubiquitous names). The symbol index is rebuilt on every commit, so it is authoritative for this best-effort feature. zig build test 699/699. The effect is below the noise floor in a local microbench on an M3 Ultra (thousands of string compares are sub-noise there) but dominant on the slower CI runner, so the CI bench is the real measure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-02T16:19:35Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	91184	90916	-0.29%	-268	OK
`codedb_changes`	11887	11204	-5.75%	-683	OK
`codedb_context`	1174704	1178512	+0.32%	+3808	OK
`codedb_deps`	247	259	+4.86%	+12	OK
`codedb_edit`	61277	49337	-19.49%	-11940	OK
`codedb_find`	10824	10805	-0.18%	-19	OK
`codedb_hot`	26857	29314	+9.15%	+2457	OK
`codedb_outline`	31909	29333	-8.07%	-2576	OK
`codedb_read`	15190	16034	+5.56%	+844	OK
`codedb_search`	21994	23641	+7.49%	+1647	OK
`codedb_snapshot`	121090	67285	-44.43%	-53805	OK
`codedb_status`	10453	9891	-5.38%	-562	OK
`codedb_symbol`	18025	18649	+3.46%	+624	OK
`codedb_tree`	42701	34597	-18.98%	-8104	OK
`codedb_word`	12589	11714	-6.95%	-875	OK

justrach and others added 13 commits June 2, 2026 02:23

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

justrach changed the base branch from main to release/0.2.5824 June 2, 2026 15:55

justrach changed the title ~~Code-graph layer, ~3× faster snapshot load, and #518 correctness fixes~~ Snapshot load performance (~3×) + parallel freshness + #518 fixes Jun 2, 2026

justrach merged commit 18be8d1 into release/0.2.5824 Jun 2, 2026
1 check passed

justrach mentioned this pull request Jun 3, 2026

release: 0.2.5824 → main (perf overhaul + warm-CLI daemon + faster find + graph ranking + Windsurf/Devin) #527

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshot load performance (~3×) + parallel freshness + #518 fixes#524

Snapshot load performance (~3×) + parallel freshness + #518 fixes#524
justrach merged 14 commits into
release/0.2.5824from
feat/code-graph

justrach commented Jun 2, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Snapshot load performance (~380ms → ~125ms, ~3× on a 39k-file repo; ~338MB lower RSS)

Code-graph snapshot persistence

#518 correctness fixes (this session) — 210966e

Testing

Related issues

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 2, 2026

Benchmark Regression Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justrach commented Jun 2, 2026 •

edited

Loading

#518 correctness fixes (this session) — `210966e`