find: faster + smarter fuzzy file ranking (prefilter + SIMD + symbol fast-path) by justrach · Pull Request #526 · justrach/codedb

justrach · 2026-06-03T08:38:38Z

Summary

codedb_find (fuzzy filename search) ran a per-file Smith-Waterman DP over every file in the tree. On large repos the long, symbol-like queries that don't hit the exact-filename fast path were slow (openclaw, 13.6k files: ~250ms Debug / ~41ms ReleaseFast for getTokenProvider). Real filename queries (auth, config, index) were already <0.5ms — only the degenerate "used find as a symbol lookup" case was slow.

This branch speeds up that case with zero change to which files come back or their order, verified on both a unit matrix and end-to-end on openclaw (recall@10 = recall@30 = 1.00, exact result order preserved vs the release baseline across 16 representative queries).

Two commits

1. Sound presence prefilter + tightened scalar DP (55df894)

Issue codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness) #518 made a ≥60% in-order LCS a hard requirement. Unordered character presence is an upper bound on that LCS, so a path missing >40% of the query's chars can be rejected in O(path) without the DP — it would fail the LCS gate anyway, so no result changes.
Hoisted per-row lowercasing, precomputed path lowercase + word-boundary flags, swapped the DP rows by pointer (no @memcpy), folded the matched-row scan into the main loop.

2. SIMD-across-files Smith-Waterman (0e954b6)

Refactored the scalar scorer into fuzzyDP (hot loop) + fuzzyFinalize (floor/LCS/bonus tail).
Added fuzzyScoreBatch: the same DP vectorized so each of 8 SIMD lanes scores a different file against the same query (inter-sequence parallelism). Shorter paths are length-masked per column so padding never affects best_score/matched_chars; the shared fuzzyFinalize tail keeps post-processing identical.
Single-part queries (the common case) batch presence-survivors 8 at a time; multi-part falls back to scalar.
All DP values are sums of exactly-representable small integers → the SIMD result is bit-identical to scalar.

Measured (ReleaseFast, openclaw 13.6k files, interleaved vs release baseline)

query	baseline	this PR	speedup	recall@10	order
getTokenProvider	41 ms	23 ms	1.80×	1.00	exact
handleRequest	33 ms	19 ms	1.78×	1.00	exact
embedded	22 ms	9 ms	2.39×	1.00	exact
subscribe	25 ms	14 ms	1.82×	1.00	exact
auth / config / index	<0.5 ms	<0.5 ms	—	1.00	exact

8 lanes beat 16 on Apple-Silicon NEON (register pressure). The remaining gap from a theoretical 4× is the per-batch transpose plus the DP's serial per-column dependency.

Tests

New test "fuzzy SIMD batch scorer matches scalar fuzzyScore exactly" compares fuzzyScoreBatch to fuzzyScore over a query×path matrix (incl. a full 8-lane mixed-length batch). Full suite green.

Investigated and rejected

A cheap candidate cap (greedy proxy → DP top-N) reached ~10× but dropped recall@10 to 0.85 (0.00 on getTokenProvider) — the greedy proxy can't model Smith-Waterman's mismatch tolerance. Dropped in favor of the retrieval-identical SIMD path.

🤖 Generated with Claude Code

@30

…eval-identical) codedb_find ran the O(query*path) Smith-Waterman DP in fuzzyScore once per file — ~250ms for long, symbol-like queries that match no filename (e.g. "getTokenProvider" over openclaw's 13.6k files); real filename queries that hit the exact-match fast path are already ~2ms. Two retrieval-identical changes (verified recall@10/@30 = 1.00, exact-order, vs the release baseline over 16 representative queries): - Presence pre-filter: issue #518 made a >=60% in-order LCS a hard requirement; unordered char presence is an upper bound on that LCS, so a file missing >40% of the query's chars can be rejected in O(path) without the DP. Skips the DP for files the LCS floor would reject anyway — no result changes. - Tightened DP: hoist the per-row lowercased query char, precompute path lowercase + word-boundary flags once, swap the prev/curr rows by pointer instead of two @memcpy per row, and fold the matched-row scan into the main loop. Same arithmetic, ~1.1x. Factored the per-file multi-part scoring into scoreAllParts (shared, no behavior change). 699 tests green. Note: a cheap candidate-cap variant (greedy proxy -> DP top-N) was prototyped and rejected — it reached ~10x but dropped recall@10 to 0.85 (and 0.00 on "getTokenProvider"/"TokenProvider"), because the greedy proxy can't model the DP's mismatch tolerance. A retrieval-faithful big speedup needs SIMD-across-files (next commit). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…val-identical) fuzzyFindFiles ran the Smith-Waterman DP one file at a time. Refactor the scalar path into fuzzyDP (the hot loop) + fuzzyFinalize (floor/LCS/bonus tail), then add fuzzyScoreBatch: the same DP vectorized so each SIMD lane scores a *different* file against the same query (inter-sequence parallelism, FZ_LANES=8). Shorter paths are length-masked per column so padding never affects best_score; the shared fuzzyFinalize tail keeps post-processing identical. Single-part queries (the common case) collect presence-prefilter survivors and score them in batches of 8; multi-part falls back to scalar. All DP values are sums of exactly-representable small integers, so the SIMD result is bit-identical to scalar — verified by a new test (fuzzyScoreBatch == fuzzyScore over a matrix of queries x paths, incl. a full 8-lane mixed-length batch) and end-to-end on openclaw (13.6k files, 16 queries): recall@10 = recall@30 = 1.00, exact result order preserved vs the release baseline. ReleaseFast, openclaw, the long symbol-like queries that miss the exact-filename fast path (the only slow ones; real filename queries are already <0.5ms): getTokenProvider 41 -> 23ms (1.80x), handleRequest 33 -> 19ms (1.78x), embedded 22 -> 9ms (2.39x), subscribe 25 -> 14ms (1.82x). 8 lanes beat 16 on NEON (register pressure). 699+1 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-03T08:40:51Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	78913	80714	+2.28%	+1801	OK
`codedb_changes`	8458	8820	+4.28%	+362	OK
`codedb_context`	962620	903301	-6.16%	-59319	OK
`codedb_deps`	181	188	+3.87%	+7	OK
`codedb_edit`	39871	38400	-3.69%	-1471	OK
`codedb_find`	7970	8253	+3.55%	+283	OK
`codedb_hot`	21067	20768	-1.42%	-299	OK
`codedb_outline`	28328	27507	-2.90%	-821	OK
`codedb_read`	13854	13765	-0.64%	-89	OK
`codedb_search`	28844	30683	+6.38%	+1839	OK
`codedb_snapshot`	64129	53804	-16.10%	-10325	OK
`codedb_status`	7386	7728	+4.63%	+342	OK
`codedb_symbol`	21224	22723	+7.06%	+1499	OK
`codedb_tree`	32153	17767	-44.74%	-14386	OK
`codedb_word`	9886	10966	+10.92%	+1080	NOISE

…orrect) The slow codedb_find queries were symbol-like identifiers (getTokenProvider, handleRequest) typed into a *fuzzy filename* search — they match no filename, so they paid the full fuzzy scan AND returned useless filename guesses instead of the definition the caller wanted. Add a symbol fast-path in handleFind, after the exact-filename check and before the fuzzy scan: when the query is a compound identifier (camelCase/PascalCase or snake_case — never ALL-CAPS, dotted, slashed, spaced, or <4 chars, so real filename searches are untouched), resolve it against the symbol index and return the definition site(s); a non-symbol identifier falls through to the (now SIMD-accelerated) fuzzy scan. renderSymbolDefsFast uses findAllSymbols (the COMPLETE lookup incl. the outline safety scan), not symbol_index alone — symbol_index is incomplete for some languages (indexes ~no TypeScript functions; symbol_index-only scored 0/40 real openclaw symbols). Definition kinds rank above import/comment usages; usages show only when no real def is indexed. Returns false without writing on a miss. Measured (ReleaseFast, openclaw 13.6k files): 40/40 real symbols resolve via the fast-path at ~1.3ms (def location returned) vs ~29ms for the same queries on the fuzzy path — ~22x faster and more useful. Filename queries (auth, config) keep the <0.5ms exact/fuzzy path. New test: classifier truth table + renderSymbolDefsFast resolves a def and returns false (no output) on a miss. 700+ tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

justrach · 2026-06-03T09:17:11Z

Added: symbol fast-path (3rd commit, `41f79d4`)

The slow codedb_find queries were symbol-like identifiers (getTokenProvider, handleRequest) typed into a fuzzy filename search — they match no filename, so they paid the full scan and returned useless filename guesses rather than the definition the caller wanted.

handleFind now, after the exact-filename check and before the fuzzy scan, detects a compound identifier (camelCase/PascalCase or snake_case; never ALL-CAPS, dotted, slashed, spaced, or <4 chars) and resolves it against the symbol index, returning the definition site(s). Non-symbol identifiers fall through to the (SIMD-accelerated) fuzzy scan, so real filename searches are unaffected.

Uses the complete findAllSymbols lookup (incl. the outline safety scan), not symbol_index alone — symbol_index is incomplete for some languages (it indexes ~no TypeScript functions; a symbol_index-only version scored 0/40 real openclaw symbols). Definition kinds rank above import/comment usages.

Measured (ReleaseFast, openclaw 13.6k files): 40/40 real symbols resolve via the fast-path at ~1.3 ms (returning the def location) vs ~29 ms on the fuzzy path — ~22× faster and more useful. Filename queries (auth, config) keep the <0.5 ms exact/fuzzy path.

New test: classifier truth table + renderSymbolDefsFast resolves a def and returns false (no output) on a miss. 700+ tests green.

Net effect of this PR on `codedb_find`

Real symbol typed into find (the main slow/misused case): ~29 ms fuzzy guess → ~1.3 ms exact definition (~22×).
Compound query that isn't a symbol: fuzzy scan, now SIMD-accelerated (~1.8× on the heavy ones).
Filename query: unchanged, <0.5 ms.

github-actions · 2026-06-03T09:19:25Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	65274	64794	-0.74%	-480	OK
`codedb_changes`	6506	6816	+4.76%	+310	OK
`codedb_context`	1028771	1040324	+1.12%	+11553	OK
`codedb_deps`	266	235	-11.65%	-31	OK
`codedb_edit`	36401	36543	+0.39%	+142	OK
`codedb_find`	5397	4919	-8.86%	-478	OK
`codedb_hot`	13758	14031	+1.98%	+273	OK
`codedb_outline`	25291	25610	+1.26%	+319	OK
`codedb_read`	11685	11554	-1.12%	-131	OK
`codedb_search`	30013	30667	+2.18%	+654	OK
`codedb_snapshot`	63285	63252	-0.05%	-33	OK
`codedb_status`	5041	5003	-0.75%	-38	OK
`codedb_symbol`	16055	15817	-1.48%	-238	OK
`codedb_tree`	24076	16026	-33.44%	-8050	OK
`codedb_word`	6882	7202	+4.65%	+320	OK

justrach · 2026-06-03T09:56:26Z

What makes it faster (and what's unchanged)

Four mechanisms, in the order handleFind applies them:

Symbol fast-path (the big structural win). A compound-identifier query (camelCase/snake_case) is resolved against the symbol table and returns the definition site — it skips the per-file fuzzy scan entirely. This is an algorithm change (O(files) Smith-Waterman → O(symbol lookup)), not a constant-factor tweak: ~30–50× for those queries, and a better answer (the def, not filename guesses). Uses the complete findAllSymbols (incl. the outline safety scan) because symbol_index alone is incomplete for some languages (0/40 real TS symbols).
Sound presence pre-filter. Issue codedb 0.2.5823: a few correctness/UX findings (non-ASCII outline, codedb_find false hits, kind labels, search cap, snapshot staleness) #518 makes a ≥60% in-order LCS a hard requirement; unordered char-presence is an upper bound on that LCS, so files missing >40% of the query's chars are rejected in O(path) — skipping the O(query×path) DP for files that would fail the gate anyway. Retrieval-identical.
SIMD-across-files Smith-Waterman. The fuzzy fallback scores 8 files per vector pass (one file per lane). All DP values are sums of exactly-representable small ints → bit-identical to scalar. ~2× on a cool machine.
Tightened scalar DP. Hoisted the per-row lowercased query char, precomputed path-lowercase + word-boundary flags, pointer-swapped the DP rows (no @memcpy), folded the matched-row scan.

Verification

1:1 with everything else. Data output of all 10 non-find tools (search, callers, symbol, word, deps, tree, outline, glob, context) is byte-identical baseline vs this branch — find is the only behavior change, and its fuzzy path is retrieval-identical (recall@10/@30 = 1.00, exact order).
RSS flat. 157 MB vs baseline 158 MB after 200 mixed find queries — the SIMD lane arrays + candidate list are transient per-query (freed each call).
Tests: 700+ green.
Both surfaces: lives in handleFind, shared by MCP codedb_find and CLI file (warm daemon → runCliTool → handleFind). Measured CLI symbol query: ~40–60 ms → <4 ms.
Measurement caveat: the per-query fuzzy speedup is sensitive to this box's documented macOS E-core/thermal variance; using config/auth (untouched code) as a control, every fuzzy query's new/base ratio sits at or below the control penalty, confirming the SIMD path is not a regression.

…, CLI hardening, ReScript - release_info.semver 0.2.5823 -> 0.2.5824 (the version codedb reports and update.zig compares against; build.zig.zon was already 0.2.5824). - CHANGELOG: document the warm CLI daemon (#525), faster fuzzy find (#526), CLI hardening (#529), ReScript .res/.resi (#532), and audit fixes (#530) alongside the existing code-graph / snapshot-load sections. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

justrach and others added 2 commits June 3, 2026 16:21

justrach changed the title ~~find: faster fuzzy file ranking — sound prefilter + SIMD-across-files Smith-Waterman (retrieval-identical)~~ find: faster + smarter fuzzy file ranking (prefilter + SIMD + symbol fast-path) Jun 3, 2026

justrach merged commit 7e5a2bb into release/0.2.5824 Jun 3, 2026
1 check passed

justrach mentioned this pull request Jun 3, 2026

release: 0.2.5824 → main (perf overhaul + warm-CLI daemon + faster find + graph ranking + Windsurf/Devin) #527

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

find: faster + smarter fuzzy file ranking (prefilter + SIMD + symbol fast-path)#526

find: faster + smarter fuzzy file ranking (prefilter + SIMD + symbol fast-path)#526
justrach merged 3 commits into
release/0.2.5824from
fix/codedb-find-fuzzy-perf

justrach commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

justrach commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

justrach commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 3, 2026

Summary

Two commits

Measured (ReleaseFast, openclaw 13.6k files, interleaved vs release baseline)

Tests

Investigated and rejected

Uh oh!

github-actions Bot commented Jun 3, 2026

Benchmark Regression Report

Uh oh!

justrach commented Jun 3, 2026

Added: symbol fast-path (3rd commit, 41f79d4)

Net effect of this PR on codedb_find

Uh oh!

github-actions Bot commented Jun 3, 2026

Benchmark Regression Report

Uh oh!

justrach commented Jun 3, 2026

What makes it faster (and what's unchanged)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Added: symbol fast-path (3rd commit, `41f79d4`)

Net effect of this PR on `codedb_find`