release/0.2.5793: fuzzy-find exact basename ranking + version sync#364
Merged
Conversation
The previous v0.2.5792 release shipped with src/release_info.zig at "0.2.579" while build.zig.zon was at "0.2.5792" — so binaries built from that release self-reported as 0.2.579 even though they were built from the 0.2.5792 source tree. Fixing both to 0.2.5793 here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reproducer from #363: querying 'cli.rs' against a workspace with several crates returns unrelated lib.rs files ahead of the actual crates/forge_main/src/cli.rs. The exact basename match should be ranked first, not fifth. Cause (next commit): fuzzyScore in src/explore.zig:4948 gives 'special entry points' (lib.rs, main.go, index.ts) a +5% multiplicative bonus regardless of whether the query asked for them. Combined with shorter path normalization, that pushes the actual cli.rs match below the lib.rs decoys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Querying 'cli.rs' against a multi-crate workspace returned four unrelated lib.rs files ahead of the actual cli.rs. Two compounding factors: 1. fuzzyScore gives lib.rs / main.go / index.ts a +5% special-entry-point bonus — applied regardless of whether the query asked for those names. 2. Path-length normalization (best_score / sqrt(path.len)) rewards shorter paths. lib.rs files with shorter parent paths beat the actual cli.rs match on length even when the cli.rs alignment was strictly better at the basename level. Fix: when the query case-insensitively equals the filename, apply a 4x multiplier so the exact match dominates length normalization and the special-entry-point bonus on competing files. This mirrors fzf-style "exact match always wins" behavior. Companion test in src/tests.zig from the previous commit (issue-363b: fuzzyFindFiles ranks exact basename match above unrelated lib.rs) now passes; all 424 tests green. Closes #363 (item b: fuzzy filename ranking). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Document the version sync fix, #363b basename ranking fix, and honestly flag #363a as not-reproducible-in-isolation. Note that #357 (received keys diagnostic, shipped in 0.2.5792) already addresses #363c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Sonnet 4.6 sub-agent driving the live MCP reproduced the bug
against this codedb repo: searchContent('searchContent') misses
src/explore.zig because doc-file hits saturate Tier 0's
result_list quota before the source-file hits at the end of the
word-index posting list are reached.
Test plants 4 doc files (4 mentions each = 16 hits), then 1
source file (2 hits, total 18). Picks max_results=10 so the
gate `word_hits.len <= max_results * 2` (18 <= 20) holds and
Tier 0 is the active code path. With doc files indexed first,
the source file's hits land at the end of the posting list and
get dropped when result_list fills.
This test fails on main; the fix in the next commit caps Tier 0
hits per file to ensure diversity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User reported that codedb_search query='pub struct ForgeApp' returned only architecture.md, missing the canonical Rust source file at crates/forge_app/src/app.rs:47 even though the phrase is a contiguous substring there. A Sonnet 4.6 agent driving the live MCP narrowed the bug down to single-token queries against repos where doc files mention the identifier many times. Tier 0 of searchContent (explore.zig:1511) iterates word-index hits in posting-list (insertion) order and appends each (path, line) hit to result_list. When the gate `word_hits.len <= max_results * 2` holds, Tier 0 runs; once result_list reaches max_results it returns. Doc files indexed earlier than source files dominate the posting list, fill the quota, and source-file hits at the end never make it in. Fix: in Tier 0, track hits per path and cap them at max(1, max_results / 5). For the default max_results=50 each file contributes at most 10 hits, leaving room for at least 5 distinct files in the result set. For small max_results the cap floors at 1 so diversity is preserved aggressively. This is a minimal change — Tier 0 still runs first, the gate is unchanged, and the cap doesn't affect Tier 1+ paths. The display layer in mcp.zig already caps per-file output at 5 so user-visible behavior for high-density single-file matches is unchanged. Companion failing test in src/tests.zig from the previous commit (issue-363a: searchContent surfaces source-file matches even when doc files dominate the word index) now passes; all 425 tests green. Closes #363 (item a: literal search recall). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote the release notes from "two fixes + #363a not reproduced" to "all of #363 resolved" now that the Tier 0 quota saturation fix landed. Restore the missing 0.2.5792 section header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
fallback, and query received-keys diagnostic Three reliability improvements landed in this release per the rewritten #356 scope (Agent Context Planner framing dropped): - issue-356-1: codedb_query pipeline currently bails on first error, discarding successful prior-step output. Test asserts step 0 output survives + a "--- partial ---" tail names the failing step. - issue-356-2: codedb_outline returns bare 'file not indexed' when the path doesn't index. Test asserts a 'did you mean' hint with the closest fuzzy match (src/main.zig for 'src/man.zig'). - issue-356-3: codedb_query step missing-arg errors should surface 'received keys: [...]' mirroring the #357 bundle diagnostic. All three fail on main; fixes in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three reliability improvements landing as the rewritten phase 1 of #356 (Agent Context Planner framing dropped — codedb stays a tool, agent stays in charge of composition): 1. codedb_query: partial results when a step fails. Pipeline previously bailed on the first error and discarded successful prior-step output. New behavior: the prior-step output is preserved and a structured "--- partial ---" tail names the failing step + reason. Callers can recover from a single bad step instead of starting over. 2. codedb_outline: fuzzy path fallback. A non-indexed path used to return a bare 'error: file not indexed: <path>'. Now appends up to 3 fuzzy-matched indexed paths under a 'did you mean:' header, so an agent that mistypes a path can self-correct without a separate codedb_find round-trip. 3. codedb_query: received-keys diagnostic on missing-arg errors. Mirrors the #357 codedb_bundle diagnostic. When a step fails with e.g. 'search needs query' but the step actually has a 'q' key instead, callers see 'received keys: [op, q]' so they can tell whether codedb dropped the field or the client sent it under the wrong name. Implementation: - New helper appendFuzzyPathSuggestions in mcp.zig — used in handleOutline's "file not indexed" path. - New helper finishQueryWithFailure in mcp.zig — emits both the received-keys diagnostic and the partial-results tail. Wired into the missing-arg error sites for op-detection, find, search, word, and symbol pipeline ops. - Other error paths in handleQuery (transient OOM, "needs prior step" for non-leading ops) keep the previous behavior; they don't surface received-keys but they will get partial-results coverage in a follow-up. Tests: src/tests.zig adds three issue-356 cases (one per item) which fail on main and pass with this commit. All 428 tests green. Closes parts of #356 (phase 1 reliability scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
All of #363 + phase 1 of #356 — search recall, ranking, query reliability, and ergonomics. Closes #363; closes phase 1 scope of #356 (issue stays open for future phases).
a6741ef(test) +caaf507(fix)82c7598(test) +3b0650b(fix)codedb_querypartial results3206638(test) +615e89b(fix)codedb_outlinefuzzy path fallback3206638(test) +615e89b(fix)codedb_queryreceived-keys diagnostic3206638(test) +615e89b(fix)Plus stale
release_info.zigversion sync (e33a90b).How #363a was found
Sonnet 4.6 sub-agent driving the live MCP reproduced an analogous bug against this codedb repo:
codedb_search query="searchContent" regex=falsereturned doc files but missedsrc/explore.zigitself. Root cause: Tier 0 ofsearchContent(explore.zig:1511) iterates word-index hits in posting-list order and saturatesresult_listwith hits from heavily-mentioning files before source files indexed later get reached. Cap-per-file in Tier 0 (max(1, max_results / 5)) restores diversity.How #363b was found
Direct code review.
fuzzyScoreatexplore.zig:4948giveslib.rs/main.go/index.tsa +5% special-entry-point bonus regardless of query. Combined with length normalization, this pushed the actualcli.rsexact-basename match below unrelatedlib.rsfiles. Failing test reproduces with user's exact path layout. Fix: 4× multiplier on exact basename match.#356 phase 1 scope (Agent Context Planner framing dropped)
The issue body was rewritten today to drop the "Agent Context Planner" framing — codedb stays a tool, agents stay in charge of composition. No automatic step composition, no natural-language intent parsing. Just three small reliability improvements:
codedb_query. Pipeline previously bailed on first error and discarded prior-step output. Now prior-step output is preserved and a structured--- partial ---tail names the failing step + reason.codedb_outline. Bareerror: file not indexedis now followed bydid you mean:with up to 3 fuzzy matches.codedb_querymissing-arg errors. Mirrors MCP bundle should preserve nested tool arguments for codedb_outline #357's bundle diagnostic. Wired through op-detection,find,search,word, andsymbolerror paths. Other error sites (transient failures, "needs prior step") keep previous behavior — incremental coverage in follow-up.Test plan
zig build test— 428/428 pass/Users/blackfloofie/bin/codedbfor live verificationmcp.zig:1095already caps per-file output at 5)--- partial ---\nfailed_at: N\nreason: <text>) is parseable by agent clientsCommits
🤖 Generated with Claude Code