Problem
The post-pass rerank in Explorer.searchContent (src/explore.zig:1700-1712) is a single-signal scorer:
for (result_list.items) |*r| {
r.score = countOccurrences(r.line_text, query);
}
std.sort.block(SearchResult, result_list.items, {}, struct {
pub fn lessThan(_: void, a: SearchResult, b: SearchResult) bool {
if (a.score != b.score) return a.score > b.score;
const ord = std.mem.order(u8, a.path, b.path);
if (ord != .eq) return ord == .lt;
return a.line_num < b.line_num;
}
}.lessThan);
It counts occurrences inside one line and tiebreaks by (path asc, line_num asc). That ignores three signals an experienced reader would weight heavily:
- Basename match. Querying
widgetX and one of the candidate files is src/widgetX.zig — the developer is almost certainly asking about that file. Today the alphabetic tiebreaker promotes src/unrelated.zig over src/widgetX.zig when both have one occurrence.
- Path prior. Hits in
examples/, tests/, vendor/, node_modules/ are usually less relevant than hits in src/, lib/. Today examples/... outranks src/... simply because e < s.
- Symbol-definition lines. A line that defines a symbol named after the query is the canonical hit. A passing comment mention with the same per-line count should rank below it. Today both score 1, alphabetic tiebreaker decides.
Failing Tests
Three on branch issue-429-failing-test (commit c4c9056). Each demonstrates one signal in isolation. All three fail today:
test "issue-429-a: searchContent rerank boosts files whose basename matches the query" {
// src/unrelated.zig and src/widgetX.zig, both with one hit. Expected:
// src/widgetX.zig (basename match) ranks first. Today it doesn't.
try testing.expectEqualStrings("src/widgetX.zig", results[0].path);
}
test "issue-429-b: searchContent rerank penalizes test/vendor/examples paths" {
// examples/sample.zig and src/sample.zig, both with one hit. Expected:
// src/sample.zig ranks first.
try testing.expectEqualStrings("src/sample.zig", results[0].path);
}
test "issue-429-c: searchContent rerank boosts lines that are symbol definitions" {
// aaa.zig (comment mention) and zzz_def.zig (`pub fn fooSym() void {}`).
// Expected: zzz_def.zig ranks first (symbol-def boost).
try testing.expectEqualStrings("zzz_def.zig", results[0].path);
}
Expected
searchContent's rerank composes (at minimum):
- Per-line occurrence count (existing)
- Basename-match boost
- Path-prior penalty for
examples/, tests/, vendor/, node_modules/
- Symbol-definition boost (when the line is a symbol definition for the query, looked up via outline)
with weights so the existing per-line frequency signal still wins on its own when no other signal applies.
Fix
Replace the single-pass countOccurrences-only score with a composed scorer in searchContent:
fn scoreResult(r: SearchResult, query: []const u8, outline_lookup: ...) f32 {
var score: f32 = @floatFromInt(countOccurrences(r.line_text, query));
if (basenameStem(r.path) matches query) score += 10.0;
if (hasSegment(r.path, "tests") or hasSegment(r.path, "test")) score *= 0.6;
if (hasSegment(r.path, "examples")) score *= 0.6;
if (hasSegment(r.path, "vendor") or hasSegment(r.path, "node_modules")) score *= 0.4;
if (lineDefinesSymbolNamed(r.path, r.line_num, query)) score += 5.0;
return score;
}
Constants are tuneable; the goal is for each signal alone to flip the order on these failing tests.
Related
Companion to #427 (Tier 1 file-length sort). Together they cover both candidate selection and post-rank ordering.
Problem
The post-pass rerank in
Explorer.searchContent(src/explore.zig:1700-1712) is a single-signal scorer:It counts occurrences inside one line and tiebreaks by
(path asc, line_num asc). That ignores three signals an experienced reader would weight heavily:widgetXand one of the candidate files issrc/widgetX.zig— the developer is almost certainly asking about that file. Today the alphabetic tiebreaker promotessrc/unrelated.zigoversrc/widgetX.zigwhen both have one occurrence.examples/,tests/,vendor/,node_modules/are usually less relevant than hits insrc/,lib/. Todayexamples/...outrankssrc/...simply becausee<s.Failing Tests
Three on branch
issue-429-failing-test(commitc4c9056). Each demonstrates one signal in isolation. All three fail today:Expected
searchContent's rerank composes (at minimum):examples/,tests/,vendor/,node_modules/with weights so the existing per-line frequency signal still wins on its own when no other signal applies.
Fix
Replace the single-pass
countOccurrences-only score with a composed scorer insearchContent:Constants are tuneable; the goal is for each signal alone to flip the order on these failing tests.
Related
Companion to #427 (Tier 1 file-length sort). Together they cover both candidate selection and post-rank ordering.