Problem
handleCallers (src/mcp.zig:1339) calls explorer.searchContentWithScope(name, ...) which scans every indexed file regardless of language. Documentation files (.md, .rst, design docs) that mention a symbol name in prose appear in the output as if they were code call sites.
The eval reproduced this for codedb_callers(name="isIndexableRoot") — 4 of the 19 returned hits were markdown lines from docs/rfc-346-mcp-root-resolution.md describing the symbol in design-doc text.
This is orthogonal to the substring-match bug (#425): even a perfect whole-word identifier match in a markdown file is still not a call site.
Failing Test
test "issue-426: codedb_callers excludes non-code files (markdown, docs)" {
var arena = std.heap.ArenaAllocator.init(testing.allocator);
defer arena.deinit();
var explorer = Explorer.init(arena.allocator());
var store = Store.init(testing.allocator);
defer store.deinit();
var agents = AgentRegistry.init(testing.allocator);
defer agents.deinit();
_ = try agents.register("__filesystem__");
var bench_ctx = mcp_mod.BenchContext.init(testing.allocator, ".");
defer bench_ctx.deinit();
try explorer.indexFile("def.zig", "pub fn fooBar() void {}\n");
try explorer.indexFile("a.zig", "pub fn callerA() void {\n fooBar();\n}\n");
try explorer.indexFile(
"docs/notes.md",
"# Notes\n\nThe fooBar helper is documented here for posterity.\n",
);
const args_json = \\{"name":"fooBar"};
const parsed = try std.json.parseFromSlice(std.json.Value, testing.allocator, args_json, .{});
defer parsed.deinit();
var out: std.ArrayList(u8) = .empty;
defer out.deinit(testing.allocator);
bench_ctx.runDispatch(io, testing.allocator, .codedb_callers, &parsed.value.object, &out, &store, &explorer, &agents);
try testing.expect(std.mem.indexOf(u8, out.items, "a.zig:2") != null);
try testing.expect(std.mem.indexOf(u8, out.items, "docs/notes.md") == null);
try testing.expect(std.mem.indexOf(u8, out.items, "1 call sites for 'fooBar'") != null);
}
Failing test on branch issue-426-failing-test (commit 7d6c8bc).
$ zig build test 2>&1 | rg "issue-426"
error: 'tests.test.issue-426: codedb_callers excludes non-code files (markdown, docs)' failed
/Users/.../src/tests.zig:10494: try testing.expect(std.mem.indexOf(u8, out.items, "docs/notes.md") == null);
Expected
codedb_callers returns hits only from code files (Zig, C, Python, TypeScript, etc.) — files for which there is a parser that would produce real call expressions. Markdown, plain text, and other prose formats are excluded.
Fix
In handleCallers (src/mcp.zig:1352-1382), use explore_mod.detectLanguage(r.path) and skip results whose detected language is Language.unknown, Language.markdown, or Language.text. (Exact whitelist depends on which Language variants are considered "code" — easiest to invert: keep only languages that have an outline parser.)
Effort: small. One predicate isCodeLanguage(lang) bool reused inside the existing for-loop.
Eval context
Found by an automated codedb evaluation against codedb 0.2.5805. Filed alongside #425 (substring leakage) and #427 (Tier 1 sort starves the canonical definition file).
Problem
handleCallers(src/mcp.zig:1339) callsexplorer.searchContentWithScope(name, ...)which scans every indexed file regardless of language. Documentation files (.md,.rst, design docs) that mention a symbol name in prose appear in the output as if they were code call sites.The eval reproduced this for
codedb_callers(name="isIndexableRoot")— 4 of the 19 returned hits were markdown lines fromdocs/rfc-346-mcp-root-resolution.mddescribing the symbol in design-doc text.This is orthogonal to the substring-match bug (#425): even a perfect whole-word identifier match in a markdown file is still not a call site.
Failing Test
Failing test on branch
issue-426-failing-test(commit7d6c8bc).Expected
codedb_callersreturns hits only from code files (Zig, C, Python, TypeScript, etc.) — files for which there is a parser that would produce real call expressions. Markdown, plain text, and other prose formats are excluded.Fix
In
handleCallers(src/mcp.zig:1352-1382), useexplore_mod.detectLanguage(r.path)and skip results whose detected language isLanguage.unknown,Language.markdown, orLanguage.text. (Exact whitelist depends on which Language variants are considered "code" — easiest to invert: keep only languages that have an outline parser.)Effort: small. One predicate
isCodeLanguage(lang) boolreused inside the existing for-loop.Eval context
Found by an automated codedb evaluation against codedb 0.2.5805. Filed alongside #425 (substring leakage) and #427 (Tier 1 sort starves the canonical definition file).