Skip to content

mcp: codedb_callers includes non-code files (markdown, docs) as call sites #426

@justrach

Description

@justrach

Problem

handleCallers (src/mcp.zig:1339) calls explorer.searchContentWithScope(name, ...) which scans every indexed file regardless of language. Documentation files (.md, .rst, design docs) that mention a symbol name in prose appear in the output as if they were code call sites.

The eval reproduced this for codedb_callers(name="isIndexableRoot") — 4 of the 19 returned hits were markdown lines from docs/rfc-346-mcp-root-resolution.md describing the symbol in design-doc text.

This is orthogonal to the substring-match bug (#425): even a perfect whole-word identifier match in a markdown file is still not a call site.

Failing Test

test "issue-426: codedb_callers excludes non-code files (markdown, docs)" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var explorer = Explorer.init(arena.allocator());
    var store = Store.init(testing.allocator);
    defer store.deinit();
    var agents = AgentRegistry.init(testing.allocator);
    defer agents.deinit();
    _ = try agents.register("__filesystem__");
    var bench_ctx = mcp_mod.BenchContext.init(testing.allocator, ".");
    defer bench_ctx.deinit();

    try explorer.indexFile("def.zig", "pub fn fooBar() void {}\n");
    try explorer.indexFile("a.zig", "pub fn callerA() void {\n    fooBar();\n}\n");
    try explorer.indexFile(
        "docs/notes.md",
        "# Notes\n\nThe fooBar helper is documented here for posterity.\n",
    );

    const args_json = \\{"name":"fooBar"};
    const parsed = try std.json.parseFromSlice(std.json.Value, testing.allocator, args_json, .{});
    defer parsed.deinit();
    var out: std.ArrayList(u8) = .empty;
    defer out.deinit(testing.allocator);
    bench_ctx.runDispatch(io, testing.allocator, .codedb_callers, &parsed.value.object, &out, &store, &explorer, &agents);

    try testing.expect(std.mem.indexOf(u8, out.items, "a.zig:2") != null);
    try testing.expect(std.mem.indexOf(u8, out.items, "docs/notes.md") == null);
    try testing.expect(std.mem.indexOf(u8, out.items, "1 call sites for 'fooBar'") != null);
}

Failing test on branch issue-426-failing-test (commit 7d6c8bc).

$ zig build test 2>&1 | rg "issue-426"
error: 'tests.test.issue-426: codedb_callers excludes non-code files (markdown, docs)' failed
       /Users/.../src/tests.zig:10494: try testing.expect(std.mem.indexOf(u8, out.items, "docs/notes.md") == null);

Expected

codedb_callers returns hits only from code files (Zig, C, Python, TypeScript, etc.) — files for which there is a parser that would produce real call expressions. Markdown, plain text, and other prose formats are excluded.

Fix

In handleCallers (src/mcp.zig:1352-1382), use explore_mod.detectLanguage(r.path) and skip results whose detected language is Language.unknown, Language.markdown, or Language.text. (Exact whitelist depends on which Language variants are considered "code" — easiest to invert: keep only languages that have an outline parser.)

Effort: small. One predicate isCodeLanguage(lang) bool reused inside the existing for-loop.

Eval context

Found by an automated codedb evaluation against codedb 0.2.5805. Filed alongside #425 (substring leakage) and #427 (Tier 1 sort starves the canonical definition file).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions