Skip to content

mcp: codedb_callers conflates substring matches with real call sites #425

@justrach

Description

@justrach

Problem

handleCallers (src/mcp.zig:1339) finds call sites by running explorer.searchContentWithScope(name, ...) — a substring full-text search. Then it de-dupes results by filtering out the canonical definition line of name (matching on path == d.path and line_num == d.symbol.line_start).

That filter only removes the one definition site of the searched name. It does not remove lines that mention a different identifier whose name contains the search term as a substring.

Concrete reproduction: codedb_callers(name="fooBar") returns lines that mention fooBarExtended — both its definition site and any references — as if they were call sites of fooBar.

The eval found this for searchInContent returning hits inside searchInContentWithScope, and for isIndexableRoot returning matches against itself in design docs.

Failing Test

test "issue-425: codedb_callers excludes substring matches in unrelated identifiers" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var explorer = Explorer.init(arena.allocator());
    var store = Store.init(testing.allocator);
    defer store.deinit();
    var agents = AgentRegistry.init(testing.allocator);
    defer agents.deinit();
    _ = try agents.register("__filesystem__");
    var bench_ctx = mcp_mod.BenchContext.init(testing.allocator, ".");
    defer bench_ctx.deinit();

    try explorer.indexFile("def.zig", "pub fn fooBar() void {}\n");
    try explorer.indexFile("other.zig", "pub fn fooBarExtended() void {}\n");
    try explorer.indexFile("a.zig", "pub fn callerA() void {\n    fooBar();\n}\n");

    const args_json =
        \\{"name":"fooBar"}
    ;
    const parsed = try std.json.parseFromSlice(std.json.Value, testing.allocator, args_json, .{});
    defer parsed.deinit();
    var out: std.ArrayList(u8) = .empty;
    defer out.deinit(testing.allocator);
    bench_ctx.runDispatch(io, testing.allocator, .codedb_callers, &parsed.value.object, &out, &store, &explorer, &agents);

    try testing.expect(std.mem.indexOf(u8, out.items, "a.zig:2") != null);
    try testing.expect(std.mem.indexOf(u8, out.items, "other.zig") == null);
    try testing.expect(std.mem.indexOf(u8, out.items, "fooBarExtended") == null);
    try testing.expect(std.mem.indexOf(u8, out.items, "1 call sites for 'fooBar'") != null);
}

Failing test lives on branch issue-425-failing-test (commit 656d713).

$ zig build test 2>&1 | rg "issue-425"
error: 'tests.test.issue-425: codedb_callers excludes substring matches in unrelated identifiers' failed
       /Users/.../src/tests.zig:10492: try testing.expect(std.mem.indexOf(u8, out.items, "other.zig") == null);

Expected

codedb_callers(name="fooBar") returns only lines where fooBar appears as a whole-word identifier — not as a substring of a longer identifier. The header count reflects the real number of call sites.

Fix

In handleCallers (src/mcp.zig:1352-1382), gate each emission on a whole-word check against r.line_text: at the byte index of the substring match, require the preceding byte (if any) and the following byte (if any) to be non-identifier characters (i.e. not [A-Za-z0-9_]). If the line has no whole-word occurrence of name, skip it.

Effort: small. One helper (hasWholeWordMatch(line, name) bool) reused inside the existing for-loop.

Eval context

Found by an automated codedb evaluation against codedb 0.2.5805. Filed alongside #426 (non-code files leaking into callers) and #427 (Tier 1 sort starves the canonical definition file).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions