Skip to content

index: mmap trigram index — removeFile silently no-ops; overlay promotion never masks stale base entries #593

@justrach

Description

@justrach

Problem

AnyTrigramIndex.removeFile is {} in pure-mmap mode (src/index.zig:2518), and the mmap_overlay promotion keeps querying the base for paths the overlay has since superseded:

  • a file deleted while the index is zero-copy stays containsFile == true forever, its trigrams keep producing candidates, and fileCount stays inflated
  • a file edited after an mmap load feeds candidates from both its old content (mmap base) and its new content (overlay) — containsFile ORs the two (src/index.zig:2494) and candidates/candidatesRegex merge both sides with no masking

Ghost candidates are re-verified against real content downstream, so the damage is wasted I/O per search plus wrong containment/count answers — but containsFile gates the tier-3 supplemental scan (src/explore.zig:3531, :5294), so a stale true can silently drop a re-indexed file out of the scan tiers that were supposed to cover it.

Failing Test

test_index.zig — fails on current release tip (first assert: containsFile stays true after removeFile):

test "issue-590: mmap trigram index — removeFile takes effect and re-index masks stale base entries" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    var explorer = Explorer.init(testing.allocator, Explorer.DEFAULT_CONTENT_CACHE_CAPACITY);
    defer explorer.deinit();
    try explorer.indexFile("src/auth.zig", "pub fn handleAuth(req: *Request) !void { validate(req); }");
    try explorer.indexFile("src/gate.zig", "pub fn checkGate(ctx: *Context) !bool { return ctx.authenticated; }");
    try explorer.indexFile("src/util.zig", "pub fn formatStr(buf: []u8, args: anytype) !void {}");

    var tmp_dir = testing.tmpDir(.{});
    defer tmp_dir.cleanup();
    var path_buf: [std.fs.max_path_bytes]u8 = undefined;
    const tmp_path_len = try tmp_dir.dir.realPathFile(io, ".", &path_buf);
    const tmp_path = path_buf[0..tmp_path_len];
    try explorer.trigram_index.writeToDisk(io, tmp_path, null);

    const mmap_idx = MmapTrigramIndex.initFromDisk(io, tmp_path, testing.allocator) orelse
        return error.MmapInitFailed;
    var any_idx = AnyTrigramIndex{ .mmap = mmap_idx };
    defer any_idx.deinit();

    // A delete while zero-copy must take effect, not silently no-op.
    any_idx.removeFile("src/gate.zig");
    try testing.expect(!any_idx.containsFile("src/gate.zig"));
    if (any_idx.candidates("checkGate", allocator)) |cands| {
        for (cands) |p| try testing.expect(!std.mem.eql(u8, p, "src/gate.zig"));
    }

    // Re-indexing must mask the base's stale trigrams for that path.
    try any_idx.indexFile("src/auth.zig", "pub fn renamedAuth() void {}");
    if (any_idx.candidates("handleAuth", allocator)) |cands| {
        for (cands) |p| try testing.expect(!std.mem.eql(u8, p, "src/auth.zig"));
    }
    const fresh = any_idx.candidates("renamedAuth", allocator) orelse return error.NoCandidates;
    var found = false;
    for (fresh) |p| {
        if (std.mem.eql(u8, p, "src/auth.zig")) found = true;
    }
    try testing.expect(found);

    // File accounting follows: 3 on disk, one removed.
    try testing.expectEqual(@as(u32, 2), any_idx.fileCount());
}

Expected

Removal and re-index behave identically across heap, mmap, and overlay modes: base entries for superseded/removed paths stop answering.

Fix

Add a masked path set (owned keys) to MmapOverlay: indexFile/removeFile mask the path (removeFile on .mmap promotes to an overlay first — a remove is a write), containsFile and the candidates/candidatesRegex merges filter base hits through it, and fileCount subtracts a maintained masked-in-base counter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions