Skip to content

index: AnyTrigramIndex.writeToDisk silently no-ops for mmap_overlay — overlay edits never persist #600

@justrach

Description

@justrach

Problem

AnyTrigramIndex.writeToDisk (src/index.zig:2566) dispatches:

.heap => |*h| try h.writeToDisk(io, dir_path, git_head),
.mmap => {},
.mmap_overlay => {},

.mmap => {} is correct — a pure mmap view has no in-memory changes; the disk files already are the index. But .mmap_overlay => {} returns success while writing nothing. The overlay exists precisely because the index took writes after a cold mmap load (indexFile promotes mmap → mmap_overlay, removeFile masks base entries). Every call site in main.zig (shutdown persist, checkpoint persist at src/main.zig:1372/1398/2453) believes the state was saved.

Net effect: a long-lived daemon that started from a cold mmap load and then took any edit never persists those edits. Every cold start re-reads the stale base — new files missing from the trigram index, deleted files resurrected — until something forces a full re-index.

Failing Test

src/test_index.zig, fails on current release branch at the fresh.zig containment check:

test "issue-600: mmap_overlay writeToDisk persists overlay edits" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    const allocator = arena.allocator();

    var tmp_dir = testing.tmpDir(.{});
    defer tmp_dir.cleanup();
    var path_buf: [std.fs.max_path_bytes]u8 = undefined;
    const tmp_path_len = try tmp_dir.dir.realPathFile(io, ".", &path_buf);
    const tmp_path = path_buf[0..tmp_path_len];

    // Seed the on-disk index with two files.
    {
        var seed = TrigramIndex.init(testing.allocator);
        defer seed.deinit();
        try seed.indexFile("keep.zig", "const keeper_token_alpha = 1;");
        try seed.indexFile("gone.zig", "const goner_token_beta = 2;");
        try seed.writeToDisk(io, tmp_path, null);
    }

    // Cold-load as mmap, then take edits: one new file, one removal.
    var any = AnyTrigramIndex{ .mmap = MmapTrigramIndex.initFromDisk(io, tmp_path, testing.allocator) orelse
        return error.MmapInitFailed };
    defer any.deinit();
    try any.indexFile("fresh.zig", "const fresh_token_gamma = 3;");
    any.removeFile("gone.zig");
    try testing.expect(any == .mmap_overlay);

    // writeToDisk reports success, so a cold start must see the edits.
    try any.writeToDisk(io, tmp_path, null);

    var reloaded = TrigramIndex.readFromDisk(io, tmp_path, testing.allocator) orelse
        return error.ReadBackFailed;
    defer reloaded.deinit();

    try testing.expect(reloaded.file_trigrams.contains("keep.zig"));
    try testing.expect(reloaded.file_trigrams.contains("fresh.zig"));
    try testing.expect(!reloaded.file_trigrams.contains("gone.zig"));

    const fresh = reloaded.candidates("fresh_token_gamma", allocator) orelse
        return error.NoCandidates;
    try testing.expectEqual(@as(usize, 1), fresh.len);
    try testing.expectEqualStrings("fresh.zig", fresh[0]);

    const keep = reloaded.candidates("keeper_token_alpha", allocator) orelse
        return error.NoCandidates;
    try testing.expectEqual(@as(usize, 1), keep.len);

    const gone = reloaded.candidates("goner_token_beta", allocator);
    if (gone) |g| {
        try testing.expectEqual(@as(usize, 0), g.len);
    }
}

Expected

writeToDisk on an mmap_overlay persists the merged logical state — base postings minus masked paths, plus overlay postings — so a cold start reproduces exactly what queries against the live overlay see.

Fix

Materialize a heap TrigramIndex from the merged state and reuse the existing atomic serializer:

  1. Create doc ids for unmasked base files (base file-table order), then overlay files (owns_paths = true, mirroring readFromDisk).
  2. Walk base lookup entries/postings, dropping postings whose path is in masked.
  3. Walk overlay index postings and merge them in (overlay paths are always masked in base, so no id collision).
  4. merged.writeToDisk(...) — already tmp+rename atomic, so overwriting the directory the live base is mmapped from is safe (the old inode stays alive under the mapping).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions