You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WordIndex.readFromDisk and WordIndex.mmapFromDisk both set skip_file_words = true (src/index.zig:790, :819), and promoteIfBorrowed keeps it that way. With file_words empty, removeFile hits self.file_words.fetchRemove(path) orelse return (src/index.zig:108) and becomes a silent no-op for every disk-loaded index — exactly the mode the warm daemon and CLI fast paths run in.
Consequences, all on the post-load write path (indexFile calls removeFile first, src/index.zig:186):
Re-indexing a file appends new postings while every stale one survives: terms deleted from the file keep hitting it (at stale line numbers), and (doc, line) duplicates inflate BM25 term frequency.
Deleting a file leaves all of its postings live — ghost hits with a valid-looking path.
Unbounded postings growth: in a long-running watch/daemon session every file save grows the index; memory is never reclaimed (RSS). index: add BM25 ranking for content search #400 fixed the total_tokens/doc_lengths counters in this mode but not the postings themselves.
In pure zero-copy mmap mode removeFile is doubly a no-op: path_to_id is empty too, and unlike indexFile it never promotes the mmap to heap.
Failing Test
test_index.zig — fails on current release tip:
test"issue-582: disk-loaded word index — re-index and removeFile must drop stale postings" {
constalloc=testing.allocator;
varwi=WordIndex.init(alloc);
deferwi.deinit();
trywi.indexFile("src/a.zig", "pub fn alphaToken() void {}\n");
trywi.indexFile("src/b.zig", "pub fn betaToken() void {}\n");
vartmp=testing.tmpDir(.{});
defertmp.cleanup();
varpath_buf: [std.fs.max_path_bytes]u8=undefined;
constdir_path_len=trytmp.dir.realPathFile(io, ".", &path_buf);
constdir_path=path_buf[0..dir_path_len];
trywi.writeToDisk(io, dir_path, null);
// Heap fast-load: re-indexing a file must drop its old postings.varloaded=WordIndex.readFromDisk(io, dir_path, alloc).?;
deferloaded.deinit();
tryloaded.indexFile("src/a.zig", "pub fn gammaToken() void {}\n");
conststale=tryloaded.searchDeduped("alphaToken", alloc);
deferalloc.free(stale);
trytesting.expectEqual(@as(usize, 0), stale.len);
constfresh=tryloaded.searchDeduped("gammaToken", alloc);
deferalloc.free(fresh);
trytesting.expectEqual(@as(usize, 1), fresh.len);
// Deleting a file must drop its postings outright.loaded.removeFile("src/b.zig");
constghost=tryloaded.searchDeduped("betaToken", alloc);
deferalloc.free(ghost);
trytesting.expectEqual(@as(usize, 0), ghost.len);
// Zero-copy mmap load: removeFile is a write — it must promote, not no-op.varmloaded=WordIndex.mmapFromDisk(io, dir_path, alloc).?;
defermloaded.deinit();
mloaded.removeFile("src/a.zig");
constmghost=trymloaded.searchDeduped("alphaToken", alloc);
deferalloc.free(mghost);
trytesting.expectEqual(@as(usize, 0), mghost.len);
}
Expected
After a disk fast-load, indexFile of changed content replaces a file's postings, and removeFile drops them — same observable behavior as a scratch-built index.
Fix
Give removeFile a slow path for the no-file_words case: when path_to_id knows the path, sweep index for the doc_id (prune empty buckets), fix doc_lengths/total_tokens, blank + free the id_to_path slot (the skip-mode owner). In mmap mode, promote first (a remove is a write), but only if the path is actually tracked. The sweep is O(index) but runs at most once per file edit after a load; the fast path is untouched.
Problem
WordIndex.readFromDiskandWordIndex.mmapFromDiskboth setskip_file_words = true(src/index.zig:790, :819), andpromoteIfBorrowedkeeps it that way. Withfile_wordsempty,removeFilehitsself.file_words.fetchRemove(path) orelse return(src/index.zig:108) and becomes a silent no-op for every disk-loaded index — exactly the mode the warm daemon and CLI fast paths run in.Consequences, all on the post-load write path (
indexFilecallsremoveFilefirst, src/index.zig:186):(doc, line)duplicates inflate BM25 term frequency.total_tokens/doc_lengthscounters in this mode but not the postings themselves.removeFileis doubly a no-op:path_to_idis empty too, and unlikeindexFileit never promotes the mmap to heap.Failing Test
test_index.zig— fails on current release tip:Expected
After a disk fast-load,
indexFileof changed content replaces a file's postings, andremoveFiledrops them — same observable behavior as a scratch-built index.Fix
Give
removeFilea slow path for the no-file_wordscase: whenpath_to_idknows the path, sweepindexfor the doc_id (prune empty buckets), fixdoc_lengths/total_tokens, blank + free theid_to_pathslot (the skip-mode owner). In mmap mode, promote first (a remove is a write), but only if the path is actually tracked. The sweep is O(index) but runs at most once per file edit after a load; the fast path is untouched.