Skip to content

perf: parallelize WordIndex build (cold index 1.49×) + leaner ranked search#520

Merged
justrach merged 1 commit into
release/0.2.5824from
feat/faster-indexing-search
Jun 1, 2026
Merged

perf: parallelize WordIndex build (cold index 1.49×) + leaner ranked search#520
justrach merged 1 commit into
release/0.2.5824from
feat/faster-indexing-search

Conversation

@justrach

@justrach justrach commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

Research-backed cold-index and search-path optimizations (from a 13-agent profile+literature workflow). Headline: the cold-index commit phase was dominated by the serial WordIndex.indexFile; this parallelizes it.

#1 — Parallel WordIndex shards (Lucene DocumentsWriterPerThread)

Each parse worker builds a private WordIndex shard (reusing indexFile, backed by its arena); after join, the new WordIndex.mergeShard folds them in worker order. Global doc-ids land in chunk ranges that exactly reproduce the serial assignment, so postings stay doc-id-ascending and BM25 results are byte-identical.

Gated on skip_file_words — shards omit the per-file word set, so they only fit the bulk cold-index build (which never needs incremental removeFile); main.zig always sets it. Threading model unchanged (parse parallel, commit serial → self.allocator stays single-threaded).

Controlled interleaved A/B (same binary, shards on/off, ~16k-file repo, 6 iters, medians):

cold index (real) commit phase user (CPU work)
before (serial) 3.96 s 1517 ms 5.96 s
after (shards) 2.66 s 206 ms 6.20 s
1.49× 7.4× ~flat

user-time staying flat confirms it's the same total work, parallelized rather than reduced.

Search-path wins (behavior-preserving)

  • searchContentRanked: lazy top-k max-heap (std.PriorityQueue) instead of full-sorting every matched doc — identical ordering + skip-and-continue semantics, O(C + (k+skips)·logC) (bm25s top-k).
  • TrigramIndex.candidates: hoist per-pair packTrigram + getPtr hash probes out of the per-candidate bloom loop (bit-identical result set).
  • searchContentRanked: U32Context (std.hash.int) for the three u32-keyed aggregation maps instead of Wyhash-over-4-bytes.

Tests

zig build test671/671. Adds a workers=1-vs-workers=4 parity test asserting identical word hits and BM25 ranked results (locks down the shard doc-id/merge correctness).

Note on measurement

Absolute cold-index wall-time is noisy on Apple Silicon (efficiency-core placement under load — same binary measured 2.5s vs 37s), so the A/B above is interleaved on the same binary to cancel that out; the user-time invariant is the reliable cross-check.

…search

Cold-index commit was dominated by the serial WordIndex.indexFile. Build it
in parallel per-worker shards (reusing indexFile, backed by each worker's
arena), then fold them in worker order via the new WordIndex.mergeShard:
global doc_ids land in chunk ranges that reproduce the serial assignment, so
postings stay doc_id-ascending and BM25 results are byte-identical (the
Lucene DocumentsWriterPerThread pattern). Gated on skip_file_words — shards
omit the per-file word set, so they only fit the bulk cold-index build that
never needs incremental removeFile; main.zig always sets it. Threading model
is unchanged (parse parallel, commit serial), so self.allocator stays
single-threaded.

Controlled interleaved A/B on a ~16k-file repo (same binary, shards on/off):
cold index 3.96s -> 2.66s (1.49x); commit phase 1517ms -> 206ms (7.4x);
user-time flat -> same total CPU work, parallelized rather than reduced.

Research-backed, behavior-preserving search-path wins:
- searchContentRanked: lazy top-k max-heap (std.PriorityQueue) instead of
  full-sorting every matched doc — identical ordering and skip-and-continue
  semantics, O(C + (k+skips)*logC) (bm25s top-k retrieval).
- TrigramIndex.candidates: hoist per-pair packTrigram + getPtr hash probes
  out of the per-candidate bloom loop (bit-identical result set).
- searchContentRanked: U32Context (std.hash.int) for the three u32-keyed
  aggregation maps instead of Wyhash over 4 key bytes.

Adds a workers=1-vs-4 parity test asserting identical word hits AND BM25
ranked results. 671/671 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 90899 91421 +0.57% +522 OK
codedb_changes 11870 11651 -1.84% -219 OK
codedb_context 963740 829397 -13.94% -134343 OK
codedb_deps 268 239 -10.82% -29 OK
codedb_edit 49681 50622 +1.89% +941 OK
codedb_find 10061 9860 -2.00% -201 OK
codedb_hot 27217 26494 -2.66% -723 OK
codedb_outline 31797 29129 -8.39% -2668 OK
codedb_read 17774 15837 -10.90% -1937 OK
codedb_search 24244 22775 -6.06% -1469 OK
codedb_snapshot 67672 67507 -0.24% -165 OK
codedb_status 10099 10196 +0.96% +97 OK
codedb_symbol 22223 18420 -17.11% -3803 OK
codedb_tree 20179 33050 +63.78% +12871 NOISE
codedb_word 14642 13518 -7.68% -1124 OK

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 460f48d979

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/watcher.zig
.outline = outline_copy,
.skip_trigram = file.skip_trigram,
}) catch continue;
if (word_shard) |ws| ws.indexFile(file.path, content_copy) catch {};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate shard indexing failures

When the parallel shard path is enabled, a file is appended to the worker results and later committed, but any failure from ws.indexFile is swallowed here. Under allocation pressure this can persist a word index that silently omits files even though their outlines/contents/trigrams were committed, so codedb_word and multi-word ranked search miss them; the serial path would return the indexing error instead of publishing an incomplete index.

Useful? React with 👍 / 👎.

@justrach justrach merged commit 71f3d51 into release/0.2.5824 Jun 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant