perf: parallelize WordIndex build (cold index 1.49×) + leaner ranked search by justrach · Pull Request #520 · justrach/codedb

justrach · 2026-06-01T14:23:00Z

Summary

Research-backed cold-index and search-path optimizations (from a 13-agent profile+literature workflow). Headline: the cold-index commit phase was dominated by the serial WordIndex.indexFile; this parallelizes it.

#1 — Parallel WordIndex shards (Lucene DocumentsWriterPerThread)

Each parse worker builds a private WordIndex shard (reusing indexFile, backed by its arena); after join, the new WordIndex.mergeShard folds them in worker order. Global doc-ids land in chunk ranges that exactly reproduce the serial assignment, so postings stay doc-id-ascending and BM25 results are byte-identical.

Gated on skip_file_words — shards omit the per-file word set, so they only fit the bulk cold-index build (which never needs incremental removeFile); main.zig always sets it. Threading model unchanged (parse parallel, commit serial → self.allocator stays single-threaded).

Controlled interleaved A/B (same binary, shards on/off, ~16k-file repo, 6 iters, medians):

	cold index (real)	commit phase	user (CPU work)
before (serial)	3.96 s	1517 ms	5.96 s
after (shards)	2.66 s	206 ms	6.20 s
	1.49×	7.4×	~flat

user-time staying flat confirms it's the same total work, parallelized rather than reduced.

Search-path wins (behavior-preserving)

searchContentRanked: lazy top-k max-heap (std.PriorityQueue) instead of full-sorting every matched doc — identical ordering + skip-and-continue semantics, O(C + (k+skips)·logC) (bm25s top-k).
TrigramIndex.candidates: hoist per-pair packTrigram + getPtr hash probes out of the per-candidate bloom loop (bit-identical result set).
searchContentRanked: U32Context (std.hash.int) for the three u32-keyed aggregation maps instead of Wyhash-over-4-bytes.

Tests

zig build test → 671/671. Adds a workers=1-vs-workers=4 parity test asserting identical word hits and BM25 ranked results (locks down the shard doc-id/merge correctness).

Note on measurement

Absolute cold-index wall-time is noisy on Apple Silicon (efficiency-core placement under load — same binary measured 2.5s vs 37s), so the A/B above is interleaved on the same binary to cancel that out; the user-time invariant is the reliable cross-check.

…search Cold-index commit was dominated by the serial WordIndex.indexFile. Build it in parallel per-worker shards (reusing indexFile, backed by each worker's arena), then fold them in worker order via the new WordIndex.mergeShard: global doc_ids land in chunk ranges that reproduce the serial assignment, so postings stay doc_id-ascending and BM25 results are byte-identical (the Lucene DocumentsWriterPerThread pattern). Gated on skip_file_words — shards omit the per-file word set, so they only fit the bulk cold-index build that never needs incremental removeFile; main.zig always sets it. Threading model is unchanged (parse parallel, commit serial), so self.allocator stays single-threaded. Controlled interleaved A/B on a ~16k-file repo (same binary, shards on/off): cold index 3.96s -> 2.66s (1.49x); commit phase 1517ms -> 206ms (7.4x); user-time flat -> same total CPU work, parallelized rather than reduced. Research-backed, behavior-preserving search-path wins: - searchContentRanked: lazy top-k max-heap (std.PriorityQueue) instead of full-sorting every matched doc — identical ordering and skip-and-continue semantics, O(C + (k+skips)*logC) (bm25s top-k retrieval). - TrigramIndex.candidates: hoist per-pair packTrigram + getPtr hash probes out of the per-candidate bloom loop (bit-identical result set). - searchContentRanked: U32Context (std.hash.int) for the three u32-keyed aggregation maps instead of Wyhash over 4 key bytes. Adds a workers=1-vs-4 parity test asserting identical word hits AND BM25 ranked results. 671/671 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T14:25:43Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	90899	91421	+0.57%	+522	OK
`codedb_changes`	11870	11651	-1.84%	-219	OK
`codedb_context`	963740	829397	-13.94%	-134343	OK
`codedb_deps`	268	239	-10.82%	-29	OK
`codedb_edit`	49681	50622	+1.89%	+941	OK
`codedb_find`	10061	9860	-2.00%	-201	OK
`codedb_hot`	27217	26494	-2.66%	-723	OK
`codedb_outline`	31797	29129	-8.39%	-2668	OK
`codedb_read`	17774	15837	-10.90%	-1937	OK
`codedb_search`	24244	22775	-6.06%	-1469	OK
`codedb_snapshot`	67672	67507	-0.24%	-165	OK
`codedb_status`	10099	10196	+0.96%	+97	OK
`codedb_symbol`	22223	18420	-17.11%	-3803	OK
`codedb_tree`	20179	33050	+63.78%	+12871	NOISE
`codedb_word`	14642	13518	-7.68%	-1124	OK

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 460f48d979

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T14:30:21Z

                .outline = outline_copy,
                .skip_trigram = file.skip_trigram,
            }) catch continue;
+            if (word_shard) |ws| ws.indexFile(file.path, content_copy) catch {};


Propagate shard indexing failures

When the parallel shard path is enabled, a file is appended to the worker results and later committed, but any failure from ws.indexFile is swallowed here. Under allocation pressure this can persist a word index that silently omits files even though their outlines/contents/trigrams were committed, so codedb_word and multi-word ranked search miss them; the serial path would return the indexing error instead of publishing an incomplete index.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

justrach merged commit 71f3d51 into release/0.2.5824 Jun 1, 2026
1 check passed

This was referenced Jun 2, 2026

Snapshot load performance (~3×) + parallel freshness + #518 fixes #524

Merged

release: 0.2.5824 → main (perf overhaul + warm-CLI daemon + faster find + graph ranking + Windsurf/Devin) #527

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize WordIndex build (cold index 1.49×) + leaner ranked search#520

perf: parallelize WordIndex build (cold index 1.49×) + leaner ranked search#520
justrach merged 1 commit into
release/0.2.5824from
feat/faster-indexing-search

justrach commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented Jun 1, 2026

Summary

#1 — Parallel WordIndex shards (Lucene DocumentsWriterPerThread)

Search-path wins (behavior-preserving)

Tests

Note on measurement

Uh oh!

github-actions Bot commented Jun 1, 2026

Benchmark Regression Report

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant