Simplify DiskBBQ dynamic visit ratio to linear by ah89 · Pull Request #142784 · elastic/elasticsearch

ah89 · 2026-02-20T20:25:07Z

Solution Design

Replace the log10(N)² × min(10_000, max(numCands, 5*k)) / N formula in AbstractIVFKnnVectorQuery.rewrite with a simple linear min(10_000, max(numCands, 5*k)) / N mapping — removing only the log10(N)² multiplier. The floor (max(numCands, 5*k)) and ceiling (min(..., 10_000)) are unchanged. This makes num_candidates directly represent the number of vectors to traverse, giving users a predictable, size-independent tuning knob.

Also:

Extract computeDynamicVisitRatio as a testable static method
Add numCands to equals/hashCode since it now directly drives visit ratio when providedVisitRatio == 0
Rename maxVectorVisited to maxExpectedTraversed in IVFVectorsReader for clarity on pre-filter vs post-filter semantics

Benchmarking: `main` (log₁₀²) vs `fix` (linear)

Here are the comparison results on GIST-1M (1M vectors, 960 dims, euclidean, IVF 1-bit BBQ, force-merged to 1 segment):

num_candidates	main recall	fix recall	main visited	fix visited	main latency(ms)	fix latency(ms)
15	0.29	0.13	4,018	653	0.11	0.04
50	0.29	0.13	4,018	653	0.12	0.04
100	0.34	0.13	7,595	672	0.18	0.04
200	0.37	0.14	14,756	703	0.31	0.04
500	0.40	0.17	36,390	1,366	0.69	0.06
1000	0.40	0.25	72,405	2,396	1.38	0.08
5000	0.41	0.35	360,329	10,395	6.62	0.27
10000	0.41	0.38	720,384	20,340	13.39	0.44

Key observations

The old formula massively over-visits. At num_candidates=100 on 1M vectors, main visits 7,595 vectors (0.76%), while fix visits 672 (0.07%). The old log₁₀(N)² ≈ 36× multiplier inflates the visit budget by ~11× at this scale.
Recall difference is modest. Going from 7,595 visited → 672 only drops recall from 0.34 → 0.13. The old formula burned 10× more vectors for only +0.21 recall. At higher num_candidates the gap narrows: at 10K, it's 0.41 vs 0.38.
The linear formula is dramatically faster. At num_candidates=10000: 0.44ms vs 13.39ms — a 30× speedup — because it visits 20K vectors instead of 720K.
Diminishing returns on main. The old formula already tops out at recall ~0.41 around num_candidates=500 and can't improve further even at 10K (visiting 72% of all vectors). This suggests GIST-1M is a hard dataset where recall is bounded by quantization quality, not visit budget.
The fix makes num_candidates a direct, predictable knob. With the linear formula, num_candidates directly controls how many vectors are traversed. Users who need higher recall can increase it, understanding the cost linearly.

Closes #142617

elasticsearchmachine · 2026-02-20T20:25:43Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

benwtrent

visitratio goes from 0-100.

This calculation just won't work.

One other thing to consider is moving this "dynamic visit percentage" based on num_candidates to be "per-segment" instead of globally calculated. This may help as well.

ah89 · 2026-02-24T16:07:31Z

Based on the problem of the possibility of large segments the capped linear formula won't work so we chnage it to a sublinear option and also added more modern dataset.

New Formulas

	Main	Proposed (α = 1.5)
Formula	$\text{visitRatio} = \frac{\lfloor \log_{10}(N)^2 \times \text{numCands} \rceil}{N}$	$\text{visitRatio} = \min\Bigl(1,\frac{x \cdot \ln\bigl(1 + \tfrac{N}{x}\bigr)^{1.5}}{N}\Bigr)$
Where	$N$ = total vectors in shard	$x = \min(10\text{K},\max(\text{numCands},5k))$, $N$ = vectors in segment
Actual visited	$2 \times \text{visitRatio} \times N$ (2× for SOAR)	Same

Property Comparison

Property	Main	Proposed (α = 1.5)
Scaling in numCands	Linear — 2× numCands = 2× visited	Sublinear — 2× numCands < 2× visited (dampened by shrinking $\ln$ term)
Scaling in N	$\log_{10}(N)^2$ grows slowly; ratio $\propto \frac{\log^2 N}{N} \to 0$	$\ln(1+N/x)^{1.5}$ grows slowly; ratio $\propto \frac{(\ln N)^{1.5}}{N} \to 0$ faster
Scope	Shard-level (same ratio for every segment)	Per-segment (adapts to each segment's size)
At numCands=10K, N=1M	visitRatio=0.36 → 720K visited (72%)	visitRatio=0.099 → 199K visited (19.9%)
At numCands=10K, N=100M	visitRatio=0.0064 → 1.28M visited (1.28%)	visitRatio=0.0028 → 559K visited (0.56%)
At numCands=10K, N=1B	visitRatio=0.00081 → 1.62M visited (0.16%)	visitRatio=0.00039 → 782K visited (0.078%)
At numCands=100, N=1M	visitRatio=0.0036 → 7.2K visited	visitRatio=0.0028 → 5.6K visited
At numCands=100, N=100M	visitRatio=0.000064 → 12.8K visited	visitRatio=0.000051 → 10.3K visited
At numCands=100, N=1B	visitRatio=0.0000081 → 16.2K visited	visitRatio=0.0000065 → 12.9K visited
numCands meaningful?	Always (linear means every bump helps)	Always (monotonically increasing; diminishing returns at high numCands)
Efficiency at high numCands (N=1M)	Bad — visits 72% of dataset at 10K	Good — visits 20% of dataset at 10K
Low numCands behavior	Floor at $\log_{10}(N)^2 \times 50 / N$	Floor at $50 \cdot \ln(1+N/50)^{1.5}/N$ (slightly less than main → slightly less recall)

Benchmarking: main (`log₁₀²`) vs fix (`sublinear`)

GIST-1M (N=1,000,000)

numCands	Main recall/visited/ms	α=1.5 recall/visited/ms	vs main visited
15	0.94 / 3,806 / 0.24	0.93 / 3,382 / 0.20	0.89×
50	0.94 / 3,806 / 0.23	0.93 / 3,382 / 0.22	0.89×
100	0.94 / 7,410 / 0.37	0.94 / 5,815 / 0.31	0.78×
200	0.96 / 14,650 / 0.63	0.95 / 10,168 / 0.47	0.69×
500	0.97 / 36,246 / 1.47	0.96 / 21,225 / 0.88	0.59×
1000	0.97 / 72,263 / 2.89	0.97 / 36,563 / 1.49	0.51×
5000	0.97 / 360,347 / 14.13	0.97 / 122,424 / 4.87	0.34×
10000	0.97 / 720,354 / 27.49	0.97 / 198,647 / 7.87	0.28×

Wiki-Cohere (N=934,024)

numCands	Main recall/visited/ms	α=1.5 recall/visited/ms	vs main visited
15	0.60 / 3,882 / 0.19	0.56 / 3,400 / 0.19	0.88×
50	0.60 / 3,882 / 0.20	0.56 / 3,400 / 0.20	0.88×
100	0.72 / 7,456 / 0.30	0.67 / 5,869 / 0.29	0.79×
200	0.81 / 14,540 / 0.52	0.76 / 10,136 / 0.40	0.70×
500	0.89 / 35,974 / 1.17	0.83 / 20,964 / 0.75	0.58×
1000	0.92 / 71,580 / 2.37	0.89 / 36,099 / 1.25	0.50×
5000	0.96 / 356,804 / 11.15	0.94 / 120,135 / 4.03	0.34×
10000	0.96 / 713,239 / 21.83	0.95 / 194,276 / 6.44	0.27×

tteofili · 2026-02-27T15:57:18Z

from the reported numbers, it seems like we are exploring less, but it seems this exploration translates with similar multiplicative factor to recall drops, at least for lower visit_percentage. since the allocation logic hasn't changed, this is not surprising. the numbers for higher visit_percentage look good instead.
apart from that, moving num_candidates / visit_percentage to apply on a per segment basis makes a lot of sense.

benwtrent · 2026-02-27T17:41:51Z

@ah89 I need to see the actual visit percentages per segment and segment count, etc.

visiting only 1% maximally with 100M vectors seems like a horrible side effect.

ah89 · 2026-03-04T15:02:58Z

@ah89 I need to see the actual visit percentages per segment and segment count, etc.

visiting only 1% maximally with 100M vectors seems like a horrible side effect.

@benwtrent The implementation is per-segment.

GIST-1M — HNSW (31 segments, ~33.3k vectors each)

nc	recall	visited	latency(ms)
15	1.00	7,664	1.30
50	1.00	10,088	1.88
100	1.00	11,950	2.30
200	1.00	13,887	2.89
500	1.00	17,028	3.83
1000	1.00	19,919	5.08
5000	1.00	26,905	8.32
10000	1.00	29,394	10.13

GIST-1M — IVF (29 segments, ~33.3k vectors each)

visit%	actual_visit%	recall	visited	latency(ms)
0.1	0.71	0.78	7,058	0.92
0.5	1.61	0.92	16,059	1.09
1.0	2.63	0.96	26,250	1.43
2.0	4.74	0.97	47,422	2.20
3.0	6.83	0.98	68,339	2.90
5.0	10.80	0.99	108,000	4.22
7.0	14.83	0.99	148,321	5.57
10.0	20.81	0.99	208,110	7.62

Wiki-Cohere — HNSW (24 segments, ~39.5k vectors each)

nc	recall	visited	latency(ms)
15	0.97	9,533	1.80
50	0.99	13,752	2.60
100	1.00	17,934	3.02
200	1.00	23,770	3.89
500	1.00	35,416	5.95
1000	1.00	49,144	8.43
5000	1.00	112,703	21.52
10000	1.00	164,703	33.11

Wiki-Cohere — IVF (22 segments, ~42.5k vectors each)

visit%	actual_visit%	recall	visited	latency(ms)
0.1	1.01	0.68	9,410	0.83
0.5	1.73	0.77	16,176	1.05
1.0	2.72	0.83	25,398	1.32
2.0	4.73	0.90	44,208	1.88
3.0	6.74	0.92	62,912	2.39
5.0	10.76	0.94	100,503	3.43
7.0	14.77	0.95	137,931	4.53
10.0	20.77	0.96	194,027	6.04

coderabbitai · 2026-03-06T03:10:33Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)

Team:Delivery
Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 095db6e7-da0c-403b-a179-4652c3f2c6b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ah89 · 2026-03-06T09:54:26Z

Gist-1M (1M, 1 segment, k=10, MIP)

nc	Main rec/vis/ms	Two-Signal rec/vis/ms	HNSW recall	actual visit%
15	0.84 / 1,584 / 0.18	0.95 / 28,987 / 1.21	1.00	2.90%
50	0.84 / 1,584 / 0.19	0.96 / 43,448 / 1.74	1.00	4.35%
100	0.84 / 1,584 / 0.17	0.96 / 53,532 / 2.12	1.00	5.35%
200	0.84 / 1,584 / 0.13	0.96 / 64,229 / 2.55	1.00	6.42%
500	0.84 / 1,584 / 0.13	0.96 / 78,982 / 3.23	1.00	7.90%
1000	0.84 / 1,584 / 0.13	0.97 / 90,296 / 3.60	1.00	9.03%
5000	0.84 / 1,584 / 0.13	0.97 / 90,296 / 3.62	—	9.03%
10000	0.84 / 1,584 / 0.13	0.97 / 90,296 / 3.54	—	9.03%

Wiki-Cohere (934k, 1 segment, k=10, MIP)

nc	Main rec/vis/ms	Two-Signal rec/vis/ms	HNSW recall	actual visit%
15	0.41 / 1,733 / 0.14	0.87 / 27,133 / 0.96	0.88	2.91%
50	0.41 / 1,733 / 0.15	0.91 / 40,696 / 1.41	0.95	4.36%
100	0.41 / 1,733 / 0.14	0.92 / 50,040 / 1.70	0.97	5.36%
200	0.41 / 1,733 / 0.13	0.93 / 60,096 / 2.08	0.99	6.43%
500	0.41 / 1,733 / 0.13	0.94 / 73,842 / 2.63	1.00	7.91%
1000	0.41 / 1,733 / 0.14	0.94 / 84,412 / 2.96	1.00	9.04%

benwtrent · 2026-03-06T14:14:59Z

server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsReader.java

+            if (numCands > 0) {
+                // Two-signal model: smooth log-based visit ratio from num_candidates/k ratio and k.
+                // Produces visitRatio in [V_MIN, V_MAX] which, after the SOAR 2× multiplier,
+                // gives actual visit percentage in [~1%, ~10%].
+                // The nc/k ratio (capped at R_MAX=100) drives 85% of the signal via log scaling,
+                // while k itself (capped at K_MAX=10000) contributes the remaining 15%.
+                final float V_MIN = 0.005f;
+                final float V_MAX = 0.05f;
+                final double R_MAX = 100.0;
+                final double K_MAX = 10_000.0;
+                double r = (double) numCands / Math.max(k, 1);
+                double x = Math.log1p(r) / Math.log1p(R_MAX);
+                double y = Math.log1p(k) / Math.log1p(K_MAX);
+                x = Math.max(0.0, Math.min(1.0, x));
+                y = Math.max(0.0, Math.min(1.0, y));
+                double z = 0.85 * x + 0.15 * y;
+                visitRatio = (float) (V_MIN + (V_MAX - V_MIN) * z);
+            } else {
+                // Fallback when called without IVFKnnSearchStrategy (e.g. checkIndex).
+                // Use log-based heuristic for reasonable default behavior.
+                final float V_MIN = 0.005f;
+                final float V_MAX = 0.05f;
+                double r = 1.0; // assume nc == k
+                double x = Math.log1p(r) / Math.log1p(100.0);
+                double y = Math.log1p(k) / Math.log1p(10_000.0);
+                x = Math.max(0.0, Math.min(1.0, x));
+                y = Math.max(0.0, Math.min(1.0, y));
+                double z = 0.85 * x + 0.15 * y;
+                visitRatio = (float) (V_MIN + (V_MAX - V_MIN) * z);
+            }
        }


lots of unnecessary duplicates, the only thing that changes is the r?

Also Math.log1p(R_MAX) and its partner should just be private static final values. Same for all the other constants (why are they local instances???)

Additionally, do we need to divide by 2 to account for the 2x application down stream?

Fixed — constants are now private static final, log1p values are pre-computed, and the duplicate log/clamp is extracted into a logScale helper.

Regarding the ÷2 question, I don't think we actually need to halve it. If I understand correctly, the 2× downstream acts more as a SOAR overlap budget (since the same vector might appear in two posting lists), rather than a true multiplier on the unique vectors visited. Let me know if I'm looking at this the wrong way!

ah89 · 2026-03-06T15:51:18Z

Please disregard the previous results, as they followed the legacy path:

Wiki-Cohere — 934,024 vectors, 768-dim, 1 segment, k=10, MIP

nc	Main Recall	Main Visited	Main Latency	Two-Signal Recall	Two-Signal Visited	Two-Signal Latency	HNSW Recall	HNSW Visited	HNSW Latency
15	0.60	3,836	0.22 ms	0.87	27,133	0.95 ms	0.88	493	0.10 ms
50	0.60	3,836	0.20 ms	0.91	40,696	1.37 ms	0.95	996	0.20 ms
100	0.72	7,452	0.31 ms	0.92	50,040	1.67 ms	0.97	1,701	0.32 ms
200	0.83	14,556	0.55 ms	0.93	60,096	1.98 ms	0.99	3,099	0.59 ms
500	0.89	35,967	1.29 ms	0.94	73,842	2.70 ms	1.00	6,810	1.33 ms
1000	0.94	71,645	2.44 ms	0.94	84,412	2.79 ms	1.00	12,222	2.45 ms
5000	0.97	356,795	12.43 ms	0.94	84,412	2.77 ms	—	—	—
10000	0.97	713,230	23.09 ms	0.94	84,412	2.78 ms	—	—	—

Gist-1M — 1,000,000 vectors, 960-dim, 1 segment, k=10, MIP

nc	Main Recall	Main Visited	Main Latency	Two-Signal Recall	Two-Signal Visited	Two-Signal Latency	HNSW Recall	HNSW Visited	HNSW Latency
15	0.87	3,803	0.20 ms	0.95	28,987	1.15 ms	1.00	413	0.04 ms
50	0.87	3,803	0.25 ms	0.96	43,448	1.69 ms	1.00	805	0.07 ms
100	0.89	7,451	0.39 ms	0.96	53,532	2.23 ms	1.00	1,204	0.12 ms
200	0.91	14,648	0.67 ms	0.96	64,229	2.56 ms	1.00	1,714	0.19 ms
500	0.96	36,281	1.48 ms	0.96	78,982	3.08 ms	1.00	2,522	0.32 ms
1000	0.96	72,290	2.79 ms	0.97	90,296	3.49 ms	1.00	3,088	0.44 ms
5000	0.97	360,347	13.56 ms	0.97	90,296	3.47 ms	—	—	—
10000	0.97	720,396	26.93 ms	0.97	90,296	3.49 ms	—	—	—

DBpedia-Arctic — 4,635,922 vectors, 768-dim, 1 segment, k=10, MIP

nc	Main Recall	Main Visited	Main Latency	Two-Signal Recall	Two-Signal Visited	Two-Signal Latency	HNSW Recall	HNSW Visited	HNSW Latency
15	0.32	4,776	0.30 ms	0.75	133,446	4.19 ms	0.51	497	0.15 ms
50	0.32	4,776	0.30 ms	0.79	200,683	6.25 ms	0.64	946	0.30 ms
100	0.42	9,234	0.43 ms	0.80	247,291	7.71 ms	0.70	1,471	0.48 ms
200	0.53	18,147	0.70 ms	0.82	296,980	9.14 ms	0.74	2,517	0.83 ms
500	0.63	44,793	1.48 ms	0.83	365,184	11.07 ms	0.78	5,297	1.82 ms
1000	0.71	89,230	2.83 ms	0.84	417,682	12.73 ms	0.79	9,438	3.30 ms
5000	0.85	444,771	13.56 ms	0.84	417,682	12.86 ms	0.82	37,245	13.13 ms
10000	0.87	889,121	27.18 ms	0.84	417,682	12.71 ms	0.83	67,313	24.56 ms

ah89 · 2026-03-10T05:49:42Z

Updated default `visit_percentage` calculation for `bbq_disk` IVF search

Formulae

Previous formula (in IVFVectorsReader, segment-level fallback):

$$\text{visitRatio} = \frac{\text{round}(\log_{10}(N)^2 \times k)}{N}$$

where $N$ is the number of vectors in the segment. This formula is segment-size dependent and completely ignores num_candidates — it produces a flat visit budget regardless of how many candidates the user requests. As $N$ grows, the visit ratio shrinks toward zero, leading to severe recall degradation on large segments.

New formula — Two-Signal Model (in AbstractIVFKnnVectorQuery, query-level):

$$r = \frac{\text{num candidates}}{k}$$

$$z = 0.85 \cdot \text{logScale}(r, \ln(1 + 100)) + 0.15 \cdot \text{logScale}(k, \ln(1 + 10{,}000))$$

$$\text{visitRatio} = V_{\min} + (V_{\max} - V_{\min}) \cdot z$$

where:

$\text{logScale}(x, m) = \text{clamp}\left(\frac{\ln(1 + x)}{m}, 0, 1\right)$
$V_{\min} = 0.005$ (0.5%)
$V_{\max} = 0.05$ (5%)

The Two-Signal Model uses two signals to determine how much of each segment to visit:

The num_candidates / k ratio (weight: 85%) — captures the user's intent for recall quality. Higher ratio = user wants more thorough search.
The absolute value of k (weight: 15%) — larger k queries benefit from slightly more exploration to fill the larger result set reliably.

Both signals are log-scaled to produce diminishing returns (doubling num_candidates doesn't double the visit budget), and the result is linearly mapped into the $[V_{\min}, V_{\max}]$ range.

Reasoning

The key properties of the Two-Signal Model:

Segment-size independent (N-independent): The visit percentage depends only on num_candidates and k, not on how many vectors are in the segment. The same query parameters produce the same recall regardless of segment size. The old formula's visit ratio collapsed toward zero on large segments — e.g., for $N = 20M$, the old formula gives ~0.01% visit while the new formula gives a consistent ~3-5%.
num_candidates becomes a meaningful tuning knob: The old formula ignored num_candidates entirely for IVF. The new formula makes num_candidates directly control the recall/latency tradeoff, analogous to how it works for HNSW.
Bounded and well-calibrated: The visit ratio is bounded between 0.5% and 5%. This range was calibrated against multi-segment benchmarks on Wiki-Cohere (768d, ~934K vectors) and GIST-1M (960d, 1M vectors), targeting IVF recall within ~5 points of HNSW recall at equivalent num_candidates values. The IVF 4-bit quantization recall ceiling is ~0.96 (reached at approximately 7-10% actual visit with 2x SOAR multiplier), and visiting beyond that wastes cycles without improving recall.
Log-scaled diminishing returns: The log scaling matches the empirical observation from benchmarks: recall improvement from additional visiting follow a concave curve — the first few percent of visiting produce the most recall improvement, with rapidly diminishing gains after ~5-7% visits.

Recommendation for users

For production workloads, users should set visit_percentage directly rather than relying on the default calculation derived from num_candidates. The visit_percentage parameter provides direct, predictable control over the recall/latency tradeoff:

Lower visit_percentage (e.g., 1-3%) → faster queries, lower recall
Higher visit_percentage (e.g., 5-10%) → slower queries, higher recall
The sweet spot for most workloads is between 3% and 10%
IVF recall with 4-bit quantization plateuas at ~0.96 regardless of visit budget — visiting beyond ~10% provides no further recall benefit

Using visit_percentage removes the indirection through num_candidates and gives an explicit, segment-size-independent knob for search quality. The default_visit_percentage field mapping option can also be set at index time to establish a per-index default.

Impact on existing users

This changes adjusts the default visit_percentage calculation for bbq_disk knn searches when visit_percentage is not explicitly set. The new Two-Signal formula visits a significantly higher fraction of vectors per segment compared to the previous default (which was often far too low, especially on large segments). This will increase search latency for many users but substantially improves recall, which was previously unacceptably low in many configurations. Users who were relying on the previous (faster but lower-recall) defaults should expect increased query times. To restore previous latency characteristics, set visit_percentage explicitly to a lower value. We strongly recommend setting visit_percentage directly to control the recall/latency tradeoff for your specific workload.

ah89 · 2026-03-16T14:31:46Z

Full analysis on 4 tables:

Wiki-Cohere 768d (934K docs, 1 segment, IVF BBQ-4bit)

nc	Main VR	Main Rcl	Main Vis	Main Lat	2Sig VR	2Sig Rcl	2Sig Vis	2Sig Lat
15	0.00165	0.58	3,375	0.19	0.00976	0.85	18,553	0.62
50	0.00165	0.58	3,375	0.19	0.02555	0.92	48,058	1.54
100	0.00296	0.67	5,813	0.42	0.03464	0.93	65,072	2.13
200	0.00526	0.78	10,124	0.40	0.03589	0.94	67,346	2.14
500	0.01107	0.86	20,966	0.70	0.03589	0.94	67,346	2.14
1000	0.01915	0.89	36,118	1.14	0.03589	0.94	67,346	2.15
5000	0.06413	0.95	120,107	3.76	0.03589	0.94	67,346	2.23
10000	0.10383	0.96	194,316	6.32	0.03589	0.94	67,346	2.15

GIST-1M 960d (1.0M docs, 1 segment, IVF BBQ-4bit)

nc	Main VR	Main Rcl	Main Vis	Main Lat	2Sig VR	2Sig Rcl	2Sig Vis	2Sig Lat
15	0.00156	0.86	3,327	0.18	0.00976	0.94	19,787	0.76
50	0.00156	0.86	3,327	0.18	0.02555	0.96	51,418	1.96
100	0.00280	0.88	5,848	0.28	0.03464	0.96	69,554	2.68
200	0.00497	0.90	10,220	0.42	0.03589	0.96	72,111	2.76
500	0.01048	0.94	21,219	0.82	0.03589	0.96	72,111	2.76
1000	0.01816	0.96	36,600	1.39	0.03589	0.96	72,111	2.79
5000	0.06106	0.97	122,409	4.65	0.03589	0.96	72,111	2.77
10000	0.09915	0.97	198,591	7.61	0.03589	0.96	72,111	2.79

DBpedia-arctic 768d (4.6M docs, 1 segment, IVF BBQ-4bit)

nc	Main VR	Main Rcl	Main Vis	Main Lat	2Sig VR	2Sig Rcl	2Sig Vis	2Sig Lat
15	0.00042	0.30	4,216	0.22	0.00976	0.71	90,879	2.74
50	0.00042	0.30	4,216	0.22	0.02555	0.80	237,335	7.12
100	0.00076	0.37	7,356	0.31	0.03464	0.83	321,589	9.75
200	0.00137	0.47	13,105	0.53	0.03589	0.83	333,167	10.04
500	0.00298	0.58	27,965	0.88	0.03589	0.83	333,167	10.10
1000	0.00529	0.64	49,386	1.50	0.03589	0.83	333,167	10.04
5000	0.01927	0.78	178,996	5.76	0.03589	0.83	333,167	10.10
10000	0.03283	0.82	304,758	9.62	0.03589	0.83	333,167	10.10

MSMARCO-v2 1024d (72M docs, 1 segment, IVF BBQ-4bit)

nc	Main Recall	Main Visited	Main Lat(ms)	2Sig Recall	2Sig Visited	2Sig Lat(ms)
15	0.28	5,619	11.5	0.50	1,406,179	956.6
50	0.28	5,619	11.5	0.52	3,680,113	670.6
100	0.34	10,322	9.4	0.52	4,989,230	884.2
200	0.38	18,591	10.7	0.52	5,169,241	914.4
500	0.42	41,214	15.0	0.52	5,169,241	907.9
1000	0.45	75,220	20.8	0.52	5,169,241	914.0
5000	0.48	296,689	59.5	0.52	5,169,241	918.3
10000	0.49	529,705	100.0	0.52	5,169,241	909.8

iverase · 2026-03-17T09:13:10Z

FWIW: My opinion is that for dynamic visit ratio, we should compute the value for each segment separately, taking into account only the size of the segment instead of a value for all segments and the divide the work evenly.

ah89 · 2026-03-18T15:35:09Z

FWIW: My opinion is that for dynamic visit ratio, we should compute the value for each segment separately, taking into account only the size of the segment instead of a value for all segments and the divide the work evenly.

Yes — the Two-Signal formula computes a visit ratio (not an absolute count), and each segment independently applies that ratio to its own size via IVFKnnSearchStrategy.

Check IVFVectorsReader.java ~line 327:

long maxVectorVisited = (long) (2.0 * visitRatio * numVectors);

where numVectors = values.size() (line ~300) is per-segment, not the total index size.

iverase · 2026-03-18T15:50:58Z

Oh. I thought that if we are moving to compute this ratio in a per segment basis, there is not really point to compute it in the query and instead doing it in the codec (I hate we had this code in two places). I see. you have not changed the implementation here:

https://github.com/ah89/elasticsearch/blob/3a3173511d50d6631c6cb7c11b0f4980fd1bb2c8/server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsReader.java#L323

Are you planing to do something with this?

iverase · 2026-03-18T16:04:55Z

I see that maxVectorVisited is dependant on the size of the segment but the visit ratio is the same for each segment.

My experience is that the visit ratio can be smaller as the segment gets bigger to hit a specific recall.

ah89 · 2026-03-19T13:29:15Z

Yes, every segment gets the same visit ratio, regardless of whether it contains 100K or 70M vectors. That makes sense given the uniform treatment of segments when no prior information is provided.

If, however, you mean that larger segments should have a smaller ratio because of IVF clustering, I can add a SEGMENT_SIZE_REFERENCE so the visit ratio changes based on the number of vectors in each segment. Since IVF groups vectors into buckets for faster search, it tends to work better as the amount of data grows.

benwtrent · 2026-03-19T13:39:52Z

@ah89 @iverase

I think this linear transformation we have in this PR is a good starting point. @ah89 please do adjust the other places were we are using the old formulae.

What do y'all think of having a small and limited bias that adjusts the final percentage according to segment size? It would have an asymptotic limit (to prevent bad scenarios like 0.00001% visit), and start near 0 when number of vectors is <10k?

ah89 · 2026-03-19T15:11:41Z

I think a segment-size bias makes sense — large segments have better-populated IVF clusters so they need a smaller visit fraction, and tiny segments (<10K) with poor clustering could benefit from visiting more. That said, our Two-Signal formula already produces very small ratios (0.3%–4%), so the effect of an additive bias near 0 for small segments may be minimal. It also adds a tuning dimension that's hard to calibrate without benchmarks across many segment sizes.

I'm leaning towards adding this as a follow-up once we have more data, but I'll leave it to @benwtrent @iverase to decide.

iverase · 2026-03-19T15:48:13Z

I am fine leaving it as a follow up.

My suggestion is to index a fixed number of documents and force merge into one segment. Then compute how much visit percentage is needed to achieve a target recall.

Try the same thing for different number of documents so we can get a nice curve segment-size / visit percentage.

If we do it for a few datasets we can get a good idea on how visit percentage should change with the segment size?

benwtrent

Please verify IVFVectorsReader#search. It is possible that this format is called without the IVF strategy and thus your logic won't work in that scenario.

ah89 · 2026-03-20T19:23:01Z

@benwtrent

Please verify IVFVectorsReader#search. It is possible that this format is called without the IVF strategy and thus your logic won't work in that scenario.

Good catch — verified and fixed. When called without IVFKnnSearchStrategy (e.g., CheckIndex, tests), numCands=0 and the Two-Signal model produces a fixed visit ratio (~0.004) regardless of segment size, which visits nearly nothing on small segments.

Fixed by splitting the dynamic fallback: when numCands > 0 (IVF strategy with dynamic ratio), use Two-Signal; when numCands == 0 (no strategy), fall back to the original segment-size-aware formula (log10(N)² * k / N).

Do you have a better idea?

benwtrent · 2026-03-20T19:59:15Z

qa/vector/src/main/java/org/elasticsearch/test/knn/KnnSearcher.java

+                    segmentInfo.append(segVectors);
+                }
+                segmentInfo.append("]");
+                logger.info("Segment layout: {}", segmentInfo);


do we really want this to be logged via info all the time?

This seems like a debug option.

Good point — changed to debug. This is only useful when troubleshooting, not needed in normal runs.

benwtrent · 2026-03-20T20:01:53Z

server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsReader.java

+        logger.debug(
+            "IVF search segment: numVectors=[{}], visitRatio=[{}], maxVectorVisited=[{}], numCands=[{}], k=[{}]",
+            numVectors,
+            visitRatio,
+            maxVectorVisited,
+            numCands,
+            k
+        );


Let's remove all this logging, it isn't necessary. if anything its "trace" and I don't think it provides useful information?

benwtrent · 2026-03-20T20:03:55Z

server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsReader.java

+        }
        float percentFiltered = Math.max(0f, Math.min(1f, approximateCost / numVectors));
+        int numCands = 0;
+        int k = knnCollector.k();


If we know k, but don't know numcands we know for a fact that numCands >= k. So, let's just lean into that. Do the new logic with k==numCands.

benwtrent · 2026-03-20T20:04:38Z

server/src/main/java/org/elasticsearch/search/vectors/AbstractIVFKnnVectorQuery.java

+    // Two-Signal Model constants for dynamic visit ratio
+    private static final float V_MIN = 0.003f;
+    private static final float V_MAX = 0.04f;
+    private static final double LOG1P_R_MAX = Math.log1p(10.0);
+    private static final double LOG1P_K_MAX = Math.log1p(10_000.0);
+    private static final double RATIO_WEIGHT = 0.85;
+    private static final double K_WEIGHT = 0.15;


you might want to have this logic shareable so that the format can use it AND the query. Possibly in the IVFFormat file as a static method.

Done — the logic now lives only in IVFVectorsReader.computeDynamicVisitRatio(). The query side (AbstractIVFKnnVectorQuery) just passes visitRatio=0.0f through to the codec and doesn't compute anything itself, so there's nothing to share.

benwtrent

Looks good! Please follow up with some experiments as Ignacio suggests and augment with some additional bias towards segment size.

ah89 · 2026-03-23T16:25:59Z

I am fine leaving it as a follow up.

My suggestion is to index a fixed number of documents and force merge into one segment. Then compute how much visit percentage is needed to achieve a target recall.

Try the same thing for different number of documents so we can get a nice curve segment-size / visit percentage.

If we do it for a few datasets we can get a good idea on how visit percentage should change with the segment size?

@iverase @benwtrent Here are the segment-size vs visit-percentage curves.

Setup: Same dataset indexed at different sizes (10K, 50K, 100K, 250K, 500K, full), force-merged to 1 segment, IVF BBQ-4bit, k=10, nc=1000. Sweep visit% from 0.1% to 20%.

GIST-1M (960d, euclidean)

Recall at each visit percentage

visit%	N=10K	N=50K	N=100K	N=250K	N=500K	N=1M
0.1	0.60	0.40	0.31	0.28	0.31	0.38
0.25	0.60	0.41	0.32	0.43	0.52	0.58
0.5	0.60	0.42	0.44	0.59	0.68	0.72
0.75	0.60	0.53	0.48	0.69	0.76	0.79
1.0	0.61	0.59	0.55	0.75	0.81	0.82
1.5	0.61	0.67	0.67	0.81	0.85	0.86
2.0	0.62	0.71	0.74	0.84	0.88	0.89
3.0	0.70	0.79	0.83	0.88	0.91	0.91
4.0	0.79	0.84	0.87	0.89	0.91	0.91
5.0	0.80	0.86	0.90	0.90	0.92	0.91
7.0	0.86	0.88	0.92	0.91	0.93	0.92
10.0	0.89	0.90	0.93	0.91	0.93	0.92
15.0	0.91	0.91	0.94	0.92	0.93	0.92
20.0	0.93	0.92	0.94	0.92	0.93	0.92

Minimum actual visit% for target recall

Target Recall	N=10K	N=50K	N=100K	N=250K	N=500K	N=1M
0.80	13.16%	6.96%	5.65%	2.97%	1.97%	1.70%
0.85	16.79%	9.62%	7.32%	4.63%	3.07%	2.79%
0.90	28.04%	20.59%	10.31%	10.14%	5.41%	5.04%

Wiki-Cohere (768d, MIP)

Recall at each visit percentage

visit%	N=10K	N=50K	N=100K	N=250K	N=500K	N=934K
0.1	0.48	0.47	0.24	0.28	0.40	0.48
0.25	0.48	0.51	0.30	0.41	0.57	0.65
0.5	0.51	0.55	0.39	0.54	0.70	0.76
0.75	0.55	0.61	0.49	0.64	0.77	0.83
1.0	0.55	0.66	0.56	0.69	0.81	0.85
1.5	0.61	0.70	0.66	0.77	0.87	0.88
2.0	0.61	0.75	0.72	0.82	0.89	0.90
3.0	0.66	0.82	0.78	0.86	0.92	0.93
4.0	0.74	0.85	0.84	0.88	0.92	0.94
5.0	0.76	0.87	0.86	0.89	0.93	0.94
7.0	0.81	0.90	0.90	0.91	0.93	0.95
10.0	0.85	0.91	0.92	0.93	0.94	0.96
15.0	0.89	0.92	0.93	0.94	0.94	0.97
20.0	0.90	0.92	0.93	0.94	0.94	0.97

Minimum actual visit% for target recall

Target Recall	N=10K	N=50K	N=100K	N=250K	N=500K	N=934K
0.80	16.37%	6.07%	6.99%	3.72%	1.93%	1.32%
0.85	23.40%	8.71%	9.32%	5.61%	2.73%	2.03%
0.90	43.30%	14.62%	14.32%	12.15%	4.73%	4.03%

Key observations & Possibilities

Larger segments need a smaller visit% for the same recall. IVF clustering becomes more effective with more data. To reach recall=0.90:
- GIST: N=10K → 28% actual visit vs N=1M → 5%
- Wiki-Cohere: N=10K → 43% vs N=934K → 4%
The relationship is roughly inverse-logarithmic. Going from 10K→100K (10×) cuts the required visit% by ~3×. Going from 100K→1M (10×) cuts it by another ~2×.
Small segments (<50K) need disproportionately more visiting because IVF clusters are poorly populated and quantization noise has more impact.
Note on actual vs configured visit%: SOAR causes actual visit% to be ~2× the configured value. The "minimum visit%" table uses actual visit%, which is the true cost. Very small segments also get a floor from minimum cluster visits.

The data clearly shows our current Two-Signal model (fixed ratio regardless of segment size) works well for segments ≥100K, but under-visits small segments. This supports @benwtrent's segment-size bias idea. Given that the data is now available and the shape of the curve is clear, do you think we should add the segment-size bias in this PR, or keep it as a follow-up?

ah89 · 2026-03-24T17:33:30Z

10K vectors went from 0.77 → 0.94 recall (+0.17), 100K went from 0.86 → 0.98 (+0.12). The bias is applied per-segment, which means it fires for ALL sizes here. The threshold of for small segments is N_REF=500k.

New : with Bias
Old: without Bias

num_docs	segs	recall old→new	Δrecall	visit% old→new	latency old→new	lat ratio
10K	1	0.77 → 0.94	+0.17	10% → 30%	0.10 → 0.16ms	1.6x
50K	1	0.91 → 0.98	+0.07	7.4% → 25%	0.18 → 0.47ms	2.6x
100K	1	0.86 → 0.98	+0.12	7.2% → 23%	0.28 → 0.86ms	3.1x
250K	1	0.96 → 0.97	+0.01	7.0% → 17%	0.66 → 1.68ms	2.5x
500K	2 (~250K/seg)	0.93 → 0.93	0.00	7.0% → 11%	1.30 → 2.08ms	1.6x
1M	3 (~333K/seg)	0.96 → 0.96	0.00	7.0% → 10.7%	2.70 → 4.24ms	1.6x

benwtrent · 2026-03-24T17:53:03Z

@ah89 this shows me that in a realistic scenario, adding the bias the way you did is expensive with no realistic benefit.

The bias that Ignacio was focusing on was REDUCING the visit percentage as segments get larger, not significantly increasing the visit percentage on smaller segments. I think the answer there is simply because the clusters are static and larger. We should not optimize for tiny segments, that just isn't realistic. Things will be merged and we will have a diversity of segment sizes.

ah89 · 2026-03-24T18:54:05Z

@ah89 @iverase

I think this linear transformation we have in this PR is a good starting point. @ah89 please do adjust the other places were we are using the old formulae.

What do y'all think of having a small and limited bias that adjusts the final percentage according to segment size? It would have an asymptotic limit (to prevent bad scenarios like 0.00001% visit), and start near 0 when number of vectors is <10k?

@benwtrent This change is addressing your earlier point on small segments (<10k or so).

The bias improves recall, but it increases visit % significantly and adds ~1–2ms latency while applying to all segment sizes (N_REF=500k).

So it’s still a tradeoff for limited benefit in the scenarios we care about, and it’s different from Ignacio’s focus on larger segments. My point on his suggestion is the same as mentioned before, beyond uniformity, this starts to feel ad hoc data dependent solution.

Replace the log10(N)^2 * clampedNumCands / N formula in AbstractIVFKnnVectorQuery.rewrite with a simple linear clampedNumCands / N mapping. This makes num_candidates directly represent the number of vectors to traverse, giving users a predictable, size-independent tuning knob. Also: - Extract computeDynamicVisitRatio as a testable static method - Add numCands to equals/hashCode since it now directly drives visit ratio when providedVisitRatio == 0 - Rename maxVectorVisited to maxExpectedTraversed in IVFVectorsReader for clarity on pre-filter vs post-filter semantics Closes elastic#142617

… improved visit ratio calculation

Introduces a new two-signal log-based heuristic for controlling IVF visit ratios, aiming for smoother scaling and more predictable recall across varying parameter ranges. Enhances logging to capture segment layouts and vector distributions, and tracks actual visit percentages for better observability of search behavior. Updates reporting to include segment counts and visit metrics for more comprehensive benchmarking.

Switches from a custom log-based model to a more principled two-signal approach for estimating the visit ratio, improving consistency and maintainability. Centralizes the two-signal logic and constants, reducing duplication and potential inconsistencies. Enhances the robustness of vector search parameterization, especially when candidate values are not explicitly provided.

Updates the logic for determining the approximate cost of accepted documents, correctly handling the case when all documents are accepted to avoid misleading cost estimations. Improves accuracy of candidate filtering and prevents potential downstream errors.

Relocates the Two-Signal Model visit ratio calculation from the query layer (AbstractIVFKnnVectorQuery) into the codec layer (IVFVectorsReader) where it can leverage per-segment context. Removes verbose debug logging from IVF search and downgrades segment layout logging from info to debug level to reduce log noise. Initializes numCands to k instead of 0 so the dynamic ratio computation has a meaningful default when no search strategy overrides it.

Replace the original log10(N)^2 * k / N visit-ratio formula with a Two-Signal model that derives the visit ratio from both the num_candidates/k ratio (85% weight) and k magnitude (15% weight), bounded to [0.3%, 4.0%]. Add a segment-size power-law cap (0.045 * (1M/N)^0.35) that limits visit ratio for large segments (~10% at 100K, ~4.5% at 1M, ~2% at 10M) while leaving small segments governed by the two-signal model alone. The final visit ratio per segment is min(two_signal, cap).

ah89 · 2026-03-27T19:31:22Z

Add segment-size-aware visit ratio cap to IVF search

Motivation (as per @iverase suggestion)

The Two-Signal model computes a dynamic visit ratio from num_candidates/k and k, producing values in [0.3%, 4%]. This works well for typical segment sizes (~100K–500K vectors), but for large segments (1M+ vectors), IVF clusters are better-formed and 4% visit is unnecessarily high — wasting latency without improving recall. Conversely, for very small segments the existing range is already appropriate.

Experiment Design

We ran controlled benchmarks across four datasets to measure recall as a function of visit% and segment size:

Dataset	Total Docs	Dimensions	Similarity
GIST-1M	1,000,000	960	MIP
Wiki-Cohere	934,024	768	MIP
DBpedia-Arctic	4,635,922	768	MIP
MSMarco (user-provided)	~130,000,000	768	MIP

GIST & Wiki-Cohere sweep: Force-merged to 6 segment sizes (10K, 50K, 100K, 250K, 500K, full) × 11 visit percentages (0.1%–20%), k=10, nc=100, IVF 4-bit BBQ.

DBpedia segment sweep: 7 segment counts (1, 2, 4, 6, 8, 10, 12 segments → ~515K to 4.6M docs/segment) × 10 visit percentages (0.1%–20%).

DBpedia ceiling + oversampling test: Tested visit% up to 70% to find the recall ceiling (0.90–0.92 without oversampling), then tested oversample=5x and 10x (5x sufficient, 10x identical).

Formula

Fitted a power-law curve to the empirical data:

cap(N) = 0.045 × (1,000,000 / N)^0.35 × (0.1 / (1 - targetRecall))

Where N = number of vectors in the segment, targetRecall = 0.9 (default).

Example cap values (targetRecall=0.9):

Segment Size	Cap
100K	10.07%
333K	6.61%
500K	5.68%
1M	4.50%
5M	2.56%
10M	2.01%

Per-Segment Application

The cap is computed independently for each segment at search time. In IVFVectorsReader.search(), each segment knows its own numVectors = values.size() and computes:

visitRatio = Math.min(computeDynamicVisitRatio(numCands, k), computeSegmentSizeCap(numVectors));

This means in a multi-segment index, each segment gets a different final visit ratio based on its size. A 100K-doc segment gets cap≈10% (two-signal dominates since max is 4%), while a 5M-doc segment gets cap≈2.6% (cap dominates). The two-signal value is the same across all segments (it depends only on query parameters nc and k), but the cap varies per segment.

Comparison: origin/main vs Branch (Two-Signal + Cap)

All runs: IVF BBQ-4bit, k=10, visit_percentage=0.0 (dynamic), no force-merge.

Wiki-Cohere (934K docs, 2 segments, ~467K docs/seg)

nc	Main Rcl	Main Lat(ms)	Main Visited	Branch Rcl	Branch Lat(ms)	Branch Visited	Δ Rcl	Δ Lat
15	0.49	0.20	4,148	0.79	0.63	18,833	+0.30	+0.43
50	0.49	0.20	4,148	0.90	1.52	48,351	+0.41	+1.32
100	0.64	0.31	7,731	0.92	2.04	65,458	+0.28	+1.73
200	0.75	0.53	14,835	0.92	2.11	67,663	+0.17	+1.58
500	0.88	1.17	36,251	0.92	2.15	67,663	+0.04	+0.98
1000	0.93	2.27	71,973	0.92	2.15	67,663	-0.01	-0.12
5000	0.96	11.14	357,072	0.92	2.14	67,663	-0.04	-9.00
10000	0.96	21.74	713,594	0.92	2.16	67,663	-0.04	-19.58

GIST-1M (1M docs, 3 segments, ~333K docs/seg)

nc	Main Rcl	Main Lat(ms)	Main Visited	Branch Rcl	Branch Lat(ms)	Branch Visited	Δ Rcl	Δ Lat
15	0.89	0.24	4,119	0.94	0.82	20,244	+0.05	+0.58
50	0.89	0.25	4,119	0.96	1.95	51,853	+0.07	+1.70
100	0.92	0.37	7,760	0.96	2.61	70,097	+0.04	+2.24
200	0.94	0.64	15,092	0.96	2.70	72,678	+0.02	+2.06
500	0.96	1.38	36,765	0.96	2.79	72,678	=	+1.41
1000	0.96	2.68	72,875	0.96	2.74	72,678	=	+0.06
5000	0.96	13.07	360,916	0.96	2.69	72,678	=	-10.38
10000	0.96	25.66	721,012	0.96	2.72	72,678	=	-22.94

DBpedia-Arctic (4.6M docs, 9 segments, ~515K docs/seg)

nc	Main Rcl	Main Lat(ms)	Main Visited	Branch Rcl	Branch Lat(ms)	Branch Visited	Δ Rcl	Δ Lat
15	0.13	0.44	7,511	0.50	2.83	93,740	+0.37	+2.39
50	0.13	0.44	7,511	0.68	7.00	240,237	+0.55	+6.56
100	0.17	0.58	12,182	0.74	9.44	324,638	+0.57	+8.86
200	0.27	0.84	21,068	0.74	9.78	336,148	+0.47	+8.94
500	0.39	1.72	47,844	0.74	9.77	336,148	+0.35	+8.05
1000	0.50	2.90	92,264	0.74	9.79	336,148	+0.24	+6.89
5000	0.78	13.24	447,764	0.74	9.80	336,148	-0.04	-3.44
10000	0.86	26.39	892,046	0.74	9.80	336,148	-0.12	-16.59

Key Observations

Low nc (15–100): Branch visits significantly more → much higher recall (Wiki +0.30, DBpedia +0.57) at modest latency increase
Mid nc (~1000): Both approaches converge to similar recall/latency
High nc (5000–10000): origin/main visits explode (360K–900K vectors) while branch caps at a bounded count, saving 10–23ms per query on GIST/Wiki and 3–17ms on DBpedia with only 0.04–0.12 recall trade-off
The Two-Signal model makes num_candidates a meaningful tuning knob — origin/main's log10(N)² × k / N formula ignores it entirely

tteofili

nice looking numbers! LGTM

…rics * upstream/main: (21 commits) Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353 Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352 [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324) Fix dense_vector default index options when using BFLOAT16 (elastic#145202) Use checked exceptions in entitlement constructor rules (elastic#145234) ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334) Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243) Make Settings.Builder.remove() fluent (elastic#145294) Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211) Fix flaky SecurityFeatureResetTests (elastic#145063) [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338) Skip certain metric assertions on Windows (elastic#144933) [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220) Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784) ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109) [Transform] Track Linked Projects (elastic#144399) Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316) Clean up TickerScheduleEngineTests (elastic#145303) [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317) Add CRUD doc to the DistributedArchitectureGuide (elastic#144710) ...

* Simplify DiskBBQ dynamic visit ratio to linear Replace the log10(N)^2 * clampedNumCands / N formula in AbstractIVFKnnVectorQuery.rewrite with a simple linear clampedNumCands / N mapping. This makes num_candidates directly represent the number of vectors to traverse, giving users a predictable, size-independent tuning knob. Also: - Extract computeDynamicVisitRatio as a testable static method - Add numCands to equals/hashCode since it now directly drives visit ratio when providedVisitRatio == 0 - Rename maxVectorVisited to maxExpectedTraversed in IVFVectorsReader for clarity on pre-filter vs post-filter semantics Closes elastic#142617 * Enhance IVFKnnSearchStrategy to include numCands and k parameters for improved visit ratio calculation * [CI] Auto commit changes from spotless * Improves IVF search logging and visit ratio calc Introduces a new two-signal log-based heuristic for controlling IVF visit ratios, aiming for smoother scaling and more predictable recall across varying parameter ranges. Enhances logging to capture segment layouts and vector distributions, and tracks actual visit percentages for better observability of search behavior. Updates reporting to include segment counts and visit metrics for more comprehensive benchmarking. * Refines dynamic visit ratio calc for IVF search Switches from a custom log-based model to a more principled two-signal approach for estimating the visit ratio, improving consistency and maintainability. Centralizes the two-signal logic and constants, reducing duplication and potential inconsistencies. Enhances the robustness of vector search parameterization, especially when candidate values are not explicitly provided. * Retune Two-Signal visit ratio formula * Fixes approximate cost calculation for doc filter Updates the logic for determining the approximate cost of accepted documents, correctly handling the case when all documents are accepted to avoid misleading cost estimations. Improves accuracy of candidate filtering and prevents potential downstream errors. * Move dynamic visit ratio computation into codec layer Relocates the Two-Signal Model visit ratio calculation from the query layer (AbstractIVFKnnVectorQuery) into the codec layer (IVFVectorsReader) where it can leverage per-segment context. Removes verbose debug logging from IVF search and downgrades segment layout logging from info to debug level to reduce log noise. Initializes numCands to k instead of 0 so the dynamic ratio computation has a meaningful default when no search strategy overrides it. * Replace dynamic visit ratio with Two-Signal model and cap Replace the original log10(N)^2 * k / N visit-ratio formula with a Two-Signal model that derives the visit ratio from both the num_candidates/k ratio (85% weight) and k magnitude (15% weight), bounded to [0.3%, 4.0%]. Add a segment-size power-law cap (0.045 * (1M/N)^0.35) that limits visit ratio for large segments (~10% at 100K, ~4.5% at 1M, ~2% at 10M) while leaving small segments governed by the two-signal model alone. The final visit ratio per segment is min(two_signal, cap). --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

ah89 added >bug :Search Relevance/Vectors Vector search v9.4.0 labels Feb 20, 2026

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 20, 2026

benwtrent requested changes Feb 20, 2026

View reviewed changes

ah89 requested a review from benwtrent February 24, 2026 17:44

benwtrent reviewed Mar 6, 2026

View reviewed changes

ah89 enabled auto-merge (squash) March 9, 2026 21:09

ah89 requested a review from benwtrent March 18, 2026 15:50

benwtrent requested changes Mar 20, 2026

View reviewed changes

benwtrent reviewed Mar 20, 2026

View reviewed changes

benwtrent approved these changes Mar 21, 2026

View reviewed changes

ah89 disabled auto-merge March 23, 2026 13:41

ah89 and others added 9 commits March 27, 2026 11:08

Enhance IVFKnnSearchStrategy to include numCands and k parameters for…

8b7c35a

… improved visit ratio calculation

[CI] Auto commit changes from spotless

fa1a0d6

Retune Two-Signal visit ratio formula

18cdb89

ah89 force-pushed the fix/diskbbq-linear-visit-ratio branch from 24d0254 to 26891cb Compare March 27, 2026 19:21

Merge branch 'main' into fix/diskbbq-linear-visit-ratio

154c9e3

Merge branch 'main' into fix/diskbbq-linear-visit-ratio

a36288d

benwtrent approved these changes Mar 30, 2026

View reviewed changes

iverase approved these changes Mar 30, 2026

View reviewed changes

tteofili approved these changes Mar 30, 2026

View reviewed changes

Merge branch 'main' into fix/diskbbq-linear-visit-ratio

90d5add

ah89 enabled auto-merge (squash) March 30, 2026 17:00

Merge branch 'main' into fix/diskbbq-linear-visit-ratio

3bdbe7f

ah89 merged commit cf99520 into elastic:main Mar 31, 2026
36 checks passed

Conversation

ah89 commented Feb 20, 2026

Solution Design

Benchmarking: main (log₁₀²) vs fix (linear)

Key observations

Uh oh!

elasticsearchmachine commented Feb 20, 2026

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

ah89 commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Formulas

Property Comparison

Benchmarking: main (log₁₀²) vs fix (sublinear)

GIST-1M (N=1,000,000)

Wiki-Cohere (N=934,024)

Uh oh!

tteofili commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benwtrent commented Feb 27, 2026

Uh oh!

ah89 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GIST-1M — HNSW (31 segments, ~33.3k vectors each)

GIST-1M — IVF (29 segments, ~33.3k vectors each)

Wiki-Cohere — HNSW (24 segments, ~39.5k vectors each)

Wiki-Cohere — IVF (22 segments, ~42.5k vectors each)

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

ah89 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Gist-1M (1M, 1 segment, k=10, MIP)

Wiki-Cohere (934k, 1 segment, k=10, MIP)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ah89 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Wiki-Cohere — 934,024 vectors, 768-dim, 1 segment, k=10, MIP

Gist-1M — 1,000,000 vectors, 960-dim, 1 segment, k=10, MIP

DBpedia-Arctic — 4,635,922 vectors, 768-dim, 1 segment, k=10, MIP

Uh oh!

ah89 commented Mar 10, 2026

Updated default visit_percentage calculation for bbq_disk IVF search

Formulae

Reasoning

Recommendation for users

Impact on existing users

Uh oh!

ah89 commented Mar 16, 2026

Uh oh!

iverase commented Mar 17, 2026

Uh oh!

ah89 commented Mar 18, 2026

Uh oh!

iverase commented Mar 18, 2026

Uh oh!

iverase commented Mar 18, 2026

Uh oh!

ah89 commented Mar 19, 2026

Uh oh!

benwtrent commented Mar 19, 2026

Uh oh!

ah89 commented Mar 19, 2026

Uh oh!

iverase commented Mar 19, 2026

Uh oh!

benwtrent left a comment

Choose a reason for hiding this comment

Uh oh!

ah89 commented Mar 20, 2026

Uh oh!

Benchmarking: `main` (log₁₀²) vs `fix` (linear)

ah89 commented Feb 24, 2026 •

edited

Loading

Benchmarking: main (`log₁₀²`) vs fix (`sublinear`)

tteofili commented Feb 27, 2026 •

edited

Loading

ah89 commented Mar 4, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

ah89 commented Mar 6, 2026 •

edited

Loading

ah89 commented Mar 6, 2026 •

edited

Loading

Updated default `visit_percentage` calculation for `bbq_disk` IVF search

ah89 Mar 20, 2026 •

edited

Loading

ah89 commented Mar 23, 2026 •

edited

Loading

ah89 commented Mar 24, 2026 •

edited

Loading