Skip to content

Simplify DiskBBQ dynamic visit ratio to linear#142784

Merged
ah89 merged 13 commits intoelastic:mainfrom
ah89:fix/diskbbq-linear-visit-ratio
Mar 31, 2026
Merged

Simplify DiskBBQ dynamic visit ratio to linear#142784
ah89 merged 13 commits intoelastic:mainfrom
ah89:fix/diskbbq-linear-visit-ratio

Conversation

@ah89
Copy link
Copy Markdown
Contributor

@ah89 ah89 commented Feb 20, 2026

Solution Design

Replace the log10(N)² × min(10_000, max(numCands, 5*k)) / N formula in AbstractIVFKnnVectorQuery.rewrite with a simple linear min(10_000, max(numCands, 5*k)) / N mapping — removing only the log10(N)² multiplier. The floor (max(numCands, 5*k)) and ceiling (min(..., 10_000)) are unchanged. This makes num_candidates directly represent the number of vectors to traverse, giving users a predictable, size-independent tuning knob.

Also:

  • Extract computeDynamicVisitRatio as a testable static method
  • Add numCands to equals/hashCode since it now directly drives visit ratio when providedVisitRatio == 0
  • Rename maxVectorVisited to maxExpectedTraversed in IVFVectorsReader for clarity on pre-filter vs post-filter semantics

Benchmarking: main (log₁₀²) vs fix (linear)

Here are the comparison results on GIST-1M (1M vectors, 960 dims, euclidean, IVF 1-bit BBQ, force-merged to 1 segment):

num_candidates main recall fix recall main visited fix visited main latency(ms) fix latency(ms)
15 0.29 0.13 4,018 653 0.11 0.04
50 0.29 0.13 4,018 653 0.12 0.04
100 0.34 0.13 7,595 672 0.18 0.04
200 0.37 0.14 14,756 703 0.31 0.04
500 0.40 0.17 36,390 1,366 0.69 0.06
1000 0.40 0.25 72,405 2,396 1.38 0.08
5000 0.41 0.35 360,329 10,395 6.62 0.27
10000 0.41 0.38 720,384 20,340 13.39 0.44

Key observations

  1. The old formula massively over-visits. At num_candidates=100 on 1M vectors, main visits 7,595 vectors (0.76%), while fix visits 672 (0.07%). The old log₁₀(N)² ≈ 36× multiplier inflates the visit budget by ~11× at this scale.

  2. Recall difference is modest. Going from 7,595 visited → 672 only drops recall from 0.34 → 0.13. The old formula burned 10× more vectors for only +0.21 recall. At higher num_candidates the gap narrows: at 10K, it's 0.41 vs 0.38.

  3. The linear formula is dramatically faster. At num_candidates=10000: 0.44ms vs 13.39ms — a 30× speedup — because it visits 20K vectors instead of 720K.

  4. Diminishing returns on main. The old formula already tops out at recall ~0.41 around num_candidates=500 and can't improve further even at 10K (visiting 72% of all vectors). This suggests GIST-1M is a hard dataset where recall is bounded by quantization quality, not visit budget.

  5. The fix makes num_candidates a direct, predictable knob. With the linear formula, num_candidates directly controls how many vectors are traversed. Users who need higher recall can increase it, understanding the cost linearly.

Closes #142617

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Feb 20, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitratio goes from 0-100.

This calculation just won't work.

One other thing to consider is moving this "dynamic visit percentage" based on num_candidates to be "per-segment" instead of globally calculated. This may help as well.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Feb 24, 2026

Based on the problem of the possibility of large segments the capped linear formula won't work so we chnage it to a sublinear option and also added more modern dataset.

New Formulas

Main Proposed (α = 1.5)
Formula $\text{visitRatio} = \frac{\lfloor \log_{10}(N)^2 \times \text{numCands} \rceil}{N}$ $\text{visitRatio} = \min\Bigl(1,\frac{x \cdot \ln\bigl(1 + \tfrac{N}{x}\bigr)^{1.5}}{N}\Bigr)$
Where $N$ = total vectors in shard $x = \min(10\text{K},\max(\text{numCands},5k))$, $N$ = vectors in segment
Actual visited $2 \times \text{visitRatio} \times N$ (2× for SOAR) Same

Property Comparison

Property Main Proposed (α = 1.5)
Scaling in numCands Linear — 2× numCands = 2× visited Sublinear — 2× numCands < 2× visited (dampened by shrinking $\ln$ term)
Scaling in N $\log_{10}(N)^2$ grows slowly; ratio $\propto \frac{\log^2 N}{N} \to 0$ $\ln(1+N/x)^{1.5}$ grows slowly; ratio $\propto \frac{(\ln N)^{1.5}}{N} \to 0$ faster
Scope Shard-level (same ratio for every segment) Per-segment (adapts to each segment's size)
At numCands=10K, N=1M visitRatio=0.36 → 720K visited (72%) visitRatio=0.099 → 199K visited (19.9%)
At numCands=10K, N=100M visitRatio=0.0064 → 1.28M visited (1.28%) visitRatio=0.0028 → 559K visited (0.56%)
At numCands=10K, N=1B visitRatio=0.00081 → 1.62M visited (0.16%) visitRatio=0.00039 → 782K visited (0.078%)
At numCands=100, N=1M visitRatio=0.0036 → 7.2K visited visitRatio=0.0028 → 5.6K visited
At numCands=100, N=100M visitRatio=0.000064 → 12.8K visited visitRatio=0.000051 → 10.3K visited
At numCands=100, N=1B visitRatio=0.0000081 → 16.2K visited visitRatio=0.0000065 → 12.9K visited
numCands meaningful? Always (linear means every bump helps) Always (monotonically increasing; diminishing returns at high numCands)
Efficiency at high numCands (N=1M) Bad — visits 72% of dataset at 10K Good — visits 20% of dataset at 10K
Low numCands behavior Floor at $\log_{10}(N)^2 \times 50 / N$ Floor at $50 \cdot \ln(1+N/50)^{1.5}/N$ (slightly less than main → slightly less recall)

Benchmarking: main (log₁₀²) vs fix (sublinear)

GIST-1M (N=1,000,000)

numCands Main recall/visited/ms α=1.5 recall/visited/ms vs main visited
15 0.94 / 3,806 / 0.24 0.93 / 3,382 / 0.20 0.89×
50 0.94 / 3,806 / 0.23 0.93 / 3,382 / 0.22 0.89×
100 0.94 / 7,410 / 0.37 0.94 / 5,815 / 0.31 0.78×
200 0.96 / 14,650 / 0.63 0.95 / 10,168 / 0.47 0.69×
500 0.97 / 36,246 / 1.47 0.96 / 21,225 / 0.88 0.59×
1000 0.97 / 72,263 / 2.89 0.97 / 36,563 / 1.49 0.51×
5000 0.97 / 360,347 / 14.13 0.97 / 122,424 / 4.87 0.34×
10000 0.97 / 720,354 / 27.49 0.97 / 198,647 / 7.87 0.28×

Wiki-Cohere (N=934,024)

numCands Main recall/visited/ms α=1.5 recall/visited/ms vs main visited
15 0.60 / 3,882 / 0.19 0.56 / 3,400 / 0.19 0.88×
50 0.60 / 3,882 / 0.20 0.56 / 3,400 / 0.20 0.88×
100 0.72 / 7,456 / 0.30 0.67 / 5,869 / 0.29 0.79×
200 0.81 / 14,540 / 0.52 0.76 / 10,136 / 0.40 0.70×
500 0.89 / 35,974 / 1.17 0.83 / 20,964 / 0.75 0.58×
1000 0.92 / 71,580 / 2.37 0.89 / 36,099 / 1.25 0.50×
5000 0.96 / 356,804 / 11.15 0.94 / 120,135 / 4.03 0.34×
10000 0.96 / 713,239 / 21.83 0.95 / 194,276 / 6.44 0.27×

@ah89 ah89 requested a review from benwtrent February 24, 2026 17:44
@tteofili
Copy link
Copy Markdown
Contributor

tteofili commented Feb 27, 2026

from the reported numbers, it seems like we are exploring less, but it seems this exploration translates with similar multiplicative factor to recall drops, at least for lower visit_percentage. since the allocation logic hasn't changed, this is not surprising. the numbers for higher visit_percentage look good instead.
apart from that, moving num_candidates / visit_percentage to apply on a per segment basis makes a lot of sense.

@benwtrent
Copy link
Copy Markdown
Member

@ah89 I need to see the actual visit percentages per segment and segment count, etc.

visiting only 1% maximally with 100M vectors seems like a horrible side effect.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 4, 2026

@ah89 I need to see the actual visit percentages per segment and segment count, etc.

visiting only 1% maximally with 100M vectors seems like a horrible side effect.

@benwtrent The implementation is per-segment.

GIST-1M — HNSW (31 segments, ~33.3k vectors each)

nc recall visited latency(ms)
15 1.00 7,664 1.30
50 1.00 10,088 1.88
100 1.00 11,950 2.30
200 1.00 13,887 2.89
500 1.00 17,028 3.83
1000 1.00 19,919 5.08
5000 1.00 26,905 8.32
10000 1.00 29,394 10.13

GIST-1M — IVF (29 segments, ~33.3k vectors each)

visit% actual_visit% recall visited latency(ms)
0.1 0.71 0.78 7,058 0.92
0.5 1.61 0.92 16,059 1.09
1.0 2.63 0.96 26,250 1.43
2.0 4.74 0.97 47,422 2.20
3.0 6.83 0.98 68,339 2.90
5.0 10.80 0.99 108,000 4.22
7.0 14.83 0.99 148,321 5.57
10.0 20.81 0.99 208,110 7.62

Wiki-Cohere — HNSW (24 segments, ~39.5k vectors each)

nc recall visited latency(ms)
15 0.97 9,533 1.80
50 0.99 13,752 2.60
100 1.00 17,934 3.02
200 1.00 23,770 3.89
500 1.00 35,416 5.95
1000 1.00 49,144 8.43
5000 1.00 112,703 21.52
10000 1.00 164,703 33.11

Wiki-Cohere — IVF (22 segments, ~42.5k vectors each)

visit% actual_visit% recall visited latency(ms)
0.1 1.01 0.68 9,410 0.83
0.5 1.73 0.77 16,176 1.05
1.0 2.72 0.83 25,398 1.32
2.0 4.73 0.90 44,208 1.88
3.0 6.74 0.92 62,912 2.39
5.0 10.76 0.94 100,503 3.43
7.0 14.77 0.95 137,931 4.53
10.0 20.77 0.96 194,027 6.04

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 095db6e7-da0c-403b-a179-4652c3f2c6b4

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 6, 2026

Gist-1M (1M, 1 segment, k=10, MIP)

nc Main rec/vis/ms Two-Signal rec/vis/ms HNSW recall actual visit%
15 0.84 / 1,584 / 0.18 0.95 / 28,987 / 1.21 1.00 2.90%
50 0.84 / 1,584 / 0.19 0.96 / 43,448 / 1.74 1.00 4.35%
100 0.84 / 1,584 / 0.17 0.96 / 53,532 / 2.12 1.00 5.35%
200 0.84 / 1,584 / 0.13 0.96 / 64,229 / 2.55 1.00 6.42%
500 0.84 / 1,584 / 0.13 0.96 / 78,982 / 3.23 1.00 7.90%
1000 0.84 / 1,584 / 0.13 0.97 / 90,296 / 3.60 1.00 9.03%
5000 0.84 / 1,584 / 0.13 0.97 / 90,296 / 3.62 9.03%
10000 0.84 / 1,584 / 0.13 0.97 / 90,296 / 3.54 9.03%

Wiki-Cohere (934k, 1 segment, k=10, MIP)

nc Main rec/vis/ms Two-Signal rec/vis/ms HNSW recall actual visit%
15 0.41 / 1,733 / 0.14 0.87 / 27,133 / 0.96 0.88 2.91%
50 0.41 / 1,733 / 0.15 0.91 / 40,696 / 1.41 0.95 4.36%
100 0.41 / 1,733 / 0.14 0.92 / 50,040 / 1.70 0.97 5.36%
200 0.41 / 1,733 / 0.13 0.93 / 60,096 / 2.08 0.99 6.43%
500 0.41 / 1,733 / 0.13 0.94 / 73,842 / 2.63 1.00 7.91%
1000 0.41 / 1,733 / 0.14 0.94 / 84,412 / 2.96 1.00 9.04%

Comment on lines 300 to 330
if (numCands > 0) {
// Two-signal model: smooth log-based visit ratio from num_candidates/k ratio and k.
// Produces visitRatio in [V_MIN, V_MAX] which, after the SOAR 2× multiplier,
// gives actual visit percentage in [~1%, ~10%].
// The nc/k ratio (capped at R_MAX=100) drives 85% of the signal via log scaling,
// while k itself (capped at K_MAX=10000) contributes the remaining 15%.
final float V_MIN = 0.005f;
final float V_MAX = 0.05f;
final double R_MAX = 100.0;
final double K_MAX = 10_000.0;
double r = (double) numCands / Math.max(k, 1);
double x = Math.log1p(r) / Math.log1p(R_MAX);
double y = Math.log1p(k) / Math.log1p(K_MAX);
x = Math.max(0.0, Math.min(1.0, x));
y = Math.max(0.0, Math.min(1.0, y));
double z = 0.85 * x + 0.15 * y;
visitRatio = (float) (V_MIN + (V_MAX - V_MIN) * z);
} else {
// Fallback when called without IVFKnnSearchStrategy (e.g. checkIndex).
// Use log-based heuristic for reasonable default behavior.
final float V_MIN = 0.005f;
final float V_MAX = 0.05f;
double r = 1.0; // assume nc == k
double x = Math.log1p(r) / Math.log1p(100.0);
double y = Math.log1p(k) / Math.log1p(10_000.0);
x = Math.max(0.0, Math.min(1.0, x));
y = Math.max(0.0, Math.min(1.0, y));
double z = 0.85 * x + 0.15 * y;
visitRatio = (float) (V_MIN + (V_MAX - V_MIN) * z);
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of unnecessary duplicates, the only thing that changes is the r?

Also Math.log1p(R_MAX) and its partner should just be private static final values. Same for all the other constants (why are they local instances???)

Additionally, do we need to divide by 2 to account for the 2x application down stream?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — constants are now private static final, log1p values are pre-computed, and the duplicate log/clamp is extracted into a logScale helper.

Regarding the ÷2 question, I don't think we actually need to halve it. If I understand correctly, the 2× downstream acts more as a SOAR overlap budget (since the same vector might appear in two posting lists), rather than a true multiplier on the unique vectors visited. Let me know if I'm looking at this the wrong way!

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 6, 2026

Please disregard the previous results, as they followed the legacy path:

Wiki-Cohere — 934,024 vectors, 768-dim, 1 segment, k=10, MIP

nc Main Recall Main Visited Main Latency Two-Signal Recall Two-Signal Visited Two-Signal Latency HNSW Recall HNSW Visited HNSW Latency
15 0.60 3,836 0.22 ms 0.87 27,133 0.95 ms 0.88 493 0.10 ms
50 0.60 3,836 0.20 ms 0.91 40,696 1.37 ms 0.95 996 0.20 ms
100 0.72 7,452 0.31 ms 0.92 50,040 1.67 ms 0.97 1,701 0.32 ms
200 0.83 14,556 0.55 ms 0.93 60,096 1.98 ms 0.99 3,099 0.59 ms
500 0.89 35,967 1.29 ms 0.94 73,842 2.70 ms 1.00 6,810 1.33 ms
1000 0.94 71,645 2.44 ms 0.94 84,412 2.79 ms 1.00 12,222 2.45 ms
5000 0.97 356,795 12.43 ms 0.94 84,412 2.77 ms
10000 0.97 713,230 23.09 ms 0.94 84,412 2.78 ms

Gist-1M — 1,000,000 vectors, 960-dim, 1 segment, k=10, MIP

nc Main Recall Main Visited Main Latency Two-Signal Recall Two-Signal Visited Two-Signal Latency HNSW Recall HNSW Visited HNSW Latency
15 0.87 3,803 0.20 ms 0.95 28,987 1.15 ms 1.00 413 0.04 ms
50 0.87 3,803 0.25 ms 0.96 43,448 1.69 ms 1.00 805 0.07 ms
100 0.89 7,451 0.39 ms 0.96 53,532 2.23 ms 1.00 1,204 0.12 ms
200 0.91 14,648 0.67 ms 0.96 64,229 2.56 ms 1.00 1,714 0.19 ms
500 0.96 36,281 1.48 ms 0.96 78,982 3.08 ms 1.00 2,522 0.32 ms
1000 0.96 72,290 2.79 ms 0.97 90,296 3.49 ms 1.00 3,088 0.44 ms
5000 0.97 360,347 13.56 ms 0.97 90,296 3.47 ms
10000 0.97 720,396 26.93 ms 0.97 90,296 3.49 ms

DBpedia-Arctic — 4,635,922 vectors, 768-dim, 1 segment, k=10, MIP

nc Main Recall Main Visited Main Latency Two-Signal Recall Two-Signal Visited Two-Signal Latency HNSW Recall HNSW Visited HNSW Latency
15 0.32 4,776 0.30 ms 0.75 133,446 4.19 ms 0.51 497 0.15 ms
50 0.32 4,776 0.30 ms 0.79 200,683 6.25 ms 0.64 946 0.30 ms
100 0.42 9,234 0.43 ms 0.80 247,291 7.71 ms 0.70 1,471 0.48 ms
200 0.53 18,147 0.70 ms 0.82 296,980 9.14 ms 0.74 2,517 0.83 ms
500 0.63 44,793 1.48 ms 0.83 365,184 11.07 ms 0.78 5,297 1.82 ms
1000 0.71 89,230 2.83 ms 0.84 417,682 12.73 ms 0.79 9,438 3.30 ms
5000 0.85 444,771 13.56 ms 0.84 417,682 12.86 ms 0.82 37,245 13.13 ms
10000 0.87 889,121 27.18 ms 0.84 417,682 12.71 ms 0.83 67,313 24.56 ms

@ah89 ah89 enabled auto-merge (squash) March 9, 2026 21:09
@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 10, 2026

Updated default visit_percentage calculation for bbq_disk IVF search

Formulae

Previous formula (in IVFVectorsReader, segment-level fallback):

$$\text{visitRatio} = \frac{\text{round}(\log_{10}(N)^2 \times k)}{N}$$

where $N$ is the number of vectors in the segment. This formula is segment-size dependent and completely ignores num_candidates — it produces a flat visit budget regardless of how many candidates the user requests. As $N$ grows, the visit ratio shrinks toward zero, leading to severe recall degradation on large segments.

New formula — Two-Signal Model (in AbstractIVFKnnVectorQuery, query-level):

$$r = \frac{\text{num candidates}}{k}$$

$$z = 0.85 \cdot \text{logScale}(r, \ln(1 + 100)) + 0.15 \cdot \text{logScale}(k, \ln(1 + 10{,}000))$$

$$\text{visitRatio} = V_{\min} + (V_{\max} - V_{\min}) \cdot z$$

where:

  • $\text{logScale}(x, m) = \text{clamp}\left(\frac{\ln(1 + x)}{m}, 0, 1\right)$
  • $V_{\min} = 0.005$ (0.5%)
  • $V_{\max} = 0.05$ (5%)

The Two-Signal Model uses two signals to determine how much of each segment to visit:

  1. The num_candidates / k ratio (weight: 85%) — captures the user's intent for recall quality. Higher ratio = user wants more thorough search.
  2. The absolute value of k (weight: 15%) — larger k queries benefit from slightly more exploration to fill the larger result set reliably.

Both signals are log-scaled to produce diminishing returns (doubling num_candidates doesn't double the visit budget), and the result is linearly mapped into the $[V_{\min}, V_{\max}]$ range.

Reasoning

The key properties of the Two-Signal Model:

  1. Segment-size independent (N-independent): The visit percentage depends only on num_candidates and k, not on how many vectors are in the segment. The same query parameters produce the same recall regardless of segment size. The old formula's visit ratio collapsed toward zero on large segments — e.g., for $N = 20M$, the old formula gives ~0.01% visit while the new formula gives a consistent ~3-5%.

  2. num_candidates becomes a meaningful tuning knob: The old formula ignored num_candidates entirely for IVF. The new formula makes num_candidates directly control the recall/latency tradeoff, analogous to how it works for HNSW.

  3. Bounded and well-calibrated: The visit ratio is bounded between 0.5% and 5%. This range was calibrated against multi-segment benchmarks on Wiki-Cohere (768d, ~934K vectors) and GIST-1M (960d, 1M vectors), targeting IVF recall within ~5 points of HNSW recall at equivalent num_candidates values. The IVF 4-bit quantization recall ceiling is ~0.96 (reached at approximately 7-10% actual visit with 2x SOAR multiplier), and visiting beyond that wastes cycles without improving recall.

  4. Log-scaled diminishing returns: The log scaling matches the empirical observation from benchmarks: recall improvement from additional visiting follow a concave curve — the first few percent of visiting produce the most recall improvement, with rapidly diminishing gains after ~5-7% visits.

Recommendation for users

For production workloads, users should set visit_percentage directly rather than relying on the default calculation derived from num_candidates. The visit_percentage parameter provides direct, predictable control over the recall/latency tradeoff:

  • Lower visit_percentage (e.g., 1-3%) → faster queries, lower recall
  • Higher visit_percentage (e.g., 5-10%) → slower queries, higher recall
  • The sweet spot for most workloads is between 3% and 10%
  • IVF recall with 4-bit quantization plateuas at ~0.96 regardless of visit budget — visiting beyond ~10% provides no further recall benefit

Using visit_percentage removes the indirection through num_candidates and gives an explicit, segment-size-independent knob for search quality. The default_visit_percentage field mapping option can also be set at index time to establish a per-index default.

Impact on existing users

This changes adjusts the default visit_percentage calculation for bbq_disk knn searches when visit_percentage is not explicitly set. The new Two-Signal formula visits a significantly higher fraction of vectors per segment compared to the previous default (which was often far too low, especially on large segments). This will increase search latency for many users but substantially improves recall, which was previously unacceptably low in many configurations. Users who were relying on the previous (faster but lower-recall) defaults should expect increased query times. To restore previous latency characteristics, set visit_percentage explicitly to a lower value. We strongly recommend setting visit_percentage directly to control the recall/latency tradeoff for your specific workload.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 16, 2026

Full analysis on 4 tables:


Wiki-Cohere 768d (934K docs, 1 segment, IVF BBQ-4bit)

nc Main VR Main Rcl Main Vis Main Lat 2Sig VR 2Sig Rcl 2Sig Vis 2Sig Lat
15 0.00165 0.58 3,375 0.19 0.00976 0.85 18,553 0.62
50 0.00165 0.58 3,375 0.19 0.02555 0.92 48,058 1.54
100 0.00296 0.67 5,813 0.42 0.03464 0.93 65,072 2.13
200 0.00526 0.78 10,124 0.40 0.03589 0.94 67,346 2.14
500 0.01107 0.86 20,966 0.70 0.03589 0.94 67,346 2.14
1000 0.01915 0.89 36,118 1.14 0.03589 0.94 67,346 2.15
5000 0.06413 0.95 120,107 3.76 0.03589 0.94 67,346 2.23
10000 0.10383 0.96 194,316 6.32 0.03589 0.94 67,346 2.15

GIST-1M 960d (1.0M docs, 1 segment, IVF BBQ-4bit)

nc Main VR Main Rcl Main Vis Main Lat 2Sig VR 2Sig Rcl 2Sig Vis 2Sig Lat
15 0.00156 0.86 3,327 0.18 0.00976 0.94 19,787 0.76
50 0.00156 0.86 3,327 0.18 0.02555 0.96 51,418 1.96
100 0.00280 0.88 5,848 0.28 0.03464 0.96 69,554 2.68
200 0.00497 0.90 10,220 0.42 0.03589 0.96 72,111 2.76
500 0.01048 0.94 21,219 0.82 0.03589 0.96 72,111 2.76
1000 0.01816 0.96 36,600 1.39 0.03589 0.96 72,111 2.79
5000 0.06106 0.97 122,409 4.65 0.03589 0.96 72,111 2.77
10000 0.09915 0.97 198,591 7.61 0.03589 0.96 72,111 2.79

DBpedia-arctic 768d (4.6M docs, 1 segment, IVF BBQ-4bit)

nc Main VR Main Rcl Main Vis Main Lat 2Sig VR 2Sig Rcl 2Sig Vis 2Sig Lat
15 0.00042 0.30 4,216 0.22 0.00976 0.71 90,879 2.74
50 0.00042 0.30 4,216 0.22 0.02555 0.80 237,335 7.12
100 0.00076 0.37 7,356 0.31 0.03464 0.83 321,589 9.75
200 0.00137 0.47 13,105 0.53 0.03589 0.83 333,167 10.04
500 0.00298 0.58 27,965 0.88 0.03589 0.83 333,167 10.10
1000 0.00529 0.64 49,386 1.50 0.03589 0.83 333,167 10.04
5000 0.01927 0.78 178,996 5.76 0.03589 0.83 333,167 10.10
10000 0.03283 0.82 304,758 9.62 0.03589 0.83 333,167 10.10

MSMARCO-v2 1024d (72M docs, 1 segment, IVF BBQ-4bit)

nc Main Recall Main Visited Main Lat(ms) 2Sig Recall 2Sig Visited 2Sig Lat(ms)
15 0.28 5,619 11.5 0.50 1,406,179 956.6
50 0.28 5,619 11.5 0.52 3,680,113 670.6
100 0.34 10,322 9.4 0.52 4,989,230 884.2
200 0.38 18,591 10.7 0.52 5,169,241 914.4
500 0.42 41,214 15.0 0.52 5,169,241 907.9
1000 0.45 75,220 20.8 0.52 5,169,241 914.0
5000 0.48 296,689 59.5 0.52 5,169,241 918.3
10000 0.49 529,705 100.0 0.52 5,169,241 909.8

@iverase
Copy link
Copy Markdown
Contributor

iverase commented Mar 17, 2026

FWIW: My opinion is that for dynamic visit ratio, we should compute the value for each segment separately, taking into account only the size of the segment instead of a value for all segments and the divide the work evenly.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 18, 2026

FWIW: My opinion is that for dynamic visit ratio, we should compute the value for each segment separately, taking into account only the size of the segment instead of a value for all segments and the divide the work evenly.

Yes — the Two-Signal formula computes a visit ratio (not an absolute count), and each segment independently applies that ratio to its own size via IVFKnnSearchStrategy.

Check IVFVectorsReader.java ~line 327:

long maxVectorVisited = (long) (2.0 * visitRatio * numVectors);

where numVectors = values.size() (line ~300) is per-segment, not the total index size.

@ah89 ah89 requested a review from benwtrent March 18, 2026 15:50
@iverase
Copy link
Copy Markdown
Contributor

iverase commented Mar 18, 2026

Oh. I thought that if we are moving to compute this ratio in a per segment basis, there is not really point to compute it in the query and instead doing it in the codec (I hate we had this code in two places). I see. you have not changed the implementation here:

https://github.com/ah89/elasticsearch/blob/3a3173511d50d6631c6cb7c11b0f4980fd1bb2c8/server/src/main/java/org/elasticsearch/index/codec/vectors/diskbbq/IVFVectorsReader.java#L323

Are you planing to do something with this?

@iverase
Copy link
Copy Markdown
Contributor

iverase commented Mar 18, 2026

I see that maxVectorVisited is dependant on the size of the segment but the visit ratio is the same for each segment.

My experience is that the visit ratio can be smaller as the segment gets bigger to hit a specific recall.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 19, 2026

Yes, every segment gets the same visit ratio, regardless of whether it contains 100K or 70M vectors. That makes sense given the uniform treatment of segments when no prior information is provided.

If, however, you mean that larger segments should have a smaller ratio because of IVF clustering, I can add a SEGMENT_SIZE_REFERENCE so the visit ratio changes based on the number of vectors in each segment. Since IVF groups vectors into buckets for faster search, it tends to work better as the amount of data grows.

@benwtrent
Copy link
Copy Markdown
Member

@ah89 @iverase

I think this linear transformation we have in this PR is a good starting point. @ah89 please do adjust the other places were we are using the old formulae.

What do y'all think of having a small and limited bias that adjusts the final percentage according to segment size? It would have an asymptotic limit (to prevent bad scenarios like 0.00001% visit), and start near 0 when number of vectors is <10k?

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 19, 2026

I think a segment-size bias makes sense — large segments have better-populated IVF clusters so they need a smaller visit fraction, and tiny segments (<10K) with poor clustering could benefit from visiting more. That said, our Two-Signal formula already produces very small ratios (0.3%–4%), so the effect of an additive bias near 0 for small segments may be minimal. It also adds a tuning dimension that's hard to calibrate without benchmarks across many segment sizes.

I'm leaning towards adding this as a follow-up once we have more data, but I'll leave it to @benwtrent @iverase to decide.

@iverase
Copy link
Copy Markdown
Contributor

iverase commented Mar 19, 2026

I am fine leaving it as a follow up.

My suggestion is to index a fixed number of documents and force merge into one segment. Then compute how much visit percentage is needed to achieve a target recall.

Try the same thing for different number of documents so we can get a nice curve segment-size / visit percentage.

If we do it for a few datasets we can get a good idea on how visit percentage should change with the segment size?

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify IVFVectorsReader#search. It is possible that this format is called without the IVF strategy and thus your logic won't work in that scenario.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 20, 2026

@benwtrent

Please verify IVFVectorsReader#search. It is possible that this format is called without the IVF strategy and thus your logic won't work in that scenario.

Good catch — verified and fixed. When called without IVFKnnSearchStrategy (e.g., CheckIndex, tests), numCands=0 and the Two-Signal model produces a fixed visit ratio (~0.004) regardless of segment size, which visits nearly nothing on small segments.

Fixed by splitting the dynamic fallback: when numCands > 0 (IVF strategy with dynamic ratio), use Two-Signal; when numCands == 0 (no strategy), fall back to the original segment-size-aware formula (log10(N)² * k / N).

Do you have a better idea?

segmentInfo.append(segVectors);
}
segmentInfo.append("]");
logger.info("Segment layout: {}", segmentInfo);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want this to be logged via info all the time?

This seems like a debug option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — changed to debug. This is only useful when troubleshooting, not needed in normal runs.

Comment on lines +330 to +337
logger.debug(
"IVF search segment: numVectors=[{}], visitRatio=[{}], maxVectorVisited=[{}], numCands=[{}], k=[{}]",
numVectors,
visitRatio,
maxVectorVisited,
numCands,
k
);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove all this logging, it isn't necessary. if anything its "trace" and I don't think it provides useful information?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
float percentFiltered = Math.max(0f, Math.min(1f, approximateCost / numVectors));
int numCands = 0;
int k = knnCollector.k();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we know k, but don't know numcands we know for a fact that numCands >= k. So, let's just lean into that. Do the new logic with k==numCands.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines +51 to +57
// Two-Signal Model constants for dynamic visit ratio
private static final float V_MIN = 0.003f;
private static final float V_MAX = 0.04f;
private static final double LOG1P_R_MAX = Math.log1p(10.0);
private static final double LOG1P_K_MAX = Math.log1p(10_000.0);
private static final double RATIO_WEIGHT = 0.85;
private static final double K_WEIGHT = 0.15;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want to have this logic shareable so that the format can use it AND the query. Possibly in the IVFFormat file as a static method.

Copy link
Copy Markdown
Contributor Author

@ah89 ah89 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — the logic now lives only in IVFVectorsReader.computeDynamicVisitRatio(). The query side (AbstractIVFKnnVectorQuery) just passes visitRatio=0.0f through to the codec and doesn't compute anything itself, so there's nothing to share.

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Please follow up with some experiments as Ignacio suggests and augment with some additional bias towards segment size.

@ah89 ah89 disabled auto-merge March 23, 2026 13:41
@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 23, 2026

I am fine leaving it as a follow up.

My suggestion is to index a fixed number of documents and force merge into one segment. Then compute how much visit percentage is needed to achieve a target recall.

Try the same thing for different number of documents so we can get a nice curve segment-size / visit percentage.

If we do it for a few datasets we can get a good idea on how visit percentage should change with the segment size?

@iverase @benwtrent Here are the segment-size vs visit-percentage curves.

Setup: Same dataset indexed at different sizes (10K, 50K, 100K, 250K, 500K, full), force-merged to 1 segment, IVF BBQ-4bit, k=10, nc=1000. Sweep visit% from 0.1% to 20%.


GIST-1M (960d, euclidean)

Recall at each visit percentage

visit% N=10K N=50K N=100K N=250K N=500K N=1M
0.1 0.60 0.40 0.31 0.28 0.31 0.38
0.25 0.60 0.41 0.32 0.43 0.52 0.58
0.5 0.60 0.42 0.44 0.59 0.68 0.72
0.75 0.60 0.53 0.48 0.69 0.76 0.79
1.0 0.61 0.59 0.55 0.75 0.81 0.82
1.5 0.61 0.67 0.67 0.81 0.85 0.86
2.0 0.62 0.71 0.74 0.84 0.88 0.89
3.0 0.70 0.79 0.83 0.88 0.91 0.91
4.0 0.79 0.84 0.87 0.89 0.91 0.91
5.0 0.80 0.86 0.90 0.90 0.92 0.91
7.0 0.86 0.88 0.92 0.91 0.93 0.92
10.0 0.89 0.90 0.93 0.91 0.93 0.92
15.0 0.91 0.91 0.94 0.92 0.93 0.92
20.0 0.93 0.92 0.94 0.92 0.93 0.92

Minimum actual visit% for target recall

Target Recall N=10K N=50K N=100K N=250K N=500K N=1M
0.80 13.16% 6.96% 5.65% 2.97% 1.97% 1.70%
0.85 16.79% 9.62% 7.32% 4.63% 3.07% 2.79%
0.90 28.04% 20.59% 10.31% 10.14% 5.41% 5.04%

Wiki-Cohere (768d, MIP)

Recall at each visit percentage

visit% N=10K N=50K N=100K N=250K N=500K N=934K
0.1 0.48 0.47 0.24 0.28 0.40 0.48
0.25 0.48 0.51 0.30 0.41 0.57 0.65
0.5 0.51 0.55 0.39 0.54 0.70 0.76
0.75 0.55 0.61 0.49 0.64 0.77 0.83
1.0 0.55 0.66 0.56 0.69 0.81 0.85
1.5 0.61 0.70 0.66 0.77 0.87 0.88
2.0 0.61 0.75 0.72 0.82 0.89 0.90
3.0 0.66 0.82 0.78 0.86 0.92 0.93
4.0 0.74 0.85 0.84 0.88 0.92 0.94
5.0 0.76 0.87 0.86 0.89 0.93 0.94
7.0 0.81 0.90 0.90 0.91 0.93 0.95
10.0 0.85 0.91 0.92 0.93 0.94 0.96
15.0 0.89 0.92 0.93 0.94 0.94 0.97
20.0 0.90 0.92 0.93 0.94 0.94 0.97

Minimum actual visit% for target recall

Target Recall N=10K N=50K N=100K N=250K N=500K N=934K
0.80 16.37% 6.07% 6.99% 3.72% 1.93% 1.32%
0.85 23.40% 8.71% 9.32% 5.61% 2.73% 2.03%
0.90 43.30% 14.62% 14.32% 12.15% 4.73% 4.03%

Key observations & Possibilities

  1. Larger segments need a smaller visit% for the same recall. IVF clustering becomes more effective with more data. To reach recall=0.90:

    • GIST: N=10K → 28% actual visit vs N=1M → 5%
    • Wiki-Cohere: N=10K → 43% vs N=934K → 4%
  2. The relationship is roughly inverse-logarithmic. Going from 10K→100K (10×) cuts the required visit% by ~3×. Going from 100K→1M (10×) cuts it by another ~2×.

  3. Small segments (<50K) need disproportionately more visiting because IVF clusters are poorly populated and quantization noise has more impact.

  4. Note on actual vs configured visit%: SOAR causes actual visit% to be ~2× the configured value. The "minimum visit%" table uses actual visit%, which is the true cost. Very small segments also get a floor from minimum cluster visits.

The data clearly shows our current Two-Signal model (fixed ratio regardless of segment size) works well for segments ≥100K, but under-visits small segments. This supports @benwtrent's segment-size bias idea. Given that the data is now available and the shape of the curve is clear, do you think we should add the segment-size bias in this PR, or keep it as a follow-up?

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 24, 2026

10K vectors went from 0.77 → 0.94 recall (+0.17), 100K went from 0.86 → 0.98 (+0.12). The bias is applied per-segment, which means it fires for ALL sizes here. The threshold of for small segments is N_REF=500k.

New : with Bias
Old: without Bias

num_docs segs recall old→new Δrecall visit% old→new latency old→new lat ratio
10K 1 0.77 → 0.94 +0.17 10% → 30% 0.10 → 0.16ms 1.6x
50K 1 0.91 → 0.98 +0.07 7.4% → 25% 0.18 → 0.47ms 2.6x
100K 1 0.86 → 0.98 +0.12 7.2% → 23% 0.28 → 0.86ms 3.1x
250K 1 0.96 → 0.97 +0.01 7.0% → 17% 0.66 → 1.68ms 2.5x
500K 2 (~250K/seg) 0.93 → 0.93 0.00 7.0% → 11% 1.30 → 2.08ms 1.6x
1M 3 (~333K/seg) 0.96 → 0.96 0.00 7.0% → 10.7% 2.70 → 4.24ms 1.6x

@benwtrent
Copy link
Copy Markdown
Member

@ah89 this shows me that in a realistic scenario, adding the bias the way you did is expensive with no realistic benefit.

The bias that Ignacio was focusing on was REDUCING the visit percentage as segments get larger, not significantly increasing the visit percentage on smaller segments. I think the answer there is simply because the clusters are static and larger. We should not optimize for tiny segments, that just isn't realistic. Things will be merged and we will have a diversity of segment sizes.

@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 24, 2026

@ah89 @iverase

I think this linear transformation we have in this PR is a good starting point. @ah89 please do adjust the other places were we are using the old formulae.

What do y'all think of having a small and limited bias that adjusts the final percentage according to segment size? It would have an asymptotic limit (to prevent bad scenarios like 0.00001% visit), and start near 0 when number of vectors is <10k?

@benwtrent This change is addressing your earlier point on small segments (<10k or so).

The bias improves recall, but it increases visit % significantly and adds ~1–2ms latency while applying to all segment sizes (N_REF=500k).

So it’s still a tradeoff for limited benefit in the scenarios we care about, and it’s different from Ignacio’s focus on larger segments. My point on his suggestion is the same as mentioned before, beyond uniformity, this starts to feel ad hoc data dependent solution.

ah89 and others added 9 commits March 27, 2026 11:08
Replace the log10(N)^2 * clampedNumCands / N formula in
AbstractIVFKnnVectorQuery.rewrite with a simple linear
clampedNumCands / N mapping. This makes num_candidates
directly represent the number of vectors to traverse,
giving users a predictable, size-independent tuning
knob.

Also:
- Extract computeDynamicVisitRatio as a testable static
  method
- Add numCands to equals/hashCode since it now directly
  drives visit ratio when providedVisitRatio == 0
- Rename maxVectorVisited to maxExpectedTraversed in
  IVFVectorsReader for clarity on pre-filter vs
  post-filter semantics

Closes elastic#142617
Introduces a new two-signal log-based heuristic for controlling IVF
visit ratios, aiming for smoother scaling and more predictable recall
across varying parameter ranges. Enhances logging to capture segment
layouts and vector distributions, and tracks actual visit percentages
for better observability of search behavior. Updates reporting to
include segment counts and visit metrics for more comprehensive
benchmarking.
Switches from a custom log-based model to a more principled two-signal
approach for estimating the visit ratio, improving consistency and
maintainability.

Centralizes the two-signal logic and constants, reducing duplication
and potential inconsistencies. Enhances the robustness of vector search
parameterization, especially when candidate values are not explicitly
provided.
Updates the logic for determining the approximate cost of accepted
documents, correctly handling the case when all documents are accepted
to avoid misleading cost estimations. Improves accuracy of candidate
filtering and prevents potential downstream errors.
Relocates the Two-Signal Model visit ratio calculation from the query
layer (AbstractIVFKnnVectorQuery) into the codec layer (IVFVectorsReader)
where it can leverage per-segment context.

Removes verbose debug logging from IVF search and downgrades segment
layout logging from info to debug level to reduce log noise.

Initializes numCands to k instead of 0 so the dynamic ratio computation
has a meaningful default when no search strategy overrides it.
Replace the original log10(N)^2 * k / N visit-ratio formula with a
Two-Signal model that derives the visit ratio from both the
num_candidates/k ratio (85% weight) and k magnitude (15% weight),
bounded to [0.3%, 4.0%]. Add a segment-size power-law cap
(0.045 * (1M/N)^0.35) that limits visit ratio for large segments
(~10% at 100K, ~4.5% at 1M, ~2% at 10M) while leaving small
segments governed by the two-signal model alone.

The final visit ratio per segment is min(two_signal, cap).
@ah89 ah89 force-pushed the fix/diskbbq-linear-visit-ratio branch from 24d0254 to 26891cb Compare March 27, 2026 19:21
@ah89
Copy link
Copy Markdown
Contributor Author

ah89 commented Mar 27, 2026

Add segment-size-aware visit ratio cap to IVF search

Motivation (as per @iverase suggestion)

The Two-Signal model computes a dynamic visit ratio from num_candidates/k and k, producing values in [0.3%, 4%]. This works well for typical segment sizes (~100K–500K vectors), but for large segments (1M+ vectors), IVF clusters are better-formed and 4% visit is unnecessarily high — wasting latency without improving recall. Conversely, for very small segments the existing range is already appropriate.

Experiment Design

We ran controlled benchmarks across four datasets to measure recall as a function of visit% and segment size:

Dataset Total Docs Dimensions Similarity
GIST-1M 1,000,000 960 MIP
Wiki-Cohere 934,024 768 MIP
DBpedia-Arctic 4,635,922 768 MIP
MSMarco (user-provided) ~130,000,000 768 MIP

GIST & Wiki-Cohere sweep: Force-merged to 6 segment sizes (10K, 50K, 100K, 250K, 500K, full) × 11 visit percentages (0.1%–20%), k=10, nc=100, IVF 4-bit BBQ.

DBpedia segment sweep: 7 segment counts (1, 2, 4, 6, 8, 10, 12 segments → ~515K to 4.6M docs/segment) × 10 visit percentages (0.1%–20%).

DBpedia ceiling + oversampling test: Tested visit% up to 70% to find the recall ceiling (0.90–0.92 without oversampling), then tested oversample=5x and 10x (5x sufficient, 10x identical).

Formula

Fitted a power-law curve to the empirical data:

cap(N) = 0.045 × (1,000,000 / N)^0.35 × (0.1 / (1 - targetRecall))

Where N = number of vectors in the segment, targetRecall = 0.9 (default).

Example cap values (targetRecall=0.9):

Segment Size Cap
100K 10.07%
333K 6.61%
500K 5.68%
1M 4.50%
5M 2.56%
10M 2.01%

Per-Segment Application

The cap is computed independently for each segment at search time. In IVFVectorsReader.search(), each segment knows its own numVectors = values.size() and computes:

visitRatio = Math.min(computeDynamicVisitRatio(numCands, k), computeSegmentSizeCap(numVectors));

This means in a multi-segment index, each segment gets a different final visit ratio based on its size. A 100K-doc segment gets cap≈10% (two-signal dominates since max is 4%), while a 5M-doc segment gets cap≈2.6% (cap dominates). The two-signal value is the same across all segments (it depends only on query parameters nc and k), but the cap varies per segment.

Comparison: origin/main vs Branch (Two-Signal + Cap)

All runs: IVF BBQ-4bit, k=10, visit_percentage=0.0 (dynamic), no force-merge.

Wiki-Cohere (934K docs, 2 segments, ~467K docs/seg)

nc Main Rcl Main Lat(ms) Main Visited Branch Rcl Branch Lat(ms) Branch Visited Δ Rcl Δ Lat
15 0.49 0.20 4,148 0.79 0.63 18,833 +0.30 +0.43
50 0.49 0.20 4,148 0.90 1.52 48,351 +0.41 +1.32
100 0.64 0.31 7,731 0.92 2.04 65,458 +0.28 +1.73
200 0.75 0.53 14,835 0.92 2.11 67,663 +0.17 +1.58
500 0.88 1.17 36,251 0.92 2.15 67,663 +0.04 +0.98
1000 0.93 2.27 71,973 0.92 2.15 67,663 -0.01 -0.12
5000 0.96 11.14 357,072 0.92 2.14 67,663 -0.04 -9.00
10000 0.96 21.74 713,594 0.92 2.16 67,663 -0.04 -19.58

GIST-1M (1M docs, 3 segments, ~333K docs/seg)

nc Main Rcl Main Lat(ms) Main Visited Branch Rcl Branch Lat(ms) Branch Visited Δ Rcl Δ Lat
15 0.89 0.24 4,119 0.94 0.82 20,244 +0.05 +0.58
50 0.89 0.25 4,119 0.96 1.95 51,853 +0.07 +1.70
100 0.92 0.37 7,760 0.96 2.61 70,097 +0.04 +2.24
200 0.94 0.64 15,092 0.96 2.70 72,678 +0.02 +2.06
500 0.96 1.38 36,765 0.96 2.79 72,678 = +1.41
1000 0.96 2.68 72,875 0.96 2.74 72,678 = +0.06
5000 0.96 13.07 360,916 0.96 2.69 72,678 = -10.38
10000 0.96 25.66 721,012 0.96 2.72 72,678 = -22.94

DBpedia-Arctic (4.6M docs, 9 segments, ~515K docs/seg)

nc Main Rcl Main Lat(ms) Main Visited Branch Rcl Branch Lat(ms) Branch Visited Δ Rcl Δ Lat
15 0.13 0.44 7,511 0.50 2.83 93,740 +0.37 +2.39
50 0.13 0.44 7,511 0.68 7.00 240,237 +0.55 +6.56
100 0.17 0.58 12,182 0.74 9.44 324,638 +0.57 +8.86
200 0.27 0.84 21,068 0.74 9.78 336,148 +0.47 +8.94
500 0.39 1.72 47,844 0.74 9.77 336,148 +0.35 +8.05
1000 0.50 2.90 92,264 0.74 9.79 336,148 +0.24 +6.89
5000 0.78 13.24 447,764 0.74 9.80 336,148 -0.04 -3.44
10000 0.86 26.39 892,046 0.74 9.80 336,148 -0.12 -16.59

Key Observations

  1. Low nc (15–100): Branch visits significantly more → much higher recall (Wiki +0.30, DBpedia +0.57) at modest latency increase
  2. Mid nc (~1000): Both approaches converge to similar recall/latency
  3. High nc (5000–10000): origin/main visits explode (360K–900K vectors) while branch caps at a bounded count, saving 10–23ms per query on GIST/Wiki and 3–17ms on DBpedia with only 0.04–0.12 recall trade-off
  4. The Two-Signal model makes num_candidates a meaningful tuning knob — origin/main's log10(N)² × k / N formula ignores it entirely

Copy link
Copy Markdown
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice looking numbers! LGTM

@ah89 ah89 enabled auto-merge (squash) March 30, 2026 17:00
@ah89 ah89 merged commit cf99520 into elastic:main Mar 31, 2026
36 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Mar 31, 2026
…rics

* upstream/main: (21 commits)
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.topSnippetsFunction} elastic#145353
  Mute org.elasticsearch.xpack.esql.qa.mixed.MixedClusterEsqlSpecIT test {csv-spec:external-basic.scoreFunction} elastic#145352
  [DiskBBQ] Fix bug in NeighborQueue#popRawAndAddRaw (elastic#145324)
  Fix dense_vector default index options when using BFLOAT16 (elastic#145202)
  Use checked exceptions in entitlement constructor rules (elastic#145234)
  ESQL: DS: datasource file plugins should not return TEXT types (elastic#145334)
  Plumb DLM error store through to DlmFrozenTransition classes (elastic#145243)
  Make Settings.Builder.remove() fluent (elastic#145294)
  Add FLS tests for METRICS_INFO and TS_INFO (elastic#145211)
  Fix flaky SecurityFeatureResetTests (elastic#145063)
  [DOCS] Fix conflict markers in ESQL processing command list (elastic#145338)
  Skip certain metric assertions on Windows (elastic#144933)
  [ES|QL] Add schema reconciliation for multi-file external sources (elastic#145220)
  Simplify DiskBBQ dynamic visit ratio to linear (elastic#142784)
  ESQL: Disallow unmapped_fields=load with partial non-KEYWORD (elastic#144109)
  [Transform] Track Linked Projects (elastic#144399)
  Fix bulk scoring to process last batch instead of falling through to scalar tail (elastic#145316)
  Clean up TickerScheduleEngineTests (elastic#145303)
  [CI] ShardBulkInferenceActionFilterIT testRestart - Ensuring that secrets-inference index is available after full restart and unmuting test (elastic#145317)
  Add CRUD doc to the DistributedArchitectureGuide (elastic#144710)
  ...
ncordon pushed a commit to ncordon/elasticsearch that referenced this pull request Apr 1, 2026
* Simplify DiskBBQ dynamic visit ratio to linear

Replace the log10(N)^2 * clampedNumCands / N formula in
AbstractIVFKnnVectorQuery.rewrite with a simple linear
clampedNumCands / N mapping. This makes num_candidates
directly represent the number of vectors to traverse,
giving users a predictable, size-independent tuning
knob.

Also:
- Extract computeDynamicVisitRatio as a testable static
  method
- Add numCands to equals/hashCode since it now directly
  drives visit ratio when providedVisitRatio == 0
- Rename maxVectorVisited to maxExpectedTraversed in
  IVFVectorsReader for clarity on pre-filter vs
  post-filter semantics

Closes elastic#142617

* Enhance IVFKnnSearchStrategy to include numCands and k parameters for improved visit ratio calculation

* [CI] Auto commit changes from spotless

* Improves IVF search logging and visit ratio calc

Introduces a new two-signal log-based heuristic for controlling IVF
visit ratios, aiming for smoother scaling and more predictable recall
across varying parameter ranges. Enhances logging to capture segment
layouts and vector distributions, and tracks actual visit percentages
for better observability of search behavior. Updates reporting to
include segment counts and visit metrics for more comprehensive
benchmarking.

* Refines dynamic visit ratio calc for IVF search

Switches from a custom log-based model to a more principled two-signal
approach for estimating the visit ratio, improving consistency and
maintainability.

Centralizes the two-signal logic and constants, reducing duplication
and potential inconsistencies. Enhances the robustness of vector search
parameterization, especially when candidate values are not explicitly
provided.

* Retune Two-Signal visit ratio formula

* Fixes approximate cost calculation for doc filter

Updates the logic for determining the approximate cost of accepted
documents, correctly handling the case when all documents are accepted
to avoid misleading cost estimations. Improves accuracy of candidate
filtering and prevents potential downstream errors.

* Move dynamic visit ratio computation into codec layer

Relocates the Two-Signal Model visit ratio calculation from the query
layer (AbstractIVFKnnVectorQuery) into the codec layer (IVFVectorsReader)
where it can leverage per-segment context.

Removes verbose debug logging from IVF search and downgrades segment
layout logging from info to debug level to reduce log noise.

Initializes numCands to k instead of 0 so the dynamic ratio computation
has a meaningful default when no search strategy overrides it.

* Replace dynamic visit ratio with Two-Signal model and cap

Replace the original log10(N)^2 * k / N visit-ratio formula with a
Two-Signal model that derives the visit ratio from both the
num_candidates/k ratio (85% weight) and k magnitude (15% weight),
bounded to [0.3%, 4.0%]. Add a segment-size power-law cap
(0.045 * (1M/N)^0.35) that limits visit ratio for large segments
(~10% at 100K, ~4.5% at 1M, ~2% at 10M) while leaving small
segments governed by the two-signal model alone.

The final visit ratio per segment is min(two_signal, cap).

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
mromaios pushed a commit to mromaios/elasticsearch that referenced this pull request Apr 9, 2026
* Simplify DiskBBQ dynamic visit ratio to linear

Replace the log10(N)^2 * clampedNumCands / N formula in
AbstractIVFKnnVectorQuery.rewrite with a simple linear
clampedNumCands / N mapping. This makes num_candidates
directly represent the number of vectors to traverse,
giving users a predictable, size-independent tuning
knob.

Also:
- Extract computeDynamicVisitRatio as a testable static
  method
- Add numCands to equals/hashCode since it now directly
  drives visit ratio when providedVisitRatio == 0
- Rename maxVectorVisited to maxExpectedTraversed in
  IVFVectorsReader for clarity on pre-filter vs
  post-filter semantics

Closes elastic#142617

* Enhance IVFKnnSearchStrategy to include numCands and k parameters for improved visit ratio calculation

* [CI] Auto commit changes from spotless

* Improves IVF search logging and visit ratio calc

Introduces a new two-signal log-based heuristic for controlling IVF
visit ratios, aiming for smoother scaling and more predictable recall
across varying parameter ranges. Enhances logging to capture segment
layouts and vector distributions, and tracks actual visit percentages
for better observability of search behavior. Updates reporting to
include segment counts and visit metrics for more comprehensive
benchmarking.

* Refines dynamic visit ratio calc for IVF search

Switches from a custom log-based model to a more principled two-signal
approach for estimating the visit ratio, improving consistency and
maintainability.

Centralizes the two-signal logic and constants, reducing duplication
and potential inconsistencies. Enhances the robustness of vector search
parameterization, especially when candidate values are not explicitly
provided.

* Retune Two-Signal visit ratio formula

* Fixes approximate cost calculation for doc filter

Updates the logic for determining the approximate cost of accepted
documents, correctly handling the case when all documents are accepted
to avoid misleading cost estimations. Improves accuracy of candidate
filtering and prevents potential downstream errors.

* Move dynamic visit ratio computation into codec layer

Relocates the Two-Signal Model visit ratio calculation from the query
layer (AbstractIVFKnnVectorQuery) into the codec layer (IVFVectorsReader)
where it can leverage per-segment context.

Removes verbose debug logging from IVF search and downgrades segment
layout logging from info to debug level to reduce log noise.

Initializes numCands to k instead of 0 so the dynamic ratio computation
has a meaningful default when no search strategy overrides it.

* Replace dynamic visit ratio with Two-Signal model and cap

Replace the original log10(N)^2 * k / N visit-ratio formula with a
Two-Signal model that derives the visit ratio from both the
num_candidates/k ratio (85% weight) and k magnitude (15% weight),
bounded to [0.3%, 4.0%]. Add a segment-size power-law cap
(0.045 * (1M/N)^0.35) that limits visit ratio for large segments
(~10% at 100K, ~4.5% at 1M, ~2% at 10M) while leaving small
segments governed by the two-signal model alone.

The final visit ratio per segment is min(two_signal, cap).

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DiskBBQ: num_candidates dynamic visit percentage is woefully low on medium+ sized datasets

5 participants