Improve vector search speed by using FixedBitSet by benwtrent · Pull Request #12789 · apache/lucene

benwtrent · 2023-11-09T15:04:08Z

While doing some performance testing and digging into flamegraphs, I noticed for smaller vectors (96dim float32), we were losing a fair bit of time within the SparseFixedBitSet#getAndSet method.

I am assuming we are using SparseFixedBitSet for performance reasons to reduce memory usage?

I ran some tests with topK=100 and fanOut=1000. To check memory usage, in a separate run, I printed out bitSet.ramBytesUsed() after every search.

I tested using FixedBitSet instead with GLOVE and saw almost a 10% improvement in search speed:

completed 1000 searches in 803 ms: 1245 QPS CPU time=801ms
checking results
0.695	 0.80	100000	1000	16	100	1100	0	1.00	post-filter

Vs. baseline

completed 1000 searches in 873 ms: 1145 QPS CPU time=873ms
checking results
0.695	 0.87	100000	1000	16	100	1100	0	1.00	post-filter

The total ramBytesUsed (allocated and then gc'd collected) for 1000 searches over glove was 21288656 bytes. For fixed bit set, every search only allocates 12544, which pans out to 12544000 bytes (actually less than sparse).

To confirm this was still true for larger vectors and a larger graph, I tested against 400k cohere vectors (same params). There is a bit more noise in the measurements, so I averaged the latency over 4 runs:

candidate: 6.115 with a min: 5.96

baseline: 6.23 with a min: 6.15.

Total memory usage for sparse: 103982464 vs for total memory usage for Fixed 50040000

Do we know the goal for using a SparseFixedBitSet and under what conditions it would actually perform better than a regular FixedBitSet? I will happily test some more.

jpountz · 2023-11-09T16:22:50Z

I can believe that FixedBitSet is faster in some cases, but it's surprising to me that the memory usage of SparseFixedBitSet can go up to 2x that of FixedBitSet, this makes me wonder if SparseFixedBitSet#ramBytesUsed is buggy. A SparseFixedBitSet that has all its bits set (worst-case scenario for memory usage) has an overhead of one long, one object reference and and one array every 4096 bits = 64 longs, which doesn't feel to me it should ever use 2x more heap?

Separately, I wonder if you know the number of nodes that get visited (ie. the number of bits that end up being set) in your benchmark? Is it a force-merged index or do you have multiple segments?

benwtrent · 2023-11-09T16:56:26Z

@jpountz I re-ran my tests and double checked my numbers, I have some corrections, I accidentally double-counted sparse sizes, so previous numbers are 2x too big.

GLOVE-100-100_000:
sparse_total_usage: 10644328
fixed_total_usage: 12544000
num_nodes_visited_avg: 3561
num_nodes_visited_min: 1971
num_nodes_visited_max: 6904

Cohere-768-400_000:
sparse_total_usage: 51991232
fixed_total_usage: 50040000
num_nodes_visited_avg: 12324
num_nodes_visited_min: 3213
num_nodes_visited_max: 19979

EDIT:

@jpountz to confirm the sparse fixed bitset memory usage, I rewrote the ramBytesUsed to be more exact (instead of summing up the estimation).

  @Override
  public long ramBytesUsed() {
    long bytesUsed = BASE_RAM_BYTES_USED;
    bytesUsed += RamUsageEstimator.sizeOf(indices);
    bytesUsed += RamUsageEstimator.shallowSizeOf(bits);
    for (long[] bitArray : bits) {
      if (bitArray != null) {
        bytesUsed += RamUsageEstimator.sizeOf(bitArray);
      }
    }
    return bytesUsed;
  }

Obviously, this ran slightly slower, but from what I found, this didn't reduce the memory estimation. I still got 51991232

jpountz · 2023-11-10T13:29:15Z

Thanks, the numbers make more sense to me now.

Intuitively, FixedBitSet performs better when a large percentage of nodes needs to be visited and SparseFixedBitSet performs better otherwise. Practically, the smaller segments of an index should probably always use a FixedBitSet. E.g. a simple threshold may consist of using SparseFixedBitSet when we would expect it to use less memory than FixedBitSet, ie. when less than 1/64 = 1.5% of the nodes get visited (or possibly a bit less: if both SparseFixedBitSet and FixedBitSet use similar amounts of memory, it probably makes sense to bias towards FixedBitSet) and FixedBitSet otherwise. I see that your benchmark visits between 2.0% and 6.9% of the nodes on GLOVE and between 0.8% and 5.0% on Cohere, so it makes sense to me that FixedBitSet performs better.

Is it possible to estimate the order of the number of nodes that a nn search needs to visit, so that we could use it as a threshold?

benwtrent · 2023-11-10T15:28:52Z

@jpountz searching scales logarithmically, but we do have to explore more if there are any pre-filtered nodes.

We can run some experiments to determine the appropriate threshold. I imagine it will be something along the lines of topK * log(graphSize) with some constant scaling applied.

jpountz · 2023-11-15T18:18:21Z

++ This feels similar to IndexOrDocValuesQuery: we probably can't guess the absolute best threshold, but we can probably figure out something that is right more often than wrong. Hopefully we can keep it simple and not include maxConn and other parameters in the equation.

Improve vector search speed by using FixedBitSet

d55d547

benwtrent added the vector-based-search label Nov 9, 2023

Adding changes

b7cf7c3

This was referenced Dec 6, 2023

Use hash set for visited nodes in HNSW search? [LUCENE-10404] #11440

Open

Re-use information from graph traversal during exact search #12820

Closed

benwtrent closed this Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve vector search speed by using FixedBitSet#12789

Improve vector search speed by using FixedBitSet#12789
benwtrent wants to merge 2 commits intoapache:mainfrom
benwtrent:feature/minor-vector-search-speed-improvement

benwtrent commented Nov 9, 2023

Uh oh!

jpountz commented Nov 9, 2023

Uh oh!

benwtrent commented Nov 9, 2023 •

edited

Loading

Uh oh!

jpountz commented Nov 10, 2023

Uh oh!

benwtrent commented Nov 10, 2023

Uh oh!

jpountz commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

benwtrent commented Nov 9, 2023

Uh oh!

jpountz commented Nov 9, 2023

Uh oh!

benwtrent commented Nov 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Nov 10, 2023

Uh oh!

benwtrent commented Nov 10, 2023

Uh oh!

jpountz commented Nov 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

benwtrent commented Nov 9, 2023 •

edited

Loading