Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors #13985
Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors #13985ChrisHegarty merged 11 commits intoapache:mainfrom
Conversation
ffb35a1 to
515edb3
Compare
lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java
Outdated
Show resolved
Hide resolved
9d0c9b4 to
6df5f33
Compare
|
@shatejas I'm curious how much this actually helps, and I know that you said that benchmark results would be posted. I do like that we can update the ReadAdvice on an index input 👍 What is unclear is how much the sequential advise here helps over something like a |
|
Hi, |
@ChrisHegarty Preliminary results showed approximately 40mins (~13%) reduction in total force merge time for 10m dataset. You should be able to find details here opensearch-project/k-NN#2134 (comment).
I did try preload which does |
|
Thanks @shatejas For clarity, the bottleneck that is being fixed here is with the reading of all vector data from the to-be-merged segments, when copying that data to the new segment - all non-deleted vectors are accessed sequential, just once, as they are copied. We can then switch back to random access when building the graph. |
|
Generally, I think that the direction in this PR is good. I wanna help get it moved forward. I'll do some local testing and perf runs to verify the impact. I can also commit some tests and improvements directly to the PR branch. @shatejas ? |
Feel free to commit changes, added you as a collaborator to the shatejas/lucene repo. Do let me know if you need any other access I will be working on the benchmarks within next few days. |
reading IndexInput The change is needed to be able to reduce the force merge time. Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes searches with madvise as RANDOM. For merges we need sequential access and ability to preload pages to be able to shorten the merge time. The change updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.
6df5f33 to
a4d4d3c
Compare
|
I added some asserts to the asserting wrappers to ensure that finishMerge is always called. I think that the code is in good shape now. Pending some benchmark runs.. @shatejas no need to force-push. If merged, we collapse the commits into a single one. |
BenchmarksSetup 1 - Opensearch clusterRan with opensearch benchmarks Total data nodes - 3 Dataset used: cohere-10m Baseline - OS 2.18 and lucene 9.12 Why was this tested with lucene 9.12? Run 1: sequence of operations: delete-index -> create-index -> add documents -> force-merge -> search Results
Run 2: Search performed on already indexed data from above run
Setup 2: Used lucene-utils knnPerfTest.pyBaseline - Lucene main Baseline
Candidate
There is a small affect on search latencies, its hard to say if its due to the change or just a fluctuation in the runs. I couldn't think of a reason that would of search latencies @jpountz @ChrisHegarty thoughts? |
ChrisHegarty
left a comment
There was a problem hiding this comment.
Restoring sequential advice is the right thing to do here, and has clear perf benefits. The code LGTM. lemme know if u need anything further to get this merged. And backported to 10.x
|
Thanks @ChrisHegarty! I do need help with merging this. I have updated Let me know if I need anything else to merge this |
|
The |
I took a look at the failure. Here is what I found, |
|
I broke the tests again, by adding more asserts, as I'm a little uncomfortable with the |
Took a look, The mismatch between Considering the test has commit() call and then search, I think its running into the scenario |
Oh yeah. I get it, and the asserts were too strong. I backed them off a bit. High level, I wanna ensure that we're asserting the right execution model, since the backing memory holding the vector data is shared across merge and search at the same time. I think that this is fine, search may access the memory while still in sequential, but when merge finishes it'll switch back. |
… merging vectors (#13985) The commit updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.
|
Thanks a lot @ChrisHegarty for adding tight tests and merging this! |
|
Thanks @ChrisHegarty for taking care. If I have any additional comments about API I will open another followup PR. |
|
The impact on search times is surprising, as you said. Can you clarify one thing a about the benchmark setup: does it perform searches concurrently with indexing (and merging) on the same box, or are they completely separate? |
…ce while merging vectors (apache#13985)" This reverts commit 46204f6.
searches were after the completion of indexing and force-merge |
… merging vectors (apache#13985) The commit updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.
| public void finishMerge() throws IOException { | ||
| // This makes sure that the access pattern hint is reverted back since HNSW implementation | ||
| // needs it | ||
| this.vectorData.updateReadAdvice(ReadAdvice.RANDOM); |
There was a problem hiding this comment.
This is too opinionated. If we want to have this swapping behavior I think we need to find a way to cache the original ReadAdvice and restore it, since we don't know what it was - that decision is made by the Directory and its hints and iocontexts and sysprops and so on. But I don't think we currently have IOContext.getReadAdvice()?
There was a problem hiding this comment.
I think this make sense. We should flip back to the read advise which was originally present
There was a problem hiding this comment.
Directory and its hints and iocontexts and sysprops and so on. But I don't think we currently have IOContext.getReadAdvice()?
At the time of the change the initial advice was random so this is reverting back to random. But I agree, we can simply cache it and revert it from the cached value
|
I'm really not sure about this change any more -- looking back at the (merging) performance improvement, it was not very large, and somewhat offset by the mysterious search slowdowns. At the same time, to get this working we need to add lots of stuff: reference counting to ensure proper closing, access to memory mapping hints. Are we still convinced it is worth it? |
We can switch to an approach where the madvise is not changed at all. The approach leverages prefetch, read advise still changes but the reads are not as aggressive as sequential. A prefetch call after each read (inside merge) will make sure that merges don't slow down. At the same time this minimizes the search perf impact considering prefetch is not aggressive. |
|
Perhaps if we have a case where there is no random access (from Lucene) at all and we are only using Lucene to store the vector data - any searh indexing is being done by a native plugin (I think this is what you are targeting?) then we don't really want to be switching back and forth between access modes, but rather we'd want to set to SEQUENTIAL and leave it that way, ideally? In that case, what if the reader could recognize that it has no associated HNSW graph (somehow) and provide some kind of hint to the directory that it could then use to decide to use sequential? Also, wondering if SEQUENTIAL is really that much better than NORMAL? If not, then we could simply revert this given that we are retruning to NORMAL as the default. |
|
I'll open an issue for backporting #14702 to 10x to fix this |
In certain scenarios, random access is critical for optimal performance. Our initial hypothesis, which we are currently validating through benchmarks, is that random access lookups are significantly more efficient than sequential lookup even for flat vector data under high memory pressure, especially when doing a exact search on filtered documents. This is because random access avoids data prefetching, thereby reducing memory swapping. To provide users with greater control, its best to allow them to configure initial IOContext value based on their specific workloads, rather than keeping it a fixed constant. This approach will offer flexibility while maintaining sensible defaults. Furthermore, we can mitigate the impact on search performance during merges by implementing a dedicated prefetch functionality for vector merges. This eliminates the need to switch between access methods, ensuring a minimal impact on ongoing searches. |
The change is needed to be able to reduce the force merge time. Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes searches with madvise as RANDOM. For merges we need sequential access and ability to preload pages to be able to shorten the merge time.
The change updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for
Lucene99FlatVectorsReader.
Description
Benchmarking results coming up. Opening as a draft to get initial feedback
Related issues
#13920