Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors by shatejas · Pull Request #13985 · apache/lucene

shatejas · 2024-11-09T09:50:00Z

The change is needed to be able to reduce the force merge time. Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes searches with madvise as RANDOM. For merges we need sequential access and ability to preload pages to be able to shorten the merge time.

The change updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for
Lucene99FlatVectorsReader.

Description

Benchmarking results coming up. Opening as a draft to get initial feedback

Related issues

#13920

shatejas · 2024-11-09T09:51:01Z

cc: @uschindler @jpountz @navneet1v

lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java

lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java

ChrisHegarty · 2024-11-12T11:42:08Z

@shatejas I'm curious how much this actually helps, and I know that you said that benchmark results would be posted.

I do like that we can update the ReadAdvice on an index input 👍 What is unclear is how much the sequential advise here helps over something like a load (similar to MemorySegment::load), that touches every page.

uschindler · 2024-11-12T16:53:11Z

Hi,
I am currently on travel, so I can't review this. Will look into it posisbly later this week. Greetings from Costa Rica!

shatejas · 2024-11-13T16:37:48Z

I'm curious how much this actually helps, and I know that you said that benchmark results would be posted.

@ChrisHegarty Preliminary results showed approximately 40mins (~13%) reduction in total force merge time for 10m dataset. You should be able to find details here opensearch-project/k-NN#2134 (comment).
Though the setup does not use LuceneHNSW it does use Lucene99FlatVectorsReader.

What is unclear is how much the sequential advise here helps over something like a load (similar to MemorySegment::load), that touches every page.

I did try preload which does MemorySegment::load and there was an improvement in force merge time. The advantage I see with updateAdvice is being able to delay the loading of vectors till needed.

ChrisHegarty · 2024-11-14T12:54:00Z

Thanks @shatejas

For clarity, the bottleneck that is being fixed here is with the reading of all vector data from the to-be-merged segments, when copying that data to the new segment - all non-deleted vectors are accessed sequential, just once, as they are copied. We can then switch back to random access when building the graph.

ChrisHegarty · 2024-11-14T16:31:32Z

Generally, I think that the direction in this PR is good. I wanna help get it moved forward. I'll do some local testing and perf runs to verify the impact. I can also commit some tests and improvements directly to the PR branch. @shatejas ?

shatejas · 2024-11-14T16:44:17Z

Generally, I think that the direction in this PR is good. I wanna help get it moved forward. I'll do some local testing and perf runs to verify the impact. I can also commit some tests and improvements directly to the PR branch. @shatejas ?

Feel free to commit changes, added you as a collaborator to the shatejas/lucene repo. Do let me know if you need any other access

I will be working on the benchmarks within next few days.

reading IndexInput The change is needed to be able to reduce the force merge time. Lucene99FlatVectorsReader is opened with IOContext.RANDOM, this optimizes searches with madvise as RANDOM. For merges we need sequential access and ability to preload pages to be able to shorten the merge time. The change updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.

ChrisHegarty · 2024-11-15T12:42:15Z

I added some asserts to the asserting wrappers to ensure that finishMerge is always called. I think that the code is in good shape now. Pending some benchmark runs..

@shatejas no need to force-push. If merged, we collapse the commits into a single one.

shatejas · 2024-11-18T20:12:52Z

Benchmarks

Setup 1 - Opensearch cluster

Ran with opensearch benchmarks

Total data nodes - 3
Total shards - 6 (2 per node), no replicas
Memory - 128gb
vCPU - 16

Dataset used: cohere-10m

Baseline - OS 2.18 and lucene 9.12
Candidate - OS 2.16 and lucene 9.12 with readAdvice changes

Why was this tested with lucene 9.12?
Opensearch is not using lucene >9.12 for any of its version. Upgrading it to use lucene 10 requires significant changes. For candidate, required commits were cherry-picked

Run 1: sequence of operations: delete-index -> create-index -> add documents -> force-merge -> search

Results

	Force-merge(ms)	Force-merge(hrs)	Search p50	Search p90	Search p99
Baseline	15795889.88313920	4hrs 23 mins	9.6	10.8	14.7
Candidate	15204143.95724240	4hrs 13mins	10.7	12.0	15.0

Run 2: Search performed on already indexed data from above run

	Search p50	Search p90	Search p99
Baseline	9.7	10.6	12.1
Candidate	10.4	11.3	12.5

Setup 2: Used lucene-utils knnPerfTest.py

Baseline - Lucene main
Candidate - Lucene main with current commit

Baseline

recall	latency (ms)	nDoc	topK	fanout	maxConn	beamWidth	quantized	index s	index docs/s	force merge s	num segments	index size (MB)
0.644	0.428	50000	10	64	64	250	no	18.97	2635.18	1.89	1	20.62

Candidate

recall	latency (ms)	nDoc	topK	fanout	maxConn	beamWidth	quantized	index s	index docs/s	force merge s	num segments	index size (MB)
0.644	0.436	50000	10	64	64	250	no	20.20	2474.76	1.77	1	20.62

There is a small affect on search latencies, its hard to say if its due to the change or just a fluctuation in the runs. I couldn't think of a reason that would of search latencies

@jpountz @ChrisHegarty thoughts?

ChrisHegarty

Restoring sequential advice is the right thing to do here, and has clear perf benefits. The code LGTM. lemme know if u need anything further to get this merged. And backported to 10.x

shatejas · 2024-11-19T17:45:02Z

Thanks @ChrisHegarty!

I do need help with merging this. I have updated CHANGES.txt. Do let me know the process of backporting it - if it needs cherry-picking and raising a new PR on 10.x branch, I can do it.

Let me know if I need anything else to merge this

ChrisHegarty · 2024-11-20T16:07:26Z

The org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose test hits a new assert that I added - sorry. I need to look to see if it is a test issue or more of a design issue with finishMerge.

./gradlew test --tests TestConcurrentMergeScheduler.testNoWaitClose -Dtests.file.encoding=UTF-8 -Dtests.iters=10

shatejas · 2024-11-20T20:13:25Z

The org.apache.lucene.index.TestConcurrentMergeScheduler.testNoWaitClose test hits a new assert that I added - sorry. I need to look to see if it is a test issue or more of a design issue with finishMerge.
./gradlew test --tests TestConcurrentMergeScheduler.testNoWaitClose -Dtests.file.encoding=UTF-8 -Dtests.iters=10

I took a look at the failure. Here is what I found, mergeInstanceCount does not decrement if the merge is aborted between getMergeInstance and when the merge starts (ref). In that scenario finishMerge is never called and the readers are closed in the finally block failing the assert mergeInstanceCount == 0 in AssertingKnnVectorsReader

ChrisHegarty · 2024-11-21T17:22:59Z

I broke the tests again, by adding more asserts, as I'm a little uncomfortable with the finishMergeCount <= 0. Now I broke something else. I'll get to fixing this, or backout the extra asserts. For reference this reproduces:

./gradlew :lucene:core:test --tests "org.apache.lucene.codecs.perfield.TestPerFieldKnnVectorsFormat.testByteVectorScorerIteration" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -XX:+UseParallelGC -XX:ActiveProcessorCount=1" -Ptests.seed=C47EA6F578AC0A04 -Ptests.iters=1 -Ptests.gui=false -Ptests.file.encoding=UTF-8 -Ptests.vectorsize=128 -Ptests.forceintegervectors=true -Dtests.iters=10

shatejas · 2024-11-21T19:45:37Z

I broke the tests again, by adding more asserts, as I'm a little uncomfortable with the finishMergeCount <= 0. Now I broke something else. I'll get to fixing this, or backout the extra asserts. For reference this reproduces:

Took a look, The mismatch between mergeInstanceCount and mergeInstance is because mergeInstanceCount is being updated in parent and mergeInstance is updated to true during getMergeInstance(). If search is happening during merge there is a good chance that it runs into a scenario where mergeInstanceCount = 1 but mergeInstance is false.

Considering the test has commit() call and then search, I think its running into the scenario

ChrisHegarty · 2024-11-22T11:38:46Z

..
Took a look, The mismatch between mergeInstanceCount and mergeInstance is because mergeInstanceCount is being updated in parent and mergeInstance is updated to true during getMergeInstance(). If search is happening during merge there is a good chance that it runs into a scenario where mergeInstanceCount = 1 but mergeInstance is false.

Considering the test has commit() call and then search, I think its running into the scenario

Oh yeah. I get it, and the asserts were too strong. I backed them off a bit. High level, I wanna ensure that we're asserting the right execution model, since the backing memory holding the vector data is shared across merge and search at the same time. I think that this is fine, search may access the memory while still in sequential, but when merge finishes it'll switch back.

… merging vectors (#13985) The commit updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.

shatejas · 2024-11-22T15:27:24Z

Thanks a lot @ChrisHegarty for adding tight tests and merging this!

uschindler · 2024-11-23T16:48:51Z

Thanks @ChrisHegarty for taking care. If I have any additional comments about API I will open another followup PR.
Still in 🇨🇷, flying back this evening.

msokolov · 2024-12-01T16:24:44Z

The impact on search times is surprising, as you said. Can you clarify one thing a about the benchmark setup: does it perform searches concurrently with indexing (and merging) on the same box, or are they completely separate?

…ce while merging vectors (apache#13985)" This reverts commit 46204f6.

shatejas · 2024-12-23T01:45:10Z

The impact on search times is surprising, as you said. Can you clarify one thing a about the benchmark setup: does it perform searches concurrently with indexing (and merging) on the same box, or are they completely separate?

searches were after the completion of indexing and force-merge

… merging vectors (apache#13985) The commit updates the ReadAdvice.SEQUENTIAL before the merge starts and reverts it to ReadAdvice.RANDOM at the end of the merge for Lucene99FlatVectorsReader.

msokolov · 2025-08-08T13:44:27Z

lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java

+  public void finishMerge() throws IOException {
+    // This makes sure that the access pattern hint is reverted back since HNSW implementation
+    // needs it
+    this.vectorData.updateReadAdvice(ReadAdvice.RANDOM);


This is too opinionated. If we want to have this swapping behavior I think we need to find a way to cache the original ReadAdvice and restore it, since we don't know what it was - that decision is made by the Directory and its hints and iocontexts and sysprops and so on. But I don't think we currently have IOContext.getReadAdvice()?

I think this make sense. We should flip back to the read advise which was originally present

Directory and its hints and iocontexts and sysprops and so on. But I don't think we currently have IOContext.getReadAdvice()?

At the time of the change the initial advice was random so this is reverting back to random. But I agree, we can simply cache it and revert it from the cached value

msokolov · 2025-08-08T16:55:45Z

I'm really not sure about this change any more -- looking back at the (merging) performance improvement, it was not very large, and somewhat offset by the mysterious search slowdowns. At the same time, to get this working we need to add lots of stuff: reference counting to ensure proper closing, access to memory mapping hints. Are we still convinced it is worth it?

shatejas · 2025-08-12T00:00:14Z

it was not very large, and somewhat offset by the mysterious search slowdowns

We can switch to an approach where the madvise is not changed at all. The approach leverages prefetch, read advise still changes but the reads are not as aggressive as sequential. A prefetch call after each read (inside merge) will make sure that merges don't slow down. At the same time this minimizes the search perf impact considering prefetch is not aggressive.

#14076 (comment)

msokolov · 2025-08-12T17:10:25Z

Perhaps if we have a case where there is no random access (from Lucene) at all and we are only using Lucene to store the vector data - any searh indexing is being done by a native plugin (I think this is what you are targeting?) then we don't really want to be switching back and forth between access modes, but rather we'd want to set to SEQUENTIAL and leave it that way, ideally? In that case, what if the reader could recognize that it has no associated HNSW graph (somehow) and provide some kind of hint to the directory that it could then use to decide to use sequential? Also, wondering if SEQUENTIAL is really that much better than NORMAL? If not, then we could simply revert this given that we are retruning to NORMAL as the default.

msokolov · 2025-08-14T16:19:31Z

I'll open an issue for backporting #14702 to 10x to fix this

shatejas · 2025-08-19T17:28:22Z

Perhaps if we have a case where there is no random access (from Lucene) at all and we are only using Lucene to store the vector data - any searh indexing is being done by a native plugin (I think this is what you are targeting?) then we don't really want to be switching back and forth between access modes

In certain scenarios, random access is critical for optimal performance. Our initial hypothesis, which we are currently validating through benchmarks, is that random access lookups are significantly more efficient than sequential lookup even for flat vector data under high memory pressure, especially when doing a exact search on filtered documents. This is because random access avoids data prefetching, thereby reducing memory swapping.

To provide users with greater control, its best to allow them to configure initial IOContext value based on their specific workloads, rather than keeping it a fixed constant. This approach will offer flexibility while maintaining sensible defaults.

Furthermore, we can mitigate the impact on search performance during merges by implementing a dedicated prefetch functionality for vector merges. This eliminates the need to switch between access methods, ensuring a minimal impact on ongoing searches.

shatejas force-pushed the force-merge-fix branch 2 times, most recently from ffb35a1 to 515edb3 Compare November 9, 2024 09:59

jpountz reviewed Nov 9, 2024

View reviewed changes

shatejas force-pushed the force-merge-fix branch from 9d0c9b4 to 6df5f33 Compare November 11, 2024 16:49

shatejas force-pushed the force-merge-fix branch from 6df5f33 to a4d4d3c Compare November 14, 2024 17:35

update asserting format and wrappers

1c037c2

ChrisHegarty marked this pull request as ready for review November 15, 2024 12:36

ChrisHegarty changed the title ~~Introduces IndexInput#updateReadAdvice to change the readadvice while~~ Introduces IndexInput#updateReadAdvice to change the ReadAdvice while merging vectors Nov 15, 2024

ChrisHegarty approved these changes Nov 19, 2024

View reviewed changes

Updates CHANGES.txt

50b510c

ChrisHegarty added 3 commits November 20, 2024 10:21

Merge branch 'main' into force-merge-fix

8baf6eb

move changes to 10.1 section

6a1139a

revert

a0ddaa5

shatejas and others added 2 commits November 20, 2024 12:30

fix AssertingKNNVectorReader to address failure

56fdd4d

more asserts

5e1c62c

ChrisHegarty added 2 commits November 22, 2024 11:26

fix asserts

8f04bb0

Merge branch 'main' into force-merge-fix

c46018d

revert

18d90e5

ChrisHegarty approved these changes Nov 22, 2024

View reviewed changes

ChrisHegarty merged commit 46204f6 into apache:main Nov 22, 2024

shatejas mentioned this pull request Nov 27, 2024

[BUG] Regression in cohere-10m force merge latency after switching to NativeEngines990KnnVectorsWriter opensearch-project/k-NN#2134

Closed

jimczi added a commit to jimczi/lucene that referenced this pull request Dec 17, 2024

Revert "Introduces IndexInput#updateReadAdvice to change the ReadAdvi…

6867430

…ce while merging vectors (apache#13985)" This reverts commit 46204f6.

jimczi mentioned this pull request Dec 17, 2024

Use read advice consistently in the knn vector formats #14076

Closed

reta mentioned this pull request Apr 3, 2025

[BUG] Force merge in 3.0 OS is slower than 2.19 OS opensearch-project/OpenSearch#17722

Open

This was referenced Apr 10, 2025

[Bug] Stored fields force merge regression between Lucene 9.12 and Lucene 10.0 #14463

Open

[Bug] Fix for stored fields force merge regression #14512

Open

[Bug] Fix for postings force merge regression #14513

Open

msokolov mentioned this pull request Aug 8, 2025

Change default ReadAdvice from RANDOM to NORMAL #15040

Merged

msokolov reviewed Aug 8, 2025

View reviewed changes

Conversation

shatejas commented Nov 9, 2024 • edited by ChrisHegarty Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related issues

Uh oh!

shatejas commented Nov 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChrisHegarty commented Nov 12, 2024

Uh oh!

uschindler commented Nov 12, 2024

Uh oh!

shatejas commented Nov 13, 2024

Uh oh!

ChrisHegarty commented Nov 14, 2024

Uh oh!

ChrisHegarty commented Nov 14, 2024

Uh oh!

shatejas commented Nov 14, 2024

Uh oh!

ChrisHegarty commented Nov 15, 2024

Uh oh!

shatejas commented Nov 18, 2024

Benchmarks

Setup 1 - Opensearch cluster

Results

Setup 2: Used lucene-utils knnPerfTest.py

Uh oh!

ChrisHegarty left a comment

Choose a reason for hiding this comment

Uh oh!

shatejas commented Nov 19, 2024

Uh oh!

ChrisHegarty commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shatejas commented Nov 20, 2024

Uh oh!

ChrisHegarty commented Nov 21, 2024

Uh oh!

shatejas commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChrisHegarty commented Nov 22, 2024

Uh oh!

shatejas commented Nov 22, 2024

Uh oh!

uschindler commented Nov 23, 2024

Uh oh!

msokolov commented Dec 1, 2024

Uh oh!

shatejas commented Dec 23, 2024

Uh oh!

msokolov Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

navneet1v Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

shatejas Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

msokolov commented Aug 8, 2025

Uh oh!

shatejas commented Aug 12, 2025

Uh oh!

msokolov commented Aug 12, 2025

Uh oh!

msokolov commented Aug 14, 2025

Uh oh!

shatejas commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

shatejas commented Nov 9, 2024 •

edited by ChrisHegarty

Loading

ChrisHegarty commented Nov 20, 2024 •

edited

Loading

shatejas commented Nov 21, 2024 •

edited

Loading

shatejas commented Aug 19, 2025 •

edited

Loading