Skip to content

Update the IOContext on IndexInput rather than the ReadAdvice#14702

Merged
thecoop merged 3 commits intoapache:mainfrom
thecoop:indexinput-update-readadvice
Jun 24, 2025
Merged

Update the IOContext on IndexInput rather than the ReadAdvice#14702
thecoop merged 3 commits intoapache:mainfrom
thecoop:indexinput-update-readadvice

Conversation

@thecoop
Copy link
Contributor

@thecoop thecoop commented May 22, 2025

Update the IOContext on IndexInput implementations rather than the ReadAdvice. This change means that ReadAdvice is now only used by MMapDirectory and friends. Note that this is a refactoring only, it does not change behaviour.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

* <p>The default implementation returns {@code this}
*/
public KnnVectorsReader getMergeInstance() {
public KnnVectorsReader getMergeInstance() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat tangential to this change:

I believe that getMergeInstance() initially didn't throw because the mental model was that it was akin to clone() but specialized for merging, and clone() doesn't throw an IOException. But we now want to run operations that may throw an IOException in getMergeInstance() impls. I think we need to decide if it's best to make all XXXFormat#getMergeInstance() throw an IOException (only a couple of them seem modified in this PR) or if impls should rethrow as an UncheckedIOException. (I don't thave an opinion on this question at this point.)

Another problem is that getMergeInstance() is no longer an independent clone (via other PRs, not yours), as the read advice is updated for everyone, not just for the merge instance. So it feels like this would be better done by a setter, to indicate that this affects the current instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing an IOException is ok here - the top-level MergeState constructor already throws IOException, so no core semantics are being modified here. And it seems reasonable for a io-related method to throw an IOException. The alternative is to throw an UncheckedIOException, which may break the caller of MergeState. I can modify the other getMergeInstance methods to declare throws IOException for consistency here.

For the setter idea, that's a tricky one. The other getMergeInstance methods do create an independent instance, and there are also some implementations of KnnVectorsReader.getMergeInstance which do create a new instance (albeit to wrap other getMergeInstance objects). But we can't do that for Lucene99FlatVectorsReader, due to only having one underlying memory segment that we have to share between the IndexInput objects we use.

Ideally, we would clone - but we can't for this specific implementation. My hunch is that this is ok as-is, given we have AssertingKnnVectorsReader ensuring the calls happen as expected so that nothing breaks (I've tightened up the assertions for this). Any other changes in this area and we would need to re-evaluate though.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 7, 2025
@thecoop thecoop requested a review from jpountz June 9, 2025 08:32
@github-actions github-actions bot removed the Stale label Jun 10, 2025
@github-actions
Copy link
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 24, 2025
Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thecoop
Copy link
Contributor Author

thecoop commented Jun 24, 2025

Adding throws IOException to the other merge readers leads to some deeper refactoring needed, so I've left it for now. The contractual behaviour of the KnnVectorsReader is enforced by the Asserting implementation; any further changes to this area and we'll need to take another look at it.

@thecoop thecoop merged commit 9936b20 into apache:main Jun 24, 2025
7 checks passed
@thecoop thecoop deleted the indexinput-update-readadvice branch June 24, 2025 15:35
@dweiss
Copy link
Contributor

dweiss commented Jun 25, 2025

I think this change caused multiple builds to fail?

dweiss added a commit that referenced this pull request Jun 25, 2025
@dweiss
Copy link
Contributor

dweiss commented Jun 25, 2025

Hey, @thecoop - I allowed myself to revert this patch so that it doesn't flood the builds mailing list with failures. Hope you don't mind. I can reapply if it's not that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants