Skip to content

Add BitSet#or opto and fixed some bugs#1

Open
mdmarshmallow wants to merge 1 commit intozacharymorn:LUCENE-11915-PeekNextNonMatchingDocfrom
mdmarshmallow:zach-doc-id-peek
Open

Add BitSet#or opto and fixed some bugs#1
mdmarshmallow wants to merge 1 commit intozacharymorn:LUCENE-11915-PeekNextNonMatchingDocfrom
mdmarshmallow:zach-doc-id-peek

Conversation

@mdmarshmallow
Copy link
Copy Markdown

Description

Please provide a short description of the changes you're making with this pull request.

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

checkUnpositioned(iter);
for (int doc = iter.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = iter.nextDoc()) {
set(doc);
int nextPossibleNonMatchingDocID = iter.peekNextNonMatchingDocID();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API would no longer returns NO_MORE_DOCS when the underlying iterator has exhausted. Please see the discussion here apache#12194 (comment)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, from the the discussion then, it seems like I can just remove the if statement.

}

for (int i = startIndex; i < endIndex; i++) {
set(i);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I feel this might not be optimal actually. Could we utilize SparseFixedBitSet's specialized data structure to set a range of docIDs at once, similar to how FixedBitSet utilized its special data structure?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured there would be a better way, but this was just a quick override so I could run a perf test. I'll see if I can make this better.

@@ -104,7 +107,19 @@ protected final void checkUnpositioned(DocIdSetIterator iter) {
public void or(DocIdSetIterator iter) throws IOException {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if the implementation here will be utilized effectively, since it has been overridden in subclasses?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the overrides and there were only 2. The first was FixedBitSet which just calls super.or in the case of a non-bitset DocIdSetIterator. The other override is in SparseFixedBitSet which shouldn't be really benefitting from this optimization anyways (I'm assuming there won't be long runs of docs in sparse bitsets).

@mdmarshmallow
Copy link
Copy Markdown
Author

Thanks for taking a look at this! Looking at Adrien's comment in the main PR, it seems that this should be a separate PR made after your work is merged? I'll continue to work on this but maybe I should close this pull request then?

@zacharymorn
Copy link
Copy Markdown
Owner

Thanks for taking a look at this! Looking at Adrien's comment in the main PR, it seems that this should be a separate PR made after your work is merged? I'll continue to work on this but maybe I should close this pull request then?

I think as long as you keep dependence to my changes to the minimum (namely, probably just the new API DocIdSetIterator#peekNextNonMatchingDocID), you should be able to work on this change in parallel, and later merge conflict around this API should be easy to resolve? It's up to you if you want to wait a bit though.

@mdmarshmallow
Copy link
Copy Markdown
Author

Hey, sorry for the delayed response, but I was able to also test this optimization on some internal (Amazon Search) benchmarks and also saw no difference with or without this optimization. I think given that I saw no changes with this in the luceneutil benchmarks as well (link to test), I'm not sure this BitSet change makes sense? I think I can close this but maybe I'm overlooking some obvious flaw?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants