Add BitSet#or opto and fixed some bugs by mdmarshmallow · Pull Request #1 · zacharymorn/lucene

mdmarshmallow · 2023-03-14T19:41:10Z

Description

Please provide a short description of the changes you're making with this pull request.

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.

zacharymorn · 2023-03-18T05:35:20Z

    checkUnpositioned(iter);
    for (int doc = iter.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = iter.nextDoc()) {
-      set(doc);
+      int nextPossibleNonMatchingDocID = iter.peekNextNonMatchingDocID();


This API would no longer returns NO_MORE_DOCS when the underlying iterator has exhausted. Please see the discussion here apache#12194 (comment)

Ah ok, from the the discussion then, it seems like I can just remove the if statement.

zacharymorn · 2023-03-18T05:39:17Z

+    }
+
+    for (int i = startIndex; i < endIndex; i++) {
+      set(i);


Hmmm I feel this might not be optimal actually. Could we utilize SparseFixedBitSet's specialized data structure to set a range of docIDs at once, similar to how FixedBitSet utilized its special data structure?

Yeah, I figured there would be a better way, but this was just a quick override so I could run a perf test. I'll see if I can make this better.

zacharymorn · 2023-03-18T05:40:32Z

@@ -104,7 +107,19 @@ protected final void checkUnpositioned(DocIdSetIterator iter) {
  public void or(DocIdSetIterator iter) throws IOException {


I'm also wondering if the implementation here will be utilized effectively, since it has been overridden in subclasses?

I checked the overrides and there were only 2. The first was FixedBitSet which just calls super.or in the case of a non-bitset DocIdSetIterator. The other override is in SparseFixedBitSet which shouldn't be really benefitting from this optimization anyways (I'm assuming there won't be long runs of docs in sparse bitsets).

mdmarshmallow · 2023-03-20T19:10:54Z

Thanks for taking a look at this! Looking at Adrien's comment in the main PR, it seems that this should be a separate PR made after your work is merged? I'll continue to work on this but maybe I should close this pull request then?

zacharymorn · 2023-03-21T04:02:13Z

Thanks for taking a look at this! Looking at Adrien's comment in the main PR, it seems that this should be a separate PR made after your work is merged? I'll continue to work on this but maybe I should close this pull request then?

I think as long as you keep dependence to my changes to the minimum (namely, probably just the new API DocIdSetIterator#peekNextNonMatchingDocID), you should be able to work on this change in parallel, and later merge conflict around this API should be easy to resolve? It's up to you if you want to wait a bit though.

mdmarshmallow · 2023-04-10T18:30:52Z

Hey, sorry for the delayed response, but I was able to also test this optimization on some internal (Amazon Search) benchmarks and also saw no difference with or without this optimization. I think given that I saw no changes with this in the luceneutil benchmarks as well (link to test), I'm not sure this BitSet change makes sense? I think I can close this but maybe I'm overlooking some obvious flaw?

Add BitSet#or opto and fixed some bugs

1bfa9ba

mdmarshmallow mentioned this pull request Mar 14, 2023

[GITHUB-11915] Make Lucene smarter about long runs of matches via new API on DISI apache/lucene#12194

Open

zacharymorn reviewed Mar 18, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BitSet#or opto and fixed some bugs#1

Add BitSet#or opto and fixed some bugs#1
mdmarshmallow wants to merge 1 commit intozacharymorn:LUCENE-11915-PeekNextNonMatchingDocfrom
mdmarshmallow:zach-doc-id-peek

mdmarshmallow commented Mar 14, 2023

Uh oh!

zacharymorn Mar 18, 2023

Uh oh!

mdmarshmallow Mar 20, 2023

Uh oh!

zacharymorn Mar 18, 2023

Uh oh!

mdmarshmallow Mar 20, 2023

Uh oh!

zacharymorn Mar 18, 2023

Uh oh!

mdmarshmallow Mar 20, 2023

Uh oh!

mdmarshmallow commented Mar 20, 2023

Uh oh!

zacharymorn commented Mar 21, 2023

Uh oh!

mdmarshmallow commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -104,7 +107,19 @@ protected final void checkUnpositioned(DocIdSetIterator iter) {
		public void or(DocIdSetIterator iter) throws IOException {

Conversation

mdmarshmallow commented Mar 14, 2023

Description

Solution

Tests

Checklist

Uh oh!

zacharymorn Mar 18, 2023

Choose a reason for hiding this comment

Uh oh!

mdmarshmallow Mar 20, 2023

Choose a reason for hiding this comment

Uh oh!

zacharymorn Mar 18, 2023

Choose a reason for hiding this comment

Uh oh!

mdmarshmallow Mar 20, 2023

Choose a reason for hiding this comment

Uh oh!

zacharymorn Mar 18, 2023

Choose a reason for hiding this comment

Uh oh!

mdmarshmallow Mar 20, 2023

Choose a reason for hiding this comment

Uh oh!

mdmarshmallow commented Mar 20, 2023

Uh oh!

zacharymorn commented Mar 21, 2023

Uh oh!

mdmarshmallow commented Apr 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants