Correct TermOrdValComparator competitive iterator intoBitSet implementation#14523
Correct TermOrdValComparator competitive iterator intoBitSet implementation#14523benwtrent merged 10 commits intoapache:mainfrom
Conversation
…re similar to other impls
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DenseConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DocIdSetIterator.java
Outdated
Show resolved
Hide resolved
| @@ -434,6 +434,9 @@ public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws IOExcept | |||
| if (competitiveIterator.docID() < doc) { | |||
There was a problem hiding this comment.
This part is strange. All other intoBitSet impls that I can find call the competitiveIterator to advance to at least offset, not to some internal doc. I suppose NumericComparator is unique?
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/DisjunctionDISIApproximation.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
Show resolved
Hide resolved
| LeafCollector leafCollector = collector.getLeafCollector(context); | ||
| BulkScorer bulkScorer = weight.bulkScorer(context); | ||
| // We need to split on this specific doc ID so that the current doc of the competitive | ||
| // iterator |
There was a problem hiding this comment.
Nit: maybe format this single-line word better?
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
…tation (#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: #14517
| if (disjunction == null) { | ||
| if (docsWithField != null) { | ||
| // we need to be absolutely sure that the iterator is at least at offset | ||
| if (docsWithField.docID() < offset) { |
There was a problem hiding this comment.
The contract from DocIdSetIterator#intoBitSet requires starting from the current doc rather than the first doc on or after offset, so it would be more correct to advance to doc here. In practice, it wouldn't make a difference today since all call sites first advance iterators to offset before calling intoBitSet, but it may not always be the case.
There was a problem hiding this comment.
The contract from DocIdSetIterator#intoBitSet requires starting from the current doc rather than the first doc on or after offset, so it would be more correct to advance to doc here.
So things might got broken when current doc is after the first doc like
CompetitiveIterator competitiveIterator = CompetitiveIterator.of(1, 2, 3, 4);
competitiveIterator.advance(2); // advance current doc to 2
competitiveIterator.update(0, 2000); // update to doc value
final int offset = 1;
if (competitiveIterator.docID() < offset) {
competitiveIterator.advance(offset);
}
competitiveIterator.intoBitSet(upTo, bitset, offset); // we should start from 2 instead of 1.
? Nice catch!
…tation (apache#14523) Similar to @ChrisHegarty 's change for the count fix, this will move up the max check to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases. Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet. closes: apache#14517
Similar to @ChrisHegarty 's change for the count fix, this will move up the
maxcheck to before the into bit set. It seems like we could be calling intobitset erroneously on some edge cases.Additionally, TermOrdValComparator's competitive iterator will ensure that the docsWithField iterator (used when not all docs have a given field), is at least the baseline of the scoring window when doing intoBitSet.
closes: #14517