Implement #docIDRunEnd() on PostingsEnum.#14390
Conversation
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
gf2121
left a comment
There was a problem hiding this comment.
Nice! I left some minor comments.
|
|
||
| @Override | ||
| public int docIDRunEnd() throws IOException { | ||
| // Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if |
There was a problem hiding this comment.
Nit: Can we assert this assumption here? As i tried to change BLOCK_SIZE before, i personally prefer an explicit AssertionError rather than comments :)
There was a problem hiding this comment.
I wanted to do this but this gives a compile error that fails the build, I'm unsure how to do it otherwise
1. ERROR in lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java (at line 1066)
assert BLOCK_SIZE == 2 * Long.SIZE;
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Comparing identical expressions
There was a problem hiding this comment.
This trick seems work:
boolean level0IsDense =
encoding == DeltaEncoding.UNARY
&& docBitSet.getBits()[0] == -1L
&& docBitSet.getBits()[1] == -1L;
if (level0IsDense) {
assert Long.bitCount(docBitSet.getBits()[0]) + Long.bitCount(docBitSet.getBits()[1])
== BLOCK_SIZE : "We are assuming BLOCK_SIZE == 128 here.";
...
}
There was a problem hiding this comment.
Thanks, I ended up using another trick that looked a bit more natural by extracting the block size to a variable.
| private static void checkDocIDRuns(DocIdSetIterator iterator) throws IOException { | ||
| int prevDoc = -1; | ||
| int runEnd = 0; | ||
| for (int doc = iterator.nextDoc(); |
There was a problem hiding this comment.
Do we also need to test docIDRunEnd works properly against operations other than nextDoc? E.g. advance, intoBitset, advanceShallow.
There was a problem hiding this comment.
I wondered about it. We already have tests that test nextDoc vs. advance or nextDoc vs. intoBitSet. And "sane" impls of docIDRun should be free of side effects. So I thought that it would be enough this way.
There was a problem hiding this comment.
Thank you for explanation, that sounds good enough today.
| // Note: this assumes that BLOCK_SIZE == 128, this bit of the code would need to be changed if | ||
| // the block size was changed. | ||
| // Hack to avoid compiler warning that both sides of the equal sign are identical. | ||
| long blockSize = BLOCK_SIZE; |
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
This implements `BlockPostingsEnum#docIDRunEnd()` by comparing the delta between doc IDs and between doc counts on the various skip levels.
This implements
BlockPostingsEnum#docIDRunEnd()by comparing the delta between doc IDs and between doc counts on the various skip levels.