Lazy initialize checkpoint tracker bit sets#27179
Conversation
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated.
9581e2d to
21299b8
Compare
| long bitArrayKey = getBitArrayKey(checkpoint); | ||
| FixedBitSet current = processedSeqNo.get(bitArrayKey); | ||
| if (current == null) { | ||
| // the bit set corresponding to the checkpoint has already been removed, set ourselves up for the next bit set |
There was a problem hiding this comment.
when can this happen? it seems we only clean bit sets here?
There was a problem hiding this comment.
When we clean the set (e.g., the checkpoint is equal to size - 1) but the checkpoint can not advance to size. In this case, when size is marked as completed, checkpoint will still be equal to size - 1 but we have already cleaned the set.
| FixedBitSet current = processedSeqNo.getFirst(); | ||
| long bitArrayKey = getBitArrayKey(checkpoint); | ||
| FixedBitSet current = processedSeqNo.get(bitArrayKey); | ||
| if (current == null) { |
There was a problem hiding this comment.
since this is a FixedBitSet I wonder why renamed everything to use bitArray? I don't mind. Just curious.
There was a problem hiding this comment.
Because I did not like the inconsistency that in some places it's referred to as a "bit array", and in some it's referred to as a "bit set". I wanted to make them all "bit set" but the problem is the setting index.seq_no.checkpoint.bit_arrays_size (currently never released to the world) refers to "bit array". If, and only if, you would be okay with changing the name of this setting to index.seq_no.checkpoint.bit_sets_size (in 6.0) then I would be okay with normalizing everything to "bit set".
There was a problem hiding this comment.
I'm fine with renaming the setting. I think I added it just be able to control things from tests and as an extra escape hatch if the default size proved wrong. We might want to just remove it. We can solve the test part differently and I'm less concerned now about the default size (I'm also thinking that 1024 is maybe too small)..
| * current bit set, we can clean it. | ||
| */ | ||
| if (checkpoint == (1 + bitArrayKey) * bitArraysSize - 1) { | ||
| assert current != null; |
There was a problem hiding this comment.
can we maybe cache (1 + bitArrayKey) * bitArraysSize - 1 to something like lastSeqNoInArray ? I think it will be easier to read.
| * Previously this would allocate the entire chain of bit sets to the one for the sequence number being marked; for very large | ||
| * sequence numbers this could lead to excessive memory usage resulting in out of memory errors. | ||
| */ | ||
| tracker.markSeqNoAsCompleted(randomNonNegativeLong()); |
There was a problem hiding this comment.
lol. can we assert we allocated just one array?
| assertThat(tracker.getCheckpoint(), equalTo((long) localCheckpoint)); | ||
| assertThat(tracker.getMaxSeqNo(), equalTo((long) maxSeqNo)); | ||
| assertThat(tracker.processedSeqNo, empty()); | ||
| assertThat(tracker.processedSeqNo, new BaseMatcher<LongObjectHashMap<FixedBitSet>>() { |
There was a problem hiding this comment.
I despise assertTrue. Note that this gives a nicer error message if the assertion fails:
java.lang.AssertionError:
Expected: empty
but: was <[512=>org.apache.lucene.util.FixedBitSet@98761236]>
Expected :empty
Actual :<[512=>org.apache.lucene.util.FixedBitSet@98761236]>
as opposed to
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([F9E3DE31F5D979AE:C4C54170CE328934]:0)
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.elasticsearch.index.seqno.LocalCheckpointTrackerTests.testResetCheckpoint(LocalCheckpointTrackerTests.java:276)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
|
Thanks @bleskes, I've responded to your feedback. |
| } | ||
|
|
||
| private FixedBitSet getBitArrayForSeqNo(final long seqNo) { | ||
| assert Thread.holdsLock(this); |
There was a problem hiding this comment.
maybe use indexOf(...) and then indexInsert(...) and indexGet(...) respectively to avoid determining what the slot is for a key several times?
final int slot = processedSeqNo.indexOf(bitArrayKey);
if (processedSeqNo.indexExists(slot) == false) {
processedSeqNo.indexInsert(slot, bitArrayKey, new FixedBitSet(bitArraysSize));
}
return processedSeqNo.indexGet(slot);There was a problem hiding this comment.
Actually the indexGet at the end is not correct because the returned index is negative if the slot is not actually used. Yet, ~slot also does not work because the table could have been resized after the insert. I pushed 97bb3ca.
There was a problem hiding this comment.
ah, I didn't realize that. Maybe we should do this to make optimal use of the slot?
final int slot = processedSeqNo.indexOf(bitArrayKey);
if (processedSeqNo.indexExists(slot) == false) {
FixedBitSet bitSet = new FixedBitSet(bitArraysSize));
processedSeqNo.indexInsert(slot, bitArrayKey, bitSet);
return bitSet;
} else {
return processedSeqNo.indexGet(slot);
}There was a problem hiding this comment.
When you see a chance to avoid == false, you seize it. 😉
I pushed f4f0dae.
* master: Remove checkpoint tracker bit sets setting Fix stable BWC branch detection logic Fix logic detecting unreleased versions Enhances exists queries to reduce need for `_field_names` (elastic#26930) Added new terms_set query Set request body to required to reflect the code base (elastic#27188) Update Docker docs for 6.0.0-rc2 (elastic#27166) Add version 6.0.0 Docs: restore now fails if it encounters incompatible settings (elastic#26933) Convert index blocks to cluster block exceptions (elastic#27050) [DOCS] Link remote info API in Cross Cluster Search docs page Fix Laplace scorer to multiply by alpha (and not add) (elastic#27125) [DOCS] Clarify migrate guide and search request validation Raise IllegalArgumentException if query validation failed (elastic#26811) prevent duplicate fields when mixing parent and root nested includes (elastic#27072) TopHitsAggregator must propagate calls to `setScorer`. (elastic#27138)
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated. Relates #27179
This local checkpoint tracker uses collections of bit sets to track which sequence numbers are complete, eventually removing these bit sets when the local checkpoint advances. However, these bit sets were eagerly allocated so that if a sequence number far ahead of the checkpoint was marked as completed, all bit sets between the "last" bit set and the bit set needed to track the marked sequence number were allocated. If this sequence number was too far ahead, the memory requirements could be excessive. This commit opts for a different strategy for holding on to these bit sets and enables them to be lazily allocated.
Relates #10708