Move GlobalCheckpointTracker and remove SequenceNumbersService#27837
Move GlobalCheckpointTracker and remove SequenceNumbersService#27837ywelsch merged 14 commits intoelastic:masterfrom
Conversation
|
I like this very much. I think we should do it but also take it a step further and total isolate the GlobalCheckpointTracker from the engine. Something like this, wdyt? |
| BiFunction<Long, Long, LocalCheckpointTracker> localCheckpointTrackerSupplier) throws IOException { | ||
| final long maxSeqNo; | ||
| final long localCheckpoint; | ||
| if (openMode == EngineConfig.OpenMode.CREATE_INDEX_AND_TRANSLOG) { |
There was a problem hiding this comment.
paranoia - can you use a switch statement with a fallback logic and an exception on default?
|
|
||
| final EngineConfig config = newEngineConfig(openMode); | ||
|
|
||
| if (openMode == EngineConfig.OpenMode.OPEN_INDEX_AND_TRANSLOG) { |
There was a problem hiding this comment.
can you add a comment why it's important to do it before we open the engine?
| * @throws IOException if an I/O exception occurred reading the latest Lucene commit point from disk | ||
| */ | ||
| public SeqNoStats loadSeqNoStats(final long globalCheckpoint) throws IOException { | ||
| public Tuple<Long, Long> loadSeqNoStats() throws IOException { |
There was a problem hiding this comment.
maybe call this load seqNoInfo? This way people won't be confused about SeqNoStats?
| * With no readers and no current, a call to {@link #getMinFileGeneration()} would not work. | ||
| */ | ||
| private TranslogWriter createWriter(long fileGeneration, long initialMinTranslogGen) throws IOException { | ||
| private TranslogWriter createWriter(long fileGeneration, long initialMinTranslogGen, long initialGlobalCheckpoint) throws IOException { |
| final LongSupplier writerGlobalCheckpointSupplier; | ||
| if (Assertions.ENABLED) { | ||
| writerGlobalCheckpointSupplier = () -> { | ||
| long gcp = globalCheckpointSupplier.getAsLong(); |
| engine = new InternalEngine(config(defaultSettings, store, translogPath, newMergePolicy(), null, null), seqNoServiceSupplier); | ||
| final AtomicLong globalCheckpoint = new AtomicLong(SequenceNumbers.UNASSIGNED_SEQ_NO); | ||
| final LongSupplier globalCheckpointSupplier = () -> globalCheckpoint.get(); | ||
| engine = new InternalEngine(config(defaultSettings, store, translogPath, newMergePolicy(), null, null, globalCheckpointSupplier)); |
| Long.parseLong(initialEngine.commitStats().getUserData().get(SequenceNumbers.LOCAL_CHECKPOINT_KEY)), | ||
| equalTo(localCheckpoint)); | ||
| initialEngine.getTranslog().sync(); // to guarantee the global checkpoint is written to the translog checkpoint | ||
| assertThat( |
There was a problem hiding this comment.
I found it tricky to keep these checks in the test with the other changes around Engine not exposing GlobalCheckpointTracker. I also did not like that this test was checking a mixture between local checkpoint and global checkpoint properties (where the global checkpoint properties are mostly trivial, and covered by GlobalCheckpointTrackerTests). I've found a way to reintroduce these checks though (see ea23a0d)
| config.getExternalRefreshListener(), Collections.emptyList(), config.getIndexSort(), config.getTranslogRecoveryRunner(), | ||
| config.getCircuitBreakerService()); | ||
| config.getCircuitBreakerService(), config.getGlobalCheckpointSupplier()); | ||
| //new GlobalCheckpointTracker(config.getShardId(), config.getAllocationId(), |
| return indexShard.getEngine(); | ||
| } | ||
|
|
||
| public static GlobalCheckpointTracker getGlobalCheckpointTracker(IndexShard indexShard) { |
|
backport pending waiting on backport of #27826 |
The PR #27837 unintentionally changed to an in memory global checkpoint.
This commit moves GlobalCheckpointTracker from the engine to IndexShard, where it better fits logically: Tracking the global checkpoint based on the local checkpoints of all shards in the replication group is not a property of the engine, but rather a property fulfilled by the current primary shard. The LocalCheckpointTracker on the other hand is driven by the contents of the local translog. By moving GlobalCheckpointTracker to IndexShard, it makes little sense to keep the SequenceNumbersService class around - it would only wrap the LocalCheckpointTracker. This commit therefore removes the class and replaces occurrences of SequenceNumbersService in the engine directly by LocalCheckpointTracker.
|
Backported to 6.x in 1576a2b |
* es/6.x: (43 commits) ingest: upgraded ingest geoip's geoip2's dependencies. [TEST] logging for update by query test #27820 Use full profile on JDK 10 builds Require Gradle 4.3 Add unreleased v6.1.2 version TEST: reduce blob size #testExecuteMultipartUpload Check index under the store metadata lock (#27768) Upgrade to Lucene 7.2.0. (#27910) Fixed test to be up to date with the new database files. Use `_refresh` to shrink the version map on inactivity (#27918) Make KeyedLock reentrant (#27920) Fixes DocStats to not report index size < -1 (#27863) Disable TestZenDiscovery in cloud providers integrations test ingest: Upgraded the geolite2 databases. [Issue-27716]: CONTRIBUTING.md IntelliJ configurations settings are confusing. (#27717) [Test] Fix IndicesClientDocumentationIT (#27899) Move uid lock into LiveVersionMap (#27905) Mute testRetentionPolicyChangeDuringRecovery Increase Gradle heap space to 1536m Move GlobalCheckpointTracker and remove SequenceNumbersService (#27837) ...
Today we did not set the global checkpoint when opening an engine from an existing store. If we are forced to close an engine before advancing the global checkpoint, we also have to close translog which in turn sync a new checkpoint with an unassigned global checkpoint. This is not caught until the global checkpoint assertion was introduced in PR elastic#27837. This commit tightens the syncNeeded conditions. Relates elastic#27970
This PR moves GlobalCheckpointTracker from the engine to IndexShard, where it better fits logically: Tracking the global checkpoint based on the local checkpoints of all shards in the replication group is not a property of the engine, but rather a property fulfilled by the current primary shard. The LocalCheckpointTracker on the other hand is driven by the contents of the local translog. By moving GlobalCheckpointTracker to IndexShard, it makes little sense to keep the SequenceNumbersService class around - it would only wrap the LocalCheckpointTracker. This PR therefore removes the class and replaces occurrences of SequenceNumbersService in the engine directly by LocalCheckpointTracker.