Fix Snapshot Out of Order Finalization Repo Corruption#75362
Fix Snapshot Out of Order Finalization Repo Corruption#75362original-brownbear merged 14 commits intoelastic:masterfrom original-brownbear:repro-multiple-out-of-ordersnapshot-finalization
Conversation
…-of-ordersnapshot-finalization
…-of-ordersnapshot-finalization
|
Pinging @elastic/es-distributed (Team:Distributed) |
| */ | ||
| private static ClusterState stateWithoutSnapshot(ClusterState state, Snapshot snapshot) { | ||
| private static ClusterState stateWithoutSuccessfulSnapshot(ClusterState state, Snapshot snapshot) { | ||
| // TODO: updating snapshots here leaks their outdated generation files, we should add logic to clean those up and enhance |
There was a problem hiding this comment.
The logic in this method would definitely benefit from a state tracking object like we have for the shard status update executor. In the interest of time and keeping it simple I went with this solution for now. Ideally, I'd like to resolve this todo and make the whole logic simpler to follow in 7.15 by finally refactoring SnapshotsInProgress into a form that is more appropriate for the logic around concurrent snapshots than to add yet another round of elaborate logic to nicely work around its shortcomings.
|
Jenkins run elasticsearch-ci/part-1 (unrelated + known) |
|
Jenkins run elasticsearch-ci/part-2 (unrelated + known) |
1 similar comment
|
Jenkins run elasticsearch-ci/part-2 (unrelated + known) |
| final String bestGeneration = generations.getOrDefault(indexName, Collections.emptyMap()).get(shardId); | ||
| assert bestGeneration == null || activeGeneration == null || activeGeneration.equals(bestGeneration); | ||
| if ((bestGeneration == null || activeGeneration == null || activeGeneration.equals(bestGeneration)) == false) { | ||
| throw new AssertionFailedException("gnarf"); |
There was a problem hiding this comment.
Changing this assertion failure into a runtime exception is, I think, cheating ;)
There was a problem hiding this comment.
sorry needed a breakpoint there :D
…-of-ordersnapshot-finalization
| ); | ||
| } | ||
|
|
||
| public void testOutOfOrderCloneFinalization() throws Exception { |
There was a problem hiding this comment.
I think the clone - clone case is missing?
There was a problem hiding this comment.
Yea sorta, that's incredibly hard to reproduce because it all runs on the same master node (can't selectively block one just one shard easily with the current infra) and I couldn't find a quick way of adding the infrastructure for that test. I can try to find time for it later today but no guarantees I will be able to.
fcofdez
left a comment
There was a problem hiding this comment.
LGTM, but I agree that we should simplify this logic in the near future, it's becoming quite complex to follow. 👍
DaveCTurner
left a comment
There was a problem hiding this comment.
I left one question and one tiny nit, LGTM otherwise.
| ImmutableOpenMap.Builder<RepositoryShardId, ShardSnapshotStatus> updatedShardAssignments = null; | ||
| for (ObjectObjectCursor<RepositoryShardId, ShardSnapshotStatus> finishedShardEntry : removedEntry.clones()) { | ||
| final ShardSnapshotStatus shardState = finishedShardEntry.value; | ||
| if (shardState.state() == ShardState.SUCCESS) { |
There was a problem hiding this comment.
tiny nit: we're inconsistent about this vs if (shardState.state() != ShardState.SUCCESS) { continue; } across the 4 branches
There was a problem hiding this comment.
Ah I did this on purpose to not indent so deeply when there's more complicated logic in a branch ... maybe just confusing though I can change it if you want :)
| final RepositoryShardId repoShardId = finishedShardEntry.key; | ||
| final IndexMetadata indexMeta = state.metadata().index(repoShardId.indexName()); | ||
| if (indexMeta == null) { | ||
| // The index name that finished cloning does not exist in the cluster state so it isn't relevant |
There was a problem hiding this comment.
I'm confused by this. If we deleted this index and then created another one with the same name then we'd be updating the entry with the wrong index UUID. I'm not sure this matters, but it certainly seems like it puts the entry in a strange state. Can we not use the shard ID from the actual entry?
There was a problem hiding this comment.
Actually in this case org.elasticsearch.snapshots.SnapshotsService#maybeAddUpdatedAssignment will just not find the shard entry in the snapshot's map that still contains the old uuid because the ShardId won't be equal so that should be fine I think.
There was a problem hiding this comment.
ugh ok even weirder :) Looking forward to this all getting cleaned up soon...
|
Thanks David + Francisco! |
* Fix up shard generations in `SnapshotsInProgress` during snapshot finalization (don't do it earlier because it's a really heavy computation and we have a ton of places where it would have to run). * Adjust finalization queue to be able to work with changing snapshot entries after they've been enqueued for finalisation * Still one remaining bug left after this (see TODO about leaking generations) that I don't feel confident in fixing for `7.13.4` due to the complexity of a fix and how minor the blob leak is (+ it's cleaned up just fine during snapshot deletes) Closes #75336
* Fix up shard generations in `SnapshotsInProgress` during snapshot finalization (don't do it earlier because it's a really heavy computation and we have a ton of places where it would have to run). * Adjust finalization queue to be able to work with changing snapshot entries after they've been enqueued for finalisation * Still one remaining bug left after this (see TODO about leaking generations) that I don't feel confident in fixing for `7.13.4` due to the complexity of a fix and how minor the blob leak is (+ it's cleaned up just fine during snapshot deletes) Closes #75336
* Fix up shard generations in `SnapshotsInProgress` during snapshot finalization (don't do it earlier because it's a really heavy computation and we have a ton of places where it would have to run). * Adjust finalization queue to be able to work with changing snapshot entries after they've been enqueued for finalisation * Still one remaining bug left after this (see TODO about leaking generations) that I don't feel confident in fixing for `7.13.4` due to the complexity of a fix and how minor the blob leak is (+ it's cleaned up just fine during snapshot deletes) Closes elastic#75336
SnapshotsInProgressduring snapshot finalization (don't do it earlier because it's a really heavy computation and we have a ton of places where it would have to run).7.13.4due to the complexity of a fix and how minor the blob leak is (+ it's cleaned up just fine during snapshot deletes)NOTE: this could probably be dried up a lot against other shard state machine logic but I did want to isolate this change as much as I could for easy backporting as well as to minimise risk as it's not a trivial change at all. By only running the generation fixing after finalizing and with the newly added tests I feel confident in this fix though.
Closes #75336