[Test] Fix ConcurrentSnapshotsIT testDeletesAreBatched#144331
[Test] Fix ConcurrentSnapshotsIT testDeletesAreBatched#144331elasticsearchmachine merged 6 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
++ this does look to be racy - suggested a simpler/faster fix
| assertBusy(() -> { | ||
| SnapshotDeletionsInProgress snapshotDeletionsInProgress = SnapshotDeletionsInProgress.get( |
There was a problem hiding this comment.
I think this would be sufficient and would avoid the busy-wait:
ClusterServiceUtils.addTemporaryStateListener(cs -> SnapshotDeletionsInProgress.get(cs).getEntries().getFirst().snapshots().size() == numSnapshots);
There was a problem hiding this comment.
(I mean, safeAwait on the listener that addTemporaryStateListener returns)
There was a problem hiding this comment.
Slightly subtle: you have to add the listener first, and handle the empty case too:
diff --git a/server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java b/server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java
index d28b29e6d5e0..e1d9c0966189 100644
--- a/server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java
+++ b/server/src/internalClusterTest/java/org/elasticsearch/snapshots/ConcurrentSnapshotsIT.java
@@ -213,6 +213,12 @@ public class ConcurrentSnapshotsIT extends AbstractSnapshotIntegTestCase {
dataNode
);
+ final var deletesEnqueuedListener = ClusterServiceUtils.addTemporaryStateListener(cs -> {
+ final var entries = SnapshotDeletionsInProgress.get(cs).getEntries();
+ assertThat(entries, anyOf(hasSize(0), hasSize(1)));
+ return entries.isEmpty() == false && entries.getFirst().snapshots().size() == numSnapshots;
+ });
+
final Collection<ListenableFuture<AcknowledgedResponse>> deleteFutures = new ArrayList<>();
while (snapshotNames.isEmpty() == false) {
final Collection<String> toDelete = randomSubsetOf(snapshotNames);
@@ -224,6 +230,7 @@ public class ConcurrentSnapshotsIT extends AbstractSnapshotIntegTestCase {
clusterAdmin().prepareDeleteSnapshot(TEST_REQUEST_TIMEOUT, repoName, toDelete.toArray(Strings.EMPTY_ARRAY)).execute(future);
deleteFutures.add(future);
}
+ safeAwait(deletesEnqueuedListener);
assertThat(createSlowFuture.isDone(), is(false));
There was a problem hiding this comment.
Thanks, yeah good point! TIL org.elasticsearch.test.ClusterServiceUtils#addTemporaryStateListener
Pushed 6875de8
I think the test fails because some cluster state tasks that delete snapshots might actually be scheduled after the snapshot that holds up all deletions executes (i.e. after it is unblocked). This creates multiple (2) snapshot deletion batches. This fix is ensuring that snapshot deletions are all batched up before unblocking the snapshot that's holding them up. Fixes elastic#144034
I think the test fails because some cluster state tasks that delete snapshots might actually be scheduled after the snapshot that holds up all deletions executes (i.e. after it is unblocked). This creates multiple (2) snapshot deletion batches.
This fix is ensuring that snapshot deletions are all batched up before unblocking the snapshot that's holding them up.
Fixes #144034