Fix a flaky test for #20107#20237
Fix a flaky test for #20107#20237liuguoqingfz wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
WalkthroughA single test file is modified to stabilize flaky behavior in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…e shows an entry in SnapshotDeletionsInProgress for repoName change the formatting Signed-off-by: Joe Liu <guoqing4@illinois.edu>
23be59a to
fcdd76d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
server/src/internalClusterTest/java/org/opensearch/snapshots/ConcurrentSnapshotsIT.java (2)
205-218: Good fix for the race condition, but consider using existing imports.The stronger ordering check correctly addresses the flaky test by ensuring
SnapshotDeletionsInProgresscontains the repository before attempting the update. However, the fully qualified names are unnecessary sinceClusterState,SnapshotDeletionsInProgress, andMatchersare already imported at the top of the file (lines 44-45, 93).// Stronger ordering: wait until cluster state shows the repository is in-use due to deletion assertBusy(() -> { - final org.opensearch.cluster.ClusterState state = client().admin().cluster().prepareState().get().getState(); - final org.opensearch.cluster.SnapshotDeletionsInProgress deletions = state.custom( - org.opensearch.cluster.SnapshotDeletionsInProgress.TYPE - ); + final ClusterState state = client().admin().cluster().prepareState().get().getState(); + final SnapshotDeletionsInProgress deletions = state.custom(SnapshotDeletionsInProgress.TYPE); - assertThat("SnapshotDeletionsInProgress must be present once delete starts", deletions, org.hamcrest.Matchers.notNullValue()); + assertThat("SnapshotDeletionsInProgress must be present once delete starts", deletions, notNullValue()); assertThat( - deletions.getEntries().stream().map(org.opensearch.cluster.SnapshotDeletionsInProgress.Entry::repository).toList(), + deletions.getEntries().stream().map(SnapshotDeletionsInProgress.Entry::repository).toList(), hasItem(equalTo(repoName)) ); });You'll need to add the
notNullValuestatic import:import static org.hamcrest.Matchers.notNullValue;
224-228: Correct handling of wrapped exceptions.The approach of catching a generic
Exceptionand unwrapping the root cause is appropriate since the transport layer can wrap the real exception. TheinstanceOfmatcher is already imported at line 93, so the FQN can be simplified.// Transport can wrap the real exception; assert on the unwrapped root cause final Exception ex = assertThrows(Exception.class, () -> updateRepository(repoName, "mock", newSettings)); final Throwable cause = org.opensearch.ExceptionsHelper.unwrapCause(ex); - assertThat(cause, org.hamcrest.Matchers.instanceOf(IllegalStateException.class)); + assertThat(cause, instanceOf(IllegalStateException.class)); assertEquals("trying to modify or unregister repository that is currently used", cause.getMessage());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/src/internalClusterTest/java/org/opensearch/snapshots/ConcurrentSnapshotsIT.java(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: gradle-check
- GitHub Check: Analyze (java)
- GitHub Check: assemble (21, ubuntu-24.04-arm)
- GitHub Check: assemble (25, ubuntu-latest)
- GitHub Check: assemble (25, ubuntu-24.04-arm)
- GitHub Check: assemble (21, ubuntu-latest)
- GitHub Check: assemble (21, windows-latest)
- GitHub Check: assemble (25, windows-latest)
- GitHub Check: detect-breaking-change
|
❌ Gradle check result for fcdd76d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for fcdd76d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for fcdd76d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for fcdd76d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Can you merge the PR, the gradle check keep failing because some other tests failing |
|
❌ Gradle check result for fcdd76d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
This PR is stalled because it has been open for 30 days with no activity. |
sandeshkr419
left a comment
There was a problem hiding this comment.
Looks good, a minor nitpick - @liuguoqingfz Can you please address that.
Post that once CI is green, I'll merge it.
|
|
||
| // Stronger ordering: wait until cluster state shows the repository is in-use due to deletion | ||
| assertBusy(() -> { | ||
| final org.opensearch.cluster.ClusterState state = client().admin().cluster().prepareState().get().getState(); |
There was a problem hiding this comment.
nit: can you please import the classes ClusterState, Matchers, ExceptionsHelper instead of full qualified name. Although a test, but that should make it more readable.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20237 +/- ##
============================================
+ Coverage 73.20% 73.25% +0.04%
- Complexity 71766 71790 +24
============================================
Files 5795 5795
Lines 328302 328303 +1
Branches 47283 47283
============================================
+ Hits 240345 240488 +143
+ Misses 68628 68561 -67
+ Partials 19329 19254 -75 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Fix a flaky test for a race: seeing the task "cluster:admin/snapshot/delete" in the task list does not guarantee the delete has progressed far enough to mark the repository “in use” in cluster state. If you call updateRepository(...) before SnapshotDeletionsInProgress is populated, the update can legitimately succeed, and then nothing is thrown.
Related Issues
Resolves #20107
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.