[Snapshot Interop][Bug Fix] Ensure lock file gets deleted if snapshot …#12016
[Snapshot Interop][Bug Fix] Ensure lock file gets deleted if snapshot …#12016harishbhakuni wants to merge 1 commit intoopensearch-project:mainfrom
Conversation
Compatibility status:Checks if related components are compatible with change 197ccfd Incompatible componentsIncompatible components: [https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git] Skipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/k-nn.git] |
|
❌ Gradle check result for b8ec898: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Test org.opensearch.search.sort.FieldSortIT.testSimpleSorts {p0={"search.concurrent_segment_search.enabled":"true"}} is failing which is flaky and being tracked as part of #11875 . |
|
This PR is stalled because it has been open for 30 days with no activity. |
197ccfd to
4cdd33e
Compare
|
❌ Gradle check result for 4cdd33e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
4cdd33e to
203e404
Compare
|
❌ Gradle check result for 203e404: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
203e404 to
184132e
Compare
|
❌ Gradle check result for 184132e: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
184132e to
1ebbc06
Compare
|
❌ Gradle check result for 1ebbc06: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…hot shard md upload fails during snapshot creation. Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
1ebbc06 to
57799ac
Compare
|
❌ Gradle check result for 57799ac: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
"that was causing an issue where if shard md upload to snapshot repository fails, it will not release the lock file from S3." - How is that happening ? The above block is doing exactly that |
not really, the problem was that caller method |
| + snapshot.getSnapshotId() | ||
| + " is " | ||
| + (endTime - startTime) | ||
| GatedCloseable<IndexCommit> finalWrappedSnapshot = wrappedSnapshot; |
There was a problem hiding this comment.
we should throw IndexShardSnapshotFailedException along with listener.onFailure(e);
| return new String[0]; | ||
| } | ||
| // filtering lock files from lock directory contents. | ||
| // this is a good to have check, there is no known prod scenarios where this can happen |
There was a problem hiding this comment.
nit : lets not use prod word . Also a user can create extra files in the lock folder as well
|
@harishbhakuni Is this still an issue? Can we update or close this PR? |
...shard md upload fails during snapshot creation.
Description
While working on another change, realized that in the following code, everything below snapshotRemoteStoreIndexShard call was not getting executed:
OpenSearch/server/src/main/java/org/opensearch/snapshots/SnapshotShardsService.java
Lines 427 to 462 in 829215c
that was causing an issue where if shard md upload to snapshot repository fails, it will not release the lock file from S3.
as part of this PR, making sure those lines of code get executed even if shard md upload fails and adding IT to cover that scenario.
Also, Adding a log line to print the time taken in flush operation during snapshot creation and another minor change to filter lock files based on lock suffix while fetching lock files from lock directory.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.