Bugfix: Relax updatedAt validation for WLM workload group creation#20486
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c31dbd5 to
6a0ce64
Compare
|
❌ Gradle check result for 6a0ce64: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
jainankitk
left a comment
There was a problem hiding this comment.
Thanks @hyanggeun for addressing this issue. I was initially thinking of adding some buffer to currentSeconds which should handle any clock skew issue across nodes. But I do feel that removing the currentSeconds logic altogether should also be fine. @kaushalmahi12 - What do you think?
|
❌ Gradle check result for 6a0ce64: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
I think it should be fine to remove this check since it will be very rare for user to supply this value since it is generated on the node which recieves the create/update request. |
kaushalmahi12
left a comment
There was a problem hiding this comment.
it would be good to also add UT around it.
|
@kaushalmahi12 I don't want to block this PR or anything, but I think we need a more standard way of tracking changes to config indices like the security index, ISM, alerting and WLM. There is technically a solution with the security auditlog, but we've been discussing with @nagarajg17 something separate for the audit log that would be geared towards operators and cluster admins who specifically want to keep track of config-type changes. Essentially, a more abstracted feature than opensearch-project/security#5407 and opensearch-project/security#5093 (comment) that any plugin can leverage out of the box. |
@hyanggeun - While you are at it, can you also add changelog entry and resolve precommit failures? |
|
@jainankitk @kaushalmahi12 |
7c47160 to
cf8385c
Compare
|
❌ Gradle check result for cf8385c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java
Outdated
Show resolved
Hide resolved
7036aee to
0515628
Compare
|
❗ AI-powered Code-Diff-Analyzer found issues on commit 0515628.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
❌ Gradle check result for 0515628: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
0515628 to
211afa5
Compare
|
❗ AI-powered Code-Diff-Analyzer found issues on commit 211afa5.
The table above displays the top 10 most important findings. Pull Requests Author(s): Please update your Pull Request according to the report above. Repository Maintainer(s): You can Thanks. |
|
❌ Gradle check result for 211afa5: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 1936752: null Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
1936752 to
2f5a02f
Compare
|
❕ Gradle check result for 2f5a02f: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #20486 +/- ##
============================================
- Coverage 73.37% 73.25% -0.12%
+ Complexity 72197 72124 -73
============================================
Files 5798 5798
Lines 329816 329816
Branches 47538 47538
============================================
- Hits 242001 241612 -389
- Misses 68458 68884 +426
+ Partials 19357 19320 -37 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…pensearch-project#20486) Signed-off-by: hyanggeun <songsogu@gmail.com>
…pensearch-project#20486) Signed-off-by: hyanggeun <songsogu@gmail.com>
Description
Summary
This change fixes intermittent failures when creating WLM workload groups on data nodes. The current validation rejects updatedAtInMillis if it is slightly ahead of the cluster-manager time, which can happen with minor clock skew. I believe a strict updatedAt upper-bound check is unnecessary for creation, so this patch only verifies that the value is a valid epoch (>= 0).
Changes
Remove the “not in the future” check from WorkloadGroup.isValid(...)
Keep only the lower-bound epoch validation
Why
Requests routed to data nodes can fail with WorkloadGroup.updatedAtInMillis is not a valid epoch due to small time differences across nodes. Since updatedAt is not critical for creation integrity, strict validation doesn’t seem necessary.
Related Issues
Fixes #20485
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.