Skip to content

Bugfix: Relax updatedAt validation for WLM workload group creation#20486

Merged
jainankitk merged 5 commits intoopensearch-project:mainfrom
hyanggeun:bugfix/fix-to-check-only-exists-when-create-wlm
Feb 11, 2026
Merged

Bugfix: Relax updatedAt validation for WLM workload group creation#20486
jainankitk merged 5 commits intoopensearch-project:mainfrom
hyanggeun:bugfix/fix-to-check-only-exists-when-create-wlm

Conversation

@hyanggeun
Copy link
Copy Markdown
Contributor

Description

Summary

This change fixes intermittent failures when creating WLM workload groups on data nodes. The current validation rejects updatedAtInMillis if it is slightly ahead of the cluster-manager time, which can happen with minor clock skew. I believe a strict updatedAt upper-bound check is unnecessary for creation, so this patch only verifies that the value is a valid epoch (>= 0).

Changes

Remove the “not in the future” check from WorkloadGroup.isValid(...)
Keep only the lower-bound epoch validation

Why

Requests routed to data nodes can fail with WorkloadGroup.updatedAtInMillis is not a valid epoch due to small time differences across nodes. Since updatedAt is not critical for creation integrity, strict validation doesn’t seem necessary.

Related Issues

Fixes #20485

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@hyanggeun hyanggeun requested a review from a team as a code owner January 27, 2026 02:52
@github-actions github-actions bot added bug Something isn't working Plugins labels Jan 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 27, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The isValid method in WorkloadGroup.java is modified to remove the upper bound validation that constrains updatedAt to not exceed the current time. The method now only validates that updatedAt is at or after the epoch start, eliminating validation failures caused by minor clock skew between nodes.

Changes

Cohort / File(s) Summary
WorkloadGroup validation logic
server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java
Removed upper bound check that validated updatedAt <= Instant.now(). The method now only enforces the lower bound requirement that updatedAt >= minValidTimestamp (epoch start).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Bugfix: Relax updatedAt validation for WLM workload group creation' accurately summarizes the main change: removing the strict future-time validation check while retaining epoch validation.
Description check ✅ Passed The description provides clear details about the bug, the fix, and rationale. It follows the template structure with a summary, changes, and 'why' explanation, plus a link to the related issue #20485.
Linked Issues check ✅ Passed The PR directly addresses the objectives of issue #20485 by relaxing updatedAt validation to tolerate minor clock skew, enabling workload group creation requests to succeed when routed to data nodes.
Out of Scope Changes check ✅ Passed The change is focused and in-scope: only the updatedAt validation logic in WorkloadGroup.isValid() is modified to remove the future-time check, with no extraneous modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hyanggeun hyanggeun force-pushed the bugfix/fix-to-check-only-exists-when-create-wlm branch from c31dbd5 to 6a0ce64 Compare January 27, 2026 02:53
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6a0ce64: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Copy Markdown
Contributor

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hyanggeun for addressing this issue. I was initially thinking of adding some buffer to currentSeconds which should handle any clock skew issue across nodes. But I do feel that removing the currentSeconds logic altogether should also be fine. @kaushalmahi12 - What do you think?

@jainankitk jainankitk added backport 3.5 Backport to 3.5 branch skip-changelog labels Jan 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6a0ce64: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kaushalmahi12
Copy link
Copy Markdown
Contributor

I think it should be fine to remove this check since it will be very rare for user to supply this value since it is generated on the node which recieves the create/update request.

Copy link
Copy Markdown
Contributor

@kaushalmahi12 kaushalmahi12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to also add UT around it.

@cwperks
Copy link
Copy Markdown
Member

cwperks commented Feb 2, 2026

@kaushalmahi12 I don't want to block this PR or anything, but I think we need a more standard way of tracking changes to config indices like the security index, ISM, alerting and WLM. There is technically a solution with the security auditlog, but we've been discussing with @nagarajg17 something separate for the audit log that would be geared towards operators and cluster admins who specifically want to keep track of config-type changes.

Essentially, a more abstracted feature than opensearch-project/security#5407 and opensearch-project/security#5093 (comment) that any plugin can leverage out of the box.

@jainankitk
Copy link
Copy Markdown
Contributor

it would be good to also add UT around it.

@hyanggeun - While you are at it, can you also add changelog entry and resolve precommit failures?

@hyanggeun
Copy link
Copy Markdown
Contributor Author

@jainankitk @kaushalmahi12
Thanks for the feedback. I'll add the necessary unit tests, update the changelog entry, and resolve the pre-commit issues.

@hyanggeun hyanggeun force-pushed the bugfix/fix-to-check-only-exists-when-create-wlm branch 3 times, most recently from 7c47160 to cf8385c Compare February 3, 2026 00:57
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

❌ Gradle check result for cf8385c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hyanggeun hyanggeun force-pushed the bugfix/fix-to-check-only-exists-when-create-wlm branch 2 times, most recently from 7036aee to 0515628 Compare February 6, 2026 00:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 6, 2026

❗ AI-powered Code-Diff-Analyzer found issues on commit 0515628.

PathLineSeverityDescription
server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java72lowTimestamp validation weakened: removed upper bound check allowing arbitrary future timestamps. Original validation prevented future timestamps; new validation only rejects negative values. While likely unintentional (addressing clock skew), could enable timestamp manipulation if used in security-sensitive contexts.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 6, 2026

❌ Gradle check result for 0515628: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hyanggeun hyanggeun force-pushed the bugfix/fix-to-check-only-exists-when-create-wlm branch from 0515628 to 211afa5 Compare February 6, 2026 01:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 6, 2026

❗ AI-powered Code-Diff-Analyzer found issues on commit 211afa5.

PathLineSeverityDescription
server/src/main/java/org/opensearch/cluster/metadata/WorkloadGroup.java72mediumTimestamp validation weakened: removed upper bound check that prevented future timestamps. Previously validated updatedAt <= Instant.now(), now only checks updatedAt > 0. This allows arbitrary future timestamps which could manipulate time-based ordering, bypass time-based controls, or enable logic bombs. While the PR claims to fix clock skew issues, the implementation is overly permissive and doesn't limit how far in the future timestamps can be set.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 6, 2026

❌ Gradle check result for 211afa5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hyanggeun hyanggeun closed this Feb 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 1936752: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
Signed-off-by: hyanggeun <songsogu@gmail.com>
@hyanggeun hyanggeun force-pushed the bugfix/fix-to-check-only-exists-when-create-wlm branch from 1936752 to 2f5a02f Compare February 11, 2026 02:08
@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for 2f5a02f: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.25%. Comparing base (e59c30c) to head (2f5a02f).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20486      +/-   ##
============================================
- Coverage     73.37%   73.25%   -0.12%     
+ Complexity    72197    72124      -73     
============================================
  Files          5798     5798              
  Lines        329816   329816              
  Branches      47538    47538              
============================================
- Hits         242001   241612     -389     
- Misses        68458    68884     +426     
+ Partials      19357    19320      -37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jainankitk jainankitk merged commit a31743e into opensearch-project:main Feb 11, 2026
34 checks passed
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Plugins skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Workload group(Query Group) creation fails when request hits data node due to updated_at validation against CM clock

6 participants