fix: use TriggerError when all ScaledJob triggers fail#7205
fix: use TriggerError when all ScaledJob triggers fail#7205wozniakjan merged 4 commits intokedacore:mainfrom
Conversation
|
Thank you for your contribution! 🙏 Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected. While you are waiting, make sure to:
Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient. Learn more about our contribution guide. |
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
|
/run-e2e internals |
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
|
/run-e2e internals |
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
/run-e2e internals |
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
|
/run-e2e internals |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes an inconsistency in error handling for ScaledJob resources to align with ScaledObject behavior. Previously, ScaledJob would always report PartialTriggerError with status Unknown when any triggers failed, even when all triggers failed. This PR correctly distinguishes between partial and total trigger failures by checking the isActive flag within the error handling logic.
Key Changes:
- Added nested condition check in
scale_jobs.goto differentiate between partial and total trigger failures - Added comprehensive E2E tests covering all three condition scenarios
- Updated CHANGELOG.md to document the fix
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| pkg/scaling/executor/scale_jobs.go | Adds nested isActive check to set TriggerError (status False) when all triggers fail, and PartialTriggerError (status Unknown) when only some fail |
| tests/internals/scaled_job_conditions/scaled_job_conditions_test.go | Adds comprehensive E2E tests covering ReadyCondition True, False, and Unknown scenarios with corresponding ActiveCondition states |
| CHANGELOG.md | Documents the bug fix and corrects formatting inconsistency for Dynamodb Scaler entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>
* fix: Correct parse error ActiveMQ (#7245) Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * fix: metricUnavailableValue parameter not working in Datadog scaler (#7241) * fix: metricUnavailableValue parameter not working in Datadog scaler The UseFiller flag was not being set correctly when metricUnavailableValue was configured. This fix distinguishes between 'not configured' and 'explicitly set to 0' by checking TriggerMetadata directly. Changes: - Set UseFiller in validateAPIMetadata() when metricUnavailableValue exists - Set UseFiller in validateClusterAgentMetadata() when metricUnavailableValue exists - Remove UseFiller logic from Validate() (responsibility moved to validate functions) - Update tests to verify UseFiller behavior with various values including 0 This allows users to explicitly set metricUnavailableValue to 0 and have it work as a fallback value, while still erroring when not configured. Fixes #7238 Signed-off-by: Hiroki Matsui <fenethtool@gmail.com> * test: cover both API and ClusterAgent modes in UseFiller test Updated TestDatadogMetadataValidateUseFiller to test both validateAPIMetadata() and validateClusterAgentMetadata() code paths. This ensures that the UseFiller flag is correctly set in both integration modes. Test cases now cover: - API mode: 5 test cases (not configured, 0, positive, negative, decimal) - Cluster Agent mode: 5 test cases (same variations) Signed-off-by: Hiroki Matsui <fenethtool@gmail.com> * refactor: use pointer type for FillValue to avoid TriggerMetadata access Changed FillValue from float64 to *float64 to distinguish between 'not configured' (nil) and 'explicitly set to any value including 0'. This addresses reviewer feedback about avoiding direct TriggerMetadata access and improves type safety and refactoring resistance. Changes: - FillValue type changed from float64 to *float64 with optional tag - validateAPIMetadata checks nil instead of TriggerMetadata map - validateClusterAgentMetadata checks nil instead of TriggerMetadata map - Dereference FillValue when returning fallback value (2 locations) - Update tests to handle pointer type with proper nil checks Signed-off-by: Hiroki Matsui <fenethtool@gmail.com> --------- Signed-off-by: Hiroki Matsui <fenethtool@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * Fix ScaledObject pause behavior when HPA doesn't exist (#7233) When a ScaledObject has the paused annotation set before the HPA is created, the controller would fall through and create the HPA, ignoring the pause annotation. The fix writes the paused status to etcd immediately before stopping the scale loop or deleting the HPA. This prevents race conditions where concurrent reconciles triggered by HPA deletion would not see the paused status and perform redundant operations. The key insight is to establish the paused state in etcd BEFORE any operations that trigger new reconciles, ensuring subsequent reconciles see the paused status and exit early. This solution follows the approach suggested by @rickbrouwer. Fixes #7231 Signed-off-by: nusmql <nusmql@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * fix: use TriggerError when all ScaledJob triggers fail (#7205) Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * Fix transfer-hpa-ownership panic when hpa name not provided (#7260) * chore: renormalize line endings Signed-off-by: James Williams <jamesleighwilliams@gmail.com> * fix: nil pointer when transfer-hpa-ownership is true but hpa name not specified (#7254) Signed-off-by: James Williams <jamesleighwilliams@gmail.com> * update changelog Signed-off-by: James Williams <jamesleighwilliams@gmail.com> * revert vendor changes Signed-off-by: James Williams <jamesleighwilliams@gmail.com> --------- Signed-off-by: James Williams <jamesleighwilliams@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * fix: restore HPA behavior when paused-scale-in/out annotation is deleted (#7291) When paused-scale-in or paused-scale-out annotation is deleted (not set to "false") and the corresponding selectPolicy (scaleDown.selectPolicy or scaleUp.selectPolicy) is not explicitly set in the ScaledObject spec, the HPA's SelectPolicy remains stuck at "Disabled" instead of being restored. This occurs even if other behavior fields like policies or stabilizationWindowSeconds are defined - only an explicit selectPolicy value triggers the update. Root cause: DeepDerivative treats nil as "unset" and considers it a subset of any value, so DeepDerivative(nil, Disabled) returns true, preventing the HPA update. Fix: Add explicit DeepEqual check for Behavior field, following the existing pattern used for Metrics length check. test: add e2e test for paused-scale-in annotation removal Signed-off-by: Dima Shevchuk <dshedimon@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * refactor: remove unused scaledObjectMetricSpecs variable (#7292) * refactor: remove unused scaledObjectMetricSpecs variable Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> * update CHANGELOG.md Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> --------- Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * fix: handle requestScaleLoop error in ScaledObject controller (#7273) * fix: handle requestScaleLoop error in ScaledObject controller Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> * chore: update CHANGELOG for PR #7273 Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> --------- Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * bump actions and go version (#7295) * bump actions and go version Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * bump deps Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * update pkgs Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * update tools Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * . Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * fix test Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * fix lint Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * update setup-go to use go.mod version Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * add nolint to exclude pulsar issues Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * fix devenv Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * fix codeql Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * fix splunk test Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * include job in links Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * update to ubuntu-slim some runners Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> * Update apis/keda/v1alpha1/scaledobject_webhook_test.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> * Update .github/workflows/scorecards.yml Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com> Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> --------- Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> * update changelog Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> --------- Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz> Signed-off-by: Hiroki Matsui <fenethtool@gmail.com> Signed-off-by: nusmql <nusmql@gmail.com> Signed-off-by: James Williams <jamesleighwilliams@gmail.com> Signed-off-by: Dima Shevchuk <dshedimon@gmail.com> Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com> Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es> Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es> Co-authored-by: Rick Brouwer <rickbrouwer@gmail.com> Co-authored-by: Matchan <fenethtool@gmail.com> Co-authored-by: nusmql <nusmql@gmail.com> Co-authored-by: James Williams <jamesleighwilliams@gmail.com> Co-authored-by: Dima Shevchuk <dshedimon@gmail.com> Co-authored-by: Kai Udo <76635578+u-kai@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com>
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com> Signed-off-by: Dmitriy Altuhov <altuhovd@gmail.com>
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
This PR fixes an inconsistency in error handling between ScaledJob and ScaledObject.
When a ScaledJob has trigger errors, it always sets the ReadyCondition to status
Unknownwith reasonPartialTriggerErrorand the messageSome triggers defined in ScaledJob are not working correctly.This happens even when all triggers fail (e.g., a ScaledJob with only 1 trigger that is misconfigured).
In contrast, ScaledObject correctly distinguishes between
PartialTriggerErrorandTriggerError.So, this PR fixes the ScaledJob's error handling with ScaledObject by adding a nested check within the
isErrorblock to evaluateisActive. SetTriggerErrorwith statusFalsewhen all triggers fail (isError=trueandisActive=false) and keeping thePartialTriggerErrorwith statusUnknownwhen only some triggers fail (isError=trueandisActive=true)Checklist