Skip to content

fix: use TriggerError when all ScaledJob triggers fail#7205

Merged
wozniakjan merged 4 commits intokedacore:mainfrom
rickbrouwer:jobs
Nov 21, 2025
Merged

fix: use TriggerError when all ScaledJob triggers fail#7205
wozniakjan merged 4 commits intokedacore:mainfrom
rickbrouwer:jobs

Conversation

@rickbrouwer
Copy link
Member

@rickbrouwer rickbrouwer commented Oct 28, 2025

This PR fixes an inconsistency in error handling between ScaledJob and ScaledObject.

When a ScaledJob has trigger errors, it always sets the ReadyCondition to status Unknown with reason PartialTriggerError and the message Some triggers defined in ScaledJob are not working correctly.

This happens even when all triggers fail (e.g., a ScaledJob with only 1 trigger that is misconfigured).

In contrast, ScaledObject correctly distinguishes between PartialTriggerError and TriggerError.

So, this PR fixes the ScaledJob's error handling with ScaledObject by adding a nested check within the isError block to evaluate isActive. Set TriggerError with status False when all triggers fail (isError=true and isActive=false) and keeping the PartialTriggerError with status Unknown when only some triggers fail (isError=true and isActive=true)

Checklist

@rickbrouwer rickbrouwer requested a review from a team as a code owner October 28, 2025 14:31
@github-actions
Copy link

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
@rickbrouwer
Copy link
Member Author

rickbrouwer commented Oct 28, 2025

/run-e2e internals
Update: You can check the progress here

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
@keda-automation keda-automation requested review from a team October 31, 2025 10:56
@rickbrouwer
Copy link
Member Author

rickbrouwer commented Oct 31, 2025

/run-e2e internals
Update: You can check the progress here

@rickbrouwer rickbrouwer added the ok-to-merge This PR can be merged label Nov 5, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
@snyk-io
Copy link

snyk-io bot commented Nov 18, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@rickbrouwer
Copy link
Member Author

rickbrouwer commented Nov 18, 2025

/run-e2e internals
Update: You can check the progress here

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
@rickbrouwer
Copy link
Member Author

rickbrouwer commented Nov 19, 2025

/run-e2e internals
Update: You can check the progress here

@wozniakjan wozniakjan merged commit 691e380 into kedacore:main Nov 21, 2025
28 checks passed
@rickbrouwer rickbrouwer deleted the jobs branch November 21, 2025 11:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an inconsistency in error handling for ScaledJob resources to align with ScaledObject behavior. Previously, ScaledJob would always report PartialTriggerError with status Unknown when any triggers failed, even when all triggers failed. This PR correctly distinguishes between partial and total trigger failures by checking the isActive flag within the error handling logic.

Key Changes:

  • Added nested condition check in scale_jobs.go to differentiate between partial and total trigger failures
  • Added comprehensive E2E tests covering all three condition scenarios
  • Updated CHANGELOG.md to document the fix

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/scaling/executor/scale_jobs.go Adds nested isActive check to set TriggerError (status False) when all triggers fail, and PartialTriggerError (status Unknown) when only some fail
tests/internals/scaled_job_conditions/scaled_job_conditions_test.go Adds comprehensive E2E tests covering ReadyCondition True, False, and Unknown scenarios with corresponding ActiveCondition states
CHANGELOG.md Documents the bug fix and corrects formatting inconsistency for Dynamodb Scaler entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JorTurFer JorTurFer mentioned this pull request Dec 7, 2025
31 tasks
JorTurFer pushed a commit to JorTurFer/keda that referenced this pull request Dec 8, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
JorTurFer pushed a commit to JorTurFer/keda that referenced this pull request Dec 8, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
JorTurFer pushed a commit to JorTurFer/keda that referenced this pull request Dec 8, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>
JorTurFer added a commit that referenced this pull request Dec 8, 2025
* fix: Correct parse error ActiveMQ (#7245)

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* fix: metricUnavailableValue parameter not working in Datadog scaler (#7241)

* fix: metricUnavailableValue parameter not working in Datadog scaler

The UseFiller flag was not being set correctly when metricUnavailableValue
was configured. This fix distinguishes between 'not configured' and
'explicitly set to 0' by checking TriggerMetadata directly.

Changes:
- Set UseFiller in validateAPIMetadata() when metricUnavailableValue exists
- Set UseFiller in validateClusterAgentMetadata() when metricUnavailableValue exists
- Remove UseFiller logic from Validate() (responsibility moved to validate functions)
- Update tests to verify UseFiller behavior with various values including 0

This allows users to explicitly set metricUnavailableValue to 0 and have
it work as a fallback value, while still erroring when not configured.

Fixes #7238

Signed-off-by: Hiroki Matsui <fenethtool@gmail.com>

* test: cover both API and ClusterAgent modes in UseFiller test

Updated TestDatadogMetadataValidateUseFiller to test both validateAPIMetadata()
and validateClusterAgentMetadata() code paths. This ensures that the UseFiller
flag is correctly set in both integration modes.

Test cases now cover:
- API mode: 5 test cases (not configured, 0, positive, negative, decimal)
- Cluster Agent mode: 5 test cases (same variations)

Signed-off-by: Hiroki Matsui <fenethtool@gmail.com>

* refactor: use pointer type for FillValue to avoid TriggerMetadata access

Changed FillValue from float64 to *float64 to distinguish between
'not configured' (nil) and 'explicitly set to any value including 0'.

This addresses reviewer feedback about avoiding direct TriggerMetadata
access and improves type safety and refactoring resistance.

Changes:
- FillValue type changed from float64 to *float64 with optional tag
- validateAPIMetadata checks nil instead of TriggerMetadata map
- validateClusterAgentMetadata checks nil instead of TriggerMetadata map
- Dereference FillValue when returning fallback value (2 locations)
- Update tests to handle pointer type with proper nil checks

Signed-off-by: Hiroki Matsui <fenethtool@gmail.com>

---------

Signed-off-by: Hiroki Matsui <fenethtool@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* Fix ScaledObject pause behavior when HPA doesn't exist (#7233)

When a ScaledObject has the paused annotation set before the HPA is
created, the controller would fall through and create the HPA, ignoring
the pause annotation.

The fix writes the paused status to etcd immediately before stopping
the scale loop or deleting the HPA. This prevents race conditions where
concurrent reconciles triggered by HPA deletion would not see the paused
status and perform redundant operations.

The key insight is to establish the paused state in etcd BEFORE any
operations that trigger new reconciles, ensuring subsequent reconciles
see the paused status and exit early.

This solution follows the approach suggested by @rickbrouwer.

Fixes #7231

Signed-off-by: nusmql <nusmql@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* fix: use TriggerError when all ScaledJob triggers fail (#7205)

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* Fix transfer-hpa-ownership panic when hpa name not provided (#7260)

* chore: renormalize line endings

Signed-off-by: James Williams <jamesleighwilliams@gmail.com>

* fix: nil pointer when transfer-hpa-ownership is true but hpa name not specified (#7254)

Signed-off-by: James Williams <jamesleighwilliams@gmail.com>

* update changelog

Signed-off-by: James Williams <jamesleighwilliams@gmail.com>

* revert vendor changes

Signed-off-by: James Williams <jamesleighwilliams@gmail.com>

---------

Signed-off-by: James Williams <jamesleighwilliams@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* fix: restore HPA behavior when paused-scale-in/out annotation is deleted (#7291)

When paused-scale-in or paused-scale-out annotation is deleted (not set
to "false") and the corresponding selectPolicy (scaleDown.selectPolicy
or scaleUp.selectPolicy) is not explicitly set in the ScaledObject spec,
the HPA's SelectPolicy remains stuck at "Disabled" instead of being
restored.

This occurs even if other behavior fields like policies or
stabilizationWindowSeconds are defined - only an explicit selectPolicy
value triggers the update.

Root cause: DeepDerivative treats nil as "unset" and considers it a
subset of any value, so DeepDerivative(nil, Disabled) returns true,
preventing the HPA update.

Fix: Add explicit DeepEqual check for Behavior field, following the
existing pattern used for Metrics length check.

test: add e2e test for paused-scale-in annotation removal

Signed-off-by: Dima Shevchuk <dshedimon@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* refactor: remove unused scaledObjectMetricSpecs variable (#7292)

* refactor: remove unused scaledObjectMetricSpecs variable

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>

* update CHANGELOG.md

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>

---------

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* fix: handle requestScaleLoop error in ScaledObject controller (#7273)

* fix: handle requestScaleLoop error in ScaledObject controller

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>

* chore: update CHANGELOG for PR #7273

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>

---------

Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* bump actions and go version (#7295)

* bump actions and go version

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* bump deps

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* update pkgs

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* update tools

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* .

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* fix test

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* fix lint

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* update setup-go to use go.mod version

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* add nolint to exclude pulsar issues

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* fix devenv

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* fix codeql

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* fix splunk test

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* include job in links

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* update to ubuntu-slim some runners

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>

* Update apis/keda/v1alpha1/scaledobject_webhook_test.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

* Update .github/workflows/scorecards.yml

Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com>
Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>

---------

Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

* update changelog

Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>

---------

Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Jorge Turrado <jorge.turrado@mail.schwarz>
Signed-off-by: Hiroki Matsui <fenethtool@gmail.com>
Signed-off-by: nusmql <nusmql@gmail.com>
Signed-off-by: James Williams <jamesleighwilliams@gmail.com>
Signed-off-by: Dima Shevchuk <dshedimon@gmail.com>
Signed-off-by: u-kai <76635578+u-kai@users.noreply.github.com>
Signed-off-by: Jorge Turrado Ferrero <Jorge_turrado@hotmail.es>
Signed-off-by: Jorge Turrado <jorge_turrado@hotmail.es>
Co-authored-by: Rick Brouwer <rickbrouwer@gmail.com>
Co-authored-by: Matchan <fenethtool@gmail.com>
Co-authored-by: nusmql <nusmql@gmail.com>
Co-authored-by: James Williams <jamesleighwilliams@gmail.com>
Co-authored-by: Dima Shevchuk <dshedimon@gmail.com>
Co-authored-by: Kai Udo <76635578+u-kai@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jan Wozniak <wozniak.jan@gmail.com>
alt-dima pushed a commit to alt-dima/keda that referenced this pull request Dec 13, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Signed-off-by: Dmitriy Altuhov <altuhovd@gmail.com>
tangobango5 pushed a commit to tangobango5/keda that referenced this pull request Dec 22, 2025
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
tangobango5 pushed a commit to tangobango5/keda that referenced this pull request Feb 13, 2026
Signed-off-by: Rick Brouwer <rickbrouwer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-merge This PR can be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants