Skip to content

bugfix(kstatus): do not wait forever on failed resources#31730

Merged
scottrigby merged 1 commit intohelm:mainfrom
matheuscscp:fix-kstatus-failed
Jan 19, 2026
Merged

bugfix(kstatus): do not wait forever on failed resources#31730
scottrigby merged 1 commit intohelm:mainfrom
matheuscscp:fix-kstatus-failed

Conversation

@matheuscscp
Copy link
Contributor

What this PR does / why we need it:

Fixes: #31729

Special notes for your reviewer:

In kstatus Failed is a terminal state. There's no point in making Helm wait on it forever, as it will never transition to Current without intervention.

If applicable:

  • this PR contains user facing changes (the docs needed label should be applied if so)
  • this PR contains unit tests
  • this PR has been tested for backwards compatibility

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 14, 2026
@matheuscscp
Copy link
Contributor Author

I have run the entire test suite of Flux's helm-controller with this fix and it passed 👍

https://github.com/fluxcd/helm-controller/actions/runs/21015460312/job/60419627508

Copy link
Contributor

@AustinAbro321 AustinAbro321 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. Thanks

@scottrigby scottrigby added bug Categorizes issue or PR as related to a bug. dont-backport labels Jan 15, 2026
Copy link
Member

@gjenkins8 gjenkins8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, nothing major

Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
@matheuscscp
Copy link
Contributor Author

@gjenkins8 All fixed, pls take another look 🙏

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where Helm would wait indefinitely for resources in a Failed state. Since Failed is a terminal state in kstatus that won't transition to Current without manual intervention, the wait logic has been updated to exit early when resources fail.

Changes:

  • Modified statusObserver to exclude failed resources from blocking the wait when waiting for resources to become current
  • Refactored error handling in wait functions to always report non-ready resources rather than only on context timeout
  • Updated error message format to include namespace and be more consistent across different wait functions
  • Added comprehensive test coverage for failed resource scenarios

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
pkg/kube/statuswait.go Added logic to treat Failed status as terminal and exclude from waiting; refactored error reporting to be more comprehensive
pkg/kube/statuswait_test.go Added jobFailedManifest constant; created TestStatusWaitWithFailedResources with multiple test cases; updated all tests to use string-based error matching instead of error comparison; fixed trailing whitespace
pkg/kube/client_test.go Added createManifest helper function; refactored test setup to use fake dynamic clients; modernized map type from interface{} to any

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@TerryHowe TerryHowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

@banjoh banjoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@scottrigby scottrigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good catch. I'm a little surprised no one noticed this before now, since it's been here since v4.0.0.

@scottrigby scottrigby added this to the 4.1.1 milestone Jan 19, 2026
@scottrigby scottrigby merged commit 6e38e30 into helm:main Jan 19, 2026
11 checks passed
@matheuscscp matheuscscp deleted the fix-kstatus-failed branch January 19, 2026 21:28
@Noksa
Copy link

Noksa commented Jan 19, 2026

@scottrigby can we have it in upcoming 4.1.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Categorizes issue or PR as related to a bug. dont-backport size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression from Helm 3: Hooks wait for the full timeout in case of failed resources

9 participants