Skip to content

kubelet: add failure threshold info to probe failure events#135879

Open
sonikaarora wants to merge 1 commit intokubernetes:masterfrom
sonikaarora:fix-probe-failure-event-115823
Open

kubelet: add failure threshold info to probe failure events#135879
sonikaarora wants to merge 1 commit intokubernetes:masterfrom
sonikaarora:fix-probe-failure-event-115823

Conversation

@sonikaarora
Copy link
Contributor

@sonikaarora sonikaarora commented Dec 22, 2025

What this PR does / why we need it:

This PR addresses issue #115823 by enhancing existing probe failure events with
threshold context. When a probe fails, the event message now includes:

  • The current failure count
  • The threshold before action is taken
  • A clear indication if the failure is being ignored

Example event messages:

  • "Liveness probe failed (1/3, will be ignored): ..." ← No action taken yet
  • "Liveness probe failed (3/3): ..." ← Action will be taken (container restart)

Why this matters: Before this change, users consuming probe failure events had no
way to know if a failure would result in action or be ignored due to FailureThreshold.
This could lead to confusion about whether containers were actually being restarted.

Approach: This PR modifies the existing event message format rather than creating
new event types, making it a minimal, backward-compatible change that directly addresses
the issue's request to "give an indication in container events."

Fixes #115823

What type of PR is this?

/kind bug
/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR is related to:

Fixes #115823

Special notes for your reviewer:

  • Backward compatible: The original probe() method still works unchanged
  • Only the event message format is modified; no behavioral changes
  • Added unit tests for the new functionality

Does this PR introduce a user-facing change?

Yes - probe failure events now include threshold information to indicate if the failure is being ignored.

Added failure count and threshold to probe failure events (e.g., "Liveness probe failed (1/3, will be ignored)") to help users understand if a probe failure will be acted upon or ignored due to FailureThreshold.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 22, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 22, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: sonikaarora / name: Sonika Arora (1c7426d)

@k8s-ci-robot k8s-ci-robot added the do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Dec 22, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @sonikaarora!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 22, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @sonikaarora. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 22, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 22, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sonikaarora
Once this PR has been reviewed and has the lgtm label, please assign random-liu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sonikaarora sonikaarora force-pushed the fix-probe-failure-event-115823 branch from 3de4109 to cc05909 Compare December 22, 2025 01:32
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 22, 2025
@sonikaarora
Copy link
Contributor Author

/sig node
/kind cleanup
/area kubelet

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Dec 22, 2025
@sonikaarora
Copy link
Contributor Author

/retitle kubelet: add failure threshold info to probe failure events

@k8s-ci-robot
Copy link
Contributor

@sonikaarora: Re-titling can only be requested by trusted users, like repository collaborators.

Details

In response to this:

/retitle kubelet: add failure threshold info to probe failure events

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sonikaarora sonikaarora changed the title kubelet: add failure threshold info to probe failure events kubelet: add failure threshold info to probe failure events Dec 22, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 22, 2025
@sonikaarora sonikaarora force-pushed the fix-probe-failure-event-115823 branch 2 times, most recently from a1cff97 to a761084 Compare December 22, 2025 01:59
When a probe fails, the event message now includes the failure count
and threshold (e.g., 'Liveness probe failed (1/3, will be ignored): ...').
This helps users understand whether a failure will trigger action or
be ignored due to FailureThreshold not being reached yet.
@sonikaarora
Copy link
Contributor Author

/cc @Random-Liu

@sonikaarora
Copy link
Contributor Author

cc @ardaguclu @Random-Liu Could you please help me to review this pr

@sonikaarora
Copy link
Contributor Author

@ardaguclu Could you please help me with review for this pr?

@sonikaarora
Copy link
Contributor Author

Hi @ardaguclu,

Hope you're doing well! I wanted to follow up on this PR that addresses #115823.

This is a focused change that enhances probe failure events with threshold information, helping users understand whether failures will trigger action. The approach is minimal (modifies existing event messages only) and maintains full backward compatibility.

I understand you're busy, but I'd really appreciate your feedback when you have a chance. Happy to make any adjustments based on your review!

Thank you! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Development

Successfully merging this pull request may close these issues.

Give an indication in container events for probe failure as to whether the failure was ignored due to FailureThreshold

2 participants