Check Node serving certificate in Node readiness check#73047
Conversation
|
/assign @liggitt |
There was a problem hiding this comment.
this only affects readiness if the kubelet has serving cert rotation enabled, right? are there other readiness checks that can go bad after startup like this?
There was a problem hiding this comment.
this only affects readiness if the kubelet has serving cert rotation enabled, right?
Yes, if static serving cert is used this check will never trigger. Ideally we could also watch static cert's expiration. Is it useful for this PR?
are there other readiness checks that can go bad after startup like this?
IIUC, yes. runtimeErrorsFunc and networkErrorsFunc rely on periodic health checks of CRI and CNI. If those start failing, node status flips.
There was a problem hiding this comment.
Just a doubt that how we are going to handle if k1.serverCertificateManager = nil.? Thanks in advance
There was a problem hiding this comment.
When k1.serverCertificateManager == nil, serving certificate rotation is not enabled.
Node status will not be tied to serving certificate status. So (assuming all other checks here pass) Node will come up Ready.
|
/test pull-kubernetes-local-e2e-containerized |
|
Ping @liggitt |
|
/retest I think this makes sense, but had a few questions:
also, the help around the rotate server certificates flag and config options needs to be updated |
This only triggers when Node serving certificate bootstrap is enabled. Without successful bootstrap, Node can't serve exec/logs calls from master. This PR will mark Node NotReady until certificate is provisioned or when it expires.
They shouldn't. In fact, static pods should run even if kubelet fails to bootstrap client credentials.
Yes, after
Done |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: awly If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@awly: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
thinking through this some more... all the other readiness checks (to my knowledge) are oriented around the ability of a node to run a workload (e.g. can the network be set up for it, is the cri ready, are storage drivers ready, etc). This is adding a condition solely oriented around whether calls from the apiserver could be made to the pod, once it was running. Is the ability of the node to accept exec/attach/log calls an inherent part of running a pod? If this is the first of this category of readiness check to be made on a node, would like an ack from @kubernetes/sig-node-pr-reviews that this is coherent with intended use of node readiness |
|
This came from conformance tests ( kubernetes/test/e2e/kubectl/kubectl.go Line 367 in 9f673c8 This seems to indicate that working exec/logs/attach are currently required to mark a cluster "conformant". Whether or not this is a true requirement or just overzealous tests - I don't know. |
|
Ping @liggitt @kubernetes/sig-node-pr-reviews |
|
Just because a node is "conformant" doesn't mean we want the node to be marked not ready because the cert is being rotated. I'm kind of uncomfortable making nodes not ready to run workloads just because some workloads that need logs can't run. Others potentially could run without logs. I'm not opposed to a condition being exposed, but I think it's a bigger step to make it part of ready. Changing node readiness has a lot of implications as Jordan noted - while this may feel correct, have we thought how it would impact people who have already deployed rotation in production? Do we have any user feedback on it? |
|
@awly: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Abandoning this PR. If this gets problematic enough, sounds like some retries in the test would be sufficient. |
This only triggers when Node serving certificate bootstrap is enabled.
Without successful bootstrap, Node can't serve exec/logs calls from
master.
This PR will mark Node NotReady until certificate is provisioned or when
it expires.
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?: