Use node addresses from informer in kubelet certificate manager#130362
Conversation
lentzi90
left a comment
There was a problem hiding this comment.
This is such a nice improvement! Thanks!
LGTM from my side
37e7c91 to
2b18dca
Compare
| // wait for the cloud provider to initialize the node addresses | ||
| for _, taint := range node.Spec.Taints { | ||
| if taint.Key == cloudproviderapi.TaintExternalCloudProvider { | ||
| return nil |
There was a problem hiding this comment.
So kubelet will no longer request a cert until the taint is removed? I think that makes sense, given the external cloud provider "owns" the address list, but this will lengthen the period when kubelet is ready/accepting pods without having any serving cert. Can you add a log line here for visibility?
Somewhat related to the discussion in: #73047
There was a problem hiding this comment.
yeah, related to this #125813
The node addresses are updated just after the taint is removed when the node is initialized by the cloud controller manager
Bear in mind that the CSR requires to have at least one IP , that mean there should be one IP in node.Status.Addresses
kubernetes/pkg/kubelet/certificate/kubelet.go
Lines 50 to 51 in 8cca6d9
But I'm not sure of what path we want to follow:
- no external cloud provider, everything is the same
- external cloud provider
- no --node-ip specified, effectively the behavior should be the same, as there will be no IPs on node.status.addresses
- node-ip specified, this is the tricky part per kubelet: cloud-provider external addresses #121028
- ip matches cloud-provider addresses, only one CSR is sent and we'll regress on node startup
- ip partially matches, two CSR are sent, but it may regress or not
- ip does not match, two CSR are sent but only the last one will be the one matching the actual addresses.
We have been so many times back and forth with node.status.address that now I think we should remove this check, and keep the existing behavior, if someone is specifies the --node-ip with the flag, then it may want to send the CSR with that IP ...
2b18dca to
269cf8b
Compare
|
Pushed last version based on discussion with @cartermckinnon in #130362 (comment), this keeps the same behavior, just removes the dependency on the node status loops by using the local informer. Based on the last year experience better the devil you know than the devil you don't |
| return nil, err | ||
| } | ||
| kl.lastStatusReportTime = kl.clock.Now() | ||
| kl.setLastObservedNodeAddresses(updatedNode.Status.Addresses) |
There was a problem hiding this comment.
@cartermckinnon this will also improve the latency on cloud providers, since it will not have to wait for the patch of the kubelet,
|
/triage accepted |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
5 similar comments
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
|
/lgtm cancel To stop CI spam. |
|
/retest Code freeze is over, still want to get this one in @aojea ? |
|
Might need a rebase to fix the tests? |
The kubelet certificate manager was using a closure to get the node addresses, but this closure depended on a static field that was only updated during the node status update. This created a twisted dependency between the node.status reconcile loops and the certificate manager. This commit fixes this issue by using the node informer to get the node addresses directly. This ensures that the kubelet always requests a certificate with the latest node addresses.
174a9c8 to
1214dc2
Compare
it shouldn't but it will retrigger the jobs, @cartermckinnon can you lgtm again? |
|
/retest
the window unit test are panicing and look very unstable https://prow.k8s.io/job-history/gs/kubernetes-ci-logs/pr-logs/directory/pull-kubernetes-unit-windows-master , they should not be informing in current state since create confusion |
|
@aojea: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
someone can lgtm again, the windows job does not seem related and is also not very stable |
|
/skip |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 32408a9c0c81fb45bf0a57f10d517c88e77a6521 |
The kubelet certificate manager was using a closure to get the node addresses, but this closure depended on a static field that was only updated during the node status update. This created a twisted dependency between the node.status reconcile loops and the certificate manager.
This commit fixes this issue by using the node informer to get the node addresses directly. This ensures that the kubelet always requests a certificate with the latest node addresses.
/kind bug
/kind cleanup
Fixes #130001
/sig node
/sig network