-
Notifications
You must be signed in to change notification settings - Fork 43.2k
kubelet periodic node status update should not be aligned #124202
Copy link
Copy link
Closed
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.
Metadata
Metadata
Assignees
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.Categorizes an issue or PR as relevant to SIG Scalability.
Type
Fields
Give feedbackNo fields configured for issues without a type.
What happened?
The default
nodeStatusUpdateFrequencyis 10s with 4% jitter.The default
nodeStatusReportFrequencyis 5m.The node status will be updated either there's a change or it has been longer than
nodeStatusReportFrequencysince the previous update.nodeStatusReportFrequencykubernetes/pkg/kubelet/kubelet_node_status.go
Lines 580 to 581 in 55b83c9
The jitter of
nodeStatusUpdateFrequencywon't effectively make the periodic node status update to be spread evenly over the 5-minute window.The node status update requests from thousands of nodes are still sent at almost the same time after a couple of days. It can cause the CPU usage to spike every 5 minutes.
The following is the APIServer CPU utilization on 96CPU instances. The cluster has 5k nodes that are added in a short period of time. The cluster is idle without any application workload. The utilization can go up to 64% and go down to 14% just because of these node status update requests.

What did you expect to happen?
I'd expect the node status update requests from kubelet to be spread evenly so that the apiserver won't have a minute that it has to handle the requests from most nodes every 5 minutes.
How can we reproduce it (as minimally and precisely as possible)?
Bring up a cluster and add nodes in a short period of time.
Anything else we need to know?
Ref: https://github.com/kubernetes/kubernetes/pull/105272/files
No response
Kubernetes version
Details
Cutting edge 1.30 ```console $ kubectl version # paste output here ```Cloud provider
Details
EKSOS version
Details
Install tools
Details
Container runtime (CRI) and version (if applicable)
Details
Related plugins (CNI, CSI, ...) and versions (if applicable)
Details