Skip to content

kubelet periodic node status update should not be aligned #124202

@mengqiy

Description

@mengqiy

What happened?

The default nodeStatusUpdateFrequency is 10s with 4% jitter.
The default nodeStatusReportFrequency is 5m.

The node status will be updated either there's a change or it has been longer than nodeStatusReportFrequency since the previous update.
nodeStatusReportFrequency

node, changed := kl.updateNode(ctx, originalNode)
shouldPatchNodeStatus := changed || kl.clock.Since(kl.lastStatusReportTime) >= kl.nodeStatusReportFrequency

The jitter of nodeStatusUpdateFrequency won't effectively make the periodic node status update to be spread evenly over the 5-minute window.
The node status update requests from thousands of nodes are still sent at almost the same time after a couple of days. It can cause the CPU usage to spike every 5 minutes.

The following is the APIServer CPU utilization on 96CPU instances. The cluster has 5k nodes that are added in a short period of time. The cluster is idle without any application workload. The utilization can go up to 64% and go down to 14% just because of these node status update requests.
Screenshot 2024-04-05 at 12 59 16 PM

What did you expect to happen?

I'd expect the node status update requests from kubelet to be spread evenly so that the apiserver won't have a minute that it has to handle the requests from most nodes every 5 minutes.

How can we reproduce it (as minimally and precisely as possible)?

Bring up a cluster and add nodes in a short period of time.

Anything else we need to know?

Ref: https://github.com/kubernetes/kubernetes/pull/105272/files

No response

Kubernetes version

Details Cutting edge 1.30 ```console $ kubectl version # paste output here ```

Cloud provider

Details EKS

OS version

Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Details

Container runtime (CRI) and version (if applicable)

Details

Related plugins (CNI, CSI, ...) and versions (if applicable)

Details

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.sig/nodeCategorizes an issue or PR as relevant to SIG Node.sig/scalabilityCategorizes an issue or PR as relevant to SIG Scalability.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions