Skip to content

add randomness to nodeStatusReportFrequency for kubelet#128394

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
mengqiy:spreadkubeletlaod
Nov 6, 2024
Merged

add randomness to nodeStatusReportFrequency for kubelet#128394
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
mengqiy:spreadkubeletlaod

Conversation

@mengqiy

@mengqiy mengqiy commented Oct 28, 2024

Copy link
Copy Markdown
Member

Adding one time [-50% , +50%] randomness to nodeStatusReportFrequency after initial node status update.
It helps spread the load from kubelet evenly

What type of PR is this?

/kind feature

What this PR does / why we need it:

The node status update traffic from kubelet can be almost synchronized in some scenarios and caused high CPU spikes. e.g. #124202

Which issue(s) this PR fixes:

Fixes #124202

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Original release note (removed since this was rolled back

Add a one-time random duration of up to 50% of kubelet's nodeStatusReportFrequency to help spread the node status update load evenly over time.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 28, 2024
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 28, 2024
@mengqiy mengqiy marked this pull request as draft October 28, 2024 19:29
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 28, 2024
@mengqiy mengqiy changed the title add randomness to nodeStatusReportFrequency [DO-NOT-MERGE] add randomness to nodeStatusReportFrequency Oct 28, 2024
@mengqiy mengqiy marked this pull request as ready for review October 28, 2024 19:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2024
@mengqiy mengqiy changed the title [DO-NOT-MERGE] add randomness to nodeStatusReportFrequency add randomness to nodeStatusReportFrequency Oct 28, 2024
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 28, 2024
@dims

dims commented Oct 28, 2024

Copy link
Copy Markdown
Member

/assign @mrunalp @SergeyKanzhelev
/milestone v1.32

@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Oct 28, 2024
@mengqiy mengqiy changed the title add randomness to nodeStatusReportFrequency add randomness to nodeStatusReportFrequency for kubelet Oct 28, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 5, 2024
@SergeyKanzhelev

Copy link
Copy Markdown
Member

unit test failure is new and seems to be introduced by this PR

@SergeyKanzhelev

Copy link
Copy Markdown
Member

/remove-approve

to prevent accidental merge

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024
@aojea

aojea commented Nov 6, 2024

Copy link
Copy Markdown
Member

unit test failure is new and seems to be introduced by this PR

can be related? I think something related to sleep hooks merged ... the code here does not seem to interact with the pod lifecycle, or does it?

@SergeyKanzhelev

Copy link
Copy Markdown
Member

unit test failure is new and seems to be introduced by this PR

can be related? I think something related to sleep hooks merged ... the code here does not seem to interact with the pod lifecycle, or does it?

yes, my mistake

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Nov 6, 2024
@mengqiy

mengqiy commented Nov 6, 2024

Copy link
Copy Markdown
Member Author

@aojea @SergeyKanzhelev Updated and PTAL

@aojea

aojea commented Nov 6, 2024

Copy link
Copy Markdown
Member

/lgtm
/hold cancel

Thank you very much

@SergeyKanzhelev for final approval

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 6, 2024
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 6, 2024
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: ffe10bf30545a14465fcf5e18d864016a1779ad5

@aojea

aojea commented Nov 6, 2024

Copy link
Copy Markdown
Member

unit test failure is new and seems to be introduced by this PR

@SergeyKanzhelev on additional note, it seems something got merged that is impacting this test https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-e2e-kind-alpha-beta-features/1853869116201373696

@SergeyKanzhelev

Copy link
Copy Markdown
Member

/approve

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mengqiy, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit 198ec57 into kubernetes:master Nov 6, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Nov 6, 2024
@mengqiy mengqiy deleted the spreadkubeletlaod branch November 6, 2024 20:19
@liggitt

liggitt commented Nov 6, 2024

Copy link
Copy Markdown
Member

this made TestUpdateNodeStatusWithLease very flaky

15s: 68 runs so far, 41 failures (60.29%), 10 active

rollback in #128629

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Development

Successfully merging this pull request may close these issues.

kubelet periodic node status update should not be aligned

7 participants