Skip to content

kubelet: improve CRI stats for resource metrics and testing#135604

Merged
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
dims:better-support-for-cri-stats
Dec 18, 2025
Merged

kubelet: improve CRI stats for resource metrics and testing#135604
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
dims:better-support-for-cri-stats

Conversation

@dims

@dims dims commented Dec 5, 2025

Copy link
Copy Markdown
Member

properly support the resource metrics endpoint when PodAndContainerStatsFromCRI is enabled and fix the related e2e tests.

Stats Provider:

  • add container-level CPU and memory stats to ListPodCPUAndMemoryStats so the resource metrics endpoint has complete data
  • add aggregatePodSwapStats to compute pod-level swap from container stats (CRI doesn't provide pod-level swap directly)
  • add missing memory stats fields: AvailableBytes, PageFaults, and MajorPageFaults
  • add platform-specific implementations for Linux and Windows

Tests:

  • skip cAdvisor metrics test when PodAndContainerStatsFromCRI is enabled (cAdvisor metrics aren't available in that mode)
  • fix expected metrics in ResourceMetricsAPI test
  • node_swap_usage_bytes is only available with cAdvisor (need to verify!)
  • Add dumpResourceMetricsForPods helper to log actual metric values when tests fail, making debugging easier

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Dec 5, 2025
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

Please note that we're already in Test Freeze for the release-1.35 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.35.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Dec 5 03:34:55 UTC 2025.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 5, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 5, 2025
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 5, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 5, 2025
@dims

dims commented Dec 5, 2025

Copy link
Copy Markdown
Member Author

xref: containerd/containerd#12629

@dims

dims commented Dec 5, 2025

Copy link
Copy Markdown
Member Author

@dims dims changed the title kubelet: improve CRI stats for resource metrics and testing [WIP] kubelet: improve CRI stats for resource metrics and testing Dec 5, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 5, 2025
@dims dims force-pushed the better-support-for-cri-stats branch from 42f4108 to e68d29f Compare December 5, 2025 12:33
@bart0sh bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Dec 5, 2025
@dims dims changed the title [WIP] kubelet: improve CRI stats for resource metrics and testing kubelet: improve CRI stats for resource metrics and testing Dec 6, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2025
@dims

dims commented Dec 9, 2025

Copy link
Copy Markdown
Member Author

/assign @SergeyKanzhelev @mrunalp

Comment thread pkg/kubelet/stats/cri_stats_provider.go
@dims

dims commented Dec 17, 2025

Copy link
Copy Markdown
Member Author

/assign @haircommander @mrunalp

properly support the resource metrics endpoint when `PodAndContainerStatsFromCRI` is enabled and fix the related e2e tests.

Stats Provider:
- add container-level CPU and memory stats to `ListPodCPUAndMemoryStats` so the resource metrics endpoint has complete data
- add `aggregatePodSwapStats` to compute pod-level swap from container stats (CRI doesn't provide pod-level swap directly)
- add missing memory stats fields: `AvailableBytes`, `PageFaults`, and `MajorPageFaults`
- add platform-specific implementations for Linux and Windows

Tests:
- skip cAdvisor metrics test when `PodAndContainerStatsFromCRI` is enabled (cAdvisor metrics aren't available in that mode)
- fix expected metrics in `ResourceMetricsAPI` test
- `node_swap_usage_bytes` is only available with cAdvisor (need to verify!)
- Add `dumpResourceMetricsForPods` helper to log actual metric values when tests fail, making debugging easier

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
@dims dims force-pushed the better-support-for-cri-stats branch from e68d29f to 914ddf4 Compare December 17, 2025 15:52
@haircommander

Copy link
Copy Markdown
Contributor

eventually we could consider extending CRI to have CRI impl aggregate swap for pod if that's a value we want to rely on

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 17, 2025
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 4b2915099db1e3c7b24a07bab557b86c9d35937a

@haircommander

Copy link
Copy Markdown
Contributor

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 17, 2025
@haircommander haircommander moved this from Triage to PRs - Needs Approver in SIG Node CI/Test Board Dec 17, 2025
@haircommander haircommander moved this from PRs - Needs Approver to Archive-it in SIG Node CI/Test Board Dec 17, 2025
@haircommander haircommander moved this from Work in progress to Needs Approver in SIG Node: code and documentation PRs Dec 17, 2025
@k8s-triage-robot

Copy link
Copy Markdown

Retesting failed PR that otherwise appears ready for merge.

Please help us fix flaky tests by following our Flaky Tests Guide.

Prevent this bot from retesting with /lgtm cancel or /hold.
For this robot's configuration, see here.

/retest-required

@k8s-ci-robot k8s-ci-robot merged commit 246fa57 into kubernetes:master Dec 18, 2025
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.36 milestone Dec 18, 2025
@github-project-automation github-project-automation Bot moved this from Needs Approver to Done in SIG Node: code and documentation PRs Dec 18, 2025
@github-project-automation github-project-automation Bot moved this from Archive-it to Done in SIG Node CI/Test Board Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Development

Successfully merging this pull request may close these issues.

7 participants