kubelet: improve CRI stats for resource metrics and testing by dims · Pull Request #135604 · kubernetes/kubernetes

dims · 2025-12-05T04:00:09Z

properly support the resource metrics endpoint when PodAndContainerStatsFromCRI is enabled and fix the related e2e tests.

Stats Provider:

add container-level CPU and memory stats to ListPodCPUAndMemoryStats so the resource metrics endpoint has complete data
add aggregatePodSwapStats to compute pod-level swap from container stats (CRI doesn't provide pod-level swap directly)
add missing memory stats fields: AvailableBytes, PageFaults, and MajorPageFaults
add platform-specific implementations for Linux and Windows

Tests:

skip cAdvisor metrics test when PodAndContainerStatsFromCRI is enabled (cAdvisor metrics aren't available in that mode)
fix expected metrics in ResourceMetricsAPI test
node_swap_usage_bytes is only available with cAdvisor (need to verify!)
Add dumpResourceMetricsForPods helper to log actual metric values when tests fail, making debugging easier

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2025-12-05T04:00:12Z

Please note that we're already in Test Freeze for the release-1.35 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.35.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Dec 5 03:34:55 UTC 2025.

k8s-ci-robot · 2025-12-05T04:00:30Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/kubelet/stats/OWNERS~~ [dims]
~~test/e2e_node/OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

dims · 2025-12-05T04:00:58Z

xref: containerd/containerd#12629

dims · 2025-12-05T04:01:26Z

(some testing done in containerd/containerd#12620 using branch https://github.com/dims/kubernetes/tree/add-logs-to-kubelet-metrics)

dims · 2025-12-09T22:51:06Z

/assign @SergeyKanzhelev @mrunalp

dims · 2025-12-17T14:42:06Z

/assign @haircommander @mrunalp

properly support the resource metrics endpoint when `PodAndContainerStatsFromCRI` is enabled and fix the related e2e tests. Stats Provider: - add container-level CPU and memory stats to `ListPodCPUAndMemoryStats` so the resource metrics endpoint has complete data - add `aggregatePodSwapStats` to compute pod-level swap from container stats (CRI doesn't provide pod-level swap directly) - add missing memory stats fields: `AvailableBytes`, `PageFaults`, and `MajorPageFaults` - add platform-specific implementations for Linux and Windows Tests: - skip cAdvisor metrics test when `PodAndContainerStatsFromCRI` is enabled (cAdvisor metrics aren't available in that mode) - fix expected metrics in `ResourceMetricsAPI` test - `node_swap_usage_bytes` is only available with cAdvisor (need to verify!) - Add `dumpResourceMetricsForPods` helper to log actual metric values when tests fail, making debugging easier Signed-off-by: Davanum Srinivas <davanum@gmail.com>

haircommander · 2025-12-17T16:17:28Z

eventually we could consider extending CRI to have CRI impl aggregate swap for pod if that's a value we want to rely on

/lgtm

k8s-ci-robot · 2025-12-17T16:17:35Z

LGTM label has been added.

Details

Git tree hash: 4b2915099db1e3c7b24a07bab557b86c9d35937a

haircommander · 2025-12-17T18:27:46Z

/triage accepted
/priority important-soon

k8s-triage-robot · 2025-12-17T20:50:04Z

Retesting failed PR that otherwise appears ready for merge.

Please help us fix flaky tests by following our Flaky Tests Guide.

Prevent this bot from retesting with /lgtm cancel or /hold.
For this robot's configuration, see here.

/retest-required

k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Dec 5, 2025

k8s-ci-robot requested review from Random-Liu and mrunalp December 5, 2025 04:00

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 5, 2025

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Dec 5, 2025

github-project-automation Bot added this to SIG Node: code and documentation PRs and SIG Node CI/Test Board Dec 5, 2025

k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 5, 2025

github-project-automation Bot moved this to Triage in SIG Node CI/Test Board Dec 5, 2025

github-project-automation Bot moved this to Triage in SIG Node: code and documentation PRs Dec 5, 2025

dims mentioned this pull request Dec 5, 2025

cri: Add background stats collector to calculate UsageNanoCores containerd/containerd#12629

Merged

dims changed the title ~~kubelet: improve CRI stats for resource metrics and testing~~ [WIP] kubelet: improve CRI stats for resource metrics and testing Dec 5, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 5, 2025

dims force-pushed the better-support-for-cri-stats branch from 42f4108 to e68d29f Compare December 5, 2025 12:33

bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Dec 5, 2025

dims changed the title ~~[WIP] kubelet: improve CRI stats for resource metrics and testing~~ kubelet: improve CRI stats for resource metrics and testing Dec 6, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 6, 2025

k8s-ci-robot assigned mrunalp and SergeyKanzhelev Dec 9, 2025

haircommander reviewed Dec 10, 2025

View reviewed changes

Comment thread pkg/kubelet/stats/cri_stats_provider.go

k8s-ci-robot assigned haircommander Dec 17, 2025

dims force-pushed the better-support-for-cri-stats branch from e68d29f to 914ddf4 Compare December 17, 2025 15:52

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 17, 2025

haircommander moved this from Triage to PRs - Needs Approver in SIG Node CI/Test Board Dec 17, 2025

haircommander moved this from PRs - Needs Approver to Archive-it in SIG Node CI/Test Board Dec 17, 2025

haircommander moved this from Work in progress to Needs Approver in SIG Node: code and documentation PRs Dec 17, 2025

k8s-ci-robot merged commit 246fa57 into kubernetes:master Dec 18, 2025
15 checks passed

k8s-ci-robot added this to the v1.36 milestone Dec 18, 2025

github-project-automation Bot moved this from Needs Approver to Done in SIG Node: code and documentation PRs Dec 18, 2025

github-project-automation Bot moved this from Archive-it to Done in SIG Node CI/Test Board Dec 18, 2025

dims mentioned this pull request Dec 18, 2025

Ensure ListMetricDescriptors gets tested with latest k/k containerd/containerd#12704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet: improve CRI stats for resource metrics and testing#135604

kubelet: improve CRI stats for resource metrics and testing#135604
k8s-ci-robot merged 1 commit into
kubernetes:masterfrom
dims:better-support-for-cri-stats

dims commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

dims commented Dec 5, 2025

Uh oh!

dims commented Dec 5, 2025 •

edited

Loading

Uh oh!

dims commented Dec 9, 2025

Uh oh!

Uh oh!

dims commented Dec 17, 2025

Uh oh!

haircommander commented Dec 17, 2025

Uh oh!

k8s-ci-robot commented Dec 17, 2025

Uh oh!

haircommander commented Dec 17, 2025

Uh oh!

k8s-triage-robot commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dims commented Dec 5, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

dims commented Dec 5, 2025

Uh oh!

dims commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dims commented Dec 9, 2025

Uh oh!

Uh oh!

dims commented Dec 17, 2025

Uh oh!

haircommander commented Dec 17, 2025

Uh oh!

k8s-ci-robot commented Dec 17, 2025

Uh oh!

haircommander commented Dec 17, 2025

Uh oh!

k8s-triage-robot commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dims commented Dec 5, 2025 •

edited

Loading