fix: refactor ListPodSandboxMetrics by fionera · Pull Request #12594 · containerd/containerd

fionera · 2025-11-29T07:08:21Z

This PR improves #10691. As far as we can tell, this code never actually worked with e.g. Kubelet, and the original PR was merged without unit tests or even manual tests with a Prometheus server.

21:48:31.065239 cri_metrics.go:113] "Error getting CRI prometheus metric" err="inconsistent label cardinality: expected 5 label values but got 4 in []string{\"foo\", \"foo\", \"default\", \"a7a6fbfcfd90a6b5dc08e30e3e45c19aad2e5dd541cc1356c2883009a4b9b04e\"}" descriptor="Desc{fqName: \"container_fs_inodes_free\", help: \"[INTERNAL] Number of available Inodes\", constLabels: {}, variableLabels: {id,name,device}}"

Some extra scrutiny probably won't hurt, this PR only fixes the most critical issues.

fionera · 2025-11-30T20:14:54Z

After some fighting with golangci-lint, this finally passes 😄

akhilerm · 2025-12-01T03:40:00Z

/cc @mikebrow @dims

mikebrow

looking good.. lots of refactoring... hard to tell the difference/identify critical fixes from just refactoring preferences.

Realize the human readable help text was carry over.. but figured I'd read through them while this refactoring is occurring and provide some suggested improvements.

"merged without unit tests or even manual tests with a Prometheus server"
The goal was to get results from WIP node e2e test cases and normalize the results for both containerd/crio. Noting that this PR does not add test cases... did you do a manual prometheus server run?

Cheers and thanks for the contributions.

fionera · 2025-12-01T18:22:21Z

The goal was to get results from WIP node e2e test cases and normalize the results for both containerd/crio. Noting that this PR does not add test cases... did you do a manual prometheus server run?

Ah good to know. I just saw the previous commit and was fairly confused because when using with the CRI-Metrics FeatureGate, you only get errors in kubelet. If there are e2e tests planned, then ignore the "rant" 😄 Yes I actually deployed this commit and validated the metrics are correct.

mikebrow

LGTM

The original implementation provided a lot of unfilled or wrong filled metrics. This tries to do better by only setting things I am fairly certain are correct. Signed-off-by: Tim Windelschmidt <tim@monogon.tech> Co-authored-by: Mike Brown <brownwm@us.ibm.com>

fionera · 2025-12-03T09:09:22Z

As I understand it, that failure is unrelated

akhilerm · 2025-12-04T03:23:26Z

+
 		if s.Blkio != nil {
 			// Process blkio device usage stats
 			for _, entry := range s.Blkio.IoServiceBytesRecursive {


Can we combine this loop and https://github.com/containerd/containerd/pull/12594/files#diff-3cce73d8ebad8a6376426b5a36524f71baafb5ab366dd68448d77687a681f092R729 so that we dont have to iterate twice over IoServiceBytesRecursive

mikebrow · 2025-12-04T14:40:58Z

As I understand it, that failure is unrelated

nod the failure was a flake.. re-ran the failed test and it's successful now

dims

LGTM

github-project-automation Bot added this to Pull Request Review Nov 29, 2025

github-project-automation Bot moved this to Needs Triage in Pull Request Review Nov 29, 2025

k8s-ci-robot added the size/XXL label Nov 29, 2025

dosubot Bot added area/cri Container Runtime Interface (CRI) kind/refactor labels Nov 29, 2025

fionera force-pushed the list-pod-sandbox-metrics branch 5 times, most recently from 2051bd9 to 637abe6 Compare November 30, 2025 19:37

k8s-ci-robot requested review from dims and mikebrow December 1, 2025 03:40

mikebrow reviewed Dec 1, 2025

View reviewed changes

fionera force-pushed the list-pod-sandbox-metrics branch from e7c20fe to 3fe38f5 Compare December 1, 2025 18:23

mikebrow approved these changes Dec 2, 2025

View reviewed changes

fionera force-pushed the list-pod-sandbox-metrics branch from 3fe38f5 to 3981541 Compare December 3, 2025 08:33

dims mentioned this pull request Dec 4, 2025

[WIP] Re-enable ResourceMetrics for node-e2e tests #12620

Closed

akhilerm reviewed Dec 4, 2025

View reviewed changes

dims approved these changes Dec 4, 2025

View reviewed changes

github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review Dec 4, 2025

dims added this pull request to the merge queue Dec 4, 2025

Merged via the queue into containerd:main with commit 08b02ea Dec 4, 2025
91 of 92 checks passed

github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Dec 4, 2025

Conversation

fionera commented Nov 29, 2025

Uh oh!

fionera commented Nov 30, 2025

Uh oh!

akhilerm commented Dec 1, 2025

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fionera commented Dec 1, 2025

Uh oh!

mikebrow left a comment

Choose a reason for hiding this comment

Uh oh!

fionera commented Dec 3, 2025

Uh oh!

akhilerm Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

mikebrow commented Dec 4, 2025

Uh oh!

dims left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants