Skip to content

fix: refactor ListPodSandboxMetrics#12594

Merged
dims merged 1 commit into
containerd:mainfrom
monogon:list-pod-sandbox-metrics
Dec 4, 2025
Merged

fix: refactor ListPodSandboxMetrics#12594
dims merged 1 commit into
containerd:mainfrom
monogon:list-pod-sandbox-metrics

Conversation

@fionera

@fionera fionera commented Nov 29, 2025

Copy link
Copy Markdown
Contributor

This PR improves #10691. As far as we can tell, this code never actually worked with e.g. Kubelet, and the original PR was merged without unit tests or even manual tests with a Prometheus server.

21:48:31.065239 cri_metrics.go:113] "Error getting CRI prometheus metric" err="inconsistent label cardinality: expected 5 label values but got 4 in []string{\"foo\", \"foo\", \"default\", \"a7a6fbfcfd90a6b5dc08e30e3e45c19aad2e5dd541cc1356c2883009a4b9b04e\"}" descriptor="Desc{fqName: \"container_fs_inodes_free\", help: \"[INTERNAL] Number of available Inodes\", constLabels: {}, variableLabels: {id,name,device}}"

Some extra scrutiny probably won't hurt, this PR only fixes the most critical issues.

@github-project-automation github-project-automation Bot moved this to Needs Triage in Pull Request Review Nov 29, 2025
@dosubot dosubot Bot added area/cri Container Runtime Interface (CRI) kind/refactor labels Nov 29, 2025
@fionera fionera force-pushed the list-pod-sandbox-metrics branch 5 times, most recently from 2051bd9 to 637abe6 Compare November 30, 2025 19:37
@fionera

fionera commented Nov 30, 2025

Copy link
Copy Markdown
Contributor Author

After some fighting with golangci-lint, this finally passes 😄

@akhilerm

akhilerm commented Dec 1, 2025

Copy link
Copy Markdown
Member

/cc @mikebrow @dims

@k8s-ci-robot k8s-ci-robot requested review from dims and mikebrow December 1, 2025 03:40

@mikebrow mikebrow left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good.. lots of refactoring... hard to tell the difference/identify critical fixes from just refactoring preferences.

Realize the human readable help text was carry over.. but figured I'd read through them while this refactoring is occurring and provide some suggested improvements.

"merged without unit tests or even manual tests with a Prometheus server"
The goal was to get results from WIP node e2e test cases and normalize the results for both containerd/crio. Noting that this PR does not add test cases... did you do a manual prometheus server run?

Cheers and thanks for the contributions.

Comment thread internal/cri/server/container_stats_list.go
Comment thread internal/cri/server/container_stats_list.go
Comment thread internal/cri/server/container_stats_list.go
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
Comment thread internal/cri/server/list_metric_descriptors.go Outdated
@fionera

fionera commented Dec 1, 2025

Copy link
Copy Markdown
Contributor Author

The goal was to get results from WIP node e2e test cases and normalize the results for both containerd/crio. Noting that this PR does not add test cases... did you do a manual prometheus server run?

Ah good to know. I just saw the previous commit and was fairly confused because when using with the CRI-Metrics FeatureGate, you only get errors in kubelet. If there are e2e tests planned, then ignore the "rant" 😄 Yes I actually deployed this commit and validated the metrics are correct.

@fionera fionera force-pushed the list-pod-sandbox-metrics branch from e7c20fe to 3fe38f5 Compare December 1, 2025 18:23

@mikebrow mikebrow left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The original implementation provided a lot of unfilled or wrong
filled metrics. This tries to do better by only setting things I
am fairly certain are correct.

Signed-off-by: Tim Windelschmidt <tim@monogon.tech>
Co-authored-by: Mike Brown <brownwm@us.ibm.com>
@fionera fionera force-pushed the list-pod-sandbox-metrics branch from 3fe38f5 to 3981541 Compare December 3, 2025 08:33
@fionera

fionera commented Dec 3, 2025

Copy link
Copy Markdown
Contributor Author

As I understand it, that failure is unrelated


if s.Blkio != nil {
// Process blkio device usage stats
for _, entry := range s.Blkio.IoServiceBytesRecursive {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine this loop and https://github.com/containerd/containerd/pull/12594/files#diff-3cce73d8ebad8a6376426b5a36524f71baafb5ab366dd68448d77687a681f092R729 so that we dont have to iterate twice over IoServiceBytesRecursive

@mikebrow

mikebrow commented Dec 4, 2025

Copy link
Copy Markdown
Member

As I understand it, that failure is unrelated

nod the failure was a flake.. re-ran the failed test and it's successful now

@dims dims left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation Bot moved this from Needs Triage to Review In Progress in Pull Request Review Dec 4, 2025
@dims dims added this pull request to the merge queue Dec 4, 2025
Merged via the queue into containerd:main with commit 08b02ea Dec 4, 2025
91 of 92 checks passed
@github-project-automation github-project-automation Bot moved this from Review In Progress to Done in Pull Request Review Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cri Container Runtime Interface (CRI) kind/refactor size/XXL

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants