fix: refactor ListPodSandboxMetrics#12594
Conversation
2051bd9 to
637abe6
Compare
|
After some fighting with golangci-lint, this finally passes 😄 |
mikebrow
left a comment
There was a problem hiding this comment.
looking good.. lots of refactoring... hard to tell the difference/identify critical fixes from just refactoring preferences.
Realize the human readable help text was carry over.. but figured I'd read through them while this refactoring is occurring and provide some suggested improvements.
"merged without unit tests or even manual tests with a Prometheus server"
The goal was to get results from WIP node e2e test cases and normalize the results for both containerd/crio. Noting that this PR does not add test cases... did you do a manual prometheus server run?
Cheers and thanks for the contributions.
Ah good to know. I just saw the previous commit and was fairly confused because when using with the CRI-Metrics FeatureGate, you only get errors in kubelet. If there are e2e tests planned, then ignore the "rant" 😄 Yes I actually deployed this commit and validated the metrics are correct. |
e7c20fe to
3fe38f5
Compare
The original implementation provided a lot of unfilled or wrong filled metrics. This tries to do better by only setting things I am fairly certain are correct. Signed-off-by: Tim Windelschmidt <tim@monogon.tech> Co-authored-by: Mike Brown <brownwm@us.ibm.com>
3fe38f5 to
3981541
Compare
|
As I understand it, that failure is unrelated |
|
|
||
| if s.Blkio != nil { | ||
| // Process blkio device usage stats | ||
| for _, entry := range s.Blkio.IoServiceBytesRecursive { |
There was a problem hiding this comment.
Can we combine this loop and https://github.com/containerd/containerd/pull/12594/files#diff-3cce73d8ebad8a6376426b5a36524f71baafb5ab366dd68448d77687a681f092R729 so that we dont have to iterate twice over IoServiceBytesRecursive
nod the failure was a flake.. re-ran the failed test and it's successful now |
This PR improves #10691. As far as we can tell, this code never actually worked with e.g. Kubelet, and the original PR was merged without unit tests or even manual tests with a Prometheus server.
Some extra scrutiny probably won't hurt, this PR only fixes the most critical issues.