Skip to content

Fix the bug where cri-o doesn't emit any metrics when all is set.#9719

Merged
openshift-merge-bot[bot] merged 1 commit into
cri-o:mainfrom
bitoku:fix-all-metrics
Feb 24, 2026
Merged

Fix the bug where cri-o doesn't emit any metrics when all is set.#9719
openshift-merge-bot[bot] merged 1 commit into
cri-o:mainfrom
bitoku:fix-all-metrics

Conversation

@bitoku

@bitoku bitoku commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

This PR fixes a bug where CRI-O didn't return all metrics when "all" is set.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a bug where CRI-O didn't return all metrics when "all" is set.

Summary by CodeRabbit

  • Bug Fixes

    • More reliable handling of the special "all" metric setting to prevent unexpected gaps.
    • Fixed inconsistent selection so pod- and container-level network/metric collection align.
  • Changes

    • Metric selection is unified across the system via a stable enabled-metrics accessor.
    • A canonical list of available metrics was added; descriptor output and collected metrics may change.

@bitoku bitoku requested a review from mrunalp as a code owner January 20, 2026 11:22
@openshift-ci openshift-ci Bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Jan 20, 2026
@openshift-ci openshift-ci Bot requested review from klihub and littlejawa January 20, 2026 11:22
@coderabbitai

coderabbitai Bot commented Jan 20, 2026

Copy link
Copy Markdown

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced uses of IncludedPodMetrics with StatsConfig.EnabledPodMetrics(), exported AvailableMetrics, and removed the special-case early return for the "all" case in PopulateMetricDescriptors, changing descriptor-map population and allowing nil entries for unknown keys.

Changes

Cohort / File(s) Summary
Configuration & Metrics Accessor
pkg/config/config.go
Add exported AvailableMetrics, internal includedPodMetrics, and StatsConfig.EnabledPodMetrics() accessor; update StatsConfig.Validate() to expand single-element ["all"]AvailableMetrics, reject mixed "all" entries, and sync internal state.
Metrics Descriptor Population
internal/lib/statsserver/metrics.go
Remove early-return special case for AllMetrics in PopulateMetricDescriptors; always initialize descriptorsMap with always-on metrics and iterate includedKeys to assign entries (may produce nil for unknown keys).
Server call sites
server/metric_descriptors_list.go, internal/lib/statsserver/stats_server_linux.go
Switch callers to use ss.Config().EnabledPodMetrics() (instead of IncludedPodMetrics) when selecting pod/container metrics and when listing metric descriptors.
CLI flag handling (nolint annotations)
internal/criocli/criocli.go
Add // nolint:staticcheck annotations around processing/initialization of the included-pod-metrics flag; no behavioral change.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant Server as "Metric Server"
    participant Config as "StatsConfig"
    participant Pop as "PopulateMetricDescriptors"

    Client->>Server: Request metric descriptors
    Server->>Config: call EnabledPodMetrics()
    Note over Config: return internal includedPodMetrics\n(single "all" → AvailableMetrics)
    Config-->>Server: includedKeys

    Server->>Pop: PopulateMetricDescriptors(includedKeys)
    Note over Pop: init descriptorsMap with always-on metrics\niterate includedKeys to set descriptorsMap[k] (no early return)
    Pop-->>Server: descriptorsMap

    Server->>Server: flatten descriptorsMap to slice
    Server-->>Client: MetricDescriptors[]
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • mrunalp
  • klihub

Poem

🐇 I hopped through fields of config and code,
Moved "all" to lists down the tidy road.
Descriptors now bloom where once a shortcut lay,
I checked each key and cleared the way.
Cheers — a rabbit, pleased with the day.

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main bug fix: enabling metrics emission when 'all' is set, which aligns with the core changes across multiple files that implement EnabledPodMetrics() and fix the metric selection logic.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/config/config.go (1)

2221-2238: Exclude "all" from the expanded internal list.
includedPodMetrics is documented as not containing "all", but AvailableMetrics includes it. This can surface as “unknown metric: all” warnings or break descriptor population if "all" is not handled downstream.

🛠️ Suggested fix
 if len(c.IncludedPodMetrics) == 1 && c.IncludedPodMetrics[0] == AllMetrics {
-	c.includedPodMetrics = AvailableMetrics
+	c.includedPodMetrics = make([]string, 0, len(AvailableMetrics)-1)
+	for _, m := range AvailableMetrics {
+		if m == AllMetrics {
+			continue
+		}
+		c.includedPodMetrics = append(c.includedPodMetrics, m)
+	}
 
 	return nil
 }

@codecov

codecov Bot commented Jan 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.46%. Comparing base (c5c86ec) to head (020b308).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9719      +/-   ##
==========================================
+ Coverage   64.45%   67.46%   +3.01%     
==========================================
  Files         210      210              
  Lines       29109    29116       +7     
==========================================
+ Hits        18761    19644     +883     
+ Misses       8657     7784     -873     
+ Partials     1691     1688       -3     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bitoku

bitoku commented Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

@cri-o/cri-o-maintainers PTAL

Comment thread pkg/config/config.go

func (c *StatsConfig) Validate() error {
if len(c.IncludedPodMetrics) == 1 && c.IncludedPodMetrics[0] == AllMetrics {
c.includedPodMetrics = AvailableMetrics

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we filter out AllMetrics here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, good catch!

@bitoku bitoku Jan 21, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed "all" from AvailableMetrics. It's only used for this Validate function anyway.

Comment thread pkg/config/config.go Outdated
Comment on lines 792 to 793
// Do NOT use this field directly. Use Metrics() instead.
IncludedPodMetrics []string `toml:"included_pod_metrics"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should depreciate it to make linters aware.

@saschagrunert saschagrunert left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one non-blocking nit, otherwise LGTM

/hold

Feel free to unhold if you think we should not deprecate the config field.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 21, 2026
@openshift-ci openshift-ci Bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 21, 2026
Comment thread pkg/config/config.go Outdated
return nil
}

func (c *StatsConfig) Metrics() []string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe EnabledPodMetrics or something? I wonder if Metrics is too generic, there is a whole metrics table in the config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for a better naming. Thanks!

@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 21, 2026
Comment thread pkg/config/config.go Outdated
return nil
}

func (c *StatsConfig) EnabledMetrics() []string {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel including pod is good here, cri-o has server metrics it exposes which are different than these

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It slipped my eyes. Thanks.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/config/config.go (1)

2223-2247: Use fmt.Errorf instead of errors.New() on line 2232 to align with Go coding guidelines and codebase patterns.

The validation logic correctly handles:

  1. Single "all" element → expands to AvailableMetrics
  2. "all" mixed with other metrics → returns error (clear guidance)
  3. Individual metrics → validates against allowed list

However, line 2232 must be changed from errors.New() to fmt.Errorf() to comply with the coding guidelines and match the codebase pattern (used consistently throughout config.go). When IncludedPodMetrics is empty/nil, validation passes and no metrics are collected—this is the intended default behavior.

@bitoku

bitoku commented Jan 23, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@bitoku

bitoku commented Jan 26, 2026

Copy link
Copy Markdown
Contributor Author

/unhold

@cri-o/cri-o-maintainers PTAL

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2026
@bitoku

bitoku commented Feb 3, 2026

Copy link
Copy Markdown
Contributor Author

@cri-o/cri-o-maintainers PTAL

1 similar comment
@bitoku

bitoku commented Feb 6, 2026

Copy link
Copy Markdown
Contributor Author

@cri-o/cri-o-maintainers PTAL

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 9, 2026
@bitoku

bitoku commented Feb 9, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2026
Signed-off-by: Ayato Tokubi <atokubi@redhat.com>
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 16, 2026
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/config/config.go (1)

2271-2275: Shared slice reference to package-level AvailableMetrics — consider a defensive copy.

c.includedPodMetrics = AvailableMetrics makes the internal field point to the same backing array as the package-level global. If any consumer ever mutates the returned slice (e.g., via append or element assignment), it would silently corrupt AvailableMetrics for all subsequent callers.

Suggested fix
 	if len(c.IncludedPodMetrics) == 1 && c.IncludedPodMetrics[0] == AllMetrics {
-		c.includedPodMetrics = AvailableMetrics
+		c.includedPodMetrics = slices.Clone(AvailableMetrics)
 
 		return nil
 	}

@bitoku

bitoku commented Feb 18, 2026

Copy link
Copy Markdown
Contributor Author

sorry rebase removed lgtm, PTAL again @cri-o/cri-o-maintainers

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 18, 2026
@openshift-ci

openshift-ci Bot commented Feb 18, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bitoku, saschagrunert

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bitoku

bitoku commented Feb 19, 2026

Copy link
Copy Markdown
Contributor Author

/retest

3 similar comments
@bitoku

bitoku commented Feb 23, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@bitoku

bitoku commented Feb 23, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@bitoku

bitoku commented Feb 24, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@openshift-merge-bot openshift-merge-bot Bot merged commit 11eaffa into cri-o:main Feb 24, 2026
73 checks passed
@bitoku

bitoku commented Feb 25, 2026

Copy link
Copy Markdown
Contributor Author

/cherry-pick release-1.35

@openshift-cherrypick-robot

Copy link
Copy Markdown

@bitoku: new pull request created: #9781

Details

In response to this:

/cherry-pick release-1.35

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants