Skip to content

Resync feature metrics documentation#44114

Merged
fristonio merged 5 commits intocilium:mainfrom
ajmmm:pr/datapath-mode-auto-4-missed-doc
Feb 24, 2026
Merged

Resync feature metrics documentation#44114
fristonio merged 5 commits intocilium:mainfrom
ajmmm:pr/datapath-mode-auto-4-missed-doc

Conversation

@ajmmm
Copy link
Copy Markdown
Member

@ajmmm ajmmm commented Feb 2, 2026

In order to facilitate automatic datapath mode selection, the configured datapath became separate to the operational datapath mode. This was reflected in metrics but a documentation update was missed.

This led to the discovery that the CI workflow to detect out-of-sync docs was broken.

This led to the discovery that version information was leaking into the auto-generated documentation.

This PR:

  1. Expands pkg/metrics with a new environment variable that removes host-info from metrics when set (specifically k8s and kernel version)

  2. Updates the CI smoke test workflow to set the new environment variable, and to properly fail if docs are out of sync

  3. Resynchronises feature metrics docs.

Updates to feature metrics documentation.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 2, 2026
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 2, 2026

/test

@ajmmm ajmmm marked this pull request as ready for review February 2, 2026 10:26
@ajmmm ajmmm requested review from a team as code owners February 2, 2026 10:26
@ajmmm ajmmm requested review from fristonio and qmonnet February 2, 2026 10:26
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This should have been caught in CI, for PR #43062 🙀.

Looking at the relevant workflow in .github/workflows/tests-smoke.yml, I see that at step Check prometheus feature metrics documentation, we check for differences with if ! git diff. This is incorrect, by default git diff does not return 1 if there is a diff. It needs to be if ! git diff --exit-code, or, to be consistent with other workflows in the repo, if ! git diff --quiet.

Can you please fix the workflow, and run the feature-helm-generator to update feature-metrics-agent.txt (look at how the workflow does it)? It looks like other updates are missing, too.

@qmonnet qmonnet added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/bug/CI This is a bug in the testing code. release-note/misc This PR makes changes that have no direct user impact. labels Feb 2, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 2, 2026
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 2, 2026

Thanks! This should have been caught in CI, for PR #43062 🙀.

Hah - oops :-) I didn't know CI checked it!

Can you please fix the workflow, and run the feature-helm-generator to update feature-metrics-agent.txt (look at how the workflow does it)? It looks like other updates are missing, too.

Ack, yep

@ajmmm ajmmm force-pushed the pr/datapath-mode-auto-4-missed-doc branch from 16a5451 to a530ba5 Compare February 3, 2026 18:02
@ajmmm ajmmm requested review from a team as code owners February 3, 2026 18:02
@ajmmm ajmmm requested review from chancez and nbusseneau February 3, 2026 18:02
@ajmmm ajmmm changed the title docs: update datapath_config metrics Resync feature metrics documentation Feb 3, 2026
@ajmmm ajmmm requested a review from qmonnet February 3, 2026 18:06
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 3, 2026

@qmonnet

Been down a bit of a rabbit hole with this one, but got there. :-)

I've done a test run direct from my dev branch, which correctly detects failures now. See test PR #44156 and the correctly failed test run.

Also note I've rebranded the PR summary to make it generic.

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 3, 2026

I've just spotted that I've missed something - moving back to draft to fix.

@ajmmm ajmmm marked this pull request as draft February 3, 2026 18:19
@ajmmm ajmmm force-pushed the pr/datapath-mode-auto-4-missed-doc branch from a530ba5 to d986a5f Compare February 4, 2026 12:28
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 4, 2026

/test

@ajmmm ajmmm marked this pull request as ready for review February 4, 2026 12:42
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 4, 2026

@qmonnet fixed :)

Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't expect the fix to be that involved, I thought we just missed the --quiet option. Thank you so much for looking into this and making it work!

Looks good to me now

Copy link
Copy Markdown
Member

@fristonio fristonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit.

@ajmmm ajmmm force-pushed the pr/datapath-mode-auto-4-missed-doc branch from d986a5f to 5d80231 Compare February 5, 2026 17:46
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 5, 2026

/test

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 6, 2026

/ci-l3-l4

@ajmmm ajmmm force-pushed the pr/datapath-mode-auto-4-missed-doc branch from 5d80231 to b8d0928 Compare February 12, 2026 11:34
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 12, 2026

/test

Cilium looks if CILIUM_FEATURE_METRICS_WITH_DEFAULTS is set in the
local environment to determine if it should include default metric
values, which is useful for testing and generating documentation. The
was introduced in commit [0]  and is not intended to be used in
production scenarios.

The metrics package also includes host-specific version information,
such as Kubernetes and Linux kernel version information. This info
should not appear in published documents, and will cuses issues if we
want to automatically detect out-of-sync documents in CI.

This commit introduces a similar environment-based mechanism to hide
host-specific versions such as that noted above. If the new variable
CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is set in the local env,
both the agent and the operator will disable version metrics.

Rationale for a new variable was to avoid overloading of the existing
variable, while following the existing precedent for modifying the
behaviour of metrics for development purposes. This also means existing
test mechanisms are unaffected.

The following commit will modify the relevant metrics to be disabled
if version metrics are to be excluded.

[0] 18fcd45 ("pkg/metrics: do not enable all metric defaults")

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit builds on the previous commit by actively disabling the
control-plane Kubernetes version and data-path Linux kernel version
feature metrics when CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is
set in the local environment.

This commit also updates the featuresParams and mockFeaturesParams
interfaces to include a helper function to get the Linux kernel version
as per the existing interface for the Kubernetes version.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit updates existing logic that tests the handling of Kubernetes
version in the operator, and also adds the same tests for the Linux
kernel version in the agent.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fix the smoke test workflow to fail correctly if feature metric docs
are out of sync.

This has required the workflow to set the new environment variable
CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION when installing, to avoid CI
runner version information being added to the generated documentation
that is then used to compare diffs with. Otherwise, this will always
cause failures.

The variable is passed in extraEnv[3] to avoid conflicts with other
environment variables passed in actions/set-env-variables.

Fixes: eabba03 ("docs: document feature metrics and add rst generator")
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Re-generate metrics documentation to synchronise with the current metrics
expressed by Cilium, which look to have been out of date for a while.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
@ajmmm ajmmm force-pushed the pr/datapath-mode-auto-4-missed-doc branch from b8d0928 to e13d167 Compare February 20, 2026 17:11
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 20, 2026

/test

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 24, 2026

/ci-gateway-api

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 24, 2026

/ci-ipsec-e2e

@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 24, 2026

/ci-l7

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 24, 2026
@fristonio fristonio added this pull request to the merge queue Feb 24, 2026
Merged via the queue into cilium:main with commit a273c82 Feb 24, 2026
77 checks passed
@ajmmm ajmmm deleted the pr/datapath-mode-auto-4-missed-doc branch February 24, 2026 17:28
Comment on lines +275 to +276
if ! git diff --quiet; then
echo "Feature metrics documentation out-of-sync"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this now quiet?

In order to make this actionable for the contributor who faces this failure, ideally this failure condition should show (1) what the diff is so they know what needs to be updated and ideally (2) a specific command the contributor can run to fix this. I can see that the message echoed to the console was already in place here, but now the failure is pretty hard to understand because there's no context in the test failure to demonstrate what's failing or how you're supposed to fix it.

Example:
https://github.com/cilium/cilium/actions/runs/22370657414/job/64756117898?pr=44459#step:17:47

@ajmmm would you mind at least reverting the --quiet and if there's a simple command to fix it, suggest that to the contributor? Similar to how we do for cases like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. kind/bug/CI This is a bug in the testing code. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants