Resync feature metrics documentation#44114
Conversation
|
/test |
There was a problem hiding this comment.
Thanks! This should have been caught in CI, for PR #43062 🙀.
Looking at the relevant workflow in .github/workflows/tests-smoke.yml, I see that at step Check prometheus feature metrics documentation, we check for differences with if ! git diff. This is incorrect, by default git diff does not return 1 if there is a diff. It needs to be if ! git diff --exit-code, or, to be consistent with other workflows in the repo, if ! git diff --quiet.
Can you please fix the workflow, and run the feature-helm-generator to update feature-metrics-agent.txt (look at how the workflow does it)? It looks like other updates are missing, too.
Hah - oops :-) I didn't know CI checked it!
Ack, yep |
16a5451 to
a530ba5
Compare
|
Been down a bit of a rabbit hole with this one, but got there. :-) I've done a test run direct from my dev branch, which correctly detects failures now. See test PR #44156 and the correctly failed test run. Also note I've rebranded the PR summary to make it generic. |
|
I've just spotted that I've missed something - moving back to draft to fix. |
a530ba5 to
d986a5f
Compare
|
/test |
|
@qmonnet fixed :) |
qmonnet
left a comment
There was a problem hiding this comment.
I didn't expect the fix to be that involved, I thought we just missed the --quiet option. Thank you so much for looking into this and making it work!
Looks good to me now
d986a5f to
5d80231
Compare
|
/test |
|
/ci-l3-l4 |
5d80231 to
b8d0928
Compare
|
/test |
Cilium looks if CILIUM_FEATURE_METRICS_WITH_DEFAULTS is set in the local environment to determine if it should include default metric values, which is useful for testing and generating documentation. The was introduced in commit [0] and is not intended to be used in production scenarios. The metrics package also includes host-specific version information, such as Kubernetes and Linux kernel version information. This info should not appear in published documents, and will cuses issues if we want to automatically detect out-of-sync documents in CI. This commit introduces a similar environment-based mechanism to hide host-specific versions such as that noted above. If the new variable CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is set in the local env, both the agent and the operator will disable version metrics. Rationale for a new variable was to avoid overloading of the existing variable, while following the existing precedent for modifying the behaviour of metrics for development purposes. This also means existing test mechanisms are unaffected. The following commit will modify the relevant metrics to be disabled if version metrics are to be excluded. [0] 18fcd45 ("pkg/metrics: do not enable all metric defaults") Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit builds on the previous commit by actively disabling the control-plane Kubernetes version and data-path Linux kernel version feature metrics when CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION is set in the local environment. This commit also updates the featuresParams and mockFeaturesParams interfaces to include a helper function to get the Linux kernel version as per the existing interface for the Kubernetes version. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
This commit updates existing logic that tests the handling of Kubernetes version in the operator, and also adds the same tests for the Linux kernel version in the agent. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Fix the smoke test workflow to fail correctly if feature metric docs are out of sync. This has required the workflow to set the new environment variable CILIUM_FEATURE_METRICS_WITHOUT_ENV_VERSION when installing, to avoid CI runner version information being added to the generated documentation that is then used to compare diffs with. Otherwise, this will always cause failures. The variable is passed in extraEnv[3] to avoid conflicts with other environment variables passed in actions/set-env-variables. Fixes: eabba03 ("docs: document feature metrics and add rst generator") Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Re-generate metrics documentation to synchronise with the current metrics expressed by Cilium, which look to have been out of date for a while. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
b8d0928 to
e13d167
Compare
|
/test |
|
/ci-gateway-api |
|
/ci-ipsec-e2e |
|
/ci-l7 |
| if ! git diff --quiet; then | ||
| echo "Feature metrics documentation out-of-sync" |
There was a problem hiding this comment.
Why is this now quiet?
In order to make this actionable for the contributor who faces this failure, ideally this failure condition should show (1) what the diff is so they know what needs to be updated and ideally (2) a specific command the contributor can run to fix this. I can see that the message echoed to the console was already in place here, but now the failure is pretty hard to understand because there's no context in the test failure to demonstrate what's failing or how you're supposed to fix it.
Example:
https://github.com/cilium/cilium/actions/runs/22370657414/job/64756117898?pr=44459#step:17:47
@ajmmm would you mind at least reverting the --quiet and if there's a simple command to fix it, suggest that to the contributor? Similar to how we do for cases like this.
In order to facilitate automatic datapath mode selection, the configured datapath became separate to the operational datapath mode. This was reflected in metrics but a documentation update was missed.
This led to the discovery that the CI workflow to detect out-of-sync docs was broken.
This led to the discovery that version information was leaking into the auto-generated documentation.
This PR:
Expands
pkg/metricswith a new environment variable that removes host-info from metrics when set (specifically k8s and kernel version)Updates the CI smoke test workflow to set the new environment variable, and to properly fail if docs are out of sync
Resynchronises feature metrics docs.