Skip to content

Auto-generate feature metrics documentation for agent and operator#44556

Closed
ajmmm wants to merge 3 commits intocilium:mainfrom
ajmmm:pr/auto-generate-feature-metrics
Closed

Auto-generate feature metrics documentation for agent and operator#44556
ajmmm wants to merge 3 commits intocilium:mainfrom
ajmmm:pr/auto-generate-feature-metrics

Conversation

@ajmmm
Copy link
Copy Markdown
Member

@ajmmm ajmmm commented Feb 27, 2026

Following #44114 CI will now fail if feature metric documentation is out of sync. However, as noted, the developer U/X for this is poor.

This PR:

  1. Introduces a new kind install target: make kind-install-cilium-metrics.
    This can be run to install Cilium with the necessary config to express default metrics. This is required to auto-generate the documentation.

  2. Introduces a new docs target: make -C Documentation update-metrics.
    This queries the installed kind cluster and auto-generates the feature metric files in Documentation/observability.

  3. Updates the tests-smoke.yaml file to better express any detected sync issues for these files, and also mentions the new build targets that can be used to re-generate them from command line.

  4. The standard make -C Documentation check is also updated to check the metrics files.

Note: the smoke test workflow is now duplicating some logic, but thought it best to leave this alone for now until these changes are accepted. I envisage a follow-up PR in due course that could split the testing out into its own workflow and refactoring to reducing duplication.


CI Test example

Screenshot below from test CI run with a commit that modifies one of the documents.

image
Introduce build targets to resynchronise feature metric documentation files for the agent and operator.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 27, 2026
@ajmmm ajmmm force-pushed the pr/auto-generate-feature-metrics branch from 510b9a1 to 3acf103 Compare February 27, 2026 16:19
Introduce a new make target that installs Cilium with the necessary
configuration and environment variables to export default metrics.

This will enable us to programmatically generate feature metrics
documentation in the next commit.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Introduce a new update-metrics build target in Documentation. When
combined with installing Cilium via `make kind-install-cilium-metrics`,
this provides a mechanism to automatically generate feature metrics
documentation for the agent and operator.

This commit also adds Documentation/check-metrics.sh which is now
called via `make -C Documentation check`.

Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Commit 677f909 ("gh: fix metric updates in smoke test workflow")
updated the IPv4 smoke test to fail if feature metrics are out of
sync. However, when hitting this in CI, it's not obvious how to
go about fixing this.

The previous two commits have provided a simplified way to:

1. Install Cilium with the correct configuration and environment
   variables to export default metrics. This is:

   `make kind-install-cilium-metrics`

2. Query the installed kind cluster for feature metrics and process
   these using the included tooling to produce the relevant docs for
   the agent and operator. This is:

   `make -C Documentation update-metrics`

Therefore, this commit updates the smoke test to include instructions
on how to fix this issue.

Fixes: 677f909 ("gh: fix metric updates in smoke test workflow")
Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
@ajmmm ajmmm force-pushed the pr/auto-generate-feature-metrics branch from 3acf103 to 2d492fe Compare February 27, 2026 16:22
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Feb 27, 2026

/test

@ajmmm ajmmm marked this pull request as ready for review February 27, 2026 16:42
@ajmmm ajmmm requested review from a team as code owners February 27, 2026 16:42
@ajmmm ajmmm requested review from brlbil, qmonnet and thorn3r February 27, 2026 16:42
Copy link
Copy Markdown
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK but I'm concerned with the readability and maintainability of the Makefile recipe from the second commit.

Ideally, these changes should also come with an update to Documentation/contributing/docs/docsframework.rst 🙂

update-metrics: OPERATOR_METRICS_DOC := ./observability/feature-metrics-operator.txt
update-metrics: OPERATOR_METRICS_PREFIX := cilium_operator_feature
update-metrics: OPERATOR_METRICS_SEPARATORS := adv_connect_and_lb
update-metrics:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this whole Makefile recipe rather difficult to read. Have you considered moving it to a dedicated script?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 would be good to follow the current pattern for Documentation/update-*

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did actually start with an update script, but I considered the need to then re-acquire variables from the Makefile. Combined with needing to then run promtool via docker, it all got a bit messy. (That was admittedly before the 8 make variables that appeared!)

Ack, will refactor into a script :)

Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on keeping these metrics docs up to date btw. I brought up some thoughts below on how to reduce toil for contributors, but we can probably tackle this in steps. Some minor feedback below to address in this PR though.

--version=

.PHONY: kind-install-cilium-metrics
kind-install-cilium-metrics: check_deps kind-ready
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a comment at the end for help text. You can see how it shows up using make help | grep install.

$(CILIUM_CLI) install \
--chart-directory=$(ROOT_DIR)/install/kubernetes/cilium \
$(KIND_VALUES_FILES) \
--helm-values=$(ROOT_DIR)/contrib/testing/kind-metrics.yaml \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, is this something we should just integrate into an existing target? I see this is a new target but it's mostly just adding the extra metrics. I don't see this new target being used anywhere. Maybe we can collectively as a developer community be more consistent about metrics if the default dev install targets also configure the metrics(?)

Copy link
Copy Markdown
Member Author

@ajmmm ajmmm Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did think about just adding them to the base kind-common.yaml. But then I didn't want to disrupt the status quo of everyone else who uses make kind*. I can just make it default if it's cleaner. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if there's anything inconvenient about enabling this by default? If yes, that could be a reason to keep it separate. If no, I'd be somewhat inclined to just update it for everyone so all developers see the same thing. If someone complains, we can reopen the discussion.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything much, but that doesn't mean there isn't an impact! But, it's a bit moot, I've got a replacement PR coming in.

update-metrics: OPERATOR_METRICS_DOC := ./observability/feature-metrics-operator.txt
update-metrics: OPERATOR_METRICS_PREFIX := cilium_operator_feature
update-metrics: OPERATOR_METRICS_SEPARATORS := adv_connect_and_lb
update-metrics:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 would be good to follow the current pattern for Documentation/update-*

--env INCREMENTAL=$(INCREMENTAL)
DOCKER_RUN := $(DOCKER_CTR) $(DOCS_BUILDER_IMG)

PROMTOOL_IMG ?= quay.io/prometheus/prometheus:latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's pin this to a specific sha and let renovate handle versioning. We don't want to arbitrarily break the tree because prometheus releases a new version of their linter.

nit, not a requirement for this PR, but .github/workflows/tests-smoke.yaml also pulls in promtool, maybe that could be rebased to invoke the same check rather than open-coding the install & use of promtool in the GitHub workflow.


if ! git diff --quiet -- "${metrics_dir}" ; then
git --no-pager diff "${metrics_dir}"
echo "HINT: to fix this, install via 'make kind-install-cilium-metrics' and run 'make -C Documentation update-metrics'"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Idly wondering if we could have a model more similar to cmdref where we don't have to run a full cluster & docker image in order to test, but if we could instead just build cells with the metrics related code and then invoke a new entrypoint with the metrics configuration in order to validate. I'm not sure how much work this would be but it might make it easier for us to just expect someone to run make -C Documentation update-metrics, have that do a similar dependency graph to update-cmdref and directly invoke static binaries rather than requiring the contributor to do extra steps & setup

Copy link
Copy Markdown
Member Author

@ajmmm ajmmm Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer if this can just be spat out of the build somehow, but I did it as-is just to get something in quickly. If the environment variable(s) are instead added to kind-common, in the interim, I just echo the variable that needs to be set and the make -C ...? Then in future this can be updated to remove the dependency on the env variable being set.

@joestringer joestringer added the release-note/misc This PR makes changes that have no direct user impact. label Feb 27, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 27, 2026
@ajmmm
Copy link
Copy Markdown
Member Author

ajmmm commented Mar 10, 2026

Because I've reworked how this is done to better align to the cmdref way of working, I've got a V2 PR that supersedes this now: #44715

I will close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants