Auto-generate feature metrics documentation for agent and operator#44556
Auto-generate feature metrics documentation for agent and operator#44556ajmmm wants to merge 3 commits intocilium:mainfrom
Conversation
510b9a1 to
3acf103
Compare
Introduce a new make target that installs Cilium with the necessary configuration and environment variables to export default metrics. This will enable us to programmatically generate feature metrics documentation in the next commit. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Introduce a new update-metrics build target in Documentation. When combined with installing Cilium via `make kind-install-cilium-metrics`, this provides a mechanism to automatically generate feature metrics documentation for the agent and operator. This commit also adds Documentation/check-metrics.sh which is now called via `make -C Documentation check`. Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
Commit 677f909 ("gh: fix metric updates in smoke test workflow") updated the IPv4 smoke test to fail if feature metrics are out of sync. However, when hitting this in CI, it's not obvious how to go about fixing this. The previous two commits have provided a simplified way to: 1. Install Cilium with the correct configuration and environment variables to export default metrics. This is: `make kind-install-cilium-metrics` 2. Query the installed kind cluster for feature metrics and process these using the included tooling to produce the relevant docs for the agent and operator. This is: `make -C Documentation update-metrics` Therefore, this commit updates the smoke test to include instructions on how to fix this issue. Fixes: 677f909 ("gh: fix metric updates in smoke test workflow") Signed-off-by: Alasdair McWilliam <alasdair.mcwilliam@isovalent.com>
3acf103 to
2d492fe
Compare
|
/test |
qmonnet
left a comment
There was a problem hiding this comment.
Looks OK but I'm concerned with the readability and maintainability of the Makefile recipe from the second commit.
Ideally, these changes should also come with an update to Documentation/contributing/docs/docsframework.rst 🙂
| update-metrics: OPERATOR_METRICS_DOC := ./observability/feature-metrics-operator.txt | ||
| update-metrics: OPERATOR_METRICS_PREFIX := cilium_operator_feature | ||
| update-metrics: OPERATOR_METRICS_SEPARATORS := adv_connect_and_lb | ||
| update-metrics: |
There was a problem hiding this comment.
I find this whole Makefile recipe rather difficult to read. Have you considered moving it to a dedicated script?
There was a problem hiding this comment.
+1 would be good to follow the current pattern for Documentation/update-*
There was a problem hiding this comment.
I did actually start with an update script, but I considered the need to then re-acquire variables from the Makefile. Combined with needing to then run promtool via docker, it all got a bit messy. (That was admittedly before the 8 make variables that appeared!)
Ack, will refactor into a script :)
| --version= | ||
|
|
||
| .PHONY: kind-install-cilium-metrics | ||
| kind-install-cilium-metrics: check_deps kind-ready |
There was a problem hiding this comment.
Please also add a comment at the end for help text. You can see how it shows up using make help | grep install.
| $(CILIUM_CLI) install \ | ||
| --chart-directory=$(ROOT_DIR)/install/kubernetes/cilium \ | ||
| $(KIND_VALUES_FILES) \ | ||
| --helm-values=$(ROOT_DIR)/contrib/testing/kind-metrics.yaml \ |
There was a problem hiding this comment.
Actually, is this something we should just integrate into an existing target? I see this is a new target but it's mostly just adding the extra metrics. I don't see this new target being used anywhere. Maybe we can collectively as a developer community be more consistent about metrics if the default dev install targets also configure the metrics(?)
There was a problem hiding this comment.
Yeah, I did think about just adding them to the base kind-common.yaml. But then I didn't want to disrupt the status quo of everyone else who uses make kind*. I can just make it default if it's cleaner. :)
There was a problem hiding this comment.
Do you know if there's anything inconvenient about enabling this by default? If yes, that could be a reason to keep it separate. If no, I'd be somewhat inclined to just update it for everyone so all developers see the same thing. If someone complains, we can reopen the discussion.
There was a problem hiding this comment.
I can't think of anything much, but that doesn't mean there isn't an impact! But, it's a bit moot, I've got a replacement PR coming in.
| update-metrics: OPERATOR_METRICS_DOC := ./observability/feature-metrics-operator.txt | ||
| update-metrics: OPERATOR_METRICS_PREFIX := cilium_operator_feature | ||
| update-metrics: OPERATOR_METRICS_SEPARATORS := adv_connect_and_lb | ||
| update-metrics: |
There was a problem hiding this comment.
+1 would be good to follow the current pattern for Documentation/update-*
| --env INCREMENTAL=$(INCREMENTAL) | ||
| DOCKER_RUN := $(DOCKER_CTR) $(DOCS_BUILDER_IMG) | ||
|
|
||
| PROMTOOL_IMG ?= quay.io/prometheus/prometheus:latest |
There was a problem hiding this comment.
Let's pin this to a specific sha and let renovate handle versioning. We don't want to arbitrarily break the tree because prometheus releases a new version of their linter.
nit, not a requirement for this PR, but .github/workflows/tests-smoke.yaml also pulls in promtool, maybe that could be rebased to invoke the same check rather than open-coding the install & use of promtool in the GitHub workflow.
|
|
||
| if ! git diff --quiet -- "${metrics_dir}" ; then | ||
| git --no-pager diff "${metrics_dir}" | ||
| echo "HINT: to fix this, install via 'make kind-install-cilium-metrics' and run 'make -C Documentation update-metrics'" |
There was a problem hiding this comment.
nit: Idly wondering if we could have a model more similar to cmdref where we don't have to run a full cluster & docker image in order to test, but if we could instead just build cells with the metrics related code and then invoke a new entrypoint with the metrics configuration in order to validate. I'm not sure how much work this would be but it might make it easier for us to just expect someone to run make -C Documentation update-metrics, have that do a similar dependency graph to update-cmdref and directly invoke static binaries rather than requiring the contributor to do extra steps & setup
There was a problem hiding this comment.
It would be nicer if this can just be spat out of the build somehow, but I did it as-is just to get something in quickly. If the environment variable(s) are instead added to kind-common, in the interim, I just echo the variable that needs to be set and the make -C ...? Then in future this can be updated to remove the dependency on the env variable being set.
|
Because I've reworked how this is done to better align to the I will close this. |
Following #44114 CI will now fail if feature metric documentation is out of sync. However, as noted, the developer U/X for this is poor.
This PR:
Introduces a new kind install target:
make kind-install-cilium-metrics.This can be run to install Cilium with the necessary config to express default metrics. This is required to auto-generate the documentation.
Introduces a new docs target:
make -C Documentation update-metrics.This queries the installed kind cluster and auto-generates the feature metric files in Documentation/observability.
Updates the
tests-smoke.yamlfile to better express any detected sync issues for these files, and also mentions the new build targets that can be used to re-generate them from command line.The standard
make -C Documentation checkis also updated to check the metrics files.Note: the smoke test workflow is now duplicating some logic, but thought it best to leave this alone for now until these changes are accepted. I envisage a follow-up PR in due course that could split the testing out into its own workflow and refactoring to reducing duplication.
CI Test example
Screenshot below from test CI run with a commit that modifies one of the documents.