fix(chart): set valid job label on pod/service monitors#8162
fix(chart): set valid job label on pod/service monitors#8162cert-manager-prow[bot] merged 1 commit intocert-manager:masterfrom LiquidPL:fix-job-label
Conversation
|
Hi @LiquidPL. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Apologies for the large diff in |
😆 The diff looks sane to me, but is probably better done in a separate commit/PR. /ok-to-test |
| scrapeTimeout: 30s | ||
|
|
||
| # The label on the service resource that will be used as the job label for all metrics. | ||
| jobLabel: app.kubernetes.io/name |
There was a problem hiding this comment.
Would it ever be helpful for a user to configure this label?
There was a problem hiding this comment.
To be honest I'm not sure. I figured there might be some obscure case that makes it worthwhile to allow configuring it.
There was a problem hiding this comment.
It seems like the best practice is to use a static value for all app instances doing the same thing.
For job labels in Prometheus, use the value to group targets that perform the same function, like job="frontend" or job="redis". Best practices include using lowercase and hyphens for names, keeping them descriptive but concise, and avoiding high-cardinality values like user IDs or specific request IDs. This ensures consistent grouping, better organization, and improved query performance by preventing the creation of too many unique time series.
I'll vote for starting simple with a hard-coded value for the jobLabel field (app.kubernetes.io/name). We can always make it configurable later on, when we have evidence that it's needed. This was also suggested in the referenced issue.
There was a problem hiding this comment.
Sounds good to me.
| # +docs:property | ||
| nodeSelector: {} | ||
|
|
|
@LiquidPL, can you try to make the release note a bit more descriptive? This "could" be a breaking change for some users. |
Sorry, my statement is untrue. There has been a misunderstanding about how the |
|
Is the current version okay? |
Indeed! Thanks! Are you able to squash everything into a single commit, please? And remove the unintended change. |
Signed-off-by: Krzysztof Gutkowski <krzysio.gutkowski@gmail.com>
|
All done now. 🚀 |
erikgb
left a comment
There was a problem hiding this comment.
Thanks a lot for working on this, @LiquidPL! 🚀 I couldn't find an official recommendation about using the app.kubernetes.io/name here, but it seems like a good choice considering the more general recommendations for this field:
- The Prometheus / monitoring best practices (e.g. in blogs and community guides) repeatedly caution against high-cardinality or dynamically changing label values. Labels should be stable and meaningful.
- The CNCF blog “Prometheus Labels: Understanding and Best Practices” emphasizes using consistent and meaningful keys, and avoiding dynamic or high-cardinality labels.
CNCF - More broadly, in “Prometheus Best Practices” style articles, one of the first rules is “Don’t use high cardinality labels.” You can infer that using a label that changes often (e.g. version, instance ID) is ill-advised.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: erikgb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
📢 A new pre-release is available which contains this fix or feature: Please test and report back. |
|
Can confirm that on a fresh install of |
…#4581) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager/cert-manager](https://github.com/cert-manager/cert-manager) | minor | `v1.19.4` → `v1.20.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager/cert-manager)</summary> ### [`v1.20.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0) [Compare Source](cert-manager/cert-manager@v1.19.4...v1.20.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. v1.20.0 adds support for the new ListenerSet resource, adds support for Azure Private DNS; parentRefs are no longer required when using ACME with Gateway API, and OtherNames was promoted to Beta. #### Changes by Kind ##### Feature - Added a set of flags to permit setting NetworkPolicy across all deployed containers. Remove redundant global IP ranges from example policies. ([#​8370](cert-manager/cert-manager#8370), [@​jcpunk](https://github.com/jcpunk)) - Added selectable fields to custom resource definitions for .spec.issuerRef.{group, kind, name} ([#​8256](cert-manager/cert-manager#8256), [@​tareksha](https://github.com/tareksha)) - Added support for specifying `imagePullSecrets` in the `startupapicheck-job` Helm template to enable pulling images from private registries. ([#​8186](cert-manager/cert-manager#8186), [@​mathieu-clnk](https://github.com/mathieu-clnk)) - Added 'extraContainers' helm chart value, allowing the deployment of arbitrary sidecar containers within the cert-manager operator pod. This can be used to support, for e.g., AWS IAM Roles Anywhere for Route53 DNS01 verification. ([#​8355](cert-manager/cert-manager#8355), [@​dancmeyers](https://github.com/dancmeyers)) - Added `parentRef` override annotations on the Certificate resource. ([#​8518](cert-manager/cert-manager#8518), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for azure private zones for dns01 issuer. ([#​8494](cert-manager/cert-manager#8494), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for configuring PEM decoding size limits, allowing operators to handle larger certificates and keys. ([#​7642](cert-manager/cert-manager#7642), [@​robertlestak](https://github.com/robertlestak)) - Added support for unhealthyPodEvictionPolicy in PodDisruptionBudget ([#​7728](cert-manager/cert-manager#7728), [@​jcpunk](https://github.com/jcpunk)) - For Venafi provider, read `venafi.cert-manager.io/custom-fields` annotation on Issuer/ClusterIssuer and use it as base with override/append capabilities on Certificate level. ([#​8301](cert-manager/cert-manager#8301), [@​k0da](https://github.com/k0da)) - Improve error message when CA issuers are misconfigured to use a clashing secret name ([#​8374](cert-manager/cert-manager#8374), [@​majiayu000](https://github.com/majiayu000)) - Introduce a new Ingress annotation `acme.cert-manager.io/http01-ingress-ingressclassname` to override `http01.ingress.ingressClassName` field in HTTP-01 challenge solvers. ([#​8244](cert-manager/cert-manager#8244), [@​lunarwhite](https://github.com/lunarwhite)) - Update `global.nodeSelector` to helm chart to perform a `merge` and allow for a single `nodeSelector` to be set across all services. ([#​8195](cert-manager/cert-manager#8195), [@​StingRayZA](https://github.com/StingRayZA)) - Vault issuers will now include the Vault server address as one of the default audiences on generated service account tokens. ([#​8228](cert-manager/cert-manager#8228), [@​terinjokes](https://github.com/terinjokes)) - Added experimental `XListenerSet` feature gate ([#​8394](cert-manager/cert-manager#8394), [@​hjoshi123](https://github.com/hjoshi123)) ##### Documentation - Add GWAPI documentation to NOTES.TXT in helm chart ([#​8353](cert-manager/cert-manager#8353), [@​jaxels10](https://github.com/jaxels10)) ##### Bug or Regression - Adds logs for cases when acme server returns us a fatal error in the order controller ([#​8199](cert-manager/cert-manager#8199), [@​Peac36](https://github.com/Peac36)) - Fixed an issue where kind or group in the issuerRef of a Certificate was omitted, upgrading to 1.19.x incorrectly caused the certificate to be renewed ([#​8160](cert-manager/cert-manager#8160), [@​inteon](https://github.com/inteon)) - Changes to the Duration and RenewBefore annotations on ingress and gateway-api resources will now trigger certificate updates. ([#​8232](cert-manager/cert-manager#8232), [@​eleanor-merry](https://github.com/eleanor-merry)) - Fix an issue where ACME challenge TXT records are not cleaned up when there are many resource records in CloudDNS. ([#​8456](cert-manager/cert-manager#8456), [@​tkna](https://github.com/tkna)) - Fix unregulated retries with the DigitalOcean DNS-01 solver Add full detailed DNS-01 errors to the events attached to the Challenge, for easier debugging ([#​8221](cert-manager/cert-manager#8221), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Fixed an infinite re-issuance loop that could occur when an issuer returns a certificate with a public key that doesn't match the CSR. The issuing controller now validates the certificate before storing it and fails with backoff on mismatch. ([#​8403](cert-manager/cert-manager#8403), [@​calm329](https://github.com/calm329)) - Fixed an issue where HTTP-01 challenges failed when the Host header contains an IPv6 address. This means that users can now issue IP address certificates for IPv6 address subjects. ([#​8424](cert-manager/cert-manager#8424), [@​SlashNephy](https://github.com/SlashNephy)) - Fixed the HTTP-01 Gateway solver creating invalid HTTPRoutes by not setting spec.hostnames when the challenge DNSName is an IP address. ([#​8443](cert-manager/cert-manager#8443), [@​alviss7](https://github.com/alviss7)) - Revert API defaults for issuer reference kind and group introduced in 0.19.0 ([#​8173](cert-manager/cert-manager#8173), [@​erikgb](https://github.com/erikgb)) - Security (MODERATE): Fix a potential panic in the cert-manager controller when a DNS response in an unexpected order was cached. If an attacker was able to modify DNS responses (or if they controlled the DNS server) it was possible to cause denial of service for the cert-manager controller. ([#​8469](cert-manager/cert-manager#8469), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Update Go to `v1.25.5` to fix `CVE-2025-61727` and `CVE-2025-61729` ([#​8290](cert-manager/cert-manager#8290), [@​octo-sts](https://github.com/octo-sts)\[bot]) - When Prometheus monitoring is enabled, the metrics label is now set to the intended value of `cert-manager`. Previously, it was set depending on various factors (namespace cert-manager is installed in and/or Helm release name). ([#​8162](cert-manager/cert-manager#8162), [@​LiquidPL](https://github.com/LiquidPL)) ##### Other (Cleanup or Flake) - Promoted the OtherNames feature to Beta and enabled it by default ([#​8288](cert-manager/cert-manager#8288), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Promoting `xlistenerset` feature gate to `listenerset` ([#​8501](cert-manager/cert-manager#8501), [@​hjoshi123](https://github.com/hjoshi123)) - Rebranding of the Venafi Issuer to CyberArk ([#​8215](cert-manager/cert-manager#8215), [@​iossifbenbassat123](https://github.com/iossifbenbassat123)) - Switched to SSA for challenge finalizer updates ([#​8519](cert-manager/cert-manager#8519), [@​inteon](https://github.com/inteon)) - The default container user (UID) is now 65532 (previously 1000) and the default container group (GID) is now 65532 (previously 0) ([#​8408](cert-manager/cert-manager#8408), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - The feature-gate DefaultPrivateKeyRotationPolicyAlways moved from Beta to GA and can no longer be disabled. ([#​8287](cert-manager/cert-manager#8287), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Update cert-manager's ACME client, forked from golang/x/crypto ([#​8268](cert-manager/cert-manager#8268), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Use the latest version of Kyverno (1.16.2) in the best-practice installation tests ([#​8389](cert-manager/cert-manager#8389), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - We stopped testing with Coutour due to it not supporting the new XListenerSet resource, and moved to kgateway. ([#​8426](cert-manager/cert-manager#8426), [@​hjoshi123](https://github.com/hjoshi123)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4581 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager](https://cert-manager.io) ([source](https://github.com/cert-manager/cert-manager)) | minor | `v1.19.4` → `v1.20.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager)</summary> ### [`v1.20.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0) [Compare Source](cert-manager/cert-manager@v1.19.4...v1.20.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. v1.20.0 adds support for the new ListenerSet resource, adds support for Azure Private DNS; parentRefs are no longer required when using ACME with Gateway API, and OtherNames was promoted to Beta. #### Changes by Kind ##### Feature - Added a set of flags to permit setting NetworkPolicy across all deployed containers. Remove redundant global IP ranges from example policies. ([#​8370](cert-manager/cert-manager#8370), [@​jcpunk](https://github.com/jcpunk)) - Added selectable fields to custom resource definitions for .spec.issuerRef.{group, kind, name} ([#​8256](cert-manager/cert-manager#8256), [@​tareksha](https://github.com/tareksha)) - Added support for specifying `imagePullSecrets` in the `startupapicheck-job` Helm template to enable pulling images from private registries. ([#​8186](cert-manager/cert-manager#8186), [@​mathieu-clnk](https://github.com/mathieu-clnk)) - Added 'extraContainers' helm chart value, allowing the deployment of arbitrary sidecar containers within the cert-manager operator pod. This can be used to support, for e.g., AWS IAM Roles Anywhere for Route53 DNS01 verification. ([#​8355](cert-manager/cert-manager#8355), [@​dancmeyers](https://github.com/dancmeyers)) - Added `parentRef` override annotations on the Certificate resource. ([#​8518](cert-manager/cert-manager#8518), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for azure private zones for dns01 issuer. ([#​8494](cert-manager/cert-manager#8494), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for configuring PEM decoding size limits, allowing operators to handle larger certificates and keys. ([#​7642](cert-manager/cert-manager#7642), [@​robertlestak](https://github.com/robertlestak)) - Added support for unhealthyPodEvictionPolicy in PodDisruptionBudget ([#​7728](cert-manager/cert-manager#7728), [@​jcpunk](https://github.com/jcpunk)) - For Venafi provider, read `venafi.cert-manager.io/custom-fields` annotation on Issuer/ClusterIssuer and use it as base with override/append capabilities on Certificate level. ([#​8301](cert-manager/cert-manager#8301), [@​k0da](https://github.com/k0da)) - Improve error message when CA issuers are misconfigured to use a clashing secret name ([#​8374](cert-manager/cert-manager#8374), [@​majiayu000](https://github.com/majiayu000)) - Introduce a new Ingress annotation `acme.cert-manager.io/http01-ingress-ingressclassname` to override `http01.ingress.ingressClassName` field in HTTP-01 challenge solvers. ([#​8244](cert-manager/cert-manager#8244), [@​lunarwhite](https://github.com/lunarwhite)) - Update `global.nodeSelector` to helm chart to perform a `merge` and allow for a single `nodeSelector` to be set across all services. ([#​8195](cert-manager/cert-manager#8195), [@​StingRayZA](https://github.com/StingRayZA)) - Vault issuers will now include the Vault server address as one of the default audiences on generated service account tokens. ([#​8228](cert-manager/cert-manager#8228), [@​terinjokes](https://github.com/terinjokes)) - Added experimental `XListenerSet` feature gate ([#​8394](cert-manager/cert-manager#8394), [@​hjoshi123](https://github.com/hjoshi123)) ##### Documentation - Add GWAPI documentation to NOTES.TXT in helm chart ([#​8353](cert-manager/cert-manager#8353), [@​jaxels10](https://github.com/jaxels10)) ##### Bug or Regression - Adds logs for cases when acme server returns us a fatal error in the order controller ([#​8199](cert-manager/cert-manager#8199), [@​Peac36](https://github.com/Peac36)) - Fixed an issue where kind or group in the issuerRef of a Certificate was omitted, upgrading to 1.19.x incorrectly caused the certificate to be renewed ([#​8160](cert-manager/cert-manager#8160), [@​inteon](https://github.com/inteon)) - Changes to the Duration and RenewBefore annotations on ingress and gateway-api resources will now trigger certificate updates. ([#​8232](cert-manager/cert-manager#8232), [@​eleanor-merry](https://github.com/eleanor-merry)) - Fix an issue where ACME challenge TXT records are not cleaned up when there are many resource records in CloudDNS. ([#​8456](cert-manager/cert-manager#8456), [@​tkna](https://github.com/tkna)) - Fix unregulated retries with the DigitalOcean DNS-01 solver Add full detailed DNS-01 errors to the events attached to the Challenge, for easier debugging ([#​8221](cert-manager/cert-manager#8221), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Fixed an infinite re-issuance loop that could occur when an issuer returns a certificate with a public key that doesn't match the CSR. The issuing controller now validates the certificate before storing it and fails with backoff on mismatch. ([#​8403](cert-manager/cert-manager#8403), [@​calm329](https://github.com/calm329)) - Fixed an issue where HTTP-01 challenges failed when the Host header contains an IPv6 address. This means that users can now issue IP address certificates for IPv6 address subjects. ([#​8424](cert-manager/cert-manager#8424), [@​SlashNephy](https://github.com/SlashNephy)) - Fixed the HTTP-01 Gateway solver creating invalid HTTPRoutes by not setting spec.hostnames when the challenge DNSName is an IP address. ([#​8443](cert-manager/cert-manager#8443), [@​alviss7](https://github.com/alviss7)) - Revert API defaults for issuer reference kind and group introduced in 0.19.0 ([#​8173](cert-manager/cert-manager#8173), [@​erikgb](https://github.com/erikgb)) - Security (MODERATE): Fix a potential panic in the cert-manager controller when a DNS response in an unexpected order was cached. If an attacker was able to modify DNS responses (or if they controlled the DNS server) it was possible to cause denial of service for the cert-manager controller. ([#​8469](cert-manager/cert-manager#8469), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Update Go to `v1.25.5` to fix `CVE-2025-61727` and `CVE-2025-61729` ([#​8290](cert-manager/cert-manager#8290), [@​octo-sts](https://github.com/octo-sts)\[bot]) - When Prometheus monitoring is enabled, the metrics label is now set to the intended value of `cert-manager`. Previously, it was set depending on various factors (namespace cert-manager is installed in and/or Helm release name). ([#​8162](cert-manager/cert-manager#8162), [@​LiquidPL](https://github.com/LiquidPL)) ##### Other (Cleanup or Flake) - Promoted the OtherNames feature to Beta and enabled it by default ([#​8288](cert-manager/cert-manager#8288), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Promoting `xlistenerset` feature gate to `listenerset` ([#​8501](cert-manager/cert-manager#8501), [@​hjoshi123](https://github.com/hjoshi123)) - Rebranding of the Venafi Issuer to CyberArk ([#​8215](cert-manager/cert-manager#8215), [@​iossifbenbassat123](https://github.com/iossifbenbassat123)) - Switched to SSA for challenge finalizer updates ([#​8519](cert-manager/cert-manager#8519), [@​inteon](https://github.com/inteon)) - The default container user (UID) is now 65532 (previously 1000) and the default container group (GID) is now 65532 (previously 0) ([#​8408](cert-manager/cert-manager#8408), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - The feature-gate DefaultPrivateKeyRotationPolicyAlways moved from Beta to GA and can no longer be disabled. ([#​8287](cert-manager/cert-manager#8287), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Update cert-manager's ACME client, forked from golang/x/crypto ([#​8268](cert-manager/cert-manager#8268), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Use the latest version of Kyverno (1.16.2) in the best-practice installation tests ([#​8389](cert-manager/cert-manager#8389), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - We stopped testing with Coutour due to it not supporting the new XListenerSet resource, and moved to kgateway. ([#​8426](cert-manager/cert-manager#8426), [@​hjoshi123](https://github.com/hjoshi123)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiY2hhcnQiXX0=--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4582 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Pull Request Motivation
Fixes #7088.
As per the Prometheus Operator documentation, the
jobLabelfield on both service and pod monitors refers to a label on the monitored resource, which is set as the label of metrics pulled into Prometheus.If that field is set to a label which doesn't exist on the respective resource, the metrics label will default to some other value (
<namespace>/<resource name>for pod monitors,<resource name>for service monitors). This can break things - for instancecert-manager-mixinexpects the label to be set tocert-manager, whereas while using a pod monitor the label is set tocert-manager/cert-manager, causing the absence alerts to fire when cert-manager is running perfectly fine.This PR will set the field's value to
app.kubernetes.io/name, which will set the metric job label to the previously intendedcert-manager, as well as make the value configurable via Helm values.Kind
/kind bug
Release Note