Skip to content

feat(metrics): adding certmanager_certificate_challenge_status metric#7736

Merged
cert-manager-prow[bot] merged 1 commit intocert-manager:masterfrom
hjoshi123:feat/cert-challenge-status-metrics
Jul 8, 2025
Merged

feat(metrics): adding certmanager_certificate_challenge_status metric#7736
cert-manager-prow[bot] merged 1 commit intocert-manager:masterfrom
hjoshi123:feat/cert-challenge-status-metrics

Conversation

@hjoshi123
Copy link
Copy Markdown
Collaborator

@hjoshi123 hjoshi123 commented May 6, 2025

Pull Request Motivation

feature

Fixes #7700. This PR adds the new challenge status metric with a controller which monitors the challenges.

/kind

Release Note

Added `certmanager_certificate_challenge_status` Prometheus metric.

@cert-manager-prow
Copy link
Copy Markdown
Contributor

@hjoshi123: The label(s) kind/release, kind/note cannot be applied, because the repository doesn't have them.

Details

In response to this:

Pull Request Motivation

Feature

#7700. This PR adds the new challenge status metric with a controller which monitors the challenges.

/kind

Release Note

Added `certmanager_certificate_challenge_status` prometheus metric.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cert-manager-prow cert-manager-prow bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/acme Indicates a PR directly modifies the ACME Issuer code needs-kind Indicates a PR lacks a `kind/foo` label and requires one. area/monitoring Indicates a PR or issue relates to monitoring needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2025
@cert-manager-prow
Copy link
Copy Markdown
Contributor

Hi @hjoshi123. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@cert-manager-prow cert-manager-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 6, 2025
@hjoshi123
Copy link
Copy Markdown
Collaborator Author

/kind feature

@cert-manager-prow cert-manager-prow bot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 6, 2025
@ThatsMrTalbot
Copy link
Copy Markdown
Contributor

/ok-to-test

@cert-manager-prow cert-manager-prow bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 15, 2025
@cert-manager-prow cert-manager-prow bot added area/testing Issues relating to testing dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. and removed dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. labels May 20, 2025
@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch from 2812dfc to 501a401 Compare May 20, 2025 14:47
@cert-manager-prow cert-manager-prow bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. and removed dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. labels May 20, 2025
@hjoshi123
Copy link
Copy Markdown
Collaborator Author

@ThatsMrTalbot there was also talk of adding age of challenge as a label.. I was thinking of a different. metric for that since this metric is only supposed to tell the status.. what's your take on that?

@ThatsMrTalbot
Copy link
Copy Markdown
Contributor

@ThatsMrTalbot there was also talk of adding age of challenge as a label.. I was thinking of a different. metric for that since this metric is only supposed to tell the status.. what's your take on that?

Age should not be a label, age is a constantly changing value, having it as a label would introduce high cardinality

@hjoshi123
Copy link
Copy Markdown
Collaborator Author

@ThatsMrTalbot there was also talk of adding age of challenge as a label.. I was thinking of a different. metric for that since this metric is only supposed to tell the status.. what's your take on that?

Age should not be a label, age is a constantly changing value, having it as a label would introduce high cardinality

Ah yes I didnt think of that.. so how would we solve it if we wanted to? it doesnt have to be this PR I am guessing but in general.

@ThatsMrTalbot
Copy link
Copy Markdown
Contributor

Ah yes I didnt think of that.. so how would we solve it if we wanted to? it doesnt have to be this PR I am guessing but in general.

The way I have seen time metrics work in the past is having a metric that contains a unix timestamp, for example kube_cronjob_status_last_schedule_time in kube-state-metric. These kind of metrics can be used with the time function to get relative time (time() - metric value).

@ThatsMrTalbot
Copy link
Copy Markdown
Contributor

The code looks good, I want to run it locally, make sure it behaves as the code reads.

@cert-manager-prow cert-manager-prow bot added kind/design Categorizes issue or PR as related to design. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 6, 2025
@hjoshi123
Copy link
Copy Markdown
Collaborator Author

Code looks great now - can you squash your commits before I approve

@ThatsMrTalbot looks like I messed up while squashing commits through git rebase -i. Do you think its better to reopen a different PR?

@erikgb
Copy link
Copy Markdown
Member

erikgb commented Jul 6, 2025

Should be no need to create a new PR. You just need som git-fu. 😉

This recipe usually works well for me. Assuming your fork is the origin remote and upstream is the original cert-manager remote.

git fetch --all
git checkout your-branch
git reset --hard upstream/master
git merge --squash origin/your-branch

Now check that all eventual conflicts are resolved and the changeset in your working area looks good. Then just commit and force push.

git commit -am "your-commit-message"
git push -f

Before the last step, ensure your commit looks good. Good luck! 👍

@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch from d159660 to bc2bbe3 Compare July 6, 2025 22:05
@cert-manager-prow cert-manager-prow bot added area/ca Indicates a PR directly modifies the CA Issuer code area/vault Indicates a PR directly modifies the Vault Issuer code and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 6, 2025
@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch 2 times, most recently from 381b889 to 35841d5 Compare July 6, 2025 22:17
@cert-manager-prow cert-manager-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 6, 2025
@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch from 35841d5 to f467393 Compare July 6, 2025 23:43
@cert-manager-prow
Copy link
Copy Markdown
Contributor

cert-manager-prow bot commented Jul 7, 2025

@hjoshi123: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cert-manager-master-e2e-v1-32 a2b366c link true /test pull-cert-manager-master-e2e-v1-32

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hjoshi123
Copy link
Copy Markdown
Collaborator Author

Thank you @erikgb @ThatsMrTalbot for the git-fu suggestions. The PR should be ready to merge now. Let me know if there are some final changes needed.

func TestMetricsController(t *testing.T) {
config, stopFn := framework.RunControlPlane(t)
t.Cleanup(stopFn)
defer stopFn()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has t.Cleanup been changed to defer?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point.. I think this was one of the merge errors I had and forgot to clean that up (because my base branch was before the master had this code I guess). Will quickly amend it now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch from f467393 to 002e2bd Compare July 7, 2025 16:52
}
}()
defer func() {
shutdownCtx, cancel := context.WithTimeout(context.WithoutCancel(t.Context()), time.Second*5)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that looks to have changed - was this intentional or part of the merge fun?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh missed this.. yup seems to be part of the merge fun.. fixing it.. sorry for the minor mistakes 😅

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: hjoshi123 <hemant.joshi@vizio.com>
@hjoshi123 hjoshi123 force-pushed the feat/cert-challenge-status-metrics branch from 002e2bd to 2558e46 Compare July 7, 2025 21:15
@ThatsMrTalbot
Copy link
Copy Markdown
Contributor

ThatsMrTalbot commented Jul 8, 2025

Thanks for your persistence on this!

/lgtm
/approve

@cert-manager-prow cert-manager-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jul 8, 2025
@cert-manager-prow
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ThatsMrTalbot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2025
@cert-manager-prow cert-manager-prow bot merged commit cc4b6f3 into cert-manager:master Jul 8, 2025
6 checks passed
@hjoshi123
Copy link
Copy Markdown
Collaborator Author

Thank you @ThatsMrTalbot for helping me out and briefing me on things. Also thank you @erikgb

alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Oct 8, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [cert-manager](https://cert-manager.io) ([source](https://github.com/cert-manager/cert-manager)) | minor | `v1.18.2` -> `v1.19.0` |

---

### Release Notes

<details>
<summary>cert-manager/cert-manager (cert-manager)</summary>

### [`v1.19.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.19.0)

[Compare Source](cert-manager/cert-manager@v1.18.2...v1.19.0)

cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters.

This release focuses on expanding platform compatibility, improving deployment flexibility, enhancing observability, and addressing key reliability issues.

> 📖  Read the full release notes at cert-manager.io: <https://cert-manager.io/docs/releases/release-notes/release-notes-1.19>

Changes since `v1.18.0`:

#### Feature

- Add IPv6 rules to the default network policy ([#&#8203;7726](cert-manager/cert-manager#7726), [@&#8203;jcpunk](https://github.com/jcpunk))
- Add `global.nodeSelector` to helm chart to allow for a single `nodeSelector` to be set across all services. ([#&#8203;7818](cert-manager/cert-manager#7818), [@&#8203;StingRayZA](https://github.com/StingRayZA))
- Add a feature gate to default to Ingress `pathType` `Exact` in ACME HTTP01 Ingress challenge solvers. ([#&#8203;7795](cert-manager/cert-manager#7795), [@&#8203;sspreitzer](https://github.com/sspreitzer))
- Add generated `applyconfigurations` allowing clients to make type-safe server-side apply requests for cert-manager resources. ([#&#8203;7866](cert-manager/cert-manager#7866), [@&#8203;erikgb](https://github.com/erikgb))
- Added API defaults to issuer references group (cert-manager.io) and kind (Issuer). ([#&#8203;7414](cert-manager/cert-manager#7414), [@&#8203;erikgb](https://github.com/erikgb))
- Added `certmanager_certificate_challenge_status` Prometheus metric. ([#&#8203;7736](cert-manager/cert-manager#7736), [@&#8203;hjoshi123](https://github.com/hjoshi123))
- Added `protocol` field for `rfc2136` DNS01 provider ([#&#8203;7881](cert-manager/cert-manager#7881), [@&#8203;hjoshi123](https://github.com/hjoshi123))
- Added experimental field `hostUsers` flag to all pods. Not set by default. ([#&#8203;7973](cert-manager/cert-manager#7973), [@&#8203;hjoshi123](https://github.com/hjoshi123))
- Support configurable resource requests and limits for ACME HTTP01 solver pods through ClusterIssuer and Issuer specifications, allowing granular resource management that overrides global `--acme-http01-solver-resource-*` settings. ([#&#8203;7972](cert-manager/cert-manager#7972), [@&#8203;lunarwhite](https://github.com/lunarwhite))
- The `CAInjectorMerging` feature has been promoted to BETA and is now enabled by default ([#&#8203;8017](cert-manager/cert-manager#8017), [@&#8203;ThatsMrTalbot](https://github.com/ThatsMrTalbot))
- The controller, webhook and ca-injector now log their version and git commit on startup for easier debugging and support. ([#&#8203;8072](cert-manager/cert-manager#8072), [@&#8203;prasad89](https://github.com/prasad89))
- Updated `certificate` metrics to the collector approach. ([#&#8203;7856](cert-manager/cert-manager#7856), [@&#8203;hjoshi123](https://github.com/hjoshi123))

#### Bug or Regression

- ACME: Increased challenge authorization timeout to 2 minutes to fix `error waiting for authorization` ([#&#8203;7796](cert-manager/cert-manager#7796), [@&#8203;hjoshi123](https://github.com/hjoshi123))
- BUGFIX: permitted URI domains were incorrectly used to set the excluded URI domains in the CSR's name constraints ([#&#8203;7816](cert-manager/cert-manager#7816), [@&#8203;kinolaev](https://github.com/kinolaev))
- Enforced ACME HTTP-01 solver validation to properly reject configurations when multiple ingress options (`class`, `ingressClassName`, `name`) are specified simultaneously ([#&#8203;8021](cert-manager/cert-manager#8021), [@&#8203;lunarwhite](https://github.com/lunarwhite))
- Increase maximum sizes of PEM certificates and chains which can be parsed in cert-manager, to handle leaf certificates with large numbers of DNS names or other identities ([#&#8203;7961](cert-manager/cert-manager#7961), [@&#8203;SgtCoDFish](https://github.com/SgtCoDFish))
- Reverted adding the `global.rbac.disableHTTPChallengesRole` Helm option. ([#&#8203;7836](cert-manager/cert-manager#7836), [@&#8203;inteon](https://github.com/inteon))
- This change removes the `path` label of core ACME client metrics and will require users to update their monitoring dashboards and alerting rules if using those metrics. ([#&#8203;8109](cert-manager/cert-manager#8109), [@&#8203;mladen-rusev-cyberark](https://github.com/mladen-rusev-cyberark))
- Use the latest version of `ingress-nginx` in E2E tests to ensure compatibility ([#&#8203;7792](cert-manager/cert-manager#7792), [@&#8203;wallrj](https://github.com/wallrj))

#### Other (Cleanup or Flake)

- Helm: Fix naming template of `tokenrequest` RoleBinding resource to improve consistency ([#&#8203;7761](cert-manager/cert-manager#7761), [@&#8203;lunarwhite](https://github.com/lunarwhite))
- Improve error messages when certificates, CRLs or private keys fail admission due to malformed or missing PEM data ([#&#8203;7928](cert-manager/cert-manager#7928), [@&#8203;SgtCoDFish](https://github.com/SgtCoDFish))
- Major upgrade of Akamai SDK. NOTE: The new version has not been fully tested end-to-end due to the lack of cloud infrastructure. ([#&#8203;8003](cert-manager/cert-manager#8003), [@&#8203;hjoshi123](https://github.com/hjoshi123))
- Update kind images to include the Kubernetes 1.33 node image ([#&#8203;7786](cert-manager/cert-manager#7786), [@&#8203;wallrj](https://github.com/wallrj))
- Use `maps.Copy` for cleaner map handling ([#&#8203;8092](cert-manager/cert-manager#8092), [@&#8203;quantpoet](https://github.com/quantpoet))
- Vault: Migrate Vault E2E add-on tests from deprecated `vault-client-go` to the new `vault/api` client. ([#&#8203;8059](cert-manager/cert-manager#8059), [@&#8203;armagankaratosun](https://github.com/armagankaratosun))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xMzUuNCIsInVwZGF0ZWRJblZlciI6IjQxLjEzNS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJjaGFydCJdfQ==-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1711
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
@wallrj-cyberark
Copy link
Copy Markdown
Member

@hjoshi123 We have released this. Please test and feedback: https://github.com/cert-manager/cert-manager/releases/tag/v1.19.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme/http01 Indicates a PR modifies ACME HTTP01 provider code area/acme Indicates a PR directly modifies the ACME Issuer code area/api Indicates a PR directly modifies the 'pkg/apis' directory area/ca Indicates a PR directly modifies the CA Issuer code area/deploy Indicates a PR modifies deployment configuration area/monitoring Indicates a PR or issue relates to monitoring area/testing Issues relating to testing area/vault Indicates a PR directly modifies the Vault Issuer code dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/design Categorizes issue or PR as related to design. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose metrics regarding challenge status

4 participants