Fix unregulated retries with the DigitalOcean, Azure DNS, and AWS Route53 DNS-01 solver#8221
Conversation
|
Skipping CI for Draft Pull Request. |
67cf158 to
f6e63e5
Compare
95bffdd to
0f60d31
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR refactors error handling for AWS Route53 and Azure DNS providers by removing custom error stabilization logic and consolidating it into a centralized location. The changes ensure that error messages remain stable (without unique request IDs) to prevent spurious challenge updates.
- Removed
removeReqID()function from Route53 provider andstabilizeError()function from Azure DNS provider - Added centralized
stabilizeSolverErrorMessage()function in the ACME challenge controller that handles error message stabilization for both AWS and Azure SDK errors - Updated error formatting to use
%winstead of%sor%vfor proper error wrapping
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/issuer/acme/dns/route53/route53.go | Removed removeReqID() function and its calls; updated error messages to use %w for proper error wrapping |
| pkg/issuer/acme/dns/route53/route53_test.go | Removed tests for the deleted removeReqID() function |
| pkg/issuer/acme/dns/azuredns/azuredns.go | Removed stabilizeError() function and NormalizedError type; updated error messages to use %w for proper error wrapping |
| pkg/issuer/acme/dns/azuredns/azuredns_test.go | Removed tests for the deleted stabilizeError() function |
| pkg/issuer/acme/dns/dns.go | Updated error message to use %w for proper error wrapping |
| pkg/controller/acmechallenges/sync.go | Added centralized stabilizeSolverErrorMessage() function and applied it to error messages stored in challenge status |
| pkg/controller/acmechallenges/sync_test.go | Added tests for the new stabilizeSolverErrorMessage() function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0f60d31 to
aae0a6c
Compare
4e10d3a to
6a943b5
Compare
6a943b5 to
52737d1
Compare
| // | ||
| // TODO(wallrj): Ideally this would not be necessary. It should be possible to | ||
| // add the unique error message to the status without triggering another | ||
| // reconcile. |
There was a problem hiding this comment.
I'll see if I can get that alternative approach working, in :
But that would be too big a change to backport to release-1.19.
This simpler change in this PR might be backported so that we can fix the problem now for Digital Ocean users.
This comment was marked as outdated.
This comment was marked as outdated.
|
/retest |
| } | ||
| fullMessage := err.Error() | ||
| { | ||
| var target *awshttp.ResponseError |
There was a problem hiding this comment.
Wouldn't it be easier to maintain to introduce some inversion of control (as already done in some other parts of the code base) and let each provider provides its own implementation?
There was a problem hiding this comment.
I considered it....for example I considered adding a new "StabilizeErrors" function to the "Provider" interface, but hesitated because it would imply a level of permanence and design endorsement that isn’t appropriate for a workaround. This PR is meant to be a cleanup and extension of existing hacks, not a new architectural direction.
Once the proper fix is in place, this workaround can and should be removed.
I've added a note the PR description (with the help of copilot).
Further testing# resources.yaml
---
# AWS Rout53
apiVersion: v1
kind: Namespace
metadata:
name: aws
---
apiVersion: v1
kind: Secret
metadata:
name: aws-credentials
namespace: aws
stringData:
AWS_ACCESS_KEY_ID: DEADBEEF
AWS_SECRET_ACCESS_KEY: DEADBEEF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: issuer-1
namespace: aws
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ${EMAIL_ADDRESS}
profile: tlsserver
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- dns01:
route53:
region: us-east-2
accessKeyIDSecretRef:
name: aws-credentials
key: AWS_ACCESS_KEY_ID
secretAccessKeySecretRef:
name: aws-credentials
key: AWS_SECRET_ACCESS_KEY
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: www
namespace: aws
spec:
secretName: www-tls
privateKey:
rotationPolicy: Always
dnsNames:
- aws.cert-manager.richard-gcp.jetstacker.net
usages:
- digital signature
- key encipherment
- server auth
issuerRef:
name: issuer-1
kind: Issuer
group: cert-manager.io
---
# Azure DNS
apiVersion: v1
kind: Namespace
metadata:
name: azure
---
apiVersion: v1
kind: Secret
metadata:
name: azure-credentials
namespace: azure
stringData:
SECRET: DEADBEEF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: issuer-1
namespace: azure
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ${EMAIL_ADDRESS}
profile: tlsserver
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- dns01:
azureDNS:
resourceGroupName: rg-1
subscriptionID: sub-1
clientID: client-1
tenantID: tenant-1
clientSecretSecretRef:
name: azure-credentials
key: SECRET
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: www
namespace: azure
spec:
secretName: www-tls
privateKey:
rotationPolicy: Always
dnsNames:
- azure.cert-manager.richard-gcp.jetstacker.net
usages:
- digital signature
- key encipherment
- server auth
issuerRef:
name: issuer-1
kind: Issuer
group: cert-manager.io
---
# Digital Ocean
apiVersion: v1
kind: Namespace
metadata:
name: digitalocean
---
apiVersion: v1
kind: Secret
metadata:
name: digitalocean-credentials
namespace: digitalocean
stringData:
TOKEN: DEADBEEF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: issuer-1
namespace: digitalocean
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: ${EMAIL_ADDRESS}
profile: tlsserver
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- dns01:
digitalocean:
tokenSecretRef:
name: digitalocean-credentials
key: TOKEN
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: www
namespace: digitalocean
spec:
secretName: www-tls
privateKey:
rotationPolicy: Always
dnsNames:
- do.cert-manager.richard-gcp.jetstacker.net
usages:
- digital signature
- key encipherment
- server auth
issuerRef:
name: issuer-1
kind: Issuer
group: cert-manager.io
$ kubectl describe challenge -A | fgrep -e Name -e Reason -e Warning
Name: www-1-2083194893-1034299529
Namespace: aws
Name: www-1-2083194893
Dns Name: aws.cert-manager.richard-gcp.jetstacker.net
Name: issuer-1
Name: aws-credentials
Name: aws-credentials
Reason: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, <redacted AWS SDK error: http.ResponseError: see events and logs for details>
Type Reason Age From Message
Warning PresentError 90s cert-manager-challenges Error presenting challenge: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, https response error StatusCode: 403, RequestID: 37cfa379-b028-4273-934e-c848c57935d9, api error InvalidClientTokenId: The security token included in the request is invalid.
Warning PresentError 90s cert-manager-challenges Error presenting challenge: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, https response error StatusCode: 403, RequestID: fc78b20c-40d3-44dd-8d6a-d2a4deb6df09, api error InvalidClientTokenId: The security token included in the request is invalid.
Warning PresentError 84s cert-manager-challenges Error presenting challenge: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, https response error StatusCode: 403, RequestID: a0dd686f-2fdb-483a-ab5c-3f1017cb7178, api error InvalidClientTokenId: The security token included in the request is invalid.
Warning PresentError 63s cert-manager-challenges Error presenting challenge: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, https response error StatusCode: 403, RequestID: d1294754-ca28-4378-9f9a-7829ff4daa40, api error InvalidClientTokenId: The security token included in the request is invalid.
Warning PresentError 22s cert-manager-challenges Error presenting challenge: failed to determine Route 53 hosted zone ID: operation error Route 53: ListHostedZonesByName, https response error StatusCode: 403, RequestID: 64c5325d-24a1-4f90-bb1d-1400eaa433a7, api error InvalidClientTokenId: The security token included in the request is invalid.
Name: www-1-137601487-3491019926
Namespace: azure
Name: www-1-137601487
Dns Name: azure.cert-manager.richard-gcp.jetstacker.net
Name: issuer-1
Name: azure-credentials
Resource Group Name: rg-1
Reason: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: <redacted Azure SDK error: azidentity.AuthenticationFailedError: see events and logs for details>
Type Reason Age From Message
Warning PresentError 90s cert-manager-challenges Error presenting challenge: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: ClientSecretCredential authentication failed.
Warning PresentError 90s cert-manager-challenges Error presenting challenge: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: ClientSecretCredential authentication failed.
Warning PresentError 85s cert-manager-challenges Error presenting challenge: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: ClientSecretCredential authentication failed.
Warning PresentError 65s cert-manager-challenges Error presenting challenge: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: ClientSecretCredential authentication failed.
Warning PresentError 24s cert-manager-challenges Error presenting challenge: Zone richard-gcp.jetstacker.net. not found in AzureDNS for domain _acme-challenge.azure.cert-manager.richard-gcp.jetstacker.net.. Err: ClientSecretCredential authentication failed.
Name: www-1-2131901473-3262237110
Namespace: digitalocean
Name: www-1-2131901473
Dns Name: do.cert-manager.richard-gcp.jetstacker.net
Name: issuer-1
Name: digitalocean-credentials
Reason: <redacted DigitalOcean SDK error: godo.ErrorResponse: see events and logs for details>
Type Reason Age From Message
Warning PresentError 91s cert-manager-challenges Error presenting challenge: GET https://api.digitalocean.com/v2/domains/richard-gcp.jetstacker.net/records?type=TXT: 401 (request "186967b9-8430-4ffb-be6f-2fdbf06f71d9") Unable to authenticate you
Warning PresentError 90s cert-manager-challenges Error presenting challenge: GET https://api.digitalocean.com/v2/domains/richard-gcp.jetstacker.net/records?type=TXT: 401 (request "467e41c6-c1f4-47f5-bd43-eee0c1793337") Unable to authenticate you
Warning PresentError 85s cert-manager-challenges Error presenting challenge: GET https://api.digitalocean.com/v2/domains/richard-gcp.jetstacker.net/records?type=TXT: 401 (request "b15242f4-f522-4ca4-b8db-afa5a0543f8d") Unable to authenticate you
Warning PresentError 65s cert-manager-challenges Error presenting challenge: GET https://api.digitalocean.com/v2/domains/richard-gcp.jetstacker.net/records?type=TXT: 401 (request "63f0a3be-fdca-48f7-9808-e82c7f606596") Unable to authenticate you
Warning PresentError 24s cert-manager-challenges Error presenting challenge: GET https://api.digitalocean.com/v2/domains/richard-gcp.jetstacker.net/records?type=TXT: 401 (request "42d9487b-0888-4090-aee7-19a053761dec") Unable to authenticate you |
- Redact AWS and Azure SDK http errors in controller error normalizer - Remove azuredns stabilizeError/NormalizedError and return original errors - Use %w when wrapping Azure errors so callers can redact them - Update tests to expect redacted messages and remove obsolete checks - Stabilize DigitalOcean ErrorResponse errors too Signed-off-by: Richard Wall <richard.wall@cyberark.com>
52737d1 to
8f2779c
Compare
|
/approve Thanks @wallrj-cyberark for tackling this long-standing issue & investing the time to find a more universal fix for these problems. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dunglas, inteon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/kind bug |
|
📢 A new pre-release is available which contains this fix or feature: Please test and report back. |
|
@wallrj-cyberark Will #7234 be fixed by your PR, too? I noticed you told users to try the alpha and give feedback in #6230 and #8166, but I haven't seen that same message in #7234. |
…#4581) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager/cert-manager](https://github.com/cert-manager/cert-manager) | minor | `v1.19.4` → `v1.20.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager/cert-manager)</summary> ### [`v1.20.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0) [Compare Source](cert-manager/cert-manager@v1.19.4...v1.20.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. v1.20.0 adds support for the new ListenerSet resource, adds support for Azure Private DNS; parentRefs are no longer required when using ACME with Gateway API, and OtherNames was promoted to Beta. #### Changes by Kind ##### Feature - Added a set of flags to permit setting NetworkPolicy across all deployed containers. Remove redundant global IP ranges from example policies. ([#​8370](cert-manager/cert-manager#8370), [@​jcpunk](https://github.com/jcpunk)) - Added selectable fields to custom resource definitions for .spec.issuerRef.{group, kind, name} ([#​8256](cert-manager/cert-manager#8256), [@​tareksha](https://github.com/tareksha)) - Added support for specifying `imagePullSecrets` in the `startupapicheck-job` Helm template to enable pulling images from private registries. ([#​8186](cert-manager/cert-manager#8186), [@​mathieu-clnk](https://github.com/mathieu-clnk)) - Added 'extraContainers' helm chart value, allowing the deployment of arbitrary sidecar containers within the cert-manager operator pod. This can be used to support, for e.g., AWS IAM Roles Anywhere for Route53 DNS01 verification. ([#​8355](cert-manager/cert-manager#8355), [@​dancmeyers](https://github.com/dancmeyers)) - Added `parentRef` override annotations on the Certificate resource. ([#​8518](cert-manager/cert-manager#8518), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for azure private zones for dns01 issuer. ([#​8494](cert-manager/cert-manager#8494), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for configuring PEM decoding size limits, allowing operators to handle larger certificates and keys. ([#​7642](cert-manager/cert-manager#7642), [@​robertlestak](https://github.com/robertlestak)) - Added support for unhealthyPodEvictionPolicy in PodDisruptionBudget ([#​7728](cert-manager/cert-manager#7728), [@​jcpunk](https://github.com/jcpunk)) - For Venafi provider, read `venafi.cert-manager.io/custom-fields` annotation on Issuer/ClusterIssuer and use it as base with override/append capabilities on Certificate level. ([#​8301](cert-manager/cert-manager#8301), [@​k0da](https://github.com/k0da)) - Improve error message when CA issuers are misconfigured to use a clashing secret name ([#​8374](cert-manager/cert-manager#8374), [@​majiayu000](https://github.com/majiayu000)) - Introduce a new Ingress annotation `acme.cert-manager.io/http01-ingress-ingressclassname` to override `http01.ingress.ingressClassName` field in HTTP-01 challenge solvers. ([#​8244](cert-manager/cert-manager#8244), [@​lunarwhite](https://github.com/lunarwhite)) - Update `global.nodeSelector` to helm chart to perform a `merge` and allow for a single `nodeSelector` to be set across all services. ([#​8195](cert-manager/cert-manager#8195), [@​StingRayZA](https://github.com/StingRayZA)) - Vault issuers will now include the Vault server address as one of the default audiences on generated service account tokens. ([#​8228](cert-manager/cert-manager#8228), [@​terinjokes](https://github.com/terinjokes)) - Added experimental `XListenerSet` feature gate ([#​8394](cert-manager/cert-manager#8394), [@​hjoshi123](https://github.com/hjoshi123)) ##### Documentation - Add GWAPI documentation to NOTES.TXT in helm chart ([#​8353](cert-manager/cert-manager#8353), [@​jaxels10](https://github.com/jaxels10)) ##### Bug or Regression - Adds logs for cases when acme server returns us a fatal error in the order controller ([#​8199](cert-manager/cert-manager#8199), [@​Peac36](https://github.com/Peac36)) - Fixed an issue where kind or group in the issuerRef of a Certificate was omitted, upgrading to 1.19.x incorrectly caused the certificate to be renewed ([#​8160](cert-manager/cert-manager#8160), [@​inteon](https://github.com/inteon)) - Changes to the Duration and RenewBefore annotations on ingress and gateway-api resources will now trigger certificate updates. ([#​8232](cert-manager/cert-manager#8232), [@​eleanor-merry](https://github.com/eleanor-merry)) - Fix an issue where ACME challenge TXT records are not cleaned up when there are many resource records in CloudDNS. ([#​8456](cert-manager/cert-manager#8456), [@​tkna](https://github.com/tkna)) - Fix unregulated retries with the DigitalOcean DNS-01 solver Add full detailed DNS-01 errors to the events attached to the Challenge, for easier debugging ([#​8221](cert-manager/cert-manager#8221), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Fixed an infinite re-issuance loop that could occur when an issuer returns a certificate with a public key that doesn't match the CSR. The issuing controller now validates the certificate before storing it and fails with backoff on mismatch. ([#​8403](cert-manager/cert-manager#8403), [@​calm329](https://github.com/calm329)) - Fixed an issue where HTTP-01 challenges failed when the Host header contains an IPv6 address. This means that users can now issue IP address certificates for IPv6 address subjects. ([#​8424](cert-manager/cert-manager#8424), [@​SlashNephy](https://github.com/SlashNephy)) - Fixed the HTTP-01 Gateway solver creating invalid HTTPRoutes by not setting spec.hostnames when the challenge DNSName is an IP address. ([#​8443](cert-manager/cert-manager#8443), [@​alviss7](https://github.com/alviss7)) - Revert API defaults for issuer reference kind and group introduced in 0.19.0 ([#​8173](cert-manager/cert-manager#8173), [@​erikgb](https://github.com/erikgb)) - Security (MODERATE): Fix a potential panic in the cert-manager controller when a DNS response in an unexpected order was cached. If an attacker was able to modify DNS responses (or if they controlled the DNS server) it was possible to cause denial of service for the cert-manager controller. ([#​8469](cert-manager/cert-manager#8469), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Update Go to `v1.25.5` to fix `CVE-2025-61727` and `CVE-2025-61729` ([#​8290](cert-manager/cert-manager#8290), [@​octo-sts](https://github.com/octo-sts)\[bot]) - When Prometheus monitoring is enabled, the metrics label is now set to the intended value of `cert-manager`. Previously, it was set depending on various factors (namespace cert-manager is installed in and/or Helm release name). ([#​8162](cert-manager/cert-manager#8162), [@​LiquidPL](https://github.com/LiquidPL)) ##### Other (Cleanup or Flake) - Promoted the OtherNames feature to Beta and enabled it by default ([#​8288](cert-manager/cert-manager#8288), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Promoting `xlistenerset` feature gate to `listenerset` ([#​8501](cert-manager/cert-manager#8501), [@​hjoshi123](https://github.com/hjoshi123)) - Rebranding of the Venafi Issuer to CyberArk ([#​8215](cert-manager/cert-manager#8215), [@​iossifbenbassat123](https://github.com/iossifbenbassat123)) - Switched to SSA for challenge finalizer updates ([#​8519](cert-manager/cert-manager#8519), [@​inteon](https://github.com/inteon)) - The default container user (UID) is now 65532 (previously 1000) and the default container group (GID) is now 65532 (previously 0) ([#​8408](cert-manager/cert-manager#8408), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - The feature-gate DefaultPrivateKeyRotationPolicyAlways moved from Beta to GA and can no longer be disabled. ([#​8287](cert-manager/cert-manager#8287), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Update cert-manager's ACME client, forked from golang/x/crypto ([#​8268](cert-manager/cert-manager#8268), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Use the latest version of Kyverno (1.16.2) in the best-practice installation tests ([#​8389](cert-manager/cert-manager#8389), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - We stopped testing with Coutour due to it not supporting the new XListenerSet resource, and moved to kgateway. ([#​8426](cert-manager/cert-manager#8426), [@​hjoshi123](https://github.com/hjoshi123)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiaW1hZ2UiXX0=--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4581 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager](https://cert-manager.io) ([source](https://github.com/cert-manager/cert-manager)) | minor | `v1.19.4` → `v1.20.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager)</summary> ### [`v1.20.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.20.0) [Compare Source](cert-manager/cert-manager@v1.19.4...v1.20.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. v1.20.0 adds support for the new ListenerSet resource, adds support for Azure Private DNS; parentRefs are no longer required when using ACME with Gateway API, and OtherNames was promoted to Beta. #### Changes by Kind ##### Feature - Added a set of flags to permit setting NetworkPolicy across all deployed containers. Remove redundant global IP ranges from example policies. ([#​8370](cert-manager/cert-manager#8370), [@​jcpunk](https://github.com/jcpunk)) - Added selectable fields to custom resource definitions for .spec.issuerRef.{group, kind, name} ([#​8256](cert-manager/cert-manager#8256), [@​tareksha](https://github.com/tareksha)) - Added support for specifying `imagePullSecrets` in the `startupapicheck-job` Helm template to enable pulling images from private registries. ([#​8186](cert-manager/cert-manager#8186), [@​mathieu-clnk](https://github.com/mathieu-clnk)) - Added 'extraContainers' helm chart value, allowing the deployment of arbitrary sidecar containers within the cert-manager operator pod. This can be used to support, for e.g., AWS IAM Roles Anywhere for Route53 DNS01 verification. ([#​8355](cert-manager/cert-manager#8355), [@​dancmeyers](https://github.com/dancmeyers)) - Added `parentRef` override annotations on the Certificate resource. ([#​8518](cert-manager/cert-manager#8518), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for azure private zones for dns01 issuer. ([#​8494](cert-manager/cert-manager#8494), [@​hjoshi123](https://github.com/hjoshi123)) - Added support for configuring PEM decoding size limits, allowing operators to handle larger certificates and keys. ([#​7642](cert-manager/cert-manager#7642), [@​robertlestak](https://github.com/robertlestak)) - Added support for unhealthyPodEvictionPolicy in PodDisruptionBudget ([#​7728](cert-manager/cert-manager#7728), [@​jcpunk](https://github.com/jcpunk)) - For Venafi provider, read `venafi.cert-manager.io/custom-fields` annotation on Issuer/ClusterIssuer and use it as base with override/append capabilities on Certificate level. ([#​8301](cert-manager/cert-manager#8301), [@​k0da](https://github.com/k0da)) - Improve error message when CA issuers are misconfigured to use a clashing secret name ([#​8374](cert-manager/cert-manager#8374), [@​majiayu000](https://github.com/majiayu000)) - Introduce a new Ingress annotation `acme.cert-manager.io/http01-ingress-ingressclassname` to override `http01.ingress.ingressClassName` field in HTTP-01 challenge solvers. ([#​8244](cert-manager/cert-manager#8244), [@​lunarwhite](https://github.com/lunarwhite)) - Update `global.nodeSelector` to helm chart to perform a `merge` and allow for a single `nodeSelector` to be set across all services. ([#​8195](cert-manager/cert-manager#8195), [@​StingRayZA](https://github.com/StingRayZA)) - Vault issuers will now include the Vault server address as one of the default audiences on generated service account tokens. ([#​8228](cert-manager/cert-manager#8228), [@​terinjokes](https://github.com/terinjokes)) - Added experimental `XListenerSet` feature gate ([#​8394](cert-manager/cert-manager#8394), [@​hjoshi123](https://github.com/hjoshi123)) ##### Documentation - Add GWAPI documentation to NOTES.TXT in helm chart ([#​8353](cert-manager/cert-manager#8353), [@​jaxels10](https://github.com/jaxels10)) ##### Bug or Regression - Adds logs for cases when acme server returns us a fatal error in the order controller ([#​8199](cert-manager/cert-manager#8199), [@​Peac36](https://github.com/Peac36)) - Fixed an issue where kind or group in the issuerRef of a Certificate was omitted, upgrading to 1.19.x incorrectly caused the certificate to be renewed ([#​8160](cert-manager/cert-manager#8160), [@​inteon](https://github.com/inteon)) - Changes to the Duration and RenewBefore annotations on ingress and gateway-api resources will now trigger certificate updates. ([#​8232](cert-manager/cert-manager#8232), [@​eleanor-merry](https://github.com/eleanor-merry)) - Fix an issue where ACME challenge TXT records are not cleaned up when there are many resource records in CloudDNS. ([#​8456](cert-manager/cert-manager#8456), [@​tkna](https://github.com/tkna)) - Fix unregulated retries with the DigitalOcean DNS-01 solver Add full detailed DNS-01 errors to the events attached to the Challenge, for easier debugging ([#​8221](cert-manager/cert-manager#8221), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Fixed an infinite re-issuance loop that could occur when an issuer returns a certificate with a public key that doesn't match the CSR. The issuing controller now validates the certificate before storing it and fails with backoff on mismatch. ([#​8403](cert-manager/cert-manager#8403), [@​calm329](https://github.com/calm329)) - Fixed an issue where HTTP-01 challenges failed when the Host header contains an IPv6 address. This means that users can now issue IP address certificates for IPv6 address subjects. ([#​8424](cert-manager/cert-manager#8424), [@​SlashNephy](https://github.com/SlashNephy)) - Fixed the HTTP-01 Gateway solver creating invalid HTTPRoutes by not setting spec.hostnames when the challenge DNSName is an IP address. ([#​8443](cert-manager/cert-manager#8443), [@​alviss7](https://github.com/alviss7)) - Revert API defaults for issuer reference kind and group introduced in 0.19.0 ([#​8173](cert-manager/cert-manager#8173), [@​erikgb](https://github.com/erikgb)) - Security (MODERATE): Fix a potential panic in the cert-manager controller when a DNS response in an unexpected order was cached. If an attacker was able to modify DNS responses (or if they controlled the DNS server) it was possible to cause denial of service for the cert-manager controller. ([#​8469](cert-manager/cert-manager#8469), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Update Go to `v1.25.5` to fix `CVE-2025-61727` and `CVE-2025-61729` ([#​8290](cert-manager/cert-manager#8290), [@​octo-sts](https://github.com/octo-sts)\[bot]) - When Prometheus monitoring is enabled, the metrics label is now set to the intended value of `cert-manager`. Previously, it was set depending on various factors (namespace cert-manager is installed in and/or Helm release name). ([#​8162](cert-manager/cert-manager#8162), [@​LiquidPL](https://github.com/LiquidPL)) ##### Other (Cleanup or Flake) - Promoted the OtherNames feature to Beta and enabled it by default ([#​8288](cert-manager/cert-manager#8288), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Promoting `xlistenerset` feature gate to `listenerset` ([#​8501](cert-manager/cert-manager#8501), [@​hjoshi123](https://github.com/hjoshi123)) - Rebranding of the Venafi Issuer to CyberArk ([#​8215](cert-manager/cert-manager#8215), [@​iossifbenbassat123](https://github.com/iossifbenbassat123)) - Switched to SSA for challenge finalizer updates ([#​8519](cert-manager/cert-manager#8519), [@​inteon](https://github.com/inteon)) - The default container user (UID) is now 65532 (previously 1000) and the default container group (GID) is now 65532 (previously 0) ([#​8408](cert-manager/cert-manager#8408), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - The feature-gate DefaultPrivateKeyRotationPolicyAlways moved from Beta to GA and can no longer be disabled. ([#​8287](cert-manager/cert-manager#8287), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - Update cert-manager's ACME client, forked from golang/x/crypto ([#​8268](cert-manager/cert-manager#8268), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Use the latest version of Kyverno (1.16.2) in the best-practice installation tests ([#​8389](cert-manager/cert-manager#8389), [@​wallrj-cyberark](https://github.com/wallrj-cyberark)) - We stopped testing with Coutour due to it not supporting the new XListenerSet resource, and moved to kgateway. ([#​8426](cert-manager/cert-manager#8426), [@​hjoshi123](https://github.com/hjoshi123)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41OS4yIiwidXBkYXRlZEluVmVyIjoiNDMuNTkuMiIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiY2hhcnQiXX0=--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/4582 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Fixes: #8166
Fixes: #6230
I've moved the error-message "stabilization" closer to the place where it is needed; so that it is clearer that redaction is only necessary for certain error messages and only where they are used in the
Challenge.Status.Reasonvalue.This has the added benefit that the unredacted errors can be added to events, where previously, the user would have to examine the logs to see the trace IDs and response details for debugging.
pkg/controller/acmechallenges/sync.go.fmt.Errorf("%s", err)calls to%wso errors can be inspected witherrors.As./kind cleanup
xrefs
%w. I originally considered it a leak abstraction, but this PR is an example of a case where it is useful to be able to reach the wrapped errors from a higher function. Afterall, redacting the errors in the lower functions so as not to cause problems for the parent reconciler is also a leaky abstraction.Testing
I've built Docker images from this branch and published them to my GitHub packages:
Install cert-manager:
helm upgrade test cert-manager \ --repo https://charts.jetstack.io \ --version 1.19.1 \ --install \ --create-namespace \ --namespace cert-manager \ --set global.logLevel=4 \ --set crds.enabled=true \ --set image.repository=ghcr.io/wallrj-cyberark/cert-manager-controller \ --set image.tag=v1.19.0-51-g52737d18b2f780Create an AWS Route 53 Issuer with a bad credential:
Observe the redacted error in the Reason field and the actual errors in the events and notice the backoff interval between the events.
And observe the detailed errors in the logs: