feat: Add resource requirements support to ACME HTTP01 solver pod templates#7972
Conversation
|
Hi @lunarwhite. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
cdd839f to
879add0
Compare
879add0 to
cd48873
Compare
|
Why do we need API support for this? Wouldn't the requirements be the same for all solvers pods? |
|
@erikgb Thanks for looking into this. I think you are sharing the same concern with @SgtCoDFish: https://kubernetes.slack.com/archives/CDEQJ0Q8M/p1754994793323799?thread_ts=1754971216.744569&cid=CDEQJ0Q8M. I still believe it's a good-to-have capability that brings more flexibility. Apart from what I documented in #7825, here are other points that I can think of:
I understand it may seem overkill, but other common pod template fields (nodeSelector, tolerations, etc) are already configurable per-issuer/solver. I'm also open to any feedback |
deploy/charts/cert-manager/templates/crd-acme.cert-manager.io_challenges.yaml
Outdated
Show resolved
Hide resolved
deploy/charts/cert-manager/templates/crd-acme.cert-manager.io_challenges.yaml
Outdated
Show resolved
Hide resolved
deploy/charts/cert-manager/templates/crd-acme.cert-manager.io_challenges.yaml
Outdated
Show resolved
Hide resolved
deploy/charts/cert-manager/templates/crd-acme.cert-manager.io_challenges.yaml
Outdated
Show resolved
Hide resolved
e94e8bc to
45f5ad5
Compare
|
/ok-to-test |
This enables users to specify custom resource requirements for ACME HTTP01 solver pods through the issuer configuration. It provides granular per-issuer/solver resource settings and allows users to override global `--acme-http01-solver-resource-*` controller flags. Signed-off-by: Yuedong Wu <dwcn22@outlook.com>
45f5ad5 to
5eaacd9
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR enables users to specify custom resource requirements for ACME HTTP01 solver pods through the issuer configuration. It provides granular per-issuer/solver resource settings and allows users to override global --acme-http01-solver-resource-* controller flags.
- Adds a new
Resourcesfield toACMEChallengeSolverHTTP01IngressPodSpecAPI structures - Updates resource merging logic to prioritize issuer-specific values over global defaults
- Includes comprehensive unit tests to verify precedence handling
Reviewed Changes
Copilot reviewed 15 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/apis/acme/v1/types_issuer.go | Adds Resources field to public API spec |
| internal/apis/acme/types_issuer.go | Adds Resources field to internal API spec and updates documentation |
| pkg/issuer/acme/http/pod.go | Implements resource merging logic with precedence handling |
| pkg/issuer/acme/http/pod_test.go | Adds comprehensive unit tests for resource precedence scenarios |
| Multiple CRD files | Updates CRDs to include the new resources field schema |
| Generated files | Auto-generated deepcopy, conversion, and client code updates |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
erikgb
left a comment
There was a problem hiding this comment.
This looks really good! I am just not sure if I understand the rationale for having this as part of the API. Shouldn't the solver pod have the same resource requirements for any issuer? The change increases the API surface considerably, which scares me a bit. I will discuss this with others joining our stand-up today. Feel free to join if you can, @lunarwhite!
/lgtm
erikgb
left a comment
There was a problem hiding this comment.
I still think this feature makes sense. 🚀 I've added some serialization nit comments.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 17 out of 18 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…and Limits fields, rather than fully importing `ResourceRequirements` type from k8s.io/api/core/v1, in order to maintain only essential API surfaces Signed-off-by: Yuedong Wu <dwcn22@outlook.com>
6264804 to
37584ca
Compare
erikgb
left a comment
There was a problem hiding this comment.
/lgtm
But I still need support from at least one other maintainer to merge this.
|
/hold |
| // Note that when only specifying resource limits, ensure they are greater than or equal | ||
| // to the corresponding global resource requests configured via controller flags | ||
| // (--acme-http01-solver-resource-request-cpu, --acme-http01-solver-resource-request-memory). | ||
| // Kubernetes will reject pod creation if limits are lower than requests, causing challenge failures. |
There was a problem hiding this comment.
Re: @erikgb As per your suggestion here.
This is a valid scenario exactly (e.g. the resource limit of 32Mi set in the pod template is lower than the request value of 64Mi set in the global flag, which would result in a solver pod creation failure). IIUC, you are suggesting adding validations to detect such invalid combinations during the Issuer/ClusterIssuer creation phase, correct?
I tried to implement this with minimal changes, but I found that we cannot avoid duplicating what the K8s native API validation already handles. A more complex issue is that the cert-manager webhook validation framework does not have access to the controller config (global defaults) as per current design. We may also not really want to introduce any non-trivial arch changes to pass the config context through the validation chain.
So I end up making this trade-off - just added above "Note that ..." in the API fields, to guide users to configure carefully and avoid conflicts or violations in such scenarios. What do you think?
There was a problem hiding this comment.
I think this is good for the initial PR. We probably have more potential "footgun" solutions hiding in dark corners. Can be improved later on, and is a known issue with resource requirements in general.
wallrj-cyberark
left a comment
There was a problem hiding this comment.
Thanks @lunarwhite for persevering with this.
And thanks for taking the time to show how you've tested this E2E.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: erikgb, wallrj, wallrj-cyberark The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold Since @wallrj has also reviewed this now. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager](https://cert-manager.io) ([source](https://github.com/cert-manager/cert-manager)) | minor | `v1.18.2` -> `v1.19.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager)</summary> ### [`v1.19.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.19.0) [Compare Source](cert-manager/cert-manager@v1.18.2...v1.19.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. This release focuses on expanding platform compatibility, improving deployment flexibility, enhancing observability, and addressing key reliability issues. > 📖 Read the full release notes at cert-manager.io: <https://cert-manager.io/docs/releases/release-notes/release-notes-1.19> Changes since `v1.18.0`: #### Feature - Add IPv6 rules to the default network policy ([#​7726](cert-manager/cert-manager#7726), [@​jcpunk](https://github.com/jcpunk)) - Add `global.nodeSelector` to helm chart to allow for a single `nodeSelector` to be set across all services. ([#​7818](cert-manager/cert-manager#7818), [@​StingRayZA](https://github.com/StingRayZA)) - Add a feature gate to default to Ingress `pathType` `Exact` in ACME HTTP01 Ingress challenge solvers. ([#​7795](cert-manager/cert-manager#7795), [@​sspreitzer](https://github.com/sspreitzer)) - Add generated `applyconfigurations` allowing clients to make type-safe server-side apply requests for cert-manager resources. ([#​7866](cert-manager/cert-manager#7866), [@​erikgb](https://github.com/erikgb)) - Added API defaults to issuer references group (cert-manager.io) and kind (Issuer). ([#​7414](cert-manager/cert-manager#7414), [@​erikgb](https://github.com/erikgb)) - Added `certmanager_certificate_challenge_status` Prometheus metric. ([#​7736](cert-manager/cert-manager#7736), [@​hjoshi123](https://github.com/hjoshi123)) - Added `protocol` field for `rfc2136` DNS01 provider ([#​7881](cert-manager/cert-manager#7881), [@​hjoshi123](https://github.com/hjoshi123)) - Added experimental field `hostUsers` flag to all pods. Not set by default. ([#​7973](cert-manager/cert-manager#7973), [@​hjoshi123](https://github.com/hjoshi123)) - Support configurable resource requests and limits for ACME HTTP01 solver pods through ClusterIssuer and Issuer specifications, allowing granular resource management that overrides global `--acme-http01-solver-resource-*` settings. ([#​7972](cert-manager/cert-manager#7972), [@​lunarwhite](https://github.com/lunarwhite)) - The `CAInjectorMerging` feature has been promoted to BETA and is now enabled by default ([#​8017](cert-manager/cert-manager#8017), [@​ThatsMrTalbot](https://github.com/ThatsMrTalbot)) - The controller, webhook and ca-injector now log their version and git commit on startup for easier debugging and support. ([#​8072](cert-manager/cert-manager#8072), [@​prasad89](https://github.com/prasad89)) - Updated `certificate` metrics to the collector approach. ([#​7856](cert-manager/cert-manager#7856), [@​hjoshi123](https://github.com/hjoshi123)) #### Bug or Regression - ACME: Increased challenge authorization timeout to 2 minutes to fix `error waiting for authorization` ([#​7796](cert-manager/cert-manager#7796), [@​hjoshi123](https://github.com/hjoshi123)) - BUGFIX: permitted URI domains were incorrectly used to set the excluded URI domains in the CSR's name constraints ([#​7816](cert-manager/cert-manager#7816), [@​kinolaev](https://github.com/kinolaev)) - Enforced ACME HTTP-01 solver validation to properly reject configurations when multiple ingress options (`class`, `ingressClassName`, `name`) are specified simultaneously ([#​8021](cert-manager/cert-manager#8021), [@​lunarwhite](https://github.com/lunarwhite)) - Increase maximum sizes of PEM certificates and chains which can be parsed in cert-manager, to handle leaf certificates with large numbers of DNS names or other identities ([#​7961](cert-manager/cert-manager#7961), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Reverted adding the `global.rbac.disableHTTPChallengesRole` Helm option. ([#​7836](cert-manager/cert-manager#7836), [@​inteon](https://github.com/inteon)) - This change removes the `path` label of core ACME client metrics and will require users to update their monitoring dashboards and alerting rules if using those metrics. ([#​8109](cert-manager/cert-manager#8109), [@​mladen-rusev-cyberark](https://github.com/mladen-rusev-cyberark)) - Use the latest version of `ingress-nginx` in E2E tests to ensure compatibility ([#​7792](cert-manager/cert-manager#7792), [@​wallrj](https://github.com/wallrj)) #### Other (Cleanup or Flake) - Helm: Fix naming template of `tokenrequest` RoleBinding resource to improve consistency ([#​7761](cert-manager/cert-manager#7761), [@​lunarwhite](https://github.com/lunarwhite)) - Improve error messages when certificates, CRLs or private keys fail admission due to malformed or missing PEM data ([#​7928](cert-manager/cert-manager#7928), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Major upgrade of Akamai SDK. NOTE: The new version has not been fully tested end-to-end due to the lack of cloud infrastructure. ([#​8003](cert-manager/cert-manager#8003), [@​hjoshi123](https://github.com/hjoshi123)) - Update kind images to include the Kubernetes 1.33 node image ([#​7786](cert-manager/cert-manager#7786), [@​wallrj](https://github.com/wallrj)) - Use `maps.Copy` for cleaner map handling ([#​8092](cert-manager/cert-manager#8092), [@​quantpoet](https://github.com/quantpoet)) - Vault: Migrate Vault E2E add-on tests from deprecated `vault-client-go` to the new `vault/api` client. ([#​8059](cert-manager/cert-manager#8059), [@​armagankaratosun](https://github.com/armagankaratosun)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xMzUuNCIsInVwZGF0ZWRJblZlciI6IjQxLjEzNS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJjaGFydCJdfQ==--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1711 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
|
@lunarwhite We have released this. Please test and feedback: https://github.com/cert-manager/cert-manager/releases/tag/v1.19.1 |
This PR enables users to specify custom resource requirements for ACME HTTP01 solver pods through the issuer configuration. It provides granular per-issuer/solver resource settings and allows users to override global
--acme-http01-solver-resource-*controller flags.Resourcesfield toACMEChallengeSolverHTTP01IngressPodSpecin both internal and public APIs.mergePodObjectMetaWithPodTemplatelogic to use issuer-specific resource values when available, falling back toACMEOptionsvalues derived from global flags.Pull Request Motivation
Closes #7825
E2E Verification
pod template with non-default resource requirements set
pod template with partial resource requirements set
pod template with empty resource requirements set
Same tests were conducted with relevant flags configured explicitly:
Kind
/kind feature
Release Note