Increase maximum PEM sizes for leaf certificates and chains#7961
Conversation
We've seen in cert-manager#7506 that the PEM size maxima we have can be too restrictive for some users who have large numbers of DNS identities in their leaf certificates, preventing them from using cert-manager for those certificates (or stopping them from upgrading). This change adds a large estimate for extra data which might be present in a leaf certificate, based on an experimentally observed increase in the size of a certificate request object by adding new DNS names. We assume 250 large DNS names, totalling about 30kB of size. We shouldn't expect to see a performance change in our existing benchmark with these changes, because the maxBundleSize is used for those tests and that size remains larger than the new maxCertificateChainSize. That seems to be the case from testing: ```console $ go test -bench=. ./internal/pem/... goos: darwin goarch: arm64 pkg: github.com/cert-manager/cert-manager/internal/pem cpu: Apple M3 Max BenchmarkPathologicalInput-16 52 22606764 ns/op PASS ok github.com/cert-manager/cert-manager/internal/pem 2.485s $ go test -bench=. ./internal/pem/... goos: darwin goarch: arm64 pkg: github.com/cert-manager/cert-manager/internal/pem cpu: Apple M3 Max BenchmarkPathologicalInput-16 52 22455893 ns/op PASS ok github.com/cert-manager/cert-manager/internal/pem 2.408s ``` Signed-off-by: Ashley Davis <ashley.davis@cyberark.com>
|
By way of testing, I created a chain in cert-manager using 4096-bit RSA keys exclusively and with a leaf certificate with 350 identities in (attached). First I tested with 200 identities, and the size of the whole chain (2x CA certs, 1x leaf cert) was 16320B (16.3kB). The size of the full chain in PEM format with 350 identities was 24.4kB: $ kubectl get -n sandbox secrets myleaf -ojson | jq -r '.data."tls.crt"' | base64 -d | wc -c
24445The leaf certificate on its own was 20757B (21kB) and the two CA certs in the chain together were 3688B (3.7kB). This PR currently raises the size limit for the whole chain to |
|
/cherry-pick release-1.17 |
|
@SgtCoDFish: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-1.18 |
|
@SgtCoDFish: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| // testing the worst case with pathologicalFuzzFile. This guards against future changes making these tests invalid; | ||
| // e.g. if, maxCertificateChainSize actually became the largest we accept, we'd want to test against that instead. | ||
| if largestLimit < maxPrivateKeyPEMSize || largestLimit < maxCertificateChainSize { | ||
| panic(fmt.Errorf("invalid test: expected max cert bundle size %d to be larger than maxPrivateKeyPEMSize %d and maxCertificateChainSize %d", maxBundleSize, maxPrivateKeyPEMSize, maxCertificateChainSize)) |
There was a problem hiding this comment.
I can see that this was not changed in this PR, but what does the term "invalid test" refer to in this case? Seems like a code assert to me.
There was a problem hiding this comment.
Invalid test means that the assumption the test is built on is no longer valid - so, the test is meant to try decoding using the largest max size we have available, to see how the worst possible SafeDecode* function performs. Before and after this PR, the worst possible is trust bundles.
This check guards against the max chain size growing larger than the max trust bundle size due to an unrelated change. In that case we'd want to make the test against the max chain size instead of the max bundle size.
largestLimit is used for the benchmark tests, which can be used to check for regressions in the fix we made for PEM decoding.
erikgb
left a comment
There was a problem hiding this comment.
/lgtm
/approve
/hold
Adding a hold in case you want to rephrase the validation test. Thanks!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: erikgb The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold Thank you for the review! I think I could probably reword this stuff better - these files are very wordy - but I think for now this is an improvement and I'm going to get stuck into the cherry picks. |
|
@SgtCoDFish: #7961 failed to apply on top of branch "release-1.17": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@SgtCoDFish: new pull request created: #7966 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [cert-manager](https://cert-manager.io) ([source](https://github.com/cert-manager/cert-manager)) | minor | `v1.18.2` -> `v1.19.0` | --- ### Release Notes <details> <summary>cert-manager/cert-manager (cert-manager)</summary> ### [`v1.19.0`](https://github.com/cert-manager/cert-manager/releases/tag/v1.19.0) [Compare Source](cert-manager/cert-manager@v1.18.2...v1.19.0) cert-manager is the easiest way to automatically manage certificates in Kubernetes and OpenShift clusters. This release focuses on expanding platform compatibility, improving deployment flexibility, enhancing observability, and addressing key reliability issues. > 📖 Read the full release notes at cert-manager.io: <https://cert-manager.io/docs/releases/release-notes/release-notes-1.19> Changes since `v1.18.0`: #### Feature - Add IPv6 rules to the default network policy ([#​7726](cert-manager/cert-manager#7726), [@​jcpunk](https://github.com/jcpunk)) - Add `global.nodeSelector` to helm chart to allow for a single `nodeSelector` to be set across all services. ([#​7818](cert-manager/cert-manager#7818), [@​StingRayZA](https://github.com/StingRayZA)) - Add a feature gate to default to Ingress `pathType` `Exact` in ACME HTTP01 Ingress challenge solvers. ([#​7795](cert-manager/cert-manager#7795), [@​sspreitzer](https://github.com/sspreitzer)) - Add generated `applyconfigurations` allowing clients to make type-safe server-side apply requests for cert-manager resources. ([#​7866](cert-manager/cert-manager#7866), [@​erikgb](https://github.com/erikgb)) - Added API defaults to issuer references group (cert-manager.io) and kind (Issuer). ([#​7414](cert-manager/cert-manager#7414), [@​erikgb](https://github.com/erikgb)) - Added `certmanager_certificate_challenge_status` Prometheus metric. ([#​7736](cert-manager/cert-manager#7736), [@​hjoshi123](https://github.com/hjoshi123)) - Added `protocol` field for `rfc2136` DNS01 provider ([#​7881](cert-manager/cert-manager#7881), [@​hjoshi123](https://github.com/hjoshi123)) - Added experimental field `hostUsers` flag to all pods. Not set by default. ([#​7973](cert-manager/cert-manager#7973), [@​hjoshi123](https://github.com/hjoshi123)) - Support configurable resource requests and limits for ACME HTTP01 solver pods through ClusterIssuer and Issuer specifications, allowing granular resource management that overrides global `--acme-http01-solver-resource-*` settings. ([#​7972](cert-manager/cert-manager#7972), [@​lunarwhite](https://github.com/lunarwhite)) - The `CAInjectorMerging` feature has been promoted to BETA and is now enabled by default ([#​8017](cert-manager/cert-manager#8017), [@​ThatsMrTalbot](https://github.com/ThatsMrTalbot)) - The controller, webhook and ca-injector now log their version and git commit on startup for easier debugging and support. ([#​8072](cert-manager/cert-manager#8072), [@​prasad89](https://github.com/prasad89)) - Updated `certificate` metrics to the collector approach. ([#​7856](cert-manager/cert-manager#7856), [@​hjoshi123](https://github.com/hjoshi123)) #### Bug or Regression - ACME: Increased challenge authorization timeout to 2 minutes to fix `error waiting for authorization` ([#​7796](cert-manager/cert-manager#7796), [@​hjoshi123](https://github.com/hjoshi123)) - BUGFIX: permitted URI domains were incorrectly used to set the excluded URI domains in the CSR's name constraints ([#​7816](cert-manager/cert-manager#7816), [@​kinolaev](https://github.com/kinolaev)) - Enforced ACME HTTP-01 solver validation to properly reject configurations when multiple ingress options (`class`, `ingressClassName`, `name`) are specified simultaneously ([#​8021](cert-manager/cert-manager#8021), [@​lunarwhite](https://github.com/lunarwhite)) - Increase maximum sizes of PEM certificates and chains which can be parsed in cert-manager, to handle leaf certificates with large numbers of DNS names or other identities ([#​7961](cert-manager/cert-manager#7961), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Reverted adding the `global.rbac.disableHTTPChallengesRole` Helm option. ([#​7836](cert-manager/cert-manager#7836), [@​inteon](https://github.com/inteon)) - This change removes the `path` label of core ACME client metrics and will require users to update their monitoring dashboards and alerting rules if using those metrics. ([#​8109](cert-manager/cert-manager#8109), [@​mladen-rusev-cyberark](https://github.com/mladen-rusev-cyberark)) - Use the latest version of `ingress-nginx` in E2E tests to ensure compatibility ([#​7792](cert-manager/cert-manager#7792), [@​wallrj](https://github.com/wallrj)) #### Other (Cleanup or Flake) - Helm: Fix naming template of `tokenrequest` RoleBinding resource to improve consistency ([#​7761](cert-manager/cert-manager#7761), [@​lunarwhite](https://github.com/lunarwhite)) - Improve error messages when certificates, CRLs or private keys fail admission due to malformed or missing PEM data ([#​7928](cert-manager/cert-manager#7928), [@​SgtCoDFish](https://github.com/SgtCoDFish)) - Major upgrade of Akamai SDK. NOTE: The new version has not been fully tested end-to-end due to the lack of cloud infrastructure. ([#​8003](cert-manager/cert-manager#8003), [@​hjoshi123](https://github.com/hjoshi123)) - Update kind images to include the Kubernetes 1.33 node image ([#​7786](cert-manager/cert-manager#7786), [@​wallrj](https://github.com/wallrj)) - Use `maps.Copy` for cleaner map handling ([#​8092](cert-manager/cert-manager#8092), [@​quantpoet](https://github.com/quantpoet)) - Vault: Migrate Vault E2E add-on tests from deprecated `vault-client-go` to the new `vault/api` client. ([#​8059](cert-manager/cert-manager#8059), [@​armagankaratosun](https://github.com/armagankaratosun)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xMzUuNCIsInVwZGF0ZWRJblZlciI6IjQxLjEzNS40IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJjaGFydCJdfQ==--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1711 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
|
@SgtCoDFish We have released this. Please test and feedback: https://github.com/cert-manager/cert-manager/releases/tag/v1.19.1 |
We've seen in #7506 that the PEM size maxima we have can be too restrictive for some users who have large numbers of DNS identities in their leaf certificates, preventing them from using cert-manager for those certificates (or stopping them from upgrading).
This change adds a large estimate for extra data which might be present in a leaf certificate, based on an experimentally observed increase in the size of a certificate request object by adding new DNS names.
We assume 250 large DNS names, totalling about 30kB of size.
We shouldn't expect to see a performance change in our existing benchmark with these changes, because the maxBundleSize is used for those tests and that size remains larger than the new maxCertificateChainSize. That seems to be the case from testing:
Kind
/kind bug
Release Note