Is your feature request related to a problem? Please describe.
We recently had someone change a role in AWS IAM and it stopped DNS01 challenges via Route53 from working correctly with the following error repeating several times per second:
E0814 12:49:27.775393 1 sync.go:282] "error cleaning up challenge" err=<
error instantiating route53 challenge solver: unable to assume role: AccessDenied: User: arn:aws:iam::XXXXXXXXXXXX:user/REDACTED is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role/REDACTED
status code: 403, request id: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
> logger="cert-manager.challenges.finalizer" resource_name="REDACTED" resource_namespace="REDACTED" resource_kind="Challenge" resource_version="v1" dnsName="REDACTED" type="DNS-01"
Describe the solution you'd like
If a Challenge fails it should be deleted after a definable timeout and allow the operator to recreate the Challenge (which fixed our issue because the ambient credentials annotation had been updated on the service account)
Describe alternatives you've considered
Challenges could also potentially not store the role ARN and look it up on each run
Additional context
- An AWS role was changed making ambient credentials stop working
- Challenges were created which were stuck failing over and over
- The ambient credentials annotation was fixed and the role restored
- Existing challenges continued to fail until they were manually deleted and allowed to be recreated
Environment details (remove if not applicable):
- Kubernetes version: 1.28.9
- Cloud-provider/provisioner: AWS/OpenShift
- cert-manager version: v1.1.0
- Install method: Operator Lifecycle Manager (OLM)
/kind feature
Is your feature request related to a problem? Please describe.
We recently had someone change a role in AWS IAM and it stopped DNS01 challenges via Route53 from working correctly with the following error repeating several times per second:
Describe the solution you'd like
If a Challenge fails it should be deleted after a definable timeout and allow the operator to recreate the Challenge (which fixed our issue because the ambient credentials annotation had been updated on the service account)
Describe alternatives you've considered
Challenges could also potentially not store the role ARN and look it up on each run
Additional context
Environment details (remove if not applicable):
/kind feature