Skip to content

cert-manager crash looping in AKS #5195

@Danpiel

Description

@Danpiel

Describe the bug:
The cert-manager container crash loops, nothing in kubernetes event log (except Backoff of crashloops).
Nothing was changed to force the crashes.

In cert-manager log

I0609 08:14:41.294169       1 sync.go:81] cert-manager/certificaterequests-issuer-ca "msg"="certificate request Ready condition true so skipping processing" "resource_kind"="CertificateRequest" "resource_name"="domain-tls-b9s5w" "resource_namespace"="domain-tls" "resource_version"="v1" 
I0609 08:14:41.294877       1 controller.go:173] cert-manager/certificaterequests-issuer-vault "msg"="finished processing work item" "key"="domain-tls/domain-tls-c4zn6" 
I0609 08:14:41.295249       1 controller.go:173] cert-manager/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="domain-tls/domain-tls-tls-b9s5w" 
I0609 08:14:41.295281       1 sync.go:81] cert-manager/certificaterequests-issuer-ca "msg"="certificate request Ready condition true so skipping processing" "resource_kind"="CertificateRequest" "resource_name"="domain-tls-cw8rz" "resource_namespace"="default" "resource_version"="v1" 
I0609 08:14:41.295363       1 sync.go:81] cert-manager/certificaterequests-issuer-ca "msg"="certificate request Ready condition true so skipping processing" "resource_kind"="CertificateRequest" "resource_name"="domain-tls-m8rz9" "resource_namespace"="default" "resource_version"="v1" 
I0609 08:14:41.296455       1 controller.go:173] cert-manager/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="default/domain-tls-cw8rz" 
I0609 08:14:41.296634       1 controller.go:173] cert-manager/certificaterequests-issuer-vault "msg"="finished processing work item" "key"="monitoring/domain-tls-fbpxq" 
I0609 08:14:41.297332       1 controller.go:173] cert-manager/certificaterequests-issuer-ca "msg"="finished processing work item" "key"="default/echo-prod-graip-xyz-tls-m8rz9"
------ CRASHED HERE ------
I0609 08:14:43.190284       1 start.go:75] cert-manager "msg"="starting controller"  "git-commit"="e466a521bc5455def8c224599c6edcd37e86410c" "version"="v1.8.0"
I0609 08:14:43.190351       1 controller.go:242] cert-manager/controller/build-context "msg"="configured acme dns01 nameservers" "nameservers"=["1.1.1.1:53","1.0.0.1:53","8.8.4.4:53"] 
W0609 08:14:43.190403       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0609 08:14:43.191062       1 context.go:297] cert-manager/controller "msg"="creating event broadcaster"  
I0609 08:14:43.191171       1 controller.go:70] cert-manager/controller "msg"="enabled controllers: [certificaterequests-approver certificaterequests-issuer-acme certificaterequests-issuer-ca certificaterequests-issuer-selfsigned certificaterequests-issuer-vault certificaterequests-issuer-venafi certificates-issuing certificates-key-manager certificates-metrics certificates-readiness certificates-request-manager certificates-revision-manager certificates-trigger challenges clusterissuers ingress-shim issuers orders]"  
I0609 08:14:43.191451       1 controller.go:134] cert-manager/controller "msg"="starting leader election"  
I0609 08:14:43.191558       1 controller.go:91] cert-manager/controller "msg"="starting metrics server"  "address"={"IP":"::","Port":9402,"Zone":""}
I0609 08:14:43.191722       1 context.go:297] cert-manager/controller "msg"="creating event broadcaster"  
I0609 08:14:43.191806       1 leaderelection.go:248] attempting to acquire leader lease cert-manager/cert-manager-controller...
I0609 08:14:43.256636       1 leaderelection.go:258] successfully acquired lease cert-manager/cert-manager-controller
I0609 08:14:43.257566       1 logs.go:177] cert-manager/controller "msg"="Event(v1.ObjectReference{Kind:\"Lease\", Namespace:\"cert-manager\", Name:\"cert-manager-controller\", UID:\"fea819ac-2e68-44e3-965a-bd61f3064d66\", APIVersion:\"coordination.k8s.io/v1\", ResourceVersion:\"69685377\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' cert-manager-857f57cb7b-xx9cx-external-cert-manager-controller became leader"  
I0609 08:14:43.257992       1 context.go:297] cert-manager/controller "msg"="creating event broadcaster"  

Expected behaviour:
cert-manager should work stable, without random crashes

Steps to reproduce the bug:
No idea for now

Anything else we need to know?:

Cert-Manager deployed in split view/horizon scenario, so it has public DNS servers set and credentials for azure public zones to verify Let's Encrypt.

Environment details::

  • Kubernetes version: 1.21.9
  • Cloud-provider/provisioner: Azure Kubernetes Services
  • cert-manager version: 1.8.0
  • Install method: Helm

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions