Skip to content

CAInjector entering crashloop with "timed out waiting for cache to be synced" #7147

@dhumphries-sainsburys

Description

@dhumphries-sainsburys

Describe the bug:

Since upgrading to 1.15.0 we seem to be having problems with CAInjector entering a crashloop state on some of our EKS clusters. The actual error appears to be a timeout listing resources but we cannot see any long running or error'd calls to the control plane that matches this from an EKS perspective. It also appears it uses more memory since the upgrade as we also had this pod OOMing a lot since the upgrade but that is entirely a secondary issue.

cert-manager-cainjector-dfd4bd499-p2x48.log

Expected behaviour:
Pod not be crashing

Steps to reproduce the bug:

  • EKS 1.30 running bottlerocket
  • Install cert manager 1.15.0
    Not sure exactly as it is inconsistent for us. We currently have 2/12 clusters impacted with this and the only notable thing with these two clusters are that they are the biggest 2 we have so best guess it is load related so here is a dump of stats for the smallest of the 2 in case it helps (feel free to request others i'm just dumping info based off of what i have seen cause issues on other issues)
  • 70 m5d.8xlarge nodes
  • 2558 running pods
  • 151 CRDs
  • 19952 secrets

Anything else we need to know?:

Environment details::

  • Kubernetes version: 1.30
  • Cloud-provider/provisioner: AWS EKS 1.30
  • cert-manager version: 1.15.0
  • Install method: helm

/kind bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions