Skip to content

ESO 0.16.2 wigs out if certain SecretStores are incorrect, to the point of being non-functional #5222

@jbilliau-rcd

Description

@jbilliau-rcd

Describe the bug
We recently upgraded from ESO 0.10.5 to 0.16.2. For some reason, only when certain SecretStores are in an error'ed state, ESO flips out and starts spamming errors nonstop, so fast it fills my screen and hangs my terminal. Only way to fix it is to scale down ESO, fix the SecretStore, and start it back up.

As we support 1000+ developers who often don't get their SecretStore object working 100% right off the gate, this pretty much renders the controller non-functional, as it wont do anything else since it's "hung up" on this one bad SecretStore. When I say bad, I mean anything....wrong IAM role, wrong AWS region, something about it is incorrect.

In 0.10.5, it would simply log the error and move on, retrying every 1m.

Example of logs (but repeated thousands of times a minute)

external-secrets-operator-85d7bff98f-6k5bb external-secrets {"level":"error","ts":"2025-08-28T23:08:06.377Z","msg":"Reconciler error","controller":"secretstore","controllerGroup":"external-secrets.io","controllerKind":"SecretStore","SecretStore":{"name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","namespace":"beta-workflow-scheduler"},"namespace":"beta-workflow-scheduler","name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","reconcileID":"d6c72ba5-778e-404c-8227-b947a68600b9","error":"could not validate provider: AccessDenied: User: arn:aws:sts::049306942178:assumed-role/cqeks-nonprod-049306942178-us-east-2-external-secrets/token-file-web-identity is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::049306942178:role/es/role-eks-beta-222262-cq-workflow-scheduler-us-east-2\n\tstatus code: 403, request id: 4eaf8694-474d-48c5-b4f5-44d124780182","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255"}
external-secrets-operator-85d7bff98f-6k5bb external-secrets {"level":"error","ts":"2025-08-28T23:08:06.402Z","logger":"controllers.SecretStore","msg":"unable to validate store","secretstore":{"name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","namespace":"beta-workflow-scheduler"},"error":"could not validate provider: AccessDenied: User: arn:aws:sts::049306942178:assumed-role/cqeks-nonprod-049306942178-us-east-2-external-secrets/token-file-web-identity is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::049306942178:role/es/role-eks-beta-222262-cq-workflow-scheduler-us-east-2\n\tstatus code: 403, request id: 1c02a194-c2d2-4065-bc6a-9425a376ee85","stacktrace":"github.com/external-secrets/external-secrets/pkg/controllers/secretstore.reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/secretstore/common.go:76\ngithub.com/external-secrets/external-secrets/pkg/controllers/secretstore.(*StoreReconciler).Reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/secretstore/secretstore_controller.go:66\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:334\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255"}
external-secrets-operator-85d7bff98f-6k5bb external-secrets {"level":"error","ts":"2025-08-28T23:08:06.413Z","msg":"Reconciler error","controller":"secretstore","controllerGroup":"external-secrets.io","controllerKind":"SecretStore","SecretStore":{"name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","namespace":"beta-workflow-scheduler"},"namespace":"beta-workflow-scheduler","name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","reconcileID":"b0f5ee44-6f7e-4abd-b6ed-af4a42ea6133","error":"could not validate provider: AccessDenied: User: arn:aws:sts::049306942178:assumed-role/cqeks-nonprod-049306942178-us-east-2-external-secrets/token-file-web-identity is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::049306942178:role/es/role-eks-beta-222262-cq-workflow-scheduler-us-east-2\n\tstatus code: 403, request id: 1c02a194-c2d2-4065-bc6a-9425a376ee85","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:347\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255"}
external-secrets-operator-85d7bff98f-6k5bb external-secrets {"level":"error","ts":"2025-08-28T23:08:06.440Z","logger":"controllers.SecretStore","msg":"unable to validate store","secretstore":{"name":"beta-cq-workflow-scheduler-aws-systemmanager-secret-store","namespace":"beta-workflow-scheduler"},"error":"could not validate provider: AccessDenied: User: arn:aws:sts::049306942178:assumed-role/cqeks-nonprod-049306942178-us-east-2-external-secrets/token-file-web-identity is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::049306942178:role/es/role-eks-beta-222262-cq-workflow-scheduler-us-east-2\n\tstatus code: 403, request id: 2c5d0935-1b1e-4812-9745-0a9fb2e04be3","stacktrace":"github.com/external-secrets/external-secrets/pkg/controllers/secretstore.reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/secretstore/common.go:76\ngithub.com/external-secrets/external-secrets/pkg/controllers/secretstore.(*StoreReconciler).Reconcile\n\t/home/runner/work/external-secrets/external-secrets/pkg/controllers/secretstore/secretstore_controller.go:66\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:334\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:294\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.4/pkg/internal/controller/controller.go:255"}

To Reproduce
Reproducing is....transient. If I create a brand new SecretStore incorrectly, no issue, it functions like 0.10.5. If I modify certain existing SecretStores, same thing. But other ones cause this issue, even though they look identical to me (we create all of our SecretStores with an inhouse helm chart, so all of our SecretStores are identical from a YAML schema perspective.

apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
  managedFields:
    - apiVersion: external-secrets.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:capabilities: {}
      manager: external-secrets
      operation: Update
      subresource: status
      time: '2024-07-29T02:44:17Z'
    - apiVersion: external-secrets.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:meta.helm.sh/release-name: {}
            f:meta.helm.sh/release-namespace: {}
          f:labels:
            .: {}
            f:app: {}
            f:app-id: {}
            f:app.kubernetes.io/managed-by: {}
            f:deploy-date: {}
            f:development-team-email: {}
            f:environment: {}
            f:helm-chart-release: {}
            f:release: {}
        f:spec:
          .: {}
          f:provider:
            .: {}
            f:aws:
              .: {}
              f:region: {}
              f:service: {}
      manager: helm
      operation: Update
      time: '2025-06-05T14:15:04Z'
    - apiVersion: external-secrets.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:provider:
            f:aws:
              f:role: {}
      manager: agent
      operation: Update
      time: '2025-08-28T22:59:51Z'
    - apiVersion: external-secrets.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions: {}
      manager: external-secrets
      operation: Update
      subresource: status
      time: '2025-08-28T23:00:00Z'
  name: beta-ci-analyzer-aws-systemmanager-secret-store
  namespace: backend
spec:
  provider:
    aws:
      region: us-east-2
      role: arn:aws:iam::999999999:role/eks/role-eks-beta-219787-ci-analyzer-us-east-2
      service: ParameterStore

Expected behavior
Simply log the problem and retry in 1m.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/not-reproducibleIndicates an issue can not be reproduced as described.triage/supportIndicates an issue that is a support question.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions