Skip to content

[Oracle] clustersecretstores using Workload principals are leaking TCP connections #5490

@adutchak

Description

@adutchak

Describe the bug
When using ClusterSecretStores with principalType: Workload in Oracle OKE clusters, each time the operator reconciles an ExternalSecret, it opens a new TCP connection, which eventually leads to an increase in errors like the ones shown below, and causes the secrets to enter a failed state.

error processing spec.data[0] (key:***), err: ClusterSecretStore "default" is not ready
could not get provider client: cannot setup new oracle client: can not create client, bad configuration: failed to renew security token: failed to get security token: error Post "https://10.96.0.1:12250/resourcePrincipalSessionTokens": dial tcp 10.96.0.1:12250: connect: cannot assign requested address

To Reproduce
Steps to reproduce the behavior:

  1. provide all relevant manifests
  • Create ClusterSecretStore which uses principalType: Workload.
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: default
spec:
  provider:
    oracle:
      compartment: ocid1.compartment.oc1...
      principalType: Workload
      region: us-ashburn-1
      serviceAccountRef:
        name: external-secrets-default
        namespace: default
      vault: ocid1.vault.oc1...
  retrySettings:
    maxRetries: 10
    retryInterval: 30s
  • Configure policy allowing interacting with vault
Allow any-user to read secret-family in compartment id ocid1.compartment... where ALL {request.principal.type='workload', request.principal.namespace='***', request.principal.service_account='***, request.principal.cluster_id='ocid1.cluster.oc1...', target.secret.name=/*/}

Allow any-user to use secret-family in compartment id ocid1.compartment... where ALL {request.principal.type='workload', request.principal.namespace='***', request.principal.service_account='***', request.principal.cluster_id='ocid1.cluster.oc1...', target.secret.name=/*/}

  • Use the externalsecret with default clustersecretstore
  • After executing into pod's network namespace, it is possible to see many open TCP connections from the pod's IP to the control plane on port 12250.
root@external-secrets-operator-584fc6cbc-p98x4:/# ss -tan | grep "12250" | wc -l
28232
  1. provide the Kubernetes and ESO version
    v0.20.2

Expected behavior
ESO should be re-using existing connections or closing the connections it opened earlier.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.triage/pending-triageThis issue was not triaged.

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions