-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
chore: design for generators/stores v2alpha1 / out of tree provider #4792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8c8f092
e1ada12
999722d
e394b00
84c6080
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,376 @@ | ||
| # SecretStores and Generators v2 (Out-of-Tree Providers) | ||
|
|
||
| <!-- toc --> | ||
| - [Summary](#summary) | ||
| - [Goals](#goals) | ||
| - [Proposal](#proposal) | ||
| - [Overview](#overview) | ||
| - [API Resources](#api-resources) | ||
| - [SecretStore](#secretstore) | ||
| - [ClusterSecretStore](#clustersecretstore) | ||
| - [Generator](#generator) | ||
| - [ClusterGenerator](#clustergenerator) | ||
| - [Changes to ExternalSecret and PushSecret](#changes-to-externalsecret-and-pushsecret) | ||
| - [New Provider Interfaces](#new-provider-interfaces) | ||
| - [Out-of-Tree Providers Maintenance](#out-of-tree-providers-maintenance) | ||
| - [Deployment](#deployment) | ||
| - [Governance](#governance) | ||
| - [Notes/Constraints/Caveats](#notesconstraintscaveats) | ||
| - [Risks and Mitigations](#risks-and-mitigations) | ||
| - [Design Details](#design-details) | ||
| - [Test Plan](#test-plan) | ||
| - [Prerequisite testing updates](#prerequisite-testing-updates) | ||
| - [Unit tests](#unit-tests) | ||
| - [Integration tests](#integration-tests) | ||
| - [e2e tests](#e2e-tests) | ||
| - [Graduation Criteria](#graduation-criteria) | ||
| - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
| - [Monitoring Requirements](#monitoring-requirements) | ||
| - [Troubleshooting](#troubleshooting) | ||
| - [Drawbacks](#drawbacks) | ||
| - [Alternatives](#alternatives) | ||
| <!-- /toc --> | ||
|
|
||
| ## Summary | ||
|
|
||
| This proposes a v2 architecture and API for SecretStores and Generators in External Secrets Operator. The primary goals are to: | ||
|
|
||
| - Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution. | ||
| - Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers. | ||
| - Make referent authentication modes explicit and easier to use. | ||
| - Allow users to install only the providers they need. | ||
| - reduce the dependency footprint of ESO to improve its security posture | ||
|
|
||
| There are several limitations in the current (v1) SecretStore and Generator architectures that hinder flexibility and maintainability: | ||
|
|
||
| - Different provider versioning is not possible. | ||
| - Out-of-tree providers are not first-class - they're only supported through the webhook provider. | ||
| - Users cannot easily install/uninstall only the desired providers. | ||
| - Referent authentication modes are implicit and hard to learn/use. | ||
|
|
||
| ## Goals | ||
|
|
||
| - Implement new CRDs: SecretStore/v2alpha1, ClusterSecretStore/v2alpha1, Generator/v2alpha1, ClusterGenerator/v2alpha1. | ||
| - Enable ESO to run without in-tree providers; users install providers separately. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't ESO become an empty shell with no real e2e testing then? |
||
| - Provide a provider configuration model that connects to out-of-tree providers via gRPC/TLS. | ||
|
Skarlso marked this conversation as resolved.
|
||
| - Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources). | ||
| - Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A complex system cannot be made simple. It's inherent to its nature. However, being afraid doesn't help anyone here, we most likely need to do a PoC, and see how it goes... WDYT? |
||
| - Maintain ExternalSecret/PushSecret compatibility via apiVersion on storeRef. | ||
| - Provide a migration path from v1 to v2, including a v1 plugin provider bridge and dedicated builds without v1 code. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really necessary? If we were to build all the current plugins into an RPC, we could use the helm chart and lots of explanations to move things forward? |
||
| - Deliver all providers (e.g., AWS/GCP/Vault) as out-of-tree projects. | ||
|
moolen marked this conversation as resolved.
|
||
| - Migrate the existing end-to-end tests into the new out-of-tree structure | ||
| - Provide documentation to validate and explain the new model. | ||
|
|
||
| ## Proposal | ||
|
|
||
| ### Overview | ||
|
|
||
| ESO will run without bundled providers. Users deploy desired providers independently as separate services. ESO connects to providers over the network using secure gRPC/TLS. | ||
|
|
||
|  | ||
|
|
||
| ### API Resources | ||
|
|
||
| #### SecretStore | ||
|
|
||
| Remove `spec.provider` and introduce `spec.providerConfig`, which contains the endpoint and authentication required to reach an out-of-tree provider, plus a provider-owned reference forwarded on requests. | ||
|
jakobmoellerdev marked this conversation as resolved.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As it will be the same Kind object, what will be the “stored” version in the k8s API? This makes me wonder if creating a new Kind object would be better, like apiVersion: secretstore.external-secrets.io/v1alpha1
kind: ExternalSecretStore
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are leveraging a new group already, for what it’s worth it’s not the same resource and migration isn’t via controllers/webhooks - it’s user faced There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changing groups is really not an obvious trick. On user experience: On controller side:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There will be no good solution anyways. Either users will have to learn about a new resource or to learn about the existence of the same resource with a group. We've already tried keeping things the same to minimize user impact (by adding conversion logic ourselves between versionings and by having different group/kind names when we moved from KES to ESO), and the hard truth is that none minimized user impact, they still needed to run fully named
That would only be applicable if we intended to leverage the same controller code to reconcile both resources - and I think that's a bad idea 😄. We can create new controller with watches for specific GVK matching that. Controller level, it would only get the correct resource. |
||
|
|
||
| ```yaml | ||
| apiVersion: secretstore.external-secrets.io/v2alpha1 | ||
| kind: SecretStore | ||
| metadata: | ||
| name: my-aws-store | ||
| namespace: default | ||
| spec: | ||
| refreshInterval: 1m | ||
| controller: dev | ||
| # ESO reconciles only if the store is healthy or unknown | ||
| gatingPolicy: Enabled # or Disabled | ||
| retrySettings: | ||
| maxRetries: 3 | ||
| retryInterval: 10s | ||
| providerConfig: | ||
| address: http+unix:///path/to/aws.sock | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see my comment above about risks. |
||
| auth: | ||
| clientCertificate: {} | ||
| serviceAccountRef: {} | ||
| providerRef: | ||
| name: my-aws-store | ||
| namespace: default | ||
| kind: AWSSecretManager | ||
| --- | ||
| apiVersion: provider.secretstore.external-secrets.io/v2alpha1 | ||
| kind: AWSSecretManager | ||
| metadata: | ||
| name: my-aws-store | ||
| namespace: default | ||
| spec: | ||
| role: arn:aws:iam::123456789012:role/external-secrets | ||
| region: eu-central-1 | ||
| auth: | ||
| secretRef: | ||
| accessKeyIDSecretRef: | ||
| name: awssm-secret | ||
| key: access-key | ||
| secretAccessKeySecretRef: | ||
| name: awssm-secret | ||
| key: secret-access-key | ||
| status: {} | ||
| ``` | ||
|
|
||
| Notes: | ||
| - the resource referenced by `providerRef` is owned and managed by the provider and lives in a separate API group (`provider.secretstore.external-secrets.io`). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this. |
||
|
|
||
| #### ClusterSecretStore | ||
|
|
||
| `ClusterSecretStore` makes referent authentication explicit via `authenticationScope`, selecting provider namespace or the manifest namespace for credentials. Cluster-scoped resources delegate to namespaced providers. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This "Cluster-scoped resources delegate to namespaced providers." is not clear to me (sorry I am new here). Could you clarify the manifestNamespace for a noob like me please? :) |
||
|
|
||
| ```yaml | ||
| apiVersion: secretstore.external-secrets.io/v2alpha1 | ||
| kind: ClusterSecretStore | ||
| metadata: | ||
| name: my-cluster-store | ||
| spec: | ||
| refreshInterval: 1m | ||
| controller: dev | ||
| retrySettings: | ||
| maxRetries: 3 | ||
| retryInterval: 10s | ||
| providerConfig: | ||
| address: http+unix:///path/to/socket.sock | ||
| providerRef: | ||
| name: my-aws-store | ||
| namespace: default | ||
| kind: AWSSecretManager | ||
| auth: {} | ||
| gatingPolicy: Enabled | ||
| authenticationScope: ProviderNamespace # or ManifestNamespace | ||
| conditions: | ||
| - namespaceSelector: {} | ||
| namespaces: [] | ||
| namespaceRegexes: [] | ||
| ``` | ||
|
|
||
| #### Generator | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still a noob question: Why would generator be a particular kind of resource in this case? In that model, couldn't we directly use providers? |
||
|
|
||
| Generators adopt `providerConfig` to delegate generation to out-of-tree providers and gain parity features. | ||
|
|
||
| - `providerConfig` to delegate to an out-of-tree provider. | ||
| - `gatingPolicy` to enable/disable floodgating for generators. | ||
|
|
||
| ```yaml | ||
| apiVersion: generator.external-secrets.io/v2alpha1 | ||
| kind: Generator | ||
| metadata: | ||
| name: my-password | ||
| namespace: default | ||
| spec: | ||
| gatingPolicy: Enabled | ||
| providerConfig: | ||
| address: http+unix:///path/to/socket.sock | ||
| providerRef: | ||
| name: password-gen | ||
| namespace: default | ||
| kind: Password | ||
| --- | ||
| apiVersion: provider.generator.external-secrets.io/v2alpha1 | ||
| kind: Password | ||
| metadata: | ||
| name: password-gen | ||
| namespace: default | ||
| spec: | ||
| digits: 5 | ||
| symbols: 5 | ||
| symbolCharacters: "-_$@" | ||
| noUpper: false | ||
| allowRepeat: true | ||
| ``` | ||
|
|
||
| #### ClusterGenerator | ||
|
|
||
| ClusterGenerators mirror ClusterSecretStores and extend namespaced Generators cluster-wide. | ||
|
|
||
| ```yaml | ||
| apiVersion: generator.external-secrets.io/v2alpha1 | ||
| kind: ClusterGenerator | ||
| metadata: | ||
| name: my-cluster-generator | ||
| spec: | ||
| refreshInterval: 1m | ||
| controller: dev | ||
| providerConfig: | ||
| address: http+unix:///path/to/socket.sock | ||
| providerRef: | ||
| name: password-gen | ||
| namespace: default | ||
| kind: Password | ||
| gatingPolicy: Enabled | ||
| authenticationNamespace: ProviderReference # or ManifestReference | ||
| conditions: | ||
| - namespaceSelector: {} | ||
| namespaces: [] | ||
| namespaceRegexes: [] | ||
| ``` | ||
|
|
||
| ### Changes to ExternalSecret and PushSecret | ||
|
|
||
| To maintain compatibility, `ExternalSecret` and `PushSecret` add `secretStoreRef.apiVersion`. Controllers use this field to decide whether to call v1 providers or v2 out-of-tree providers. No other changes are required. | ||
|
|
||
| ### New Provider Interfaces | ||
|
|
||
| Provider and Generator interfaces are updated to pass full specs and enable provider-side processing. | ||
|
|
||
| ```go | ||
| type ProviderV2 interface { | ||
| GetSecret(SecretStoreSpec, ExternalSecretDataRemoteRef) ([]byte, error) | ||
| PushSecret(SecretStoreSpec, *corev1.Secret, PushSecretData) error | ||
| DeleteSecret(SecretStoreSpec, PushSecretRemoteRef) error | ||
| SecretExists(SecretStoreSpec, PushSecretRemoteRef) (bool, error) | ||
| GetAllSecrets(SecretStoreSpec, ExternalSecretFind) (map[string][]byte, error) | ||
| Validate(SecretStoreSpec) (admission.Warnings, error) | ||
| Capabilities(SecretStoreSpec) SecretStoreCapabilities | ||
| } | ||
|
|
||
| type GeneratorV2 interface { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this still necessary? I am really forcing an out of the box thinking here :) |
||
| Generate(GeneratorSpec) (map[string][]byte, GeneratorProviderState, error) | ||
| Cleanup(GeneratorSpec, GeneratorProviderState) error | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we keep Generator interface, is cleanup still necessary? Interface could be RotateSecret() or something like that, which would be noop on most generator, and different behaviour based on provider. But don't trust me on this, I am no expert. |
||
| } | ||
| ``` | ||
|
|
||
| ### Out-of-Tree Providers Maintenance | ||
|
|
||
| #### Deployment | ||
|
|
||
| Out-of-tree providers are separate projects with their own repos, images, and Helm charts. Users deploy ESO and the providers they need. ESO connects to providers via a Kubernetes Service indicated by `providerConfig.address`. Co-locating providers as sidecars is **discouraged** to preserve isolation and scalability. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the point of making it so hard for users? If you intend ESO to be a framework, maybe we need to be closer to Kubernetes SIG... The reason ppl love ESO is the friendliness to get secrets. If they now have to deal with each provider's idosyncracies, it might have a large impact on perceived value. I feel truth is maybe in between... ESO as an org, getting contributions from cloud providers in their own repos while guaranteeing quality as an org. But that's maybe me... Or even closer, if you keep a monorepo structure with a good org. It might get weird though, cause the privileges you give on git are different than having a codeowners... |
||
|
|
||
| #### Governance | ||
|
|
||
| - One repo per "official" eso maintained provider (e.g., `provider-aws`). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im wondering: whats the benefit of us tracking official providers in separate repositories? Is there a specific benefit we like to see over e.g. maintaining a go.mod in a subolder? especially wondering because provider-contrib is also just one big repo in the proposal?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also say it's harder on the tracking side. Like issue wise it would be harder to set up project automation and the likes which limit to 5 repositories on a pro account. I couldn't set up more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having them in a subdirectory would work for me, too. I guess i'm just used to having separate repositories. What i dislike about having everything in one repo is that in some shared areas (CI, root-level Makefile, CODEOWNERS etc) it gets quite cluttered over time.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you vouch for a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Mono repo makes eventual provider transfers (eg AWS SM to AWS people ) way harder. Just my own two cents)
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would it be harder? With clear separation and modules it's literally a copy and paste and updating some references.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or maybe you're thinking once people start adopting it, it's more difficult to point them to a new repository? I'm not too concerned about that. We could always just start deprecating it and moving to a new repo with ample of time for people to migrate out.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems to be at the crux of some of the proposition. I think this need a call to understand alignment... Or, if no call would resolve that, maybe we should write the code and see? |
||
| - Promotion lifecycle (experimental → stable). | ||
| - CODEOWNERS and standard PR workflows per provider. | ||
| - A collective community repo `provider-contrib` for community-maintained providers. | ||
|
|
||
| ### Notes/Constraints/Caveats | ||
|
|
||
| - Do not implement Unix domain sockets for sidecars; providers should run as independent deployments to ensure horizontal scalability, separate network policies, and stronger isolation. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be interested to see why it's a no go. I don't see how horizontal scalability is limited with sockets. It's just a different (harder) design. The fact to go networked also brings downsides. This needs to be clarified IMO. |
||
| - Cluster-scoped resources delegate to namespaced providers; referent authentication is explicit via `authenticationScope`. | ||
|
|
||
| ### Risks and Mitigations | ||
|
|
||
| - Operational overhead: Users manage separate provider deployments | ||
| - mitigated by dedicated Helm charts and independent versioning per provider. | ||
| - Maintenance overhead: every provider needs similar infrastructure such as a helm chart, e2e test framework and test cases, a common library to bootstrap the provider's GRPC server, metrics, initialising clients etc. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering why.... (I mean, it's in accordance with the rest of the document.... but i am not understanding why we go so far). |
||
| - mitigated by providing one or more `common` repos, e.g. `provider-common` which host the shared code among eso-core and all providers (similar to https://github.com/fluxcd/pkg). | ||
| - TLS management: someone needs to manage the TLS certs/keys: do we push it to the user or do we provide it ourselves? | ||
| - mitigated by (1) implement certificate management with `cert-controller` and provide an integration mechanism with cert-manager. | ||
|
jakobmoellerdev marked this conversation as resolved.
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some users are relying on ESO to bootstrap their infra to then have cert-manager running. You're basically killing that by introducing a circular dependency. I would believe we need an alternative way to establish trust between the plugin and the main controller.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What alternative ways are you thinking of? we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not use SA auth here, as the SA authenticating it would be external-secrets' to validate the call to the plugin. (as in, interception of a highly privileged k8s SA). (i.e. eso is the client here - the plugin is the one needing to validate good clients) IMO whatever we do should be x509 rooted. IMO, I am in favor of leveraging Integration with
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, it's hard for me to give an opinion on what to secure, if we are not clear about conventions, what we want to do and their associated risks. If we consider that it's plain http between pods in a cluster, it's not the same problem as plain http between containers of the same pod. I really would like us to clarify what's configurable to know how far someone will take the plugin. Nevertheless, establishing a trust tunnel can be done in multiple ways. And It doesn't need to be asymmetric btw. For @gusfcarvalho 's comment - I am even more confused: You don't need to use eso's sa for that. In fact you definitely shouldn't. (I think we agree there). Simpler solutions are simpler to maintain. Are we sure we are aiming for simple here?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @evrardjp I was refering to this:
I understood it as leverage existing K8s SAs.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding a dependency looks deceptively simple. But by adding it, you've coupled to a project. It means we're gonna need to start maintain this other project should something happen there, in order to keep our software working. We're gonna break many ppl CI/CD cause now that we'd need this intertwining. Vendors will curse us, cause they would have to change their tooling. For me, bringing another project as a dependency is a last resort move : we should reduce the code we maintain, not add more. Side note: when I read myself I think "Damn, I start to be a grumpy old git." 🤣
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But cert-controller is ours already, we install it everywhere by default 🙂
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LOL, I all misread. I read cert-manager out of habit! Ofc that makes more sense :) |
||
| ## Design Details | ||
|
|
||
| ### Test Plan | ||
|
|
||
| This plan validates core changes in eso-core and out-of-tree providers, including security boundaries and migration flows. | ||
|
|
||
| #### Prerequisite testing updates | ||
| - Create a provider conformance test suite (library) that providers run against locally and in CI (covers GetSecret, GetAllSecrets, Push/Delete/Exists when implemented, error semantics, auth, TLS). | ||
| - Add a fake gRPC provider for tests to simulate success, latency, failures, and version skew. | ||
|
moolen marked this conversation as resolved.
|
||
| - CI jobs: | ||
| - fast unit/integration on PR | ||
| - extended e2e and conformance nightly | ||
| - artifacts for v1 providerless build | ||
|
|
||
| #### Unit tests | ||
| - eso-core | ||
| - Routing by secretStoreRef.apiVersion (v1 vs v2). | ||
| - Validation of providerConfig and references; gating policy; retry/backoff. | ||
| - Referent auth: deny cross-namespace access for namespaced SecretStores; respect ClusterSecretStore scope rules. | ||
| - Metrics emission (.ObserveAPICall or provider-side equivalents) and labels. | ||
| - Robust error mapping from provider responses to conditions/events. | ||
| - Providers (adapters and common libs) | ||
| - gRPC client/server option wiring, TLS, timeouts, backoff and cancellation. | ||
| - Auth resolvers from SecretRefs/ServiceAccounts, including namespace scoping. | ||
| - Serialization of providerRef and request/response schemas. | ||
| - Generators | ||
| - providerConfig passthrough, state persistence, and gating. | ||
|
|
||
| #### Integration tests | ||
| - envtest-based controller tests installing v2alpha1 CRDs (SecretStore/ClusterSecretStore/Generator/ClusterGenerator). | ||
| - In-process fake provider; verify refresh, target templates, dataFrom, GetAllSecrets, and error propagation. | ||
| - Security boundaries | ||
| - Namespaced store cannot read Secret from another namespace; Cluster store may when allowed by selector/scope. | ||
| - Conditions and namespace constraints on ClusterSecretStore resources are enforced. | ||
| - Migration | ||
| - v1 plugin provider forwards to v2 provider; existing ExternalSecrets continue to sync; switch to v2 stores without data loss. | ||
| - Failure injection: DNS/TLS failures, expired certs, non-retryable vs retryable errors, deadline exceeded. | ||
| - Version skew: eso-core N with provider N-1/N+1 where compatible; reject incompatible versions with clear status. | ||
| - Run with -race; ensure no data races in reconcile paths. | ||
|
|
||
| #### e2e tests | ||
| - kind-based suites deploying: | ||
| - eso-core (v1+plugin / v2-enabled) and all eso maintained out-of-tree providers. | ||
| - Scenario matrix: namespaced and cluster stores, referent auth modes, gating on/off, refresh and templating. | ||
| - Scale smoke: O(1000) ExternalSecrets across O(50) namespaces; measure sync latency and resource usage. | ||
| - Disruption: roll provider Deployment/Service; TLS cert rotation; verify recovery without manual intervention. | ||
| - Upgrade: v1-only -> mixed (v1 plugin + v2 provider) -> v2-only. | ||
| - Conformance suite executed against each supported provider repo (as optional gate for community providers). | ||
|
|
||
| ### Graduation Criteria | ||
|
|
||
| Alpha | ||
| - v2alpha1 CRDs published; feature behind a feature gate and disabled by default in stable images. | ||
| - Unit/integration tests implemented; initial e2e with fake/sample provider green on 2 supported Kubernetes versions. | ||
| - Basic metrics and status/conditions wired; documentation draft and examples provided. | ||
|
|
||
| Beta | ||
| - Enabled by default (feature gate remains for rollback); docs complete; migration guide published. | ||
| - Provider conformance test suite v1.0 released; at least two providers pass the suite and run e2e in CI. | ||
| - Broad provider support: ≥3 major providers pass conformance and publish compatibility matrix. | ||
| - Security: referent auth boundaries verified by automated tests; fuzz tests for provider API payloads; TLS required by default. | ||
| - Version skew policy documented and tested (eso-core N with provider N-1/N+1 where contract allows). | ||
| - Performance baselines documented; scale e2e passing; no critical open bugs; user feedback from early adopters. | ||
|
|
||
| GA | ||
| - Feature gate removed (or defaulted on permanently); CRDs promoted (e.g., v2beta1->v2 or equivalent) without breaking API. | ||
| - Upgrade/downgrade and rollback procedures validated in CI; migration from v1 documented and tooling available. | ||
| - SLOs met (availability and sync latency); telemetry and dashboards documented. | ||
| - No known security gaps in namespace isolation/auth; passing periodic conformance in provider repos. | ||
|
|
||
| ### Upgrade / Downgrade Strategy | ||
|
|
||
| A phased migration enables safe adoption from v1 to v2: | ||
|
|
||
| 1) Early adoption via a v1 plugin provider | ||
| - Introduce a special `plugin` provider within `SecretStore/v1` that forwards requests to v2 out-of-tree providers. This allows testing v2 providers without changing existing v1 resources. | ||
|
|
||
| 2) Dedicated builds | ||
| - Provide ESO builds without in-tree provider code to reduce footprint for fully migrated users. | ||
|
|
||
| 3) Full migration | ||
| - Define v2 SecretStore and Generator CRDs pointing to out-of-tree provider deployments/CRs. | ||
| - Update `ExternalSecret` manifests to use `secretStoreRef.apiVersion: secretstore.external-secrets.io/v2alpha1` and reference v2 stores. | ||
| - Decommission v1 stores after all `ExternalSecret` resources are migrated. | ||
|
|
||
|
|
||
| ### Monitoring Requirements | ||
| - Add GRPC-related metrics to eso-core (client-side) as well as on the provider (server) side | ||
| - Migrate metrics (`.ObserveAPICall()`) to the provider | ||
|
|
||
| ### Troubleshooting | ||
| A user would have the same troubleshooting flow: | ||
| - inspect Secret | ||
| - describe ExternalSecret | ||
| - describe SecretStore | ||
| - (new) describe provider-owned resource | ||
| - inspect logs of eso-core | ||
| - (new) inspect logs of provider pods | ||
|
|
||
| ## Drawbacks | ||
|
|
||
| - Increased operational complexity and responsibility for users to deploy and manage provider lifecycles in addition to ESO. | ||
| - New CRDs introduce a learning curve and require updated documentation. | ||
| - Separate repositories, issue trackers, and release pipelines for each provider increase maintenance overhead. | ||
| - Distributed maintenance across community providers can fragment ownership. | ||
|
|
||
| ## Alternatives | ||
|
|
||
| - just provide a GRPC plugin mechanism and move providers out of tree without a v2 SecretStore | ||
| - get rid of the SecretStore alltogether and directly point from a `ExternalSecret` at a `Kind=AWSSecretsManager`. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. interesting approach.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we go as far as evaluating this : wouldn't this be simpler? Crds per provider, all using the same ESO go modules? Having providers in different repos with the same code structure would allow wholesale PRs. Bots can help maintain versions of providers by bumping our eso modules. A single helm chart could deploy three different providers implementations (x or y would be installed based on values).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be a benefit regardless if we get rid of SecretStore or not. It is a more direct approach, which means we will lose some features (like cluster-scoping, and flood gaate controls if a given store is out). I think the simplicity will be less controllers for us to maintain and merge - as in everything could be handled easily on a simple core-controller loop.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But what is the hardest: maintain modules for n controllers and have controllers a bit more specific in some places OR a grpc system + cert-manager? Same question for operations.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And I don't say that because some of my customers are scared of self signed CAs...!
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But how would eso work with them? are you thinking about brewing one eso distribution per provider? Several orgs use more than one provider, this won’t scale if I understood it correctly x
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As in - users will now need to figure out controller classes on every single install; even simple ones.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Plus, an ExternalSecret using two providers will be a 💫 headache |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without explaining the terms you use, I can't follow you.
What is the referent authentication? (The current docs refer to the term only on ONE page, which is provider support, and the providers page don't even mention the same term).
Would you mind rephrasing two things: