Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added assets/eso-out-of-tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
376 changes: 376 additions & 0 deletions design/014-secretstore-generator-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,376 @@
# SecretStores and Generators v2 (Out-of-Tree Providers)

<!-- toc -->
- [Summary](#summary)
- [Goals](#goals)
- [Proposal](#proposal)
- [Overview](#overview)
- [API Resources](#api-resources)
- [SecretStore](#secretstore)
- [ClusterSecretStore](#clustersecretstore)
- [Generator](#generator)
- [ClusterGenerator](#clustergenerator)
- [Changes to ExternalSecret and PushSecret](#changes-to-externalsecret-and-pushsecret)
- [New Provider Interfaces](#new-provider-interfaces)
- [Out-of-Tree Providers Maintenance](#out-of-tree-providers-maintenance)
- [Deployment](#deployment)
- [Governance](#governance)
- [Notes/Constraints/Caveats](#notesconstraintscaveats)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Prerequisite testing updates](#prerequisite-testing-updates)
- [Unit tests](#unit-tests)
- [Integration tests](#integration-tests)
- [e2e tests](#e2e-tests)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Monitoring Requirements](#monitoring-requirements)
- [Troubleshooting](#troubleshooting)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Summary

This proposes a v2 architecture and API for SecretStores and Generators in External Secrets Operator. The primary goals are to:

- Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution.
- Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers.
- Make referent authentication modes explicit and easier to use.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without explaining the terms you use, I can't follow you.
What is the referent authentication? (The current docs refer to the term only on ONE page, which is provider support, and the providers page don't even mention the same term).

Would you mind rephrasing two things:

  • What you're trying to have with the explicitness of it?
  • How would the explicitness make some feature easier to use when some ppl are not even aware of their usage?

- Allow users to install only the providers they need.
- reduce the dependency footprint of ESO to improve its security posture

There are several limitations in the current (v1) SecretStore and Generator architectures that hinder flexibility and maintainability:

- Different provider versioning is not possible.
- Out-of-tree providers are not first-class - they're only supported through the webhook provider.
- Users cannot easily install/uninstall only the desired providers.
- Referent authentication modes are implicit and hard to learn/use.

## Goals

- Implement new CRDs: SecretStore/v2alpha1, ClusterSecretStore/v2alpha1, Generator/v2alpha1, ClusterGenerator/v2alpha1.
- Enable ESO to run without in-tree providers; users install providers separately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't ESO become an empty shell with no real e2e testing then?
Wouldn't it make sense to have at least ONE example code (if possible, without external deps), in-tree?

- Provide a provider configuration model that connects to out-of-tree providers via gRPC/TLS.
Comment thread
Skarlso marked this conversation as resolved.
- Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources).
- Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A complex system cannot be made simple. It's inherent to its nature.
I am afraid you'll end up with a poor abstraction that is a terrible shell. Or maybe I didn't get it.

However, being afraid doesn't help anyone here, we most likely need to do a PoC, and see how it goes... WDYT?

- Maintain ExternalSecret/PushSecret compatibility via apiVersion on storeRef.
- Provide a migration path from v1 to v2, including a v1 plugin provider bridge and dedicated builds without v1 code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary?
That sounds very complex.

If we were to build all the current plugins into an RPC, we could use the helm chart and lots of explanations to move things forward?

- Deliver all providers (e.g., AWS/GCP/Vault) as out-of-tree projects.
Comment thread
moolen marked this conversation as resolved.
- Migrate the existing end-to-end tests into the new out-of-tree structure
- Provide documentation to validate and explain the new model.

## Proposal

### Overview

ESO will run without bundled providers. Users deploy desired providers independently as separate services. ESO connects to providers over the network using secure gRPC/TLS.

![Out of Tree](../assets/eso-out-of-tree.png)

### API Resources

#### SecretStore

Remove `spec.provider` and introduce `spec.providerConfig`, which contains the endpoint and authentication required to reach an out-of-tree provider, plus a provider-owned reference forwarded on requests.
Comment thread
jakobmoellerdev marked this conversation as resolved.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it will be the same Kind object, what will be the “stored” version in the k8s API?
How to migrate from one version to another?
Also, if we want to keep both provider and providerConfig it makes controller behavior quite “hacky” (spec is not fully predictable).

This makes me wonder if creating a new Kind object would be better, like

apiVersion: secretstore.external-secrets.io/v1alpha1
kind: ExternalSecretStore

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leveraging a new group already, for what it’s worth it’s not the same resource and migration isn’t via controllers/webhooks - it’s user faced

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one really had a look at before on CRD) could break an object.

On user experience:
Doing a "kubectl get" will give 1 resource, or the other or an ambiguous error.

On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one > really had a look at before on CRD) could break an object.

There will be no good solution anyways. Either users will have to learn about a new resource or to learn about the existence of the same resource with a group. We've already tried keeping things the same to minimize user impact (by adding conversion logic ourselves between versionings and by having different group/kind names when we moved from KES to ESO), and the hard truth is that none minimized user impact, they still needed to run fully named kubectl commands to figure out versions (and we introduced a new point of failure via the conversion webhook).

On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.

That would only be applicable if we intended to leverage the same controller code to reconcile both resources - and I think that's a bad idea 😄. We can create new controller with watches for specific GVK matching that. Controller level, it would only get the correct resource.


```yaml
apiVersion: secretstore.external-secrets.io/v2alpha1
kind: SecretStore
metadata:
name: my-aws-store
namespace: default
spec:
refreshInterval: 1m
controller: dev
# ESO reconciles only if the store is healthy or unknown
gatingPolicy: Enabled # or Disabled
retrySettings:
maxRetries: 3
retryInterval: 10s
providerConfig:
address: http+unix:///path/to/aws.sock
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above about risks.

auth:
clientCertificate: {}
serviceAccountRef: {}
providerRef:
name: my-aws-store
namespace: default
kind: AWSSecretManager
---
apiVersion: provider.secretstore.external-secrets.io/v2alpha1
kind: AWSSecretManager
metadata:
name: my-aws-store
namespace: default
spec:
role: arn:aws:iam::123456789012:role/external-secrets
region: eu-central-1
auth:
secretRef:
accessKeyIDSecretRef:
name: awssm-secret
key: access-key
secretAccessKeySecretRef:
name: awssm-secret
key: secret-access-key
status: {}
```

Notes:
- the resource referenced by `providerRef` is owned and managed by the provider and lives in a separate API group (`provider.secretstore.external-secrets.io`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.


#### ClusterSecretStore

`ClusterSecretStore` makes referent authentication explicit via `authenticationScope`, selecting provider namespace or the manifest namespace for credentials. Cluster-scoped resources delegate to namespaced providers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Cluster-scoped resources delegate to namespaced providers." is not clear to me (sorry I am new here).

Could you clarify the manifestNamespace for a noob like me please? :)


```yaml
apiVersion: secretstore.external-secrets.io/v2alpha1
kind: ClusterSecretStore
metadata:
name: my-cluster-store
spec:
refreshInterval: 1m
controller: dev
retrySettings:
maxRetries: 3
retryInterval: 10s
providerConfig:
address: http+unix:///path/to/socket.sock
providerRef:
name: my-aws-store
namespace: default
kind: AWSSecretManager
auth: {}
gatingPolicy: Enabled
authenticationScope: ProviderNamespace # or ManifestNamespace
conditions:
- namespaceSelector: {}
namespaces: []
namespaceRegexes: []
```

#### Generator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a noob question: Why would generator be a particular kind of resource in this case?

In that model, couldn't we directly use providers?


Generators adopt `providerConfig` to delegate generation to out-of-tree providers and gain parity features.

- `providerConfig` to delegate to an out-of-tree provider.
- `gatingPolicy` to enable/disable floodgating for generators.

```yaml
apiVersion: generator.external-secrets.io/v2alpha1
kind: Generator
metadata:
name: my-password
namespace: default
spec:
gatingPolicy: Enabled
providerConfig:
address: http+unix:///path/to/socket.sock
providerRef:
name: password-gen
namespace: default
kind: Password
---
apiVersion: provider.generator.external-secrets.io/v2alpha1
kind: Password
metadata:
name: password-gen
namespace: default
spec:
digits: 5
symbols: 5
symbolCharacters: "-_$@"
noUpper: false
allowRepeat: true
```

#### ClusterGenerator

ClusterGenerators mirror ClusterSecretStores and extend namespaced Generators cluster-wide.

```yaml
apiVersion: generator.external-secrets.io/v2alpha1
kind: ClusterGenerator
metadata:
name: my-cluster-generator
spec:
refreshInterval: 1m
controller: dev
providerConfig:
address: http+unix:///path/to/socket.sock
providerRef:
name: password-gen
namespace: default
kind: Password
gatingPolicy: Enabled
authenticationNamespace: ProviderReference # or ManifestReference
conditions:
- namespaceSelector: {}
namespaces: []
namespaceRegexes: []
```

### Changes to ExternalSecret and PushSecret

To maintain compatibility, `ExternalSecret` and `PushSecret` add `secretStoreRef.apiVersion`. Controllers use this field to decide whether to call v1 providers or v2 out-of-tree providers. No other changes are required.

### New Provider Interfaces

Provider and Generator interfaces are updated to pass full specs and enable provider-side processing.

```go
type ProviderV2 interface {
GetSecret(SecretStoreSpec, ExternalSecretDataRemoteRef) ([]byte, error)
PushSecret(SecretStoreSpec, *corev1.Secret, PushSecretData) error
DeleteSecret(SecretStoreSpec, PushSecretRemoteRef) error
SecretExists(SecretStoreSpec, PushSecretRemoteRef) (bool, error)
GetAllSecrets(SecretStoreSpec, ExternalSecretFind) (map[string][]byte, error)
Validate(SecretStoreSpec) (admission.Warnings, error)
Capabilities(SecretStoreSpec) SecretStoreCapabilities
}

type GeneratorV2 interface {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary? I am really forcing an out of the box thinking here :)

Generate(GeneratorSpec) (map[string][]byte, GeneratorProviderState, error)
Cleanup(GeneratorSpec, GeneratorProviderState) error
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep Generator interface, is cleanup still necessary?
It's only used by grafana. For me this is not an interface, it's something the provider should implement.

Interface could be RotateSecret() or something like that, which would be noop on most generator, and different behaviour based on provider. But don't trust me on this, I am no expert.

}
```

### Out-of-Tree Providers Maintenance

#### Deployment

Out-of-tree providers are separate projects with their own repos, images, and Helm charts. Users deploy ESO and the providers they need. ESO connects to providers via a Kubernetes Service indicated by `providerConfig.address`. Co-locating providers as sidecars is **discouraged** to preserve isolation and scalability.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the point of making it so hard for users?

If you intend ESO to be a framework, maybe we need to be closer to Kubernetes SIG... The reason ppl love ESO is the friendliness to get secrets. If they now have to deal with each provider's idosyncracies, it might have a large impact on perceived value.

I feel truth is maybe in between... ESO as an org, getting contributions from cloud providers in their own repos while guaranteeing quality as an org. But that's maybe me...

Or even closer, if you keep a monorepo structure with a good org. It might get weird though, cause the privileges you give on git are different than having a codeowners...


#### Governance

- One repo per "official" eso maintained provider (e.g., `provider-aws`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im wondering: whats the benefit of us tracking official providers in separate repositories? Is there a specific benefit we like to see over e.g. maintaining a go.mod in a subolder? especially wondering because provider-contrib is also just one big repo in the proposal?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say it's harder on the tracking side. Like issue wise it would be harder to set up project automation and the likes which limit to 5 repositories on a pro account. I couldn't set up more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having them in a subdirectory would work for me, too. I guess i'm just used to having separate repositories. What i dislike about having everything in one repo is that in some shared areas (CI, root-level Makefile, CODEOWNERS etc) it gets quite cluttered over time.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you vouch for a monorepo i'm not opposed. LMK and i amend the proposal.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono repo makes eventual provider transfers (eg AWS SM to AWS people ) way harder. Just my own two cents)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be harder? With clear separation and modules it's literally a copy and paste and updating some references.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe you're thinking once people start adopting it, it's more difficult to point them to a new repository? I'm not too concerned about that. We could always just start deprecating it and moving to a new repo with ample of time for people to migrate out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be at the crux of some of the proposition. I think this need a call to understand alignment... Or, if no call would resolve that, maybe we should write the code and see?

- Promotion lifecycle (experimental → stable).
- CODEOWNERS and standard PR workflows per provider.
- A collective community repo `provider-contrib` for community-maintained providers.

### Notes/Constraints/Caveats

- Do not implement Unix domain sockets for sidecars; providers should run as independent deployments to ensure horizontal scalability, separate network policies, and stronger isolation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to see why it's a no go. I don't see how horizontal scalability is limited with sockets. It's just a different (harder) design. The fact to go networked also brings downsides.

This needs to be clarified IMO.

- Cluster-scoped resources delegate to namespaced providers; referent authentication is explicit via `authenticationScope`.

### Risks and Mitigations

- Operational overhead: Users manage separate provider deployments
- mitigated by dedicated Helm charts and independent versioning per provider.
- Maintenance overhead: every provider needs similar infrastructure such as a helm chart, e2e test framework and test cases, a common library to bootstrap the provider's GRPC server, metrics, initialising clients etc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why.... (I mean, it's in accordance with the rest of the document.... but i am not understanding why we go so far).

- mitigated by providing one or more `common` repos, e.g. `provider-common` which host the shared code among eso-core and all providers (similar to https://github.com/fluxcd/pkg).
- TLS management: someone needs to manage the TLS certs/keys: do we push it to the user or do we provide it ourselves?
- mitigated by (1) implement certificate management with `cert-controller` and provide an integration mechanism with cert-manager.
Comment thread
jakobmoellerdev marked this conversation as resolved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users are relying on ESO to bootstrap their infra to then have cert-manager running.

You're basically killing that by introducing a circular dependency. I would believe we need an alternative way to establish trust between the plugin and the main controller.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What alternative ways are you thinking of? we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts

Copy link
Copy Markdown
Member Author

@gusfcarvalho gusfcarvalho Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use SA auth here, as the SA authenticating it would be external-secrets' to validate the call to the plugin. (as in, interception of a highly privileged k8s SA). (i.e. eso is the client here - the plugin is the one needing to validate good clients)

IMO whatever we do should be x509 rooted. IMO, I am in favor of leveraging cert-controller for that.

Integration with cert-manager should be a plus just like it is today - everything works without it, but if people want to ditch cert-controller over it, they can, sort of. (for setups that do not have the circular dependency mentioned by JP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, it's hard for me to give an opinion on what to secure, if we are not clear about conventions, what we want to do and their associated risks.

If we consider that it's plain http between pods in a cluster, it's not the same problem as plain http between containers of the same pod. I really would like us to clarify what's configurable to know how far someone will take the plugin.

Nevertheless, establishing a trust tunnel can be done in multiple ways. And It doesn't need to be asymmetric btw.

For @gusfcarvalho 's comment - I am even more confused: You don't need to use eso's sa for that. In fact you definitely shouldn't. (I think we agree there).
That doesn't prevent the use of another sa with 0 pod access. Especially useful if you want to federate between the kube cluster and something else. While We could do that, it gets tricky for some managed kubernetes offering, so i would rather avoid it too.

Simpler solutions are simpler to maintain. Are we sure we are aiming for simple here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evrardjp I was refering to this:

we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts

I understood it as leverage existing K8s SAs.

cert-controller IMO is the simplest solution 😄. It just adds self-signed certificates to a Secret, and does that by following labels today (though we only search for the webhook label). It is very simple to extend it to a common label to add Self Signed Certs to any providers.

Copy link
Copy Markdown
Contributor

@evrardjp evrardjp Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a dependency looks deceptively simple.

But by adding it, you've coupled to a project.

It means we're gonna need to start maintain this other project should something happen there, in order to keep our software working.

We're gonna break many ppl CI/CD cause now that we'd need this intertwining. Vendors will curse us, cause they would have to change their tooling.

For me, bringing another project as a dependency is a last resort move : we should reduce the code we maintain, not add more.

Side note: when I read myself I think "Damn, I start to be a grumpy old git." 🤣

Copy link
Copy Markdown
Member Author

@gusfcarvalho gusfcarvalho Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But cert-controller is ours already, we install it everywhere by default 🙂

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I all misread. I read cert-manager out of habit! Ofc that makes more sense :)

## Design Details

### Test Plan

This plan validates core changes in eso-core and out-of-tree providers, including security boundaries and migration flows.

#### Prerequisite testing updates
- Create a provider conformance test suite (library) that providers run against locally and in CI (covers GetSecret, GetAllSecrets, Push/Delete/Exists when implemented, error semantics, auth, TLS).
- Add a fake gRPC provider for tests to simulate success, latency, failures, and version skew.
Comment thread
moolen marked this conversation as resolved.
- CI jobs:
- fast unit/integration on PR
- extended e2e and conformance nightly
- artifacts for v1 providerless build

#### Unit tests
- eso-core
- Routing by secretStoreRef.apiVersion (v1 vs v2).
- Validation of providerConfig and references; gating policy; retry/backoff.
- Referent auth: deny cross-namespace access for namespaced SecretStores; respect ClusterSecretStore scope rules.
- Metrics emission (.ObserveAPICall or provider-side equivalents) and labels.
- Robust error mapping from provider responses to conditions/events.
- Providers (adapters and common libs)
- gRPC client/server option wiring, TLS, timeouts, backoff and cancellation.
- Auth resolvers from SecretRefs/ServiceAccounts, including namespace scoping.
- Serialization of providerRef and request/response schemas.
- Generators
- providerConfig passthrough, state persistence, and gating.

#### Integration tests
- envtest-based controller tests installing v2alpha1 CRDs (SecretStore/ClusterSecretStore/Generator/ClusterGenerator).
- In-process fake provider; verify refresh, target templates, dataFrom, GetAllSecrets, and error propagation.
- Security boundaries
- Namespaced store cannot read Secret from another namespace; Cluster store may when allowed by selector/scope.
- Conditions and namespace constraints on ClusterSecretStore resources are enforced.
- Migration
- v1 plugin provider forwards to v2 provider; existing ExternalSecrets continue to sync; switch to v2 stores without data loss.
- Failure injection: DNS/TLS failures, expired certs, non-retryable vs retryable errors, deadline exceeded.
- Version skew: eso-core N with provider N-1/N+1 where compatible; reject incompatible versions with clear status.
- Run with -race; ensure no data races in reconcile paths.

#### e2e tests
- kind-based suites deploying:
- eso-core (v1+plugin / v2-enabled) and all eso maintained out-of-tree providers.
- Scenario matrix: namespaced and cluster stores, referent auth modes, gating on/off, refresh and templating.
- Scale smoke: O(1000) ExternalSecrets across O(50) namespaces; measure sync latency and resource usage.
- Disruption: roll provider Deployment/Service; TLS cert rotation; verify recovery without manual intervention.
- Upgrade: v1-only -> mixed (v1 plugin + v2 provider) -> v2-only.
- Conformance suite executed against each supported provider repo (as optional gate for community providers).

### Graduation Criteria

Alpha
- v2alpha1 CRDs published; feature behind a feature gate and disabled by default in stable images.
- Unit/integration tests implemented; initial e2e with fake/sample provider green on 2 supported Kubernetes versions.
- Basic metrics and status/conditions wired; documentation draft and examples provided.

Beta
- Enabled by default (feature gate remains for rollback); docs complete; migration guide published.
- Provider conformance test suite v1.0 released; at least two providers pass the suite and run e2e in CI.
- Broad provider support: ≥3 major providers pass conformance and publish compatibility matrix.
- Security: referent auth boundaries verified by automated tests; fuzz tests for provider API payloads; TLS required by default.
- Version skew policy documented and tested (eso-core N with provider N-1/N+1 where contract allows).
- Performance baselines documented; scale e2e passing; no critical open bugs; user feedback from early adopters.

GA
- Feature gate removed (or defaulted on permanently); CRDs promoted (e.g., v2beta1->v2 or equivalent) without breaking API.
- Upgrade/downgrade and rollback procedures validated in CI; migration from v1 documented and tooling available.
- SLOs met (availability and sync latency); telemetry and dashboards documented.
- No known security gaps in namespace isolation/auth; passing periodic conformance in provider repos.

### Upgrade / Downgrade Strategy

A phased migration enables safe adoption from v1 to v2:

1) Early adoption via a v1 plugin provider
- Introduce a special `plugin` provider within `SecretStore/v1` that forwards requests to v2 out-of-tree providers. This allows testing v2 providers without changing existing v1 resources.

2) Dedicated builds
- Provide ESO builds without in-tree provider code to reduce footprint for fully migrated users.

3) Full migration
- Define v2 SecretStore and Generator CRDs pointing to out-of-tree provider deployments/CRs.
- Update `ExternalSecret` manifests to use `secretStoreRef.apiVersion: secretstore.external-secrets.io/v2alpha1` and reference v2 stores.
- Decommission v1 stores after all `ExternalSecret` resources are migrated.


### Monitoring Requirements
- Add GRPC-related metrics to eso-core (client-side) as well as on the provider (server) side
- Migrate metrics (`.ObserveAPICall()`) to the provider

### Troubleshooting
A user would have the same troubleshooting flow:
- inspect Secret
- describe ExternalSecret
- describe SecretStore
- (new) describe provider-owned resource
- inspect logs of eso-core
- (new) inspect logs of provider pods

## Drawbacks

- Increased operational complexity and responsibility for users to deploy and manage provider lifecycles in addition to ESO.
- New CRDs introduce a learning curve and require updated documentation.
- Separate repositories, issue trackers, and release pipelines for each provider increase maintenance overhead.
- Distributed maintenance across community providers can fragment ownership.

## Alternatives

- just provide a GRPC plugin mechanism and move providers out of tree without a v2 SecretStore
- get rid of the SecretStore alltogether and directly point from a `ExternalSecret` at a `Kind=AWSSecretsManager`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go as far as evaluating this : wouldn't this be simpler? Crds per provider, all using the same ESO go modules? Having providers in different repos with the same code structure would allow wholesale PRs. Bots can help maintain versions of providers by bumping our eso modules. A single helm chart could deploy three different providers implementations (x or y would be installed based on values).
Code completely independent in each repo, so no dependency hell. Ownership very clear in each repo. Easy to phase out a provider when some repo isn't up to our standards: we remove from helm chart and archive the repo.
Yes, there is code repeated. But it looks simpler to me from my window. I need to know why this shouldn't be feasible before having an opinion on this whole effort.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a benefit regardless if we get rid of SecretStore or not. It is a more direct approach, which means we will lose some features (like cluster-scoping, and flood gaate controls if a given store is out). I think the simplicity will be less controllers for us to maintain and merge - as in everything could be handled easily on a simple core-controller loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is the hardest: maintain modules for n controllers and have controllers a bit more specific in some places OR a grpc system + cert-manager?

Same question for operations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't say that because some of my customers are scared of self signed CAs...!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how would eso work with them?

are you thinking about brewing one eso distribution per provider?

Several orgs use more than one provider, this won’t scale if I understood it correctly x

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in - users will now need to figure out controller classes on every single install; even simple ones.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, an ExternalSecret using two providers will be a 💫 headache

Loading