Skip to content

chore: design for generators/stores v2alpha1 / out of tree provider#4792

Closed
gusfcarvalho wants to merge 5 commits intomainfrom
gc/design/gen-secretstore-v2
Closed

chore: design for generators/stores v2alpha1 / out of tree provider#4792
gusfcarvalho wants to merge 5 commits intomainfrom
gc/design/gen-secretstore-v2

Conversation

@gusfcarvalho
Copy link
Copy Markdown
Member

No description provided.

Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
@gusfcarvalho gusfcarvalho requested a review from a team as a code owner May 18, 2025 18:03
@gusfcarvalho gusfcarvalho requested a review from knelasevero May 18, 2025 18:03
@gusfcarvalho gusfcarvalho marked this pull request as draft May 18, 2025 18:03
@sonarqubecloud
Copy link
Copy Markdown

@gusfcarvalho
Copy link
Copy Markdown
Member Author

Also worth considering: if we do it, should we just break generators/v1alpha1 or should we reimplement it on a v2alpha1 and mark v1alpha1 deprecated?

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented May 19, 2025

Hm. I don't know. This is a massive rewrite. Like, it's a complete rewrite of everything we have.

@gusfcarvalho
Copy link
Copy Markdown
Member Author

Hm. I don't know. This is a massive rewrite. Like, it's a complete rewrite of everything we have.

YES.

Unfortunately, this was my conclusion after working with both provider versioning and custom templating tickets... I think we need to do massive rewrites on both ExternalSecrets/PushSecrets and Secretstores/Generators... otherwise we will always be trying to explain the nuances and implicit feastures available in one place and not available in another place.

Even if we decide to dont change the API Spec itself, to implement those two tickets alone, a massive rewrite of the codebase would already need to happen (and then it would be full of workarounds, gotchas, because of the way the codebase is structured).

So... if we need to do it 🤷 might as well make it clean. That was my logic.

@gusfcarvalho
Copy link
Copy Markdown
Member Author

re: #4792 (comment) ( I cannot add a comment for some reason) - the idea is to have an extension point for users. My original thought was shared volume so they can add whatever they want with e.g. CSI drivers + a daemonset.

I'm not that concerned with this approach, as the idea really is to support a better way for outOfTree plugins - not move our stuff to be out of tree.

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented May 19, 2025

Hm. I don't know. This is a massive rewrite. Like, it's a complete rewrite of everything we have.

YES.

Unfortunately, this was my conclusion after working with both provider versioning and custom templating tickets... I think we need to do massive rewrites on both ExternalSecrets/PushSecrets and Secretstores/Generators... otherwise we will always be trying to explain the nuances and implicit feastures available in one place and not available in another place.

Even if we decide to dont change the API Spec itself, to implement those two tickets alone, a massive rewrite of the codebase would already need to happen (and then it would be full of workarounds, gotchas, because of the way the codebase is structured).

So... if we need to do it 🤷 might as well make it clean. That was my logic.

Yah I get it. But I'm concerned about effort and time basically. 🤔 I need to think about this. :D Also, really really would like to involve @moolen in this. :) After 1.5 years, I still don't feel qualified to decide on a completely new architecture and API contract. :D

@github-actions
Copy link
Copy Markdown

This pr is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions github-actions bot added the Stale This issue/Pull Request is stale and will be automatically closed label Aug 19, 2025
Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
@github-actions github-actions bot removed the Stale This issue/Pull Request is stale and will be automatically closed label Aug 26, 2025
@evrardjp
Copy link
Copy Markdown
Contributor

Many questions were asked during community meeting about this topic and you mentionned the need to have a discussion.

I can't promise I will be available to discuss this synchronously. If you want to bounce ideas with me, don't hesitate to contact me on slack or in github (same username).

@webstradev
Copy link
Copy Markdown
Contributor

I'd be happy to try and help out on this, I imagine it's a rather large job.

@moolen moolen mentioned this pull request Aug 27, 2025
5 tasks
Copy link
Copy Markdown
Member

@moolen moolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD structure makes sense to me. I think the scope is may be a bit too broad in this design as it covers SecretStores, Generators and additional fields in each resource. We can not design/review/implement that all at once.
Maybe we can limit the scope to how we get to "out of tree SecretStore providers" for now?

As for the design, i think we should cover a couple of more areas:

  1. How would users deploy out of tree providers?
    A. I guess the implicit assumption here is, that these are separate deployments running in separate pods. I would NOT want to support unix domain sockets for providers running in the same pod - simply because one should not run a provider in the same pod (sharing the same service account by default etc). I think we should add that explicitly to the design doc.
  2. How can users migrate to the new SecretStores?
    A. It just came to my mind that it may be worthwhile to add a plugin provider (hello #5057) to SecretStore v1 which essentially talks to a SecretStore v2 provider. That allows users to opt in early to use the new out of tree providers. If we then also offer a separate build without in-tree providers our users get the benefits already with SecretStore v1 and we get early feedback.
    B. Please add a section to the design doc about how users would migrate to the new approach. What would they need change on their end to facilitate the migration? What benefit would they get?
  3. With out of tree providers we need to come up with a repository and governance structure, i think this should be part of the proposal as well.

Just a few initial thoughts, but i'm confident this is the right direction!

@gusfcarvalho gusfcarvalho mentioned this pull request Aug 28, 2025
5 tasks
feat: add migration path

feat: add consequences and acceptance criteria
Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
@github-actions github-actions bot added kind/chore Categorizes Pull Requests for chore activities (like bumping versions) size/m labels Aug 28, 2025
@gusfcarvalho
Copy link
Copy Markdown
Member Author

added a new version addressing all of these comments 🙇 if you all can take another pass, I'd appreciate it

@gusfcarvalho
Copy link
Copy Markdown
Member Author

I'd be happy to try and help out on this, I imagine it's a rather large job.

@webstradev just for the sake of tracking, can you also express your will to join on #5218 ? Trying to be more transparent on these huge-effort issues so community can join them easier.

Copy link
Copy Markdown
Member

@moolen moolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the diff was a good read, i like it. I'll take another pass on the whole document soon.

Copy link
Copy Markdown
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool concept! In general I think this has potential to be fleshed out but Im missing some (for me) very important parts for such a big project (I am very uncreative and usually steal a lot of this stuff from the outline of a KEP, I know its very generic and arguably "over-engineered" but I have come to trust this process):

  • A set of Constraints / Risks and Mitigation Strategies transparently outlined
  • A Production readiness check
    • How users can enable this feature as opt-in, rollback from it
    • How users can monitor out of tree providers
    • How provider communication can be scaled horizontally
    • How troubleshooting is performed (i.e. standardized errors in interfaces)
  • Drawbacks and Alternatives considered
  • A version skew overview (as in what happens if we change an interface of the OOT provider API and the provider was not updated yet)


#### Governance

- One repo per "official" eso maintained provider (e.g., `provider-aws`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also say it's harder on the tracking side. Like issue wise it would be harder to set up project automation and the likes which limit to 5 repositories on a pro account. I couldn't set up more.

@moolen moolen changed the title chore: design for generators/stores v2alpha1 chore: design for generators/stores v2alpha1 / out of tree provider Sep 18, 2025

#### SecretStore

Remove `spec.provider` and introduce `spec.providerConfig`, which contains the endpoint and authentication required to reach an out-of-tree provider, plus a provider-owned reference forwarded on requests.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it will be the same Kind object, what will be the “stored” version in the k8s API?
How to migrate from one version to another?
Also, if we want to keep both provider and providerConfig it makes controller behavior quite “hacky” (spec is not fully predictable).

This makes me wonder if creating a new Kind object would be better, like

apiVersion: secretstore.external-secrets.io/v1alpha1
kind: ExternalSecretStore

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are leveraging a new group already, for what it’s worth it’s not the same resource and migration isn’t via controllers/webhooks - it’s user faced

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one really had a look at before on CRD) could break an object.

On user experience:
Doing a "kubectl get" will give 1 resource, or the other or an ambiguous error.

On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one > really had a look at before on CRD) could break an object.

There will be no good solution anyways. Either users will have to learn about a new resource or to learn about the existence of the same resource with a group. We've already tried keeping things the same to minimize user impact (by adding conversion logic ourselves between versionings and by having different group/kind names when we moved from KES to ESO), and the hard truth is that none minimized user impact, they still needed to run fully named kubectl commands to figure out versions (and we introduced a new point of failure via the conversion webhook).

On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.

That would only be applicable if we intended to leverage the same controller code to reconcile both resources - and I think that's a bad idea 😄. We can create new controller with watches for specific GVK matching that. Controller level, it would only get the correct resource.

@webstradev
Copy link
Copy Markdown
Contributor

Really thorough proposal, pretty easy to understand the intentions even for someone new to ESO like myself.
Only concern e2e wise would be that there are quite a few ESO maintained providers at the moment (though maybe that will change after this).

Does that mean for the ESO maitained ones we always add e2e tests, in which case they would become quite large and quite slow

@gusfcarvalho
Copy link
Copy Markdown
Member Author

Really thorough proposal, pretty easy to understand the intentions even for someone new to ESO like myself. Only concern e2e wise would be that there are quite a few ESO maintained providers at the moment (though maybe that will change after this).

Does that mean for the ESO maitained ones we always add e2e tests, in which case they would become quite large and quite slow

we only really have 5 of them as per the stability support docs: https://external-secrets.io/latest/introduction/stability-support/

for all of them, our current e2e test suite already covers the use cases. The work here would be more of a porting work.

Also, I think once this is done, e2e test for a given provider should be ran on that specific provider repo; on core external-secrets we run a conformance suite maybe with the fake provider - and let any provider repository run the conformance suite.

@webstradev
Copy link
Copy Markdown
Contributor

Really thorough proposal, pretty easy to understand the intentions even for someone new to ESO like myself. Only concern e2e wise would be that there are quite a few ESO maintained providers at the moment (though maybe that will change after this).

Does that mean for the ESO maitained ones we always add e2e tests, in which case they would become quite large and quite slow

we only really have 5 of them as per the stability support docs: https://external-secrets.io/latest/introduction/stability-support/

for all of them, our current e2e test suite already covers the use cases. The work here would be more of a porting work.

Also, I think once this is done, e2e test for a given provider should be ran on that specific provider repo; on core external-secrets we run a conformance suite maybe with the fake provider - and let any provider repository run the conformance suite.

Regarding the last point I think it was mentioned in the community meeting that separate repo's is hard because project automation becomes no longer possible because ur limited to max 5 repo's or something like that.

I thought the current leaning was one monorepo

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Sep 24, 2025

Regarding the last point I think it was mentioned in the community meeting that separate repo's is hard because project automation becomes no longer possible because ur limited to max 5 repo's or something like that.

I thought the current leaning was one monorepo

I've seen that on the community meeting note docs and added a comment there. Because we are under CNCF enterprise org (I think?), we actually have 20 repos to support - but even so this is just a 'nice' feature from github. We could just simply have a template repository for providers (provider-template or something like that) that contains a set of github actions we need across repositories.

For the project one, we can simply have a workflow that triggers actions/add-to-project and that's it.

IMO I think monorepoing will cause the same pain for people like the OTEL community has with the otel-collector-contrib monorepo.

(and we will run on the same ownership limitations we have today x:)

That said, I don't think having one versus the other is a blocker. I just think one option has more payoffs than downsides 😆 both of them will work

(though, IMO, one will work better ;P)

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented Sep 25, 2025

Like I said in the other comment, project automation breaks after a couple of repos. The rest of the repositories would not be able to automatically add issues and such to the global project board.

@gusfcarvalho
Copy link
Copy Markdown
Member Author

Like I said in the other comment, project automation breaks after a couple of repos. The rest of the repositories would not be able to automatically add issues and such to the global project board.

I know - but from what I understood this was the automatic project config to watch over repos. The other way around would still work with https://github.com/actions/add-to-project (i.e. actions running on each repo to auto add issues and PRs to the same global project). This is the recommended way to do that for multiple repositories and one project.

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented Sep 25, 2025

As long as it works... :D

@gusfcarvalho
Copy link
Copy Markdown
Member Author

As long as it works... :D

yeah, if it doesnt work, multiple repos will be bad...

My problem with monorepo is that releases/tags/semver promotion will be hard to manage & coordinate, if we want to keep them independent of each other.

Take a look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tags. They don't even propose to release individual components. If we can pull monorepo off without going to that path, it is for sure the best approach.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 25, 2025

There are a lot of opinions on monorepo vs. several repos. I'm wondering what we would like to start with and put that into the initial design.
With the recent community i'm leaning towards leaving it in-repo, just for the sake of simplicity and cohesion. I would like to make a decision. Do we need to vote on it, or can we find consensus?

I think it's only you @gusfcarvalho who is opposed to this. Could you live with a monorepo initially, and then revise this along the way to see if we want to keep it like that?

@gusfcarvalho
Copy link
Copy Markdown
Member Author

I think it's only you @gusfcarvalho who is opposed to this. Could you live with a monorepo initially, and then revise this along the way to see if we want to keep it like that?

I was more giving advice 😺 . Don't consider myself a blocker here, as I'm likely not the one that will face these issues anyways 🤷

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 26, 2025

I think @gusfcarvalho made a very good point around the release process, which is missing in the proposal:

Take a look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tags. They don't even propose to release individual components. If we can pull monorepo off without going to that path, it is for sure the best approach.

For context: they have a bunch of components and they share the same tag v0.136.0. These components are grouped up as a "distribution". There are many distributions, like core, contrib or k8s. The components are versioned in bulk, it seems that some distributions even are versioned/released coherently.

Anyhow, we should think of an strategy on how we would create releases for the providers. The design proposal explicitly says allowing independent versioning is one of the goals, though i'm wondering if this is a problem we need to solve. I'm mostly concerned about the large pile of dependencies we ship with eso and i want to declutter that. I don't care much about individual versioning.

🤔 WDYT?

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Sep 27, 2025

Anyhow, we should think of an strategy on how we would create releases for the providers. The design proposal explicitly says allowing independent versioning is one of the goals, though i'm wondering if this is a problem we need to solve. I'm mostly concerned about the large pile of dependencies we ship with eso and i want to declutter that. I don't care much about individual versioning.

If we are taking away individual versioning as a goal, IMO, we don't need out of tree providers at all (not in its shape at least). We can safely resolve dependency clutter by building custom images - @Skarlso suggested a way forward which is very simple to do.

IMO - if we want to solve only the dependency clutter, we should not go through this proposal and instead just provide a way for users to go build their eso images with only a set of providers enabled. We could even build and ship eso versions that are only enabled for only one provider (though I believe limited to the 5 ones we support)

Much easier, faster, safer, opt-in, and with limited (if any) breaking changes for users.

Then, maybe, and only maybe, we can further separate CRDs as to avoid the SecretStore manifest clutter and some sort of versioning (but weak one, as the controller will be a bundled version anyways).

IMO, out of tree providers as it is drafted out, is only worth pursuing because of the strong separation it brings everywhere (code, distribution, runtime, permissions, CRD).

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Sep 27, 2025

and just for the sake of clarity what I mean is:

- Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution.
- Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers.
- Make referent authentication modes explicit and easier to use.
- Allow users to install only the providers they need.
- reduce the dependency footprint of ESO to improve its security posture

If you take away point 1, everything else can be achieved with non-breaking additions to the SecretStore/Generator specs + go build flags + new builds for some cases we provide + docs for users who want to build their specific setup - which is simpler to do and will cause fewer breaking changes.

Also, just for the sake of clarity - I'm fine going that route 😄 Improving the security posture was also the most important thing for me when originally thinking about this design.

I just don't think it is worth it to pursue out-of-tree as way to achieve the goal of improving the security posture alone. There are easier ways to achieve that 😄

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Sep 27, 2025

Also worth saying we can still find a way to produce coherent tags + releases + packages for individual providers in a monorepo setup. I just think this is going to be tricky, purely based on my past frustrations with monorepos (later found out to be also shared pain in other cncf projects).

@evrardjp
Copy link
Copy Markdown
Contributor

Is it me or the goals+trajectory are not clear? With so many comments, can we actually decide in an asynchronous manner, or should we organise a dedicated call? Might a call help align goals and trajectory?

I personally need more time to provide good insights.
My opinion so far will only be partial.

please ignore this comment if not relevant.

@evrardjp
Copy link
Copy Markdown
Contributor

evrardjp commented Sep 27, 2025

and just for the sake of clarity what I mean is:


- Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution.

- Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers.

- Make referent authentication modes explicit and easier to use.

- Allow users to install only the providers they need.

- reduce the dependency footprint of ESO to improve its security posture

If you take away point 1, everything else can be achieved with non-breaking additions to the SecretStore/Generator specs + go build flags + new builds for some cases we provide + docs for users who want to build their specific setup - which is simpler to do and will cause fewer breaking changes.

Also, just for the sake of clarity - I'm fine going that route 😄 Improving the security posture was also the most important thing for me when originally thinking about this design.

I just don't think it is worth it to pursue out-of-tree as way to achieve the goal of improving the security posture alone. There are easier ways to achieve that 😄

I agree with these goals btw.

Minor point on the first one: the fact someone release in a vendor independent way will always result in compat matrix. So this is a bit of a lie to believe it will all work independently/regardless of version. But i think i got the gist :)

@evrardjp
Copy link
Copy Markdown
Contributor

evrardjp commented Sep 27, 2025

Anyhow, we should think of an strategy on how we would create releases for the providers. The design proposal explicitly says allowing independent versioning is one of the goals, though i'm wondering if this is a problem we need to solve. I'm mostly concerned about the large pile of dependencies we ship with eso and i want to declutter that. I don't care much about individual versioning.

If we are taking away individual versioning as a goal, IMO, we don't need out of tree providers at all (not in its shape at least). We can safely resolve dependency clutter by building custom images - @Skarlso suggested a way forward which is very simple to do.

IMO - if we want to solve only the dependency clutter, we should not go through this proposal and instead just provide a way for users to go build their eso images with only a set of providers enabled. We could even build and ship eso versions that are only enabled for only one provider (though I believe limited to the 5 ones we support)

Much easier, faster, safer, opt-in, and with limited (if any) breaking changes for users.

Then, maybe, and only maybe, we can further separate CRDs as to avoid the SecretStore manifest clutter and some sort of versioning (but weak one, as the controller will be a bundled version anyways).

IMO, out of tree providers as it is drafted out, is only worth pursuing because of the strong separation it brings everywhere (code, distribution, runtime, permissions, CRD).

@gusfcarvalho Do you believe there is a way to work incrementally? This proposal seems very sane to get started.

Copy link
Copy Markdown
Contributor

@evrardjp evrardjp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now firmly believing this is too large to get merged. So many blind spots.

I think this idea has really good stuff. And it feels it's coming from the heart and pains.
But I am also believing we should not fix everything at once.
For me, we need to build a clear roadmap with milestones instead.

@gusfcarvalho do you believe we could get to a point where YOUR major painpoints would be documented, sorted by order of importance, and then we figure out all together a path towards this design? Some people might feel differently about the pains, so if we explain them well, we might get a consensus quicker. WDYT?


- Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution.
- Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers.
- Make referent authentication modes explicit and easier to use.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without explaining the terms you use, I can't follow you.
What is the referent authentication? (The current docs refer to the term only on ONE page, which is provider support, and the providers page don't even mention the same term).

Would you mind rephrasing two things:

  • What you're trying to have with the explicitness of it?
  • How would the explicitness make some feature easier to use when some ppl are not even aware of their usage?

## Goals

- Implement new CRDs: SecretStore/v2alpha1, ClusterSecretStore/v2alpha1, Generator/v2alpha1, ClusterGenerator/v2alpha1.
- Enable ESO to run without in-tree providers; users install providers separately.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't ESO become an empty shell with no real e2e testing then?
Wouldn't it make sense to have at least ONE example code (if possible, without external deps), in-tree?

- Enable ESO to run without in-tree providers; users install providers separately.
- Provide a provider configuration model that connects to out-of-tree providers via gRPC/TLS.
- Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources).
- Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A complex system cannot be made simple. It's inherent to its nature.
I am afraid you'll end up with a poor abstraction that is a terrible shell. Or maybe I didn't get it.

However, being afraid doesn't help anyone here, we most likely need to do a PoC, and see how it goes... WDYT?

- Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources).
- Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies.
- Maintain ExternalSecret/PushSecret compatibility via apiVersion on storeRef.
- Provide a migration path from v1 to v2, including a v1 plugin provider bridge and dedicated builds without v1 code.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary?
That sounds very complex.

If we were to build all the current plugins into an RPC, we could use the helm chart and lots of explanations to move things forward?


#### Governance

- One repo per "official" eso maintained provider (e.g., `provider-aws`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be at the crux of some of the proposition. I think this need a call to understand alignment... Or, if no call would resolve that, maybe we should write the code and see?


### Notes/Constraints/Caveats

- Do not implement Unix domain sockets for sidecars; providers should run as independent deployments to ensure horizontal scalability, separate network policies, and stronger isolation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to see why it's a no go. I don't see how horizontal scalability is limited with sockets. It's just a different (harder) design. The fact to go networked also brings downsides.

This needs to be clarified IMO.


- Operational overhead: Users manage separate provider deployments
- mitigated by dedicated Helm charts and independent versioning per provider.
- Maintenance overhead: every provider needs similar infrastructure such as a helm chart, e2e test framework and test cases, a common library to bootstrap the provider's GRPC server, metrics, initialising clients etc.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why.... (I mean, it's in accordance with the rest of the document.... but i am not understanding why we go so far).

- mitigated by providing one or more `common` repos, e.g. `provider-common` which host the shared code among eso-core and all providers (similar to https://github.com/fluxcd/pkg).
- TLS management: someone needs to manage the TLS certs/keys: do we push it to the user or do we provide it ourselves?
- mitigated by (1) implement certificate management with `cert-controller` and provide an integration mechanism with cert-manager.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some users are relying on ESO to bootstrap their infra to then have cert-manager running.

You're basically killing that by introducing a circular dependency. I would believe we need an alternative way to establish trust between the plugin and the main controller.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What alternative ways are you thinking of? we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts

Copy link
Copy Markdown
Member Author

@gusfcarvalho gusfcarvalho Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use SA auth here, as the SA authenticating it would be external-secrets' to validate the call to the plugin. (as in, interception of a highly privileged k8s SA). (i.e. eso is the client here - the plugin is the one needing to validate good clients)

IMO whatever we do should be x509 rooted. IMO, I am in favor of leveraging cert-controller for that.

Integration with cert-manager should be a plus just like it is today - everything works without it, but if people want to ditch cert-controller over it, they can, sort of. (for setups that do not have the circular dependency mentioned by JP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, it's hard for me to give an opinion on what to secure, if we are not clear about conventions, what we want to do and their associated risks.

If we consider that it's plain http between pods in a cluster, it's not the same problem as plain http between containers of the same pod. I really would like us to clarify what's configurable to know how far someone will take the plugin.

Nevertheless, establishing a trust tunnel can be done in multiple ways. And It doesn't need to be asymmetric btw.

For @gusfcarvalho 's comment - I am even more confused: You don't need to use eso's sa for that. In fact you definitely shouldn't. (I think we agree there).
That doesn't prevent the use of another sa with 0 pod access. Especially useful if you want to federate between the kube cluster and something else. While We could do that, it gets tricky for some managed kubernetes offering, so i would rather avoid it too.

Simpler solutions are simpler to maintain. Are we sure we are aiming for simple here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evrardjp I was refering to this:

we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts

I understood it as leverage existing K8s SAs.

cert-controller IMO is the simplest solution 😄. It just adds self-signed certificates to a Secret, and does that by following labels today (though we only search for the webhook label). It is very simple to extend it to a common label to add Self Signed Certs to any providers.

Copy link
Copy Markdown
Contributor

@evrardjp evrardjp Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a dependency looks deceptively simple.

But by adding it, you've coupled to a project.

It means we're gonna need to start maintain this other project should something happen there, in order to keep our software working.

We're gonna break many ppl CI/CD cause now that we'd need this intertwining. Vendors will curse us, cause they would have to change their tooling.

For me, bringing another project as a dependency is a last resort move : we should reduce the code we maintain, not add more.

Side note: when I read myself I think "Damn, I start to be a grumpy old git." 🤣

Copy link
Copy Markdown
Member Author

@gusfcarvalho gusfcarvalho Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But cert-controller is ours already, we install it everywhere by default 🙂

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL, I all misread. I read cert-manager out of habit! Ofc that makes more sense :)

## Alternatives

- just provide a GRPC plugin mechanism and move providers out of tree without a v2 SecretStore
- get rid of the SecretStore alltogether and directly point from a `ExternalSecret` at a `Kind=AWSSecretsManager`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go as far as evaluating this : wouldn't this be simpler? Crds per provider, all using the same ESO go modules? Having providers in different repos with the same code structure would allow wholesale PRs. Bots can help maintain versions of providers by bumping our eso modules. A single helm chart could deploy three different providers implementations (x or y would be installed based on values).
Code completely independent in each repo, so no dependency hell. Ownership very clear in each repo. Easy to phase out a provider when some repo isn't up to our standards: we remove from helm chart and archive the repo.
Yes, there is code repeated. But it looks simpler to me from my window. I need to know why this shouldn't be feasible before having an opinion on this whole effort.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a benefit regardless if we get rid of SecretStore or not. It is a more direct approach, which means we will lose some features (like cluster-scoping, and flood gaate controls if a given store is out). I think the simplicity will be less controllers for us to maintain and merge - as in everything could be handled easily on a simple core-controller loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what is the hardest: maintain modules for n controllers and have controllers a bit more specific in some places OR a grpc system + cert-manager?

Same question for operations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I don't say that because some of my customers are scared of self signed CAs...!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how would eso work with them?

are you thinking about brewing one eso distribution per provider?

Several orgs use more than one provider, this won’t scale if I understood it correctly x

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in - users will now need to figure out controller classes on every single install; even simple ones.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, an ExternalSecret using two providers will be a 💫 headache

@gusfcarvalho
Copy link
Copy Markdown
Member Author

I missed the last community calls - maybe you've discussed this in there - did we find a decision point on the scope of this design? as in - are we encompassing dep-tree + independent versioning, or just dep-trees ? :)

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented Oct 17, 2025

No, not yet. Moritz was actually asking for one. :D So we really should. :)

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Oct 17, 2025

Let’s vote then 😃

🚀 = scope for deps-tree + independent versioning

❤️ = scope for deps-tree only

👀 = abstain / any option is fine

@gusfcarvalho
Copy link
Copy Markdown
Member Author

gusfcarvalho commented Oct 17, 2025

Re: my vote: I think encompassing both is way too complex to pull and even more complex to maintain. None of the options we’ve been discussing thus far come with easy trade offs 😅. Plus, while we are onboarding people, I’m still skeptical of our longer term ability to maintain this extra code / repos / charts, so I’m picking one (reps-tree reduction) over the other (independent versioning).

we still need them both, but maybe a different approach without neccessaroly moving to a plugin system. e.g we could separate CRDs for providers still, keep them all in tree; and have builds that learn how to manage / control that provider.

For now We’ll have big CRDs but that’s fine I guess if users can just build eso focusing on the providers they need. Security will be heavily improved.

@Skarlso
Copy link
Copy Markdown
Contributor

Skarlso commented Oct 20, 2025

@moolen ?

@moolen
Copy link
Copy Markdown
Member

moolen commented Oct 20, 2025

I think we should do a scoping session (see link): I think we need to clarify at least these three things:

  1. goal setting: What is the minimal viable subset we aim to land first, what will we defer to later and what the implications of that are
  2. provider runtime model
  3. Governance (+repository) model for providers

@moolen
Copy link
Copy Markdown
Member

moolen commented Nov 5, 2025

superseded by #5544

@moolen moolen closed this Nov 5, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in External Secrets Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/chore Categorizes Pull Requests for chore activities (like bumping versions) size/m

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants