chore: design for generators/stores v2alpha1 / out of tree provider#4792
chore: design for generators/stores v2alpha1 / out of tree provider#4792gusfcarvalho wants to merge 5 commits intomainfrom
Conversation
Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
|
|
Also worth considering: if we do it, should we just break |
|
Hm. I don't know. This is a massive rewrite. Like, it's a complete rewrite of everything we have. |
YES. Unfortunately, this was my conclusion after working with both provider versioning and custom templating tickets... I think we need to do massive rewrites on both Even if we decide to dont change the API Spec itself, to implement those two tickets alone, a massive rewrite of the codebase would already need to happen (and then it would be full of workarounds, gotchas, because of the way the codebase is structured). So... if we need to do it 🤷 might as well make it clean. That was my logic. |
|
re: #4792 (comment) ( I cannot add a comment for some reason) - the idea is to have an extension point for users. My original thought was shared volume so they can add whatever they want with e.g. CSI drivers + a daemonset. I'm not that concerned with this approach, as the idea really is to support a better way for outOfTree plugins - not move our stuff to be out of tree. |
Yah I get it. But I'm concerned about effort and time basically. 🤔 I need to think about this. :D Also, really really would like to involve @moolen in this. :) After 1.5 years, I still don't feel qualified to decide on a completely new architecture and API contract. :D |
|
This pr is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
|
Many questions were asked during community meeting about this topic and you mentionned the need to have a discussion. I can't promise I will be available to discuss this synchronously. If you want to bounce ideas with me, don't hesitate to contact me on slack or in github (same username). |
|
I'd be happy to try and help out on this, I imagine it's a rather large job. |
moolen
left a comment
There was a problem hiding this comment.
The CRD structure makes sense to me. I think the scope is may be a bit too broad in this design as it covers SecretStores, Generators and additional fields in each resource. We can not design/review/implement that all at once.
Maybe we can limit the scope to how we get to "out of tree SecretStore providers" for now?
As for the design, i think we should cover a couple of more areas:
- How would users deploy out of tree providers?
A. I guess the implicit assumption here is, that these are separate deployments running in separate pods. I would NOT want to support unix domain sockets for providers running in the same pod - simply because one should not run a provider in the same pod (sharing the same service account by default etc). I think we should add that explicitly to the design doc. - How can users migrate to the new SecretStores?
A. It just came to my mind that it may be worthwhile to add a plugin provider (hello #5057) to SecretStore v1 which essentially talks to a SecretStore v2 provider. That allows users to opt in early to use the new out of tree providers. If we then also offer a separate build without in-tree providers our users get the benefits already with SecretStore v1 and we get early feedback.
B. Please add a section to the design doc about how users would migrate to the new approach. What would they need change on their end to facilitate the migration? What benefit would they get? - With out of tree providers we need to come up with a repository and governance structure, i think this should be part of the proposal as well.
Just a few initial thoughts, but i'm confident this is the right direction!
feat: add migration path feat: add consequences and acceptance criteria Signed-off-by: Gustavo Carvalho <gustavo@externalsecrets.com>
|
added a new version addressing all of these comments 🙇 if you all can take another pass, I'd appreciate it |
@webstradev just for the sake of tracking, can you also express your will to join on #5218 ? Trying to be more transparent on these huge-effort issues so community can join them easier. |
moolen
left a comment
There was a problem hiding this comment.
the diff was a good read, i like it. I'll take another pass on the whole document soon.
jakobmoellerdev
left a comment
There was a problem hiding this comment.
Very cool concept! In general I think this has potential to be fleshed out but Im missing some (for me) very important parts for such a big project (I am very uncreative and usually steal a lot of this stuff from the outline of a KEP, I know its very generic and arguably "over-engineered" but I have come to trust this process):
- A set of Constraints / Risks and Mitigation Strategies transparently outlined
- A Production readiness check
- How users can enable this feature as opt-in, rollback from it
- How users can monitor out of tree providers
- How provider communication can be scaled horizontally
- How troubleshooting is performed (i.e. standardized errors in interfaces)
- Drawbacks and Alternatives considered
- A version skew overview (as in what happens if we change an interface of the OOT provider API and the provider was not updated yet)
|
|
||
| #### Governance | ||
|
|
||
| - One repo per "official" eso maintained provider (e.g., `provider-aws`). |
There was a problem hiding this comment.
I would also say it's harder on the tracking side. Like issue wise it would be harder to set up project automation and the likes which limit to 5 repositories on a pro account. I couldn't set up more.
|
|
||
| #### SecretStore | ||
|
|
||
| Remove `spec.provider` and introduce `spec.providerConfig`, which contains the endpoint and authentication required to reach an out-of-tree provider, plus a provider-owned reference forwarded on requests. |
There was a problem hiding this comment.
As it will be the same Kind object, what will be the “stored” version in the k8s API?
How to migrate from one version to another?
Also, if we want to keep both provider and providerConfig it makes controller behavior quite “hacky” (spec is not fully predictable).
This makes me wonder if creating a new Kind object would be better, like
apiVersion: secretstore.external-secrets.io/v1alpha1
kind: ExternalSecretStoreThere was a problem hiding this comment.
We are leveraging a new group already, for what it’s worth it’s not the same resource and migration isn’t via controllers/webhooks - it’s user faced
There was a problem hiding this comment.
Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one really had a look at before on CRD) could break an object.
On user experience:
Doing a "kubectl get" will give 1 resource, or the other or an ambiguous error.
On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.
There was a problem hiding this comment.
Changing groups is really not an obvious trick.
It makes upgrading really error-prone:
With a "simple" look it will be difficult to know which resource it is; just missing the group (something no-one > really had a look at before on CRD) could break an object.
There will be no good solution anyways. Either users will have to learn about a new resource or to learn about the existence of the same resource with a group. We've already tried keeping things the same to minimize user impact (by adding conversion logic ourselves between versionings and by having different group/kind names when we moved from KES to ESO), and the hard truth is that none minimized user impact, they still needed to run fully named kubectl commands to figure out versions (and we introduced a new point of failure via the conversion webhook).
On controller side:
As the controller runtime just reconciles on name/namespace, it will have to "guess" what version to reconcile.
Or we would have to duplicate another controller just for this new group.
That would only be applicable if we intended to leverage the same controller code to reconcile both resources - and I think that's a bad idea 😄. We can create new controller with watches for specific GVK matching that. Controller level, it would only get the correct resource.
|
Really thorough proposal, pretty easy to understand the intentions even for someone new to ESO like myself. Does that mean for the ESO maitained ones we always add e2e tests, in which case they would become quite large and quite slow |
we only really have 5 of them as per the stability support docs: https://external-secrets.io/latest/introduction/stability-support/ for all of them, our current e2e test suite already covers the use cases. The work here would be more of a porting work. Also, I think once this is done, e2e test for a given provider should be ran on that specific provider repo; on core external-secrets we run a conformance suite maybe with the fake provider - and let any provider repository run the conformance suite. |
Regarding the last point I think it was mentioned in the community meeting that separate repo's is hard because project automation becomes no longer possible because ur limited to max 5 repo's or something like that. I thought the current leaning was one monorepo |
I've seen that on the community meeting note docs and added a comment there. Because we are under CNCF enterprise org (I think?), we actually have 20 repos to support - but even so this is just a 'nice' feature from github. We could just simply have a template repository for providers ( For the project one, we can simply have a workflow that triggers IMO I think monorepoing will cause the same pain for people like the OTEL community has with the otel-collector-contrib monorepo. (and we will run on the same ownership limitations we have today x:) That said, I don't think having one versus the other is a blocker. I just think one option has more payoffs than downsides 😆 both of them will work (though, IMO, one will work better ;P) |
|
Like I said in the other comment, project automation breaks after a couple of repos. The rest of the repositories would not be able to automatically add issues and such to the global project board. |
I know - but from what I understood this was the automatic project config to watch over repos. The other way around would still work with https://github.com/actions/add-to-project (i.e. actions running on each repo to auto add issues and PRs to the same global project). This is the recommended way to do that for multiple repositories and one project. |
|
As long as it works... :D |
yeah, if it doesnt work, multiple repos will be bad... My problem with monorepo is that releases/tags/semver promotion will be hard to manage & coordinate, if we want to keep them independent of each other. Take a look at https://github.com/open-telemetry/opentelemetry-collector-contrib/tags. They don't even propose to release individual components. If we can pull monorepo off without going to that path, it is for sure the best approach. |
|
There are a lot of opinions on monorepo vs. several repos. I'm wondering what we would like to start with and put that into the initial design. I think it's only you @gusfcarvalho who is opposed to this. Could you live with a monorepo initially, and then revise this along the way to see if we want to keep it like that? |
I was more giving advice 😺 . Don't consider myself a blocker here, as I'm likely not the one that will face these issues anyways 🤷 |
|
I think @gusfcarvalho made a very good point around the release process, which is missing in the proposal:
For context: they have a bunch of components and they share the same tag Anyhow, we should think of an strategy on how we would create releases for the providers. The design proposal explicitly says 🤔 WDYT? |
If we are taking away individual versioning as a goal, IMO, we don't need out of tree providers at all (not in its shape at least). We can safely resolve dependency clutter by building custom images - @Skarlso suggested a way forward which is very simple to do. IMO - if we want to solve only the dependency clutter, we should not go through this proposal and instead just provide a way for users to Much easier, faster, safer, opt-in, and with limited (if any) breaking changes for users. Then, maybe, and only maybe, we can further separate CRDs as to avoid the SecretStore manifest clutter and some sort of versioning (but weak one, as the controller will be a bundled version anyways). IMO, out of tree providers as it is drafted out, is only worth pursuing because of the strong separation it brings everywhere (code, distribution, runtime, permissions, CRD). |
|
and just for the sake of clarity what I mean is: If you take away point 1, everything else can be achieved with non-breaking additions to the SecretStore/Generator specs + go build flags + new builds for some cases we provide + docs for users who want to build their specific setup - which is simpler to do and will cause fewer breaking changes. Also, just for the sake of clarity - I'm fine going that route 😄 Improving the security posture was also the most important thing for me when originally thinking about this design. I just don't think it is worth it to pursue out-of-tree as way to achieve the goal of improving the security posture alone. There are easier ways to achieve that 😄 |
|
Also worth saying we can still find a way to produce coherent tags + releases + packages for individual providers in a monorepo setup. I just think this is going to be tricky, purely based on my past frustrations with monorepos (later found out to be also shared pain in other cncf projects). |
|
Is it me or the goals+trajectory are not clear? With so many comments, can we actually decide in an asynchronous manner, or should we organise a dedicated call? Might a call help align goals and trajectory? I personally need more time to provide good insights. please ignore this comment if not relevant. |
I agree with these goals btw. Minor point on the first one: the fact someone release in a vendor independent way will always result in compat matrix. So this is a bit of a lie to believe it will all work independently/regardless of version. But i think i got the gist :) |
@gusfcarvalho Do you believe there is a way to work incrementally? This proposal seems very sane to get started. |
There was a problem hiding this comment.
I am now firmly believing this is too large to get merged. So many blind spots.
I think this idea has really good stuff. And it feels it's coming from the heart and pains.
But I am also believing we should not fix everything at once.
For me, we need to build a clear roadmap with milestones instead.
@gusfcarvalho do you believe we could get to a point where YOUR major painpoints would be documented, sorted by order of importance, and then we figure out all together a path towards this design? Some people might feel differently about the pains, so if we explain them well, we might get a consensus quicker. WDYT?
|
|
||
| - Support out-of-tree providers as first-class citizens, allowing independent versioning and distribution. | ||
| - Unify feature sets of SecretStores and Generators (e.g. refresh, gating) under consistent CRDs and controllers. | ||
| - Make referent authentication modes explicit and easier to use. |
There was a problem hiding this comment.
Without explaining the terms you use, I can't follow you.
What is the referent authentication? (The current docs refer to the term only on ONE page, which is provider support, and the providers page don't even mention the same term).
Would you mind rephrasing two things:
- What you're trying to have with the explicitness of it?
- How would the explicitness make some feature easier to use when some ppl are not even aware of their usage?
| ## Goals | ||
|
|
||
| - Implement new CRDs: SecretStore/v2alpha1, ClusterSecretStore/v2alpha1, Generator/v2alpha1, ClusterGenerator/v2alpha1. | ||
| - Enable ESO to run without in-tree providers; users install providers separately. |
There was a problem hiding this comment.
Wouldn't ESO become an empty shell with no real e2e testing then?
Wouldn't it make sense to have at least ONE example code (if possible, without external deps), in-tree?
| - Enable ESO to run without in-tree providers; users install providers separately. | ||
| - Provide a provider configuration model that connects to out-of-tree providers via gRPC/TLS. | ||
| - Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources). | ||
| - Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies. |
There was a problem hiding this comment.
A complex system cannot be made simple. It's inherent to its nature.
I am afraid you'll end up with a poor abstraction that is a terrible shell. Or maybe I didn't get it.
However, being afraid doesn't help anyone here, we most likely need to do a PoC, and see how it goes... WDYT?
| - Make referent authentication explicit (e.g., authentication scope for cluster-scoped resources). | ||
| - Add unified behaviors: refresh intervals, controller classes, retry settings, and gating policies. | ||
| - Maintain ExternalSecret/PushSecret compatibility via apiVersion on storeRef. | ||
| - Provide a migration path from v1 to v2, including a v1 plugin provider bridge and dedicated builds without v1 code. |
There was a problem hiding this comment.
Is this really necessary?
That sounds very complex.
If we were to build all the current plugins into an RPC, we could use the helm chart and lots of explanations to move things forward?
|
|
||
| #### Governance | ||
|
|
||
| - One repo per "official" eso maintained provider (e.g., `provider-aws`). |
There was a problem hiding this comment.
This seems to be at the crux of some of the proposition. I think this need a call to understand alignment... Or, if no call would resolve that, maybe we should write the code and see?
|
|
||
| ### Notes/Constraints/Caveats | ||
|
|
||
| - Do not implement Unix domain sockets for sidecars; providers should run as independent deployments to ensure horizontal scalability, separate network policies, and stronger isolation. |
There was a problem hiding this comment.
I would be interested to see why it's a no go. I don't see how horizontal scalability is limited with sockets. It's just a different (harder) design. The fact to go networked also brings downsides.
This needs to be clarified IMO.
|
|
||
| - Operational overhead: Users manage separate provider deployments | ||
| - mitigated by dedicated Helm charts and independent versioning per provider. | ||
| - Maintenance overhead: every provider needs similar infrastructure such as a helm chart, e2e test framework and test cases, a common library to bootstrap the provider's GRPC server, metrics, initialising clients etc. |
There was a problem hiding this comment.
I am wondering why.... (I mean, it's in accordance with the rest of the document.... but i am not understanding why we go so far).
| - mitigated by providing one or more `common` repos, e.g. `provider-common` which host the shared code among eso-core and all providers (similar to https://github.com/fluxcd/pkg). | ||
| - TLS management: someone needs to manage the TLS certs/keys: do we push it to the user or do we provide it ourselves? | ||
| - mitigated by (1) implement certificate management with `cert-controller` and provide an integration mechanism with cert-manager. | ||
|
|
There was a problem hiding this comment.
Some users are relying on ESO to bootstrap their infra to then have cert-manager running.
You're basically killing that by introducing a circular dependency. I would believe we need an alternative way to establish trust between the plugin and the main controller.
There was a problem hiding this comment.
What alternative ways are you thinking of? we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts
There was a problem hiding this comment.
We should not use SA auth here, as the SA authenticating it would be external-secrets' to validate the call to the plugin. (as in, interception of a highly privileged k8s SA). (i.e. eso is the client here - the plugin is the one needing to validate good clients)
IMO whatever we do should be x509 rooted. IMO, I am in favor of leveraging cert-controller for that.
Integration with cert-manager should be a plus just like it is today - everything works without it, but if people want to ditch cert-controller over it, they can, sort of. (for setups that do not have the circular dependency mentioned by JP.
There was a problem hiding this comment.
First, it's hard for me to give an opinion on what to secure, if we are not clear about conventions, what we want to do and their associated risks.
If we consider that it's plain http between pods in a cluster, it's not the same problem as plain http between containers of the same pod. I really would like us to clarify what's configurable to know how far someone will take the plugin.
Nevertheless, establishing a trust tunnel can be done in multiple ways. And It doesn't need to be asymmetric btw.
For @gusfcarvalho 's comment - I am even more confused: You don't need to use eso's sa for that. In fact you definitely shouldn't. (I think we agree there).
That doesn't prevent the use of another sa with 0 pod access. Especially useful if you want to federate between the kube cluster and something else. While We could do that, it gets tricky for some managed kubernetes offering, so i would rather avoid it too.
Simpler solutions are simpler to maintain. Are we sure we are aiming for simple here?
There was a problem hiding this comment.
@evrardjp I was refering to this:
we could for example introduce service account authentication and RBAC based tokens, but im interested to hear thoughts
I understood it as leverage existing K8s SAs.
cert-controller IMO is the simplest solution 😄. It just adds self-signed certificates to a Secret, and does that by following labels today (though we only search for the webhook label). It is very simple to extend it to a common label to add Self Signed Certs to any providers.
There was a problem hiding this comment.
Adding a dependency looks deceptively simple.
But by adding it, you've coupled to a project.
It means we're gonna need to start maintain this other project should something happen there, in order to keep our software working.
We're gonna break many ppl CI/CD cause now that we'd need this intertwining. Vendors will curse us, cause they would have to change their tooling.
For me, bringing another project as a dependency is a last resort move : we should reduce the code we maintain, not add more.
Side note: when I read myself I think "Damn, I start to be a grumpy old git." 🤣
There was a problem hiding this comment.
But cert-controller is ours already, we install it everywhere by default 🙂
There was a problem hiding this comment.
There was a problem hiding this comment.
LOL, I all misread. I read cert-manager out of habit! Ofc that makes more sense :)
| ## Alternatives | ||
|
|
||
| - just provide a GRPC plugin mechanism and move providers out of tree without a v2 SecretStore | ||
| - get rid of the SecretStore alltogether and directly point from a `ExternalSecret` at a `Kind=AWSSecretsManager`. |
There was a problem hiding this comment.
If we go as far as evaluating this : wouldn't this be simpler? Crds per provider, all using the same ESO go modules? Having providers in different repos with the same code structure would allow wholesale PRs. Bots can help maintain versions of providers by bumping our eso modules. A single helm chart could deploy three different providers implementations (x or y would be installed based on values).
Code completely independent in each repo, so no dependency hell. Ownership very clear in each repo. Easy to phase out a provider when some repo isn't up to our standards: we remove from helm chart and archive the repo.
Yes, there is code repeated. But it looks simpler to me from my window. I need to know why this shouldn't be feasible before having an opinion on this whole effort.
There was a problem hiding this comment.
This would be a benefit regardless if we get rid of SecretStore or not. It is a more direct approach, which means we will lose some features (like cluster-scoping, and flood gaate controls if a given store is out). I think the simplicity will be less controllers for us to maintain and merge - as in everything could be handled easily on a simple core-controller loop.
There was a problem hiding this comment.
But what is the hardest: maintain modules for n controllers and have controllers a bit more specific in some places OR a grpc system + cert-manager?
Same question for operations.
There was a problem hiding this comment.
And I don't say that because some of my customers are scared of self signed CAs...!
There was a problem hiding this comment.
But how would eso work with them?
are you thinking about brewing one eso distribution per provider?
Several orgs use more than one provider, this won’t scale if I understood it correctly x
There was a problem hiding this comment.
As in - users will now need to figure out controller classes on every single install; even simple ones.
There was a problem hiding this comment.
Plus, an ExternalSecret using two providers will be a 💫 headache
|
I missed the last community calls - maybe you've discussed this in there - did we find a decision point on the scope of this design? as in - are we encompassing dep-tree + independent versioning, or just dep-trees ? :) |
|
No, not yet. Moritz was actually asking for one. :D So we really should. :) |
|
Let’s vote then 😃 🚀 = scope for deps-tree + independent versioning ❤️ = scope for deps-tree only 👀 = abstain / any option is fine |
|
Re: my vote: I think encompassing both is way too complex to pull and even more complex to maintain. None of the options we’ve been discussing thus far come with easy trade offs 😅. Plus, while we are onboarding people, I’m still skeptical of our longer term ability to maintain this extra code / repos / charts, so I’m picking one (reps-tree reduction) over the other (independent versioning). we still need them both, but maybe a different approach without neccessaroly moving to a plugin system. e.g we could separate CRDs for providers still, keep them all in tree; and have builds that learn how to manage / control that provider. For now We’ll have big CRDs but that’s fine I guess if users can just build eso focusing on the providers they need. Security will be heavily improved. |
|
@moolen ? |
|
I think we should do a scoping session (see link): I think we need to clarify at least these three things:
|
|
superseded by #5544 |



No description provided.