Use a single cache for all dynamic controllers (i.e. XRs and claims)#5651
Use a single cache for all dynamic controllers (i.e. XRs and claims)#5651negz merged 10 commits intocrossplane:masterfrom
Conversation
f2f7122 to
ab57920
Compare
|
CC @sttts - this makes big changes to the realtime compositions implementation. |
c590da9 to
0ba7db9
Compare
|
Looks like the E2E tests are passing pretty reliably now. The only thing blocking moving this out of draft is writing some more unit tests. |
I am working on implementing watches in provider-kubernetes, and the PR is based on the previous real-time composition implementation, which is being refactored here. While the provider-kubernetes use case is more straightforward (no controller start/stop at runtime), I believe the new structure is more usable with the unified/shared cache then the previous implementation. When I check the functionality introduced here, I have a feeling like I could leverage things like |
@turkenh How would you feel about duplicating them to start with? We could then consolidate later. I'm hopeful we can get the tracking cache and stoppable source functionality added upstream to controller-runtime. If that happens I could imagine a world where:
|
|
@turkenh Thanks for approving! I can't proceed until crossplane/crossplane-runtime#689 is also approved. Could you take a look? It's pretty much just deleting code. |
a8da7ab to
24b03c1
Compare
This just moved the files, unedited, as of the below commit. crossplane/crossplane-runtime@8641eb2 Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
This bumps crossplane-runtime, controller-runtime, and k8s.io dependencies to latest. Per the below PR, the latest crossplane-runtime doesn't have the controller engine anymore. It moved into c/c. crossplane/crossplane-runtime#689 Signed-off-by: Nic Cope <nicc@rk0n.org>
Updating our controller-runtime and Kubernetes dependencies bumped our minimum Go version to v1.22. That in turn enables some new linters, since we no longer need to copy range vars in Go v1.22. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
It started as a function, but now we pass several arguments that are all fields of the Reconciler. It's only called once, by the Reconciler. Making it a method shortens the function signature, and makes it clear which things change on each reconcile and which are fixed. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Crossplane uses a controller engine to dynamically start claim and XR controllers when a new XRD is installed. Before this commit, each controller gets at least one cache. This is because when I built this functionality, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches. Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26). kubernetes-sigs/controller-runtime#2285 kubernetes-sigs/controller-runtime#2046 This commit uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Compared to the current implementation, this commit: * Takes fewer global locks when realtime compositions are enabled. Locking is now mostly at the controller scope. * Works with the breaking changes to source.Source introduced in controller-runtime v0.18. :) I think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine, but that's pretty subjective. Signed-off-by: Nic Cope <nicc@rk0n.org>
| // RemoveInformer removes an informer entry and stops it if it was running. | ||
| // | ||
| // Removing an informer marks the informer as inactive. | ||
| func (c *InformerTrackingCache) RemoveInformer(ctx context.Context, obj client.Object) error { |
There was a problem hiding this comment.
thought: a smoke (unit) test might help to detect when the cache.Cache interface changes. The overridden methods are must be exhaustive for a given gvk. So there is risk that this breaks and will be hard to debug.
There was a problem hiding this comment.
Another approach that could work would be to stop embedding cache.Cache. Instead wrap one and implement all of its methods. That way we'll stop satisfying the interface if it changes. Will follow up in another PR.
| func NewFunctionComposer(kube client.Client, r FunctionRunner, o ...FunctionComposerOption) *FunctionComposer { | ||
| // TODO(negz): Can we avoid double-wrapping if the supplied client is | ||
| // already wrapped? Or just do away with unstructured.NewClient completely? | ||
| kube = unstructured.NewClient(kube) |
| func NewPTComposer(kube client.Client, o ...PTComposerOption) *PTComposer { | ||
| // TODO(negz): Can we avoid double-wrapping if the supplied client is | ||
| // already wrapped? Or just do away with unstructured.NewClient completely? | ||
| kube = unstructured.NewClient(kube) |
Description of your changes
Fixes #5338
Fixes #2645
Closes #5463
Closes #5468
Closes #5422
It's possible this will fix #5400, #5151, #5533, and #5228 given that it makes large changes to how realtime compositions work. I'd rather not assume though - we can see if folks can reproduce if/when this PR is merged.
Crossplane uses a "controller engine" to dynamically start claim and XR controllers when a new XRD is installed, and stop them when the XRD is uninstalled.
Before this PR, each controller gets at least one cache. This is because when I built the controller engine, you couldn't stop a single informer within a cache (a cache is basically a map of informers by GVK). Instead, we stop entire caches.
When realtime composition is enabled, there are even more caches. One per composed resource GVK. A GVK routed cache routes cache lookups to these various delegate caches.
Meanwhile, controller-runtime recently made it possible to stop an informer within a cache. It's also been possible to remove an event handler from an informer for some time (since Kubernetes 1.26):
This PR uses a single client, backed by a single cache, across all dynamic controllers (specifically the definition, offered, claim, and XR controllers). Each controller tracks what watches ("sources" in controller-runtime) it has active and stops them when it doesn't need them anymore. The controller engine also garbage collects custom resource informers when their CRDs are deleted.
Compared to the current implementation, this:
I also think this makes the realtime composition code a little easier to follow by consolodating it into the ControllerEngine (and into one repo), but that's pretty subjective.
Here's a questionable Mermaid diagram of how it works:
It's conceptually pretty similar to a typical set of controller-runtime controllers run by a
controller.Manager, but with the ability to stop watches and informers.Reviewer Note
This PR depends on crossplane/crossplane-runtime#689.
Unfortunately, this required bumping a bunch of dependencies, mostly to get the latest controller-runtime. Those dependency bumps in turn required a bunch of other little fixes (CRD generation issues, new linter warnings, etc).
I also opted to move the ControllerEngine implementation from c/cr to c/c. I think that will make changes like this easier in future.
While this all makes this PR pretty huge, it's all broken out into in separate commits. The last commit in the PR contains the bulk of the changes - reviewers can focus on that.
I'm opening this as draft since it's a big change. I still need to add more tests for the new code (e.g. the new engine implementation).
I have:
make reviewableto ensure this PR is ready for review.Added or updated e2e tests.Linked a PR or a docs tracking issue to document this change.Addedbackport release-x.ylabels to auto-backport this PR.Need help with this checklist? See the cheat sheet.