Promote realtime composition to beta - don't poll in the XR reconciler#6407
Promote realtime composition to beta - don't poll in the XR reconciler#6407negz merged 12 commits intocrossplane:mainfrom
Conversation
1c29af3 to
8a777c6
Compare
4c0c456 to
b2de309
Compare
| func WithPollInterval(interval time.Duration) ReconcilerOption { | ||
| return WithPollIntervalHook(func(_ context.Context, _ *composite.Unstructured) time.Duration { | ||
| // Jitter the poll interval +/- 10%. | ||
| return interval + time.Duration((rand.Float64()-0.5)*2*(float64(interval)*0.1)) //nolint:gosec // No need for secure randomness |
There was a problem hiding this comment.
Wouldn't this still be needed if someone needs/decides to disable the beta feature for whatever reason? I am particularly concerned about resource consumption/scalability of real time compositions and wondering if there is a way to get back to the previous configuration if something goes wrong.
There was a problem hiding this comment.
The definition reconciler still sets a poll interval when realtime compositions is disabled. The jittering happens inline - I wanted to remote pollIntervalHook as a simplification.
| o = append(o, composite.WithWatchStarter(composite.ControllerName(d.GetName()), h, r.engine)) | ||
| o = append(o, | ||
| composite.WithWatchStarter(composite.ControllerName(d.GetName()), h, r.engine), | ||
| composite.WithPollInterval(0), // Disable polling. |
There was a problem hiding this comment.
@turkenh You can see on line 551 of this file that it'll default to composite.WithPollInterval(r.options.PollInterval).
We only disable polling here inside the if r.options.Features.Enabled(features.EnableBetaRealtimeCompositions) branch.
Nothing has read this field for a long time - claims watch their XRs. Signed-off-by: Nic Cope <nicc@rk0n.org>
After this commit realtime composition is enabled by default. You can disable it by setting --enable-realtime-composition=false. When realtime composition is enabled XRs won't requeue on a poll interval by default. XRs will requeue a reconcile when one of the following happens: * The XR changes * A composed resource changes * A new CompositionRevision is created, and the XR should auto-update * Crossplane's sync interval (default 1 hour) is exceeded * The last function pipeline run's TTL is exceeded We consider a function pipeline run's TTL to be the shortest non-zero TTL of any RunFunctionResult returned by the pipeline. If the pipeline's TTL is non-zero we requeue after that TTL. This lets functions control when Crossplane will call them again. If a function needs to be called periodically it should set a TTL. A function might need to do this if it needs to periodically refresh its extra resources (Crossplane doesn't watch them) or read an input from an external system. Crossplane also reserves the right to use the TTL for caching. This isn't implemented, but when it is Crossplane may not call the function _before_ the TTL. Today many functions return a TTL of 1 minute. This is because function-sdk-go and function-sdk-python use 1 minute as the default TTL. I think we should update the SDKs to use a longer default TTL, since the current default will effectively be equivalent to polling every minute. Signed-off-by: Nic Cope <nicc@rk0n.org>
Instead use regular composition tests to test for realtime behavior. Signed-off-by: Nic Cope <nicc@rk0n.org>
We don't want the watch garbage collector to stop a watch for a composed resource prematurely. If it did, the XR controller wouldn't be called to reconcile the XR when that type of composed resource changed. Before this commit the watch garbage collector determined what watches were still needed only by listing from cache. If the cache was stale, it might've incorrectly determined a watch was no longer needed and stopped it. After this commit, the watch garbage collector double checks by making direct calls to the API server to determine what watches are still needed before it stops any. The race still exists, but it's much smaller now. A watch would have to become needed between when Crossplane lists from the API server and when it calls StopWatches. Signed-off-by: Nic Cope <nicc@rk0n.org>
This'll let folks monitor things like: * How many XR controllers are running * How many watches are running * Whether an XR controller is flapping * Whether an XR controller is watching its XR and CompRev * How many types of composed resource an XR controller is watching Signed-off-by: Nic Cope <nicc@rk0n.org>
It's functionally identical (no pun intended) to the variant that uses functions, but the native version has a nasty bug. When it patches the MR it removes the MR's finalizer, which causes the XR and MR controllers to fight. This is bad when realtime compositions are enabled. We could potentially fix the native P&T variant (e.g. by patching in the finalizer) but I don't think it's worth it. Apart from this very specific scenario where we artificially set a finalizer to block the delete, the composition implementation shouldn't matter. Given native P&T is deprecated let's drop it and save some time in E2E runs. Signed-off-by: Nic Cope <nicc@rk0n.org>
It's a beta feature now - on by default. Signed-off-by: Nic Cope <nicc@rk0n.org>
Detect whether a revision is compatible earlier and more often, and return early. Signed-off-by: Nic Cope <nicc@rk0n.org>
30 seconds seems mostly okay, but sometimes it's taking ~45 in CI. Signed-off-by: Nic Cope <nicc@rk0n.org>
Just a drive-by nit. The convention in this struct is for XR scoped stuff to be top-level, e.g. ConnectionDetails, events, etc. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Fixing some places where I missed saying that in the feature flag. Signed-off-by: Nic Cope <nicc@rk0n.org>
|
This now appears to be failing E2E 100% of the time due to #6413. Looking into it. Edit: For some reason this is failing just as much on main right now. Still looking into it, but I doubt it's related to this PR so I'm going to merge it. |
|
Added this to the draft release notes for v1.20 Crossplane no longer reconciles XRs on a regular poll interval. The
|
Description of your changes
Fixes #4828
Fixes #4316
Fixes #6408
See also recent realtime composition bug fixes in #6395.
After this commit realtime composition is enabled by default. You can disable it by setting
--enable-realtime-composition=false.When realtime composition is enabled XRs won't requeue on a poll interval by default.
XRs will requeue a reconcile when one of the following happens:
We consider a function pipeline run's TTL to be the shortest non-zero TTL of any
RunFunctionResultreturned by the pipeline. If the pipeline's TTL is non-zero we requeue after that TTL.This lets functions control when Crossplane will call them again.
If a function needs to be called periodically it should set a TTL. A function might need to do this if it needs to periodically refresh its extra resources (Crossplane doesn't watch them) or read an input from an external system.
Crossplane also reserves the right to use the TTL for caching. This isn't implemented, but when it is Crossplane may not call the function before the TTL.
Today many functions return a TTL of 1 minute. This is because function-sdk-go and function-sdk-python use 1 minute as the default TTL (see
response.goandresponse.py).I think we should update the SDKs to use a longer default TTL, since the current default will effectively be equivalent to polling every minute.
I don't think this is really a breaking behavior change, but it is a significant one. I've added the scary label so we remember to include a prominent release note about the change.
This PR also adds a few basic metrics for the controller engine:
I have:
earthly +reviewableto ensure this PR is ready for review.Addedbackport release-x.ylabels to auto-backport this PR.Followed the API promotion workflow if this PR introduces, removes, or promotes an API.Need help with this checklist? See the cheat sheet.