Add a retroactive one-pager for rate limiting by negz · Pull Request #5415 · crossplane/crossplane

negz · 2024-02-21T06:02:42Z

Description of your changes

I spent a bunch of time these evening trying to remember how Crossplane's rate limiting works, and why it works that way. I figured I'd write it all down for next time.

I have:

Read and followed Crossplane's contribution process.
Run make reviewable to ensure this PR is ready for review.
~~Added or updated unit tests.~~
~~Added or updated e2e tests.~~
~~Linked a PR or a docs tracking issue to document this change.~~
~~Added backport release-x.y labels to auto-backport this PR.~~

Need help with this checklist? See the cheat sheet.

Signed-off-by: Nic Cope <nicc@rk0n.org>

negz · 2024-02-21T06:21:59Z

@turkenh two thoughts come to mind after writing this up.

First, there's not really a need for --max-reconcile-rate in core Crossplane. In theory I like the simplicity of the --max-reconcile-rate flag and the symmetry with providers, but unlike a provider there's no reason to globally limit all Crossplane controllers (e.g. all claims, XRs, packages, etc etc) to achieve a maximum reconcile rate across all of Crossplane, given it's only talking to the API server. At a minimum, we can probably make Crossplane's default --max-reconcile-rate a lot higher (e.g. 100).

Second, I'm not sure why the global rate limiter would affect the workqueue_depth metric as you saw in crossplane-contrib/provider-kubernetes#203. It should be re-adding anything that's rate limited back to the same work queue and thus maintaining the same depth. I would expect to to affect workqueue_duration.

turkenh · 2024-02-23T09:44:15Z

design/one-pager-rate-limiting.md

+the middleware `Reconciler`, subject to its token bucket reconciler. If there
+are sufficient tokens available in the bucket, the reconcile is passed to the
+wrapped (inner) `Reconciler` immediately. If there aren't sufficient tokens
+available, the reconcile is returned to the tail of the work queue by returning


Wouldn't it be more fair if we put it back to the head of the queue? Why do we change the ordering in the queue?
Maybe I misunderstood something but, as an analogy, I am waiting in a bank queue and when it is my turn I am told by the officer to go to the tail of the queue because he processed enough for that given hour 😅

Wouldn't it be more fair if we put it back to the head of the queue?

It would.

Why do we change the ordering in the queue?

No reason, really. 🙂 It just seems to be how all the Kubernetes work queue rate limiting works, and the topic didn't come up until now.

The queueing is all using https://pkg.go.dev/k8s.io/client-go/util/workqueue. When we return RequeueAfter I
believe that translates to an AddAfter, i.e. "add back to the end of the queue after duration D":

https://github.com/crossplane/crossplane-runtime/blob/v1.15.1/pkg/ratelimiter/reconciler.go#L51

We could potentially improve on this by using a RateLimitingQueue inside our ratelimiter.Reconciler. This way there would be two levels of workqueue:

Outer queue handles per-object exponential backoff. Requests sent to back of outer queue.

Inner queue handles global max-reconcile-rate. Requests sent to back of inner queue.

I think if we wanted to support requests being sent to the front of the queue after a certain duration we'd need to implement our own work queue.

There's talk of allowing users to supply their own work queue (not only their own rate limiter) in controller-runtime in kubernetes-sigs/controller-runtime#857. If we want to optimize as best as possible, it might be worth pursuing that upstream.

negz · 2024-02-23T22:32:58Z

@turkenh Given that this is just documenting how things work today, WDYT about approving so I can merge? We can then discuss how it could be better.

turkenh

Thanks for taking time for writing this down 🙌

Add a retroactive one-pager for rate limiting

9615979

Signed-off-by: Nic Cope <nicc@rk0n.org>

negz requested a review from turkenh February 21, 2024 06:02

negz requested a review from a team as a code owner February 21, 2024 06:02

negz requested a review from phisco February 21, 2024 06:02

negz force-pushed the the-going-rate branch 2 times, most recently from cab334e to 9615979 Compare February 21, 2024 06:08

negz mentioned this pull request Feb 22, 2024

logging: wire klog backend, but only output request throttling logs #5419

Merged

2 tasks

turkenh reviewed Feb 23, 2024

View reviewed changes

turkenh approved these changes Feb 24, 2024

View reviewed changes

negz merged commit e053b68 into crossplane:master Feb 25, 2024

negz deleted the the-going-rate branch February 25, 2024 07:33

negz mentioned this pull request Feb 26, 2024

Workqueue depth and duration metrics are not accurate crossplane/crossplane-runtime#674

Open

turkenh mentioned this pull request Feb 29, 2024

Poll more frequently when waiting for composed resources to become ready #5427

Merged

6 tasks

turkenh mentioned this pull request Mar 14, 2024

Bump default max reconcile rate and resource limits #5478

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a retroactive one-pager for rate limiting#5415

Add a retroactive one-pager for rate limiting#5415
negz merged 1 commit intocrossplane:masterfrom
negz:the-going-rate

negz commented Feb 21, 2024

Uh oh!

negz commented Feb 21, 2024 •

edited

Loading

Uh oh!

turkenh Feb 23, 2024 •

edited

Loading

Uh oh!

negz Feb 23, 2024

Uh oh!

negz commented Feb 23, 2024

Uh oh!

turkenh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

negz commented Feb 21, 2024

Description of your changes

Uh oh!

negz commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

turkenh Feb 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

negz Feb 23, 2024

Choose a reason for hiding this comment

Uh oh!

negz commented Feb 23, 2024

Uh oh!

turkenh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

negz commented Feb 21, 2024 •

edited

Loading

turkenh Feb 23, 2024 •

edited

Loading