multitenant: implement a fallback rate#70163
Conversation
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
ajwerner
left a comment
There was a problem hiding this comment.
If I had a nit it's that I don't like the word backup as much as I like fallback. How come you chose backup, which very much is a term in cockroach, and, probably something users can initiate?
Reviewed 1 of 5 files at r1, 9 of 14 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
pkg/ccl/multitenantccl/tenantcostclient/tenant_side.go, line 366 at r2 (raw file):
ctx context.Context, resp *roachpb.TokenBucketResponse, ) { if log.V(1) {
drive-by: this should be log.ExpensiveLogEnabled(ctx, 1)
|
Yeah I agree. I thought of "fallback" after I wrote most of the code. I will rename. |
e135a8b to
52e88c8
Compare
|
I fell back to the backup word |
ajwerner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r1, 12 of 12 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)
This change adjusts the tenant cost controller logic to try to maintain a "buffer" of 5000 RUs. This is useful to prevent waiting for more RUs if an otherwise lightly loaded pod suddenly gets a spike of traffic. Release note: None
This change implements a fallback throttling rate that a SQL pod can use if it stops being able to complete token bucket requests. The goal is keep tenants without burst RUs throttled and tenants with lots of RUs unthrottled (or throttled at a high rate). To achieve this, we calculate a rate at which the tenant would burn through all their available RUs within 1 hour. The premise here is that if we have some kind of infrastructure problem, 1 hour is a reasonable time frame to address it. Beyond 1 hour, the tenant will continue at the same rate, consuming more RUs than they had available. Informs cockroachdb#68479. Release note: None Release justification: Necessary fix for the distributed rate limiting functionality, which is vital for the upcoming Serverless MVP release. It allows CRDB to throttle clusters that have run out of free or paid request units (which measure CPU and I/O usage). This functionality is only enabled in multi-tenant scenarios and should have no impact on our dedicated customers.
52e88c8 to
56f94dd
Compare
|
bors r+ |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 56f94dd to blathers/backport-release-21.2-70163: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
tenantcostclient: maintain a "buffer" of RUs
This change adjusts the tenant cost controller logic to try to
maintain a "buffer" of 5000 RUs. This is useful to prevent waiting for
more RUs if an otherwise lightly loaded pod suddenly gets a spike of
traffic.
Release note: None
multitenant: implement a "backup" rate
This change implements a "backup" (fallback) throttling rate that a
SQL pod can use if it stops being able to complete token bucket
requests.
The goal is keep tenants without burst RUs throttled and tenants with
lots of RUs unthrottled (or throttled at a high rate). To achieve
this, we calculate a rate at which the tenant would burn through all
their available RUs within 1 hour. The premise here is that if we have
some kind of infrastructure problem, 1 hour is a reasonable time frame
to address it. Beyond 1 hour, the tenant will continue at the same
rate, consuming more RUs than they had available.
Informs #68479.
Release note: None
Release justification: Necessary fix for the distributed rate limiting
functionality, which is vital for the upcoming Serverless MVP release.
It allows CRDB to throttle clusters that have run out of free or paid
request units (which measure CPU and I/O usage). This functionality is
only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.