Skip to content

tenantcostclient: smooth over debt payback#71149

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:token-bucket-debt
Oct 8, 2021
Merged

tenantcostclient: smooth over debt payback#71149
craig[bot] merged 1 commit intocockroachdb:masterfrom
RaduBerinde:token-bucket-debt

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Oct 5, 2021

Prior to this commit, the tenant's local token bucket stalls all
traffic whenever it incurs debt. This is a problem because some debt
comes in discrete "packets" (CPU usage is accounted for once a
second), which can lead to large occasional latency spikes.

This change improves the debt repayment mechanism: instead of paying
the debt as soon as possible, we aim to pay the debt within 2 seconds
of when the debt is incurred and earmark part of the rate for debt
repayment; the remaining rate can be used by requests in the mean
time. The comment in token_bucket.go goes over more details.

Release note: None

Release justification: Necessary fix for the distributed rate limiting
functionality, which is vital for the upcoming Serverless MVP release.
It allows CRDB to throttle clusters that have run out of free or paid
request units (which measure CPU and I/O usage). This functionality is
only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.

Plots for latency data from the debt test

Before:
image

After:

image

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised you don't have any data-driven tests here. Are those not useful for this new case?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @RaduBerinde)


pkg/ccl/multitenantccl/tenantcostclient/token_bucket.go, line 48 at r1 (raw file):

// This rate is recalculated every time we incur debt. So in essence, every time
// we incur debt, we put it together with all existing debt and plan to pay it
// within time T (we can think of this as "refinancing" the existing debt).

😂 We should charge an interest rate.


pkg/ccl/multitenantccl/tenantcostclient/token_bucket.go, line 172 at r1 (raw file):

// increasing or decreasing them.
//
// A positive amount indicates

NIT: This comment needs fixing up.

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datadriven tests are very hard to set up. I added one now - it took me ~1hr to get it right :)
TestTokenBucketTryToFulfill was much more useful.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andy-kimball)

Copy link
Copy Markdown
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @andy-kimball)

Copy link
Copy Markdown
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @andy-kimball)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 7, 2021

Build failed:

@RaduBerinde
Copy link
Copy Markdown
Member Author

The new datadriven test flaked (even though I tested it with both stress and stressrace..)

Prior to this commit, the tenant's local token bucket stalls all
traffic whenever it incurs debt. This is a problem because some debt
comes in discrete "packets" (CPU usage is accounted for once a
second), which can lead to large occasional latency spikes.

This change improves the debt repayment mechanism: instead of paying
the debt as soon as possible, we aim to pay the debt within 2 seconds
of when the debt is incurred and earmark part of the rate for debt
repayment; the remaining rate can be used by requests in the mean
time. The comment in `token_bucket.go` goes over more details.

Release note: None

Release justification: Necessary fix for the distributed rate limiting
functionality, which is vital for the upcoming Serverless MVP release.
It allows CRDB to throttle clusters that have run out of free or paid
request units (which measure CPU and I/O usage). This functionality is
only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.
@RaduBerinde
Copy link
Copy Markdown
Member Author

Ah, the problem was that I wasn't rebased on top of the latest cost model changes. Fixed, will try again after CI runs.

@RaduBerinde
Copy link
Copy Markdown
Member Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Oct 8, 2021

Build succeeded:

@craig craig bot merged commit 7e4ba61 into cockroachdb:master Oct 8, 2021
@RaduBerinde RaduBerinde deleted the token-bucket-debt branch October 12, 2021 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants