admission: smooth compaction token calculation by bananabrick · Pull Request #104577 · cockroachdb/cockroach

bananabrick · 2023-06-08T08:04:35Z

We change the way we compute the score for compaction token
calculation, and use that score to smoothen the calculation.

The score is calculated as max(sublevels/20, l0files/1000).

Prior to this commit, we were overloaded if score was above 1 and
underloaded if score was below 1. This meant that it was easy to
have an underloaded system with unlimited tokens, and then become
overloaded due to the unlimited tokens. Note that our score calculations
also use the admission control overload threshold of 20, and 1000.

In this pr, we the keep the previous definition of score, but we're only
overloaded if score >= 1. We can consider score < 0.5 as underload, and score
in [0.5, 1) as medium load.

During underload, we still give out unlimited tokens, and during
overload we still give out C = smoothedIntL0CompactedBytes / 2 tokens.

But during medium load we hand out (C/2, C] tokens. This scheme will
ideally remove the sharp spikes in the granted tokens due to switching
back and forth between overloaded and underloaded systems.

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-25469
Fixes: #91519
Release note: None

cockroach-teamcity · 2023-06-08T08:04:47Z

This change is

sumeerbhola

Can you include some graphs from your experimental results in the PR description (and the experiment setup).

Reviewed 5 of 5 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bananabrick)

pkg/util/admission/grant_coordinator.go line 54 at r2 (raw file):

	settings                             *cluster.Settings
	makeStoreRequesterFunc               makeStoreRequesterFunc
	kvIOTokensExhaustedDuration          *metric.Counter

we should add a TODO to make all these metrics per store

pkg/util/admission/granter.go line 312 at r2 (raw file):

	ioTokensExhaustedDurationMetric *metric.Counter
	availableTokensMetrics          *metric.Gauge
	tookWithoutPermissionMetric     *metric.Counter

the total tokens given out (with or without permission) may also be helpful if we are trying to compare whether the without permission rate is significant.

pkg/util/admission/granter.go line 394 at r2 (raw file):

		if sg.coordMu.availableIOTokens > 0 {
			sg.subtractTokensLocked(count, false)
			sg.availableTokensMetrics.Update(sg.coordMu.availableIOTokens)

We should update the metric where we update availableIOTokens, i.e., in subtractTokensLocked and in setAvailableTokens. Otherwise it is easy to forget some caller.

pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):

	// and less smoothed behaviour than multiplying the score by 2, and checking
	// if the score is < 1, or >= 2, and performing linear interpolation for the
	// range [1, 2).

I don't quite believe this. There must be a bug if that was the behavior that occurred. I don't mind this current code since it the interval [1, 2) is easy to reason about in the equations below. So maybe just say that.

pkg/util/admission/io_load_listener.go line 847 at r3 (raw file):

			// since we want to get under the thresholds over time. This
			// scaling could be adjusted based on how much above the threshold we are,
			// but for now we just use a constant.

the last sentence "... could be adjusted ..." is what this PR is doing, so we should drop the sentence.

irfansharif

Could you backport the additional metrics to 22.2 and 23.1? They’re very valuable. +1 to adding a metric for L0 tokens produced and including in backport (feel free to do that separately too, or take advantage of the LGTM here).

Reviewed 3 of 5 files at r2, 3 of 4 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)

-- commits line 24 at r3:
s/pr/commit. Nit: "we the keep the previous definition of score, but we're only overloaded if score >= 1" makes it seem that previously we had a different definition of overload, but we didn't. We're instead introducing medium load, and re-defining underload.

-- commits line 31 at r3:
This could be more specific in mentioning the exact piecewise linear function we're using, or that it's a linear interpolation from X to Y.

pkg/util/admission/grant_coordinator.go line 991 at r2 (raw file):

	KVSlotAdjusterDecrements             *metric.Counter
	KVIOTokensExhaustedDuration          *metric.Counter
	KVIONumIOTokensTookWithoutPermission *metric.Counter

[Minor nit]: "Num" is unnecessary, here and elsewhere.

pkg/util/admission/granter.go line 394 at r2 (raw file):

Previously, sumeerbhola wrote…

We should update the metric where we update availableIOTokens, i.e., in subtractTokensLocked and in setAvailableTokens. Otherwise it is easy to forget some caller.

+1. There's one other callsite below where we invoke subtractTokensLocked where we're not updating this metric (was that intentional? if so, why?)

pkg/util/admission/io_load_listener.go line 541 at r3 (raw file):

	l0WriteLM, l0IngestLM, ingestLM := io.perWorkTokenEstimator.getModelsAtDone()
	io.kvGranter.setLinearModels(l0WriteLM, l0IngestLM, ingestLM)
	if score, _ := io.ioThreshold.Score(); score >= 0.5 || io.aux.doLogFlush ||

This logging is very useful. Could you also add a log.V(1) condition so we can enable it selectively even on unloaded clusters through vmodule?

pkg/util/admission/io_load_listener.go line 829 at r3 (raw file):

	score *= 2
	if score < 1 {
		// Under the threshold. Maintain a smoothedCompactionByteTOkens based on

Accidental capitalization of "O" in smoothedCompactionByteTOkens.

pkg/util/admission/io_load_listener.go line 836 at r3 (raw file):

		numTokens := intL0CompactedBytes
		// Smooth it. This may seem peculiar since we are already using
		// smoothedIntL0CompactedBytes, but the clauses below uses different

Nit: "clauses below use different"

pkg/util/admission/io_load_listener.go line 850 at r3 (raw file):

			fTotalNumByteTokens = float64(smoothedIntL0CompactedBytes / 2.0)
		} else {
			// score in [1, 2).

This could use a comment block explaining why we're using the piece-wise linear function. You could also annotate this else block with the "medium load" naming convention you used in the commit message. Ditto above for "underload" and "overload".

pkg/util/admission/testdata/format_adjust_tokens_stats.txt line 4 at r3 (raw file):

----
zero:
compaction score 0.000 (0 ssts, 0 sub-levels), L0 growth 0 B (write 0 B ingest 0 B ignored 0 B): requests 0 (0 bypassed) with 0 B acc-write (0 B bypassed) + 0 B acc-ingest (0 B bypassed) + write-model 0.00x+0 B (smoothed 0.00x+0 B) + ingested-model 0.00x+0 B (smoothed 0.00x+0 B) + at-admission-tokens 0 B, compacted 0 B [≈0 B], flushed 0 B [≈0 B]; admitting all; elastic tokens 0 B (used 0 B, regular used 0 B): write model 0.00x+0 B ingest model 0.00x+0 B, disk bw read 0 B write 0 B provisioned 0 B; write stalls 12

Why is this non-zero for the zero state this test wants to print?

pkg/util/admission/testdata/io_load_listener line 575 at r3 (raw file):

setAvailableTokens: io-tokens=unlimited elastic-disk-bw-tokens=1 max-byte-tokens=unlimited max-disk-bw-tokens=105

# Test scoring and interpolation for compaction token smoothing.

This is not for this PR, but I continue to find tests in this file difficult to read. There's so much state, the lines are very long (so not very friendly reading it in IDEs, with or without text wrapping), and every set-state directive sets all state so it's hard to visually point out exactly which one was changed line by line. Do you have any ideas on for how we could improve it? I just glanced at these diffs.

bananabrick · 2023-06-16T22:36:47Z

Flushing some results from the experimentation. I ran kv0/enc=false/nodes=3/cpu=32/size=4kb which overloads IO. I ran these on n1-standard-32 gce machines.

Adding some graphs showing the smoothing behaviour(left is with smoothing, right is master):

We can see smoothing even in the CPU utilization as we avoid bursts which were common in the previous scheme.

The tokens taken without permission/exhaustion durations are slightly higher in this pr. This shouldn't be a concern in the single node version of this experiment/with flow control enabled. I'll probably run the single node experiment before I merge this.

The ops vs sec is slightly lower with smoothing, around 6900 ops/sec vs 7200 ops/sec on master. I think this is expected as we're throttling compaction tokens starting from a sublevel count of 10 rather than 20.

bananabrick

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @sumeerbhola)

-- commits line 24 at r3:

Previously, irfansharif (irfan sharif) wrote…

s/pr/commit. Nit: "we the keep the previous definition of score, but we're only overloaded if score >= 1" makes it seem that previously we had a different definition of overload, but we didn't. We're instead introducing medium load, and re-defining underload.

Done.

-- commits line 31 at r3:

Previously, irfansharif (irfan sharif) wrote…

This could be more specific in mentioning the exact piecewise linear function we're using, or that it's a linear interpolation from X to Y.

Done.

pkg/util/admission/grant_coordinator.go line 54 at r2 (raw file):

Previously, sumeerbhola wrote…

we should add a TODO to make all these metrics per store

Done.

pkg/util/admission/grant_coordinator.go line 991 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[Minor nit]: "Num" is unnecessary, here and elsewhere.

Done.

pkg/util/admission/granter.go line 312 at r2 (raw file):

Previously, sumeerbhola wrote…

the total tokens given out (with or without permission) may also be helpful if we are trying to compare whether the without permission rate is significant.

Will add in the next pr.

pkg/util/admission/granter.go line 394 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

+1. There's one other callsite below where we invoke subtractTokensLocked where we're not updating this metric (was that intentional? if so, why?)

Oversight.

But I didn't put this in subtractTokensLocked on purpose. In setAvailableTokens we call subtractTokensLocked, and then we might further restrict the available tokens, and I didn't want this to be updated twice.

But I guess updating twice is fine, so I moved this in.

pkg/util/admission/io_load_listener.go line 541 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This logging is very useful. Could you also add a log.V(1) condition so we can enable it selectively even on unloaded clusters through vmodule?

Nice. This is once every 15 seconds anyway, we just always log this. Added log.V(1) for now.

pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):

Previously, sumeerbhola wrote…

I don't quite believe this. There must be a bug if that was the behavior that occurred. I don't mind this current code since it the interval [1, 2) is easy to reason about in the equations below. So maybe just say that.

I ran this multiple times with another pr, where the only diff was that we don't multiply the score by 2, and we divide the thresholds beneath by 0.5. Fairly certain this is what was going on, which is why I think we should keep this.

I'm going to run it once more to verify.

pkg/util/admission/io_load_listener.go line 829 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Accidental capitalization of "O" in smoothedCompactionByteTOkens.

Done.

pkg/util/admission/io_load_listener.go line 847 at r3 (raw file):

Previously, sumeerbhola wrote…

the last sentence "... could be adjusted ..." is what this PR is doing, so we should drop the sentence.

Done.

pkg/util/admission/io_load_listener.go line 850 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This could use a comment block explaining why we're using the piece-wise linear function. You could also annotate this else block with the "medium load" naming convention you used in the commit message. Ditto above for "underload" and "overload".

Done.

pkg/util/admission/testdata/format_adjust_tokens_stats.txt line 4 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why is this non-zero for the zero state this test wants to print?

The test just calls adjustTokensInner with a write stall count of 12. The previous for the zero case is 0.

pkg/util/admission/testdata/io_load_listener line 575 at r3 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This is not for this PR, but I continue to find tests in this file difficult to read. There's so much state, the lines are very long (so not very friendly reading it in IDEs, with or without text wrapping), and every set-state directive sets all state so it's hard to visually point out exactly which one was changed line by line. Do you have any ideas on for how we could improve it? I just glanced at these diffs.

I think this test prints out a bunch of state, but when I was writing the tests here, I only cared about three variables. We could probably modify this to only selectively print the variables we want.

Not sure if that was true for the other tests here.

This commit adds metrics for available io tokens, and tokens taken without permission in the kvStoreTokenGranter. Epic: none Release note: None

We change the way we compute the score for compaction token calculation, and use that score to smoothen the calculation. The score is calculated as max(sublevels/20, l0files/1000). Prior to this commit, we were overloaded if score was above 1 and underloaded if score was below 1. This meant that it was easy to have an underloaded system with unlimited tokens, and then become overloaded due to the unlimited tokens. Note that our score calculations also use the admission control overload threshold of 20, and 1000. In this pr, we the keep the previous definition of score and overload, but we introduce a medium load for scores in the range [0.5, 1) and underload for scores < 0.5. During underload, we still give out unlimited tokens, and during overload we still give out C = smoothedIntL0CompactedBytes / 2 tokens. But during medium load we hand out (C/2, C] tokens. This scheme will ideally remove the sharp spikes in the granted tokens due to switching back and forth between overloaded and underloaded systems. The precise piecewise function is 3 * C / 2 - score * C / 2. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-25469 Fixes: #91519 Release note: None

bananabrick

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)

pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):

Previously, bananabrick (Arjun Nair) wrote…

I ran this multiple times with another pr, where the only diff was that we don't multiply the score by 2, and we divide the thresholds beneath by 0.5. Fairly certain this is what was going on, which is why I think we should keep this.

I'm going to run it once more to verify.

Hmm, rather than reproducing this, which would take forever, I wrote a go script to compare the differences.

package main

import "fmt"

func main() {
	diff := 0
	for i := 0; i < 21; i++ {
		var score1 float64
		var score2 float64
		score1 = float64(i) / float64(20)
		score2 = score1 * 2
		fmt.Println(score1, score2)
		if score1 < 0.5 && score2 >= 1 {
			diff++
		} else if score1 >= 0.5 && score1 < 1 && (score2 < 1 || score2 >= 2) {
			diff++
		} else if score1 >= 1 && score2 < 2 {
			diff++
		}
	}
	fmt.Println(diff)
}

and it printed out no differences at all. Maybe I did have a bug. Okay, I removed that comment.

bananabrick

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)

pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):

Previously, bananabrick (Arjun Nair) wrote…

Hmm, rather than reproducing this, which would take forever, I wrote a go script to compare the differences.
package main

import "fmt"

func main() {
	diff := 0
	for i := 0; i < 21; i++ {
		var score1 float64
		var score2 float64
		score1 = float64(i) / float64(20)
		score2 = score1 * 2
		fmt.Println(score1, score2)
		if score1 < 0.5 && score2 >= 1 {
			diff++
		} else if score1 >= 0.5 && score1 < 1 && (score2 < 1 || score2 >= 2) {
			diff++
		} else if score1 >= 1 && score2 < 2 {
			diff++
		}
	}
	fmt.Println(diff)
}
and it printed out no differences at all. Maybe I did have a bug. Okay, I removed that comment.

Oh this is silly, but I just realized that I was changing the score range to [0.5, 1), but I didn't recompute the linear function based on that range.

bananabrick · 2023-06-17T00:37:50Z

The github CI lint failures are also happening on commits merged into master, and bazel ci passed. Will merge this now.

bananabrick · 2023-06-17T01:10:09Z

bors r+ single on

craig · 2023-06-17T02:25:41Z

Build succeeded:

Bazel Essential CI (Cockroach)

cockroachdb deleted a comment from blathers-crl bot Jun 8, 2023

bananabrick mentioned this pull request Jun 9, 2023

admission: ioLoadListener compaction token calculation is too abrupt #91519

Closed

bananabrick marked this pull request as ready for review June 15, 2023 20:38

bananabrick requested a review from a team as a code owner June 15, 2023 20:39

sumeerbhola approved these changes Jun 16, 2023

View reviewed changes

irfansharif approved these changes Jun 16, 2023

View reviewed changes

bananabrick commented Jun 16, 2023

View reviewed changes

bananabrick added 2 commits June 16, 2023 19:36

admission: add metrics for tokens

62d8375

This commit adds metrics for available io tokens, and tokens taken without permission in the kvStoreTokenGranter. Epic: none Release note: None

bananabrick commented Jun 16, 2023

View reviewed changes

bananabrick commented Jun 17, 2023

View reviewed changes

craig bot merged commit c56e835 into cockroachdb:master Jun 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admission: smooth compaction token calculation#104577

admission: smooth compaction token calculation#104577
craig[bot] merged 2 commits intocockroachdb:masterfrom
bananabrick:io_load_smooth

bananabrick commented Jun 8, 2023 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jun 8, 2023

Uh oh!

sumeerbhola left a comment

Uh oh!

irfansharif left a comment

Uh oh!

bananabrick commented Jun 16, 2023 •

edited

Loading

Uh oh!

bananabrick left a comment

Uh oh!

bananabrick left a comment

Uh oh!

bananabrick left a comment

Uh oh!

bananabrick commented Jun 17, 2023

Uh oh!

bananabrick commented Jun 17, 2023

Uh oh!

craig bot commented Jun 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bananabrick commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jun 8, 2023

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

irfansharif left a comment

Choose a reason for hiding this comment

Uh oh!

bananabrick commented Jun 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bananabrick left a comment

Choose a reason for hiding this comment

Uh oh!

bananabrick left a comment

Choose a reason for hiding this comment

Uh oh!

bananabrick left a comment

Choose a reason for hiding this comment

Uh oh!

bananabrick commented Jun 17, 2023

Uh oh!

bananabrick commented Jun 17, 2023

Uh oh!

craig bot commented Jun 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bananabrick commented Jun 8, 2023 •

edited

Loading

bananabrick commented Jun 16, 2023 •

edited

Loading