admission: smooth compaction token calculation#104577
admission: smooth compaction token calculation#104577craig[bot] merged 2 commits intocockroachdb:masterfrom bananabrick:io_load_smooth
Conversation
sumeerbhola
left a comment
There was a problem hiding this comment.
Can you include some graphs from your experimental results in the PR description (and the experiment setup).
Reviewed 5 of 5 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @bananabrick)
pkg/util/admission/grant_coordinator.go line 54 at r2 (raw file):
settings *cluster.Settings makeStoreRequesterFunc makeStoreRequesterFunc kvIOTokensExhaustedDuration *metric.Counter
we should add a TODO to make all these metrics per store
pkg/util/admission/granter.go line 312 at r2 (raw file):
ioTokensExhaustedDurationMetric *metric.Counter availableTokensMetrics *metric.Gauge tookWithoutPermissionMetric *metric.Counter
the total tokens given out (with or without permission) may also be helpful if we are trying to compare whether the without permission rate is significant.
pkg/util/admission/granter.go line 394 at r2 (raw file):
if sg.coordMu.availableIOTokens > 0 { sg.subtractTokensLocked(count, false) sg.availableTokensMetrics.Update(sg.coordMu.availableIOTokens)
We should update the metric where we update availableIOTokens, i.e., in subtractTokensLocked and in setAvailableTokens. Otherwise it is easy to forget some caller.
pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):
// and less smoothed behaviour than multiplying the score by 2, and checking // if the score is < 1, or >= 2, and performing linear interpolation for the // range [1, 2).
I don't quite believe this. There must be a bug if that was the behavior that occurred. I don't mind this current code since it the interval [1, 2) is easy to reason about in the equations below. So maybe just say that.
pkg/util/admission/io_load_listener.go line 847 at r3 (raw file):
// since we want to get under the thresholds over time. This // scaling could be adjusted based on how much above the threshold we are, // but for now we just use a constant.
the last sentence "... could be adjusted ..." is what this PR is doing, so we should drop the sentence.
irfansharif
left a comment
There was a problem hiding this comment.
Could you backport the additional metrics to 22.2 and 23.1? They’re very valuable. +1 to adding a metric for L0 tokens produced and including in backport (feel free to do that separately too, or take advantage of the LGTM here).
Reviewed 3 of 5 files at r2, 3 of 4 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @bananabrick and @sumeerbhola)
-- commits line 24 at r3:
s/pr/commit. Nit: "we the keep the previous definition of score, but we're only overloaded if score >= 1" makes it seem that previously we had a different definition of overload, but we didn't. We're instead introducing medium load, and re-defining underload.
-- commits line 31 at r3:
This could be more specific in mentioning the exact piecewise linear function we're using, or that it's a linear interpolation from X to Y.
pkg/util/admission/grant_coordinator.go line 991 at r2 (raw file):
KVSlotAdjusterDecrements *metric.Counter KVIOTokensExhaustedDuration *metric.Counter KVIONumIOTokensTookWithoutPermission *metric.Counter
[Minor nit]: "Num" is unnecessary, here and elsewhere.
pkg/util/admission/granter.go line 394 at r2 (raw file):
Previously, sumeerbhola wrote…
We should update the metric where we update
availableIOTokens, i.e., insubtractTokensLockedand insetAvailableTokens. Otherwise it is easy to forget some caller.
+1. There's one other callsite below where we invoke subtractTokensLocked where we're not updating this metric (was that intentional? if so, why?)
pkg/util/admission/io_load_listener.go line 541 at r3 (raw file):
l0WriteLM, l0IngestLM, ingestLM := io.perWorkTokenEstimator.getModelsAtDone() io.kvGranter.setLinearModels(l0WriteLM, l0IngestLM, ingestLM) if score, _ := io.ioThreshold.Score(); score >= 0.5 || io.aux.doLogFlush ||
This logging is very useful. Could you also add a log.V(1) condition so we can enable it selectively even on unloaded clusters through vmodule?
pkg/util/admission/io_load_listener.go line 829 at r3 (raw file):
score *= 2 if score < 1 { // Under the threshold. Maintain a smoothedCompactionByteTOkens based on
Accidental capitalization of "O" in smoothedCompactionByteTOkens.
pkg/util/admission/io_load_listener.go line 836 at r3 (raw file):
numTokens := intL0CompactedBytes // Smooth it. This may seem peculiar since we are already using // smoothedIntL0CompactedBytes, but the clauses below uses different
Nit: "clauses below use different"
pkg/util/admission/io_load_listener.go line 850 at r3 (raw file):
fTotalNumByteTokens = float64(smoothedIntL0CompactedBytes / 2.0) } else { // score in [1, 2).
This could use a comment block explaining why we're using the piece-wise linear function. You could also annotate this else block with the "medium load" naming convention you used in the commit message. Ditto above for "underload" and "overload".
pkg/util/admission/testdata/format_adjust_tokens_stats.txt line 4 at r3 (raw file):
---- zero: compaction score 0.000 (0 ssts, 0 sub-levels), L0 growth 0 B (write 0 B ingest 0 B ignored 0 B): requests 0 (0 bypassed) with 0 B acc-write (0 B bypassed) + 0 B acc-ingest (0 B bypassed) + write-model 0.00x+0 B (smoothed 0.00x+0 B) + ingested-model 0.00x+0 B (smoothed 0.00x+0 B) + at-admission-tokens 0 B, compacted 0 B [≈0 B], flushed 0 B [≈0 B]; admitting all; elastic tokens 0 B (used 0 B, regular used 0 B): write model 0.00x+0 B ingest model 0.00x+0 B, disk bw read 0 B write 0 B provisioned 0 B; write stalls 12
Why is this non-zero for the zero state this test wants to print?
pkg/util/admission/testdata/io_load_listener line 575 at r3 (raw file):
setAvailableTokens: io-tokens=unlimited elastic-disk-bw-tokens=1 max-byte-tokens=unlimited max-disk-bw-tokens=105 # Test scoring and interpolation for compaction token smoothing.
This is not for this PR, but I continue to find tests in this file difficult to read. There's so much state, the lines are very long (so not very friendly reading it in IDEs, with or without text wrapping), and every set-state directive sets all state so it's hard to visually point out exactly which one was changed line by line. Do you have any ideas on for how we could improve it? I just glanced at these diffs.
bananabrick
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @sumeerbhola)
Previously, irfansharif (irfan sharif) wrote…
s/pr/commit. Nit: "we the keep the previous definition of score, but we're only overloaded if score >= 1" makes it seem that previously we had a different definition of overload, but we didn't. We're instead introducing medium load, and re-defining underload.
Done.
Previously, irfansharif (irfan sharif) wrote…
This could be more specific in mentioning the exact piecewise linear function we're using, or that it's a linear interpolation from X to Y.
Done.
pkg/util/admission/grant_coordinator.go line 54 at r2 (raw file):
Previously, sumeerbhola wrote…
we should add a TODO to make all these metrics per store
Done.
pkg/util/admission/grant_coordinator.go line 991 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
[Minor nit]: "Num" is unnecessary, here and elsewhere.
Done.
pkg/util/admission/granter.go line 312 at r2 (raw file):
Previously, sumeerbhola wrote…
the total tokens given out (with or without permission) may also be helpful if we are trying to compare whether the without permission rate is significant.
Will add in the next pr.
pkg/util/admission/granter.go line 394 at r2 (raw file):
Previously, irfansharif (irfan sharif) wrote…
+1. There's one other callsite below where we invoke
subtractTokensLockedwhere we're not updating this metric (was that intentional? if so, why?)
Oversight.
But I didn't put this in subtractTokensLocked on purpose. In setAvailableTokens we call subtractTokensLocked, and then we might further restrict the available tokens, and I didn't want this to be updated twice.
But I guess updating twice is fine, so I moved this in.
pkg/util/admission/io_load_listener.go line 541 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This logging is very useful. Could you also add a log.V(1) condition so we can enable it selectively even on unloaded clusters through vmodule?
Nice. This is once every 15 seconds anyway, we just always log this. Added log.V(1) for now.
pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):
Previously, sumeerbhola wrote…
I don't quite believe this. There must be a bug if that was the behavior that occurred. I don't mind this current code since it the interval [1, 2) is easy to reason about in the equations below. So maybe just say that.
I ran this multiple times with another pr, where the only diff was that we don't multiply the score by 2, and we divide the thresholds beneath by 0.5. Fairly certain this is what was going on, which is why I think we should keep this.
I'm going to run it once more to verify.
pkg/util/admission/io_load_listener.go line 829 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Accidental capitalization of "O" in smoothedCompactionByteTOkens.
Done.
pkg/util/admission/io_load_listener.go line 847 at r3 (raw file):
Previously, sumeerbhola wrote…
the last sentence "... could be adjusted ..." is what this PR is doing, so we should drop the sentence.
Done.
pkg/util/admission/io_load_listener.go line 850 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This could use a comment block explaining why we're using the piece-wise linear function. You could also annotate this else block with the "medium load" naming convention you used in the commit message. Ditto above for "underload" and "overload".
Done.
pkg/util/admission/testdata/format_adjust_tokens_stats.txt line 4 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Why is this non-zero for the zero state this test wants to print?
The test just calls adjustTokensInner with a write stall count of 12. The previous for the zero case is 0.
pkg/util/admission/testdata/io_load_listener line 575 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
This is not for this PR, but I continue to find tests in this file difficult to read. There's so much state, the lines are very long (so not very friendly reading it in IDEs, with or without text wrapping), and every
set-statedirective sets all state so it's hard to visually point out exactly which one was changed line by line. Do you have any ideas on for how we could improve it? I just glanced at these diffs.
I think this test prints out a bunch of state, but when I was writing the tests here, I only cared about three variables. We could probably modify this to only selectively print the variables we want.
Not sure if that was true for the other tests here.
This commit adds metrics for available io tokens, and tokens taken without permission in the kvStoreTokenGranter. Epic: none Release note: None
We change the way we compute the score for compaction token calculation, and use that score to smoothen the calculation. The score is calculated as max(sublevels/20, l0files/1000). Prior to this commit, we were overloaded if score was above 1 and underloaded if score was below 1. This meant that it was easy to have an underloaded system with unlimited tokens, and then become overloaded due to the unlimited tokens. Note that our score calculations also use the admission control overload threshold of 20, and 1000. In this pr, we the keep the previous definition of score and overload, but we introduce a medium load for scores in the range [0.5, 1) and underload for scores < 0.5. During underload, we still give out unlimited tokens, and during overload we still give out C = smoothedIntL0CompactedBytes / 2 tokens. But during medium load we hand out (C/2, C] tokens. This scheme will ideally remove the sharp spikes in the granted tokens due to switching back and forth between overloaded and underloaded systems. The precise piecewise function is 3 * C / 2 - score * C / 2. Epic: https://cockroachlabs.atlassian.net/browse/CRDB-25469 Fixes: #91519 Release note: None
bananabrick
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)
pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
I ran this multiple times with another pr, where the only diff was that we don't multiply the score by 2, and we divide the thresholds beneath by 0.5. Fairly certain this is what was going on, which is why I think we should keep this.
I'm going to run it once more to verify.
Hmm, rather than reproducing this, which would take forever, I wrote a go script to compare the differences.
package main
import "fmt"
func main() {
diff := 0
for i := 0; i < 21; i++ {
var score1 float64
var score2 float64
score1 = float64(i) / float64(20)
score2 = score1 * 2
fmt.Println(score1, score2)
if score1 < 0.5 && score2 >= 1 {
diff++
} else if score1 >= 0.5 && score1 < 1 && (score2 < 1 || score2 >= 2) {
diff++
} else if score1 >= 1 && score2 < 2 {
diff++
}
}
fmt.Println(diff)
}
and it printed out no differences at all. Maybe I did have a bug. Okay, I removed that comment.
bananabrick
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)
pkg/util/admission/io_load_listener.go line 826 at r3 (raw file):
Previously, bananabrick (Arjun Nair) wrote…
Hmm, rather than reproducing this, which would take forever, I wrote a go script to compare the differences.
package main import "fmt" func main() { diff := 0 for i := 0; i < 21; i++ { var score1 float64 var score2 float64 score1 = float64(i) / float64(20) score2 = score1 * 2 fmt.Println(score1, score2) if score1 < 0.5 && score2 >= 1 { diff++ } else if score1 >= 0.5 && score1 < 1 && (score2 < 1 || score2 >= 2) { diff++ } else if score1 >= 1 && score2 < 2 { diff++ } } fmt.Println(diff) }and it printed out no differences at all. Maybe I did have a bug. Okay, I removed that comment.
Oh this is silly, but I just realized that I was changing the score range to [0.5, 1), but I didn't recompute the linear function based on that range.
|
The github CI lint failures are also happening on commits merged into master, and bazel ci passed. Will merge this now. |
|
bors r+ single on |
|
Build succeeded: |







We change the way we compute the score for compaction token
calculation, and use that score to smoothen the calculation.
The score is calculated as max(sublevels/20, l0files/1000).
Prior to this commit, we were overloaded if score was above 1 and
underloaded if score was below 1. This meant that it was easy to
have an underloaded system with unlimited tokens, and then become
overloaded due to the unlimited tokens. Note that our score calculations
also use the admission control overload threshold of 20, and 1000.
In this pr, we the keep the previous definition of score, but we're only
overloaded if score >= 1. We can consider score < 0.5 as underload, and score
in [0.5, 1) as medium load.
During underload, we still give out unlimited tokens, and during
overload we still give out C = smoothedIntL0CompactedBytes / 2 tokens.
But during medium load we hand out (C/2, C] tokens. This scheme will
ideally remove the sharp spikes in the granted tokens due to switching
back and forth between overloaded and underloaded systems.
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-25469
Fixes: #91519
Release note: None