grunning: add library for precise on-CPU time measurement#85977
grunning: add library for precise on-CPU time measurement#85977craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
@rickystewart: PTAL at the build tags, to make sure I've got them right. @tbg, @sumeerbhola: PTAL at everything else. Future PRs will wire this up into the elastic CPU limiter we're building towards for latency isolation for backups. |
f961c60 to
8605a33
Compare
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
package grunning
// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration
The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
=== RUN TestEquivalentGoroutines
thread=03 expected≈10.00% got= 9.98% of on-cpu time
thread=06 expected≈10.00% got=10.00% of on-cpu time
thread=02 expected≈10.00% got=10.01% of on-cpu time
thread=10 expected≈10.00% got=10.01% of on-cpu time
thread=07 expected≈10.00% got= 9.99% of on-cpu time
thread=04 expected≈10.00% got= 9.99% of on-cpu time
thread=09 expected≈10.00% got=10.00% of on-cpu time
thread=01 expected≈10.00% got= 9.99% of on-cpu time
thread=08 expected≈10.00% got=10.02% of on-cpu time
thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)
=== RUN TestProportionalGoroutines
thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x
thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x
thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x
thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x
thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x
thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)
=== RUN TestPingPongHog
pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)
Release note: None
8605a33 to
5b0982e
Compare
|
@irfansharif now's a good time to diversify away from me as a reviewer since I am sort of swamped and also likely won't be involved with admission control much. I assume Sumeer has got you covered but it's also worth thinking who else might be the third person on the squad, ideally someone from KV or an Obs team. |
|
@kvoli or @lidorcarmel: mind giving this a look from the KV side of things? The immediate use is going to be in admission control, but I know the both of you are thinking about load balancing/allocation generally, and at some level, I think this will play a part (even if indirectly through AC): #83490. |
|
Thanks! bors r+ |
|
Build succeeded: |
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):
for i := trip; i > 0; i-- { ret += i ret = ret ^ (i + 0xcafebabe)
why the different computation compared to TestEquivalentGoroutines? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.
pkg/util/grunning/enabled_test.go line 118 at r2 (raw file):
} results := make([]int64, 10)
why is TestEquivalentGoroutines using a map when a slice like in this test would suffice?
pkg/util/grunning/enabled_test.go line 133 at r2 (raw file):
total := int64(0) for _, result := range results { total += result
Does the barrier due to the WaitGroup obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?
pkg/util/grunning/enabled_test.go line 186 at r2 (raw file):
}() ping <- true // start ping-pong <-stop
The test immediately stops the CPU hog? What is the purpose of the CPU hog?
It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.
pkg/util/grunning/enabled_test.go line 201 at r2 (raw file):
for n := 0; n < b.N; n++ { _ = grunning.Time() }
what was the result of this benchmark?
irfansharif
left a comment
There was a problem hiding this comment.
Sent test changes over at #86185.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):
Previously, sumeerbhola wrote…
why the different computation compared to
TestEquivalentGoroutines? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.
The computation is just something I copied off of golang/go#36821, and did so originally when evaluating CPU accuracy of this measurement compared to profiler data (same as that golang/go issue). And yes, the only thing that matters is the different number of iterations.
pkg/util/grunning/enabled_test.go line 118 at r2 (raw file):
Previously, sumeerbhola wrote…
why is
TestEquivalentGoroutinesusing a map when a slice like in this test would suffice?
No real reasons, I was being sloppy.
pkg/util/grunning/enabled_test.go line 133 at r2 (raw file):
Previously, sumeerbhola wrote…
Does the barrier due to the
WaitGroupobviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?
It does, I was being sloppy.
pkg/util/grunning/enabled_test.go line 186 at r2 (raw file):
Previously, sumeerbhola wrote…
The test immediately stops the CPU hog? What is the purpose of the CPU hog?
It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.
The test isn't stopping the CPU hog, that line is waiting for the CPU hog to finish. I'll add a comment.
pkg/util/grunning/enabled_test.go line 201 at r2 (raw file):
Previously, sumeerbhola wrote…
what was the result of this benchmark?
$ dev bench pkg/util/grunning -f BenchmarkGRunningTime -v
goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkGRunningTime
BenchmarkGRunningTime-24 38336452 31.59 ns/op
PASS
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See ci: add scripts to build patched versions of Go runtime #84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
Release note: None