grunning: add library for precise on-CPU time measurement by irfansharif · Pull Request #85977 · cockroachdb/cockroach

irfansharif · 2022-08-11T18:15:17Z

Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

package grunning

// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration

The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:

FreeBSD (we don't have cross-compilers setup), and
M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See ci: add scripts to build patched versions of Go runtime #84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

=== RUN   TestEquivalentGoroutines
    thread=03 expected≈10.00% got= 9.98% of on-cpu time
    thread=06 expected≈10.00% got=10.00% of on-cpu time
    thread=02 expected≈10.00% got=10.01% of on-cpu time
    thread=10 expected≈10.00% got=10.01% of on-cpu time
    thread=07 expected≈10.00% got= 9.99% of on-cpu time
    thread=04 expected≈10.00% got= 9.99% of on-cpu time
    thread=09 expected≈10.00% got=10.00% of on-cpu time
    thread=01 expected≈10.00% got= 9.99% of on-cpu time
    thread=08 expected≈10.00% got=10.02% of on-cpu time
    thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)

=== RUN   TestProportionalGoroutines
    thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
    thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
    thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
    thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
    thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
    thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
    thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
    thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
    thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
    thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)

=== RUN   TestPingPongHog
    pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)

Release note: None

cockroach-teamcity · 2022-08-11T18:15:26Z

This change is

irfansharif · 2022-08-11T18:21:23Z

@rickystewart: PTAL at the build tags, to make sure I've got them right. @tbg, @sumeerbhola: PTAL at everything else. Future PRs will wire this up into the elastic CPU limiter we're building towards for latency isolation for backups.

pkg/util/grunning/enabled_test.go

pkg/util/grunning/enabled.go

Package grunning is a library that's able to retrieve on-CPU running time for individual goroutines. It relies on using a patched Go and provides a primitive for fine-grained CPU attribution and control through a single API: package grunning // Time returns the time spent by the current goroutine in the // running state. func Time() time.Duration The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625. We build CRDB using use the patched Go runtime for all officially supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly building CRDB also use happen to use two platforms we don't use a patched Go for: - FreeBSD (we don't have cross-compilers setup), and - M1/M2 Macs (we don't have a code-signing pipeline, yet). We use '(darwin && arm64) || freebsd || !bazel' as the build tag to exclude such platforms. See cockroachdb#84867 for more details. This package tests various properties we should expect over the running time value. It does not make assertions given the CI environments we run these under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy CRDB deployments). This is also why they're skipped under stress. Still, these tests are useful to understand the properties we expect running time to have: === RUN TestEquivalentGoroutines thread=03 expected≈10.00% got= 9.98% of on-cpu time thread=06 expected≈10.00% got=10.00% of on-cpu time thread=02 expected≈10.00% got=10.01% of on-cpu time thread=10 expected≈10.00% got=10.01% of on-cpu time thread=07 expected≈10.00% got= 9.99% of on-cpu time thread=04 expected≈10.00% got= 9.99% of on-cpu time thread=09 expected≈10.00% got=10.00% of on-cpu time thread=01 expected≈10.00% got= 9.99% of on-cpu time thread=08 expected≈10.00% got=10.02% of on-cpu time thread=05 expected≈10.00% got=10.02% of on-cpu time --- PASS: TestEquivalentGoroutines (0.56s) === RUN TestProportionalGoroutines thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x --- PASS: TestProportionalGoroutines (1.72s) === RUN TestPingPongHog pinger/ponger expected≈1.00x got=0.96x --- PASS: TestPingPongHog (0.91s) Release note: None

tbg · 2022-08-12T14:14:35Z

@irfansharif now's a good time to diversify away from me as a reviewer since I am sort of swamped and also likely won't be involved with admission control much. I assume Sumeer has got you covered but it's also worth thinking who else might be the third person on the squad, ideally someone from KV or an Obs team.

irfansharif · 2022-08-12T14:43:32Z

@kvoli or @lidorcarmel: mind giving this a look from the KV side of things? The immediate use is going to be in admission control, but I know the both of you are thinking about load balancing/allocation generally, and at some level, I think this will play a part (even if indirectly through AC): #83490.

irfansharif · 2022-08-12T15:18:15Z

Thanks!

bors r+

craig · 2022-08-12T16:50:38Z

Build succeeded:

Bazel Essential CI (Cockroach)

sumeerbhola

Reviewed 7 of 8 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):

		for i := trip; i > 0; i-- {
			ret += i
			ret = ret ^ (i + 0xcafebabe)

why the different computation compared to TestEquivalentGoroutines? I am assuming the only thing that matters in this test is that the different goroutines do different number of iterations.

pkg/util/grunning/enabled_test.go line 118 at r2 (raw file):

	}

	results := make([]int64, 10)

why is TestEquivalentGoroutines using a map when a slice like in this test would suffice?

pkg/util/grunning/enabled_test.go line 133 at r2 (raw file):

	total := int64(0)
	for _, result := range results {
		total += result

Does the barrier due to the WaitGroup obviate the need to do an atomic load here? If yes, why is there an atomic store above, when the different goroutines are writing to different ints?

pkg/util/grunning/enabled_test.go line 186 at r2 (raw file):

	}()
	ping <- true // start ping-pong
	<-stop

The test immediately stops the CPU hog? What is the purpose of the CPU hog?

It is not quite clear what this test is trying to demonstrate. Perhaps say something like
// Test that demonstrates that if 2 goroutines alternately cycle between running and waiting, they will get similar running times.

pkg/util/grunning/enabled_test.go line 201 at r2 (raw file):

	for n := 0; n < b.N; n++ {
		_ = grunning.Time()
	}

what was the result of this benchmark?

irfansharif

Sent test changes over at #86185.

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/util/grunning/enabled_test.go line 111 at r2 (raw file):