rfcs: fine-grained cpu attribution#82356
Conversation
7228848 to
75654ff
Compare
knz
left a comment
There was a problem hiding this comment.
Overall +1 but I'd like a more verbose description of what we need wrt org change.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @irfansharif, and @tbg)
docs/RFCS/20220602_fine_grained_cpu_attribution.md line 275 at r1 (raw file):
attribution to the major Go release the change possibly appears; the diff is conservative in size to make it easy to patch to minor/major Go releases going forward. We expect future proposals for runtime changes, if any, to be evaluated
I thought you were extracting this sentence in a separate section?
I think the need for org changes needs to be discussed as one of the top areas of focus in the RFC review.
irfansharif
left a comment
There was a problem hiding this comment.
I'm not sure what you're imagining is in scope for "org changes", but is a continually maintained wiki page a better place for it than a checked in (typically immutable) RFC? The omission here was intentional -- I'm happy to opine on such a page, but I feel it'd be easier to comment on and keep up-to-date if situated elsewhere. I also don't feel we need to hammer it out before merging this RFC, or have that discussion here -- I'm ok proceeding with a fuzzy understanding that the changes proposed fit our acceptable criteria at this point in time.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @tbg)
docs/RFCS/20220602_fine_grained_cpu_attribution.md line 275 at r1 (raw file):
Previously, knz (kena) wrote…
I thought you were extracting this sentence in a separate section?
I think the need for org changes needs to be discussed as one of the top areas of focus in the RFC review.
This block ("Future runtime changes") was pulled out from the earlier one ("Development and release"), and now includes a sketch of possible guidelines to consider when evaluating runtime patches. See top-level comment for org change discussions.
75654ff to
08cf393
Compare
CRDB lacks primitives to attribute CPU usage to specific scopes (ranges, tenants, sessions, statements) or processes (pebble compactions, snapshot generation). For subsystems that rely on such attribution, we use proxy signals like '# of batch requests' or 'size of read/write request' as observed by a range/replica or a given tenant. We propose using a patched Go runtime to track CPU use at the level of individual goroutines, using the primitive to derive an accurate resource consumption signal in various CRDB subsystems. The compiler, language, and tooling will remain the same. Release note: None
08cf393 to
3460044
Compare
|
bors r+ |
|
Build succeeded: |
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
docs/RFCS/20220602_fine_grained_cpu_attribution.md line 55 at r3 (raw file):
The time spent per-goroutine in the running state (henceforth "CPU time") is an accurate and precise measure for CPU usage. This is not currently tracked by
"accurate and precise" seems mildly exaggerated because (as you note below) a goroutine may not actually be running while in the _Grunning state because the underlying thread is not being run by the kernel. I suspect this source of error is small in normal circumstances, though it becomes larger if another process on the node starts consuming significant CPU. The error also becomes larger if we're over-scheduling CPU. Are we doing that in Serverless? (I can't recall if we are over-scheduling CPU or memory or both).
docs/RFCS/20220602_fine_grained_cpu_attribution.md line 282 at r3 (raw file):
`crypto` ("core" functionality in general), and performance implications. ## Evaluation
Might be worthwhile to see how inaccurate the CPU time measurement is under adverse conditions. Notably, when there is another process on the CRDB node consuming significant CPU. If the inaccuracy is large, is there anything that could be done to mitigate it? For example, could we periodically call getrusage and normalize the goroutine CPU times with the actual process CPU times in some fashion (I'm hand-waving here)? I think this would be overkill initially. Mainly I'm looking for a small Future Work section.
docs/RFCS/20220602_fine_grained_cpu_attribution.md line 398 at r3 (raw file):
following idea. - We'd make use of profiler labels at points of interest (code paths where requests to a single range/tenant feed through for e.g.)
Nit: e.g. means for example. So when I see for e.g. I'm reading that as for for example which seems repetitive. Not sure if that is my misreading or not.
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
package grunning
// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration
The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
=== RUN TestEquivalentGoroutines
thread=03 expected≈10.00% got= 9.98% of on-cpu time
thread=06 expected≈10.00% got=10.00% of on-cpu time
thread=02 expected≈10.00% got=10.01% of on-cpu time
thread=10 expected≈10.00% got=10.01% of on-cpu time
thread=07 expected≈10.00% got= 9.99% of on-cpu time
thread=04 expected≈10.00% got= 9.99% of on-cpu time
thread=09 expected≈10.00% got=10.00% of on-cpu time
thread=01 expected≈10.00% got= 9.99% of on-cpu time
thread=08 expected≈10.00% got=10.02% of on-cpu time
thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)
=== RUN TestProportionalGoroutines
thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x
thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x
thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x
thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x
thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x
thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)
=== RUN TestPingPongHog
pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)
Release note: None
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
package grunning
// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration
The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
=== RUN TestEquivalentGoroutines
thread=03 expected≈10.00% got= 9.98% of on-cpu time
thread=06 expected≈10.00% got=10.00% of on-cpu time
thread=02 expected≈10.00% got=10.01% of on-cpu time
thread=10 expected≈10.00% got=10.01% of on-cpu time
thread=07 expected≈10.00% got= 9.99% of on-cpu time
thread=04 expected≈10.00% got= 9.99% of on-cpu time
thread=09 expected≈10.00% got=10.00% of on-cpu time
thread=01 expected≈10.00% got= 9.99% of on-cpu time
thread=08 expected≈10.00% got=10.02% of on-cpu time
thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)
=== RUN TestProportionalGoroutines
thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x
thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x
thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x
thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x
thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x
thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)
=== RUN TestPingPongHog
pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)
Release note: None
85850: importer: support {} format for arrays in CSV r=ecwall a=rafiss
fixes #84631
Release note (sql change): Arrays can now be imported in a CSV file
using the {} format, similar to COPY FROM. Importing array expressions
(e.g. ARRAY[1, 2, 3]) is still supported as well.
85854: clusterversion,storage: remove 22.1 PebbleFormat version gates r=jbowens a=celiala
This commit removes the following 22.1 version gates:
- PebbleFormatBlockPropertyCollector
- PebbleFormatSplitUserKeysMarked
Cleanup was done following guidance from [21.2 cleanup](#74270 (comment)):
> For the most part, if the gates were just simple if !version.IsActive { return x } or something, I just removed the block, and even if it was a little more complicated, like args = [x]; if version { args = append(args, y) }; foo(args) I still tried to mostly inline it such that it looked natural (i.e. remove that append and make it args = [x, y]).
> However for just a couple more complicated cases that were referring to <21.2 versions that needed to be replaced when those were deleted, I added a placeholder clusterversion.TODOPre21_2 alias for 21.2. Replacing those calls with this alias shouldn't change their behavior -- it was already always true, since the code today should never run in a <21.2 cluster -- but means we can delete those older versions in the meantime and then the owners of these bits can decide how to update them.
Partially addresses #80663
Release note: none
85909: kvserver: instrument RaftTransport workers with pprof labels r=tbg,erikgrinaker a=pavelkalinnikov
The unused arguments in the method signature were used to identify goroutines
in traces. This no longer works after Go 1.17 started passing arguments via
registers.
This commit adds pprof labels when starting these goroutines, to have a cleaner
code, more readable traces, and to work around the new Go convention.
Release note: None
85977: grunning: add library for precise on-CPU time measurement r=irfansharif a=irfansharif
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:
package grunning
// Time returns the time spent by the current goroutine in the
// running state.
func Time() time.Duration
The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See #84867 for more details.
This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:
=== RUN TestEquivalentGoroutines
thread=03 expected≈10.00% got= 9.98% of on-cpu time
thread=06 expected≈10.00% got=10.00% of on-cpu time
thread=02 expected≈10.00% got=10.01% of on-cpu time
thread=10 expected≈10.00% got=10.01% of on-cpu time
thread=07 expected≈10.00% got= 9.99% of on-cpu time
thread=04 expected≈10.00% got= 9.99% of on-cpu time
thread=09 expected≈10.00% got=10.00% of on-cpu time
thread=01 expected≈10.00% got= 9.99% of on-cpu time
thread=08 expected≈10.00% got=10.02% of on-cpu time
thread=05 expected≈10.00% got=10.02% of on-cpu time
--- PASS: TestEquivalentGoroutines (0.56s)
=== RUN TestProportionalGoroutines
thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x
thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x
thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x
thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x
thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x
thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
--- PASS: TestProportionalGoroutines (1.72s)
=== RUN TestPingPongHog
pinger/ponger expected≈1.00x got=0.96x
--- PASS: TestPingPongHog (0.91s)
Release note: None
85987: sql: fix aggregation of statistics r=maryliag a=maryliag
Previously, because we were using a join, we were double
counting statistics when we had the same fingerprint in
memory and persisted.
This commit adds a `DISTINCT` so we only count them once.
Fixes #85958
Release note: None
86008: sql: logic test to inform maximum builtin function oid change r=chengxiong-ruan a=chengxiong-ruan
This commit adds a logic test to let engineers who added a
new builtin function know that the new builtin function is
constructed earlier than some existing builtin functions at
init time.
Release note: None
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Celia La <celia@cockroachlabs.com>
Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com>
Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
Link to RFC text.
CRDB lacks primitives to attribute CPU usage to specific scopes (ranges,
tenants, sessions, statements) or processes (pebble compactions,
snapshot generation). For subsystems that rely on such attribution, we
use proxy signals like '# of batch requests' or 'size of read/write
request' as observed by a range/replica or a given tenant. We propose
using a patched Go runtime to track CPU use at the level of individual
goroutines, using the primitive to derive an accurate resource
consumption signal in various CRDB subsystems. The compiler, language,
and tooling will remain the same.
Release note: None