rfcs: fine-grained cpu attribution by irfansharif · Pull Request #82356 · cockroachdb/cockroach

irfansharif · 2022-06-02T15:25:08Z

CRDB lacks primitives to attribute CPU usage to specific scopes (ranges,
tenants, sessions, statements) or processes (pebble compactions,
snapshot generation). For subsystems that rely on such attribution, we
use proxy signals like '# of batch requests' or 'size of read/write
request' as observed by a range/replica or a given tenant. We propose
using a patched Go runtime to track CPU use at the level of individual
goroutines, using the primitive to derive an accurate resource
consumption signal in various CRDB subsystems. The compiler, language,
and tooling will remain the same.

Release note: None

cockroach-teamcity · 2022-06-02T15:25:18Z

This change is

knz

Overall +1 but I'd like a more verbose description of what we need wrt org change.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @irfansharif, and @tbg)

docs/RFCS/20220602_fine_grained_cpu_attribution.md line 275 at r1 (raw file):

attribution to the major Go release the change possibly appears; the diff is
conservative in size to make it easy to patch to minor/major Go releases going
forward. We expect future proposals for runtime changes, if any, to be evaluated

I thought you were extracting this sentence in a separate section?

I think the need for org changes needs to be discussed as one of the top areas of focus in the RFC review.

irfansharif

I'm not sure what you're imagining is in scope for "org changes", but is a continually maintained wiki page a better place for it than a checked in (typically immutable) RFC? The omission here was intentional -- I'm happy to opine on such a page, but I feel it'd be easier to comment on and keep up-to-date if situated elsewhere. I also don't feel we need to hammer it out before merging this RFC, or have that discussion here -- I'm ok proceeding with a fuzzy understanding that the changes proposed fit our acceptable criteria at this point in time.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @bdarnell, @knz, and @tbg)

docs/RFCS/20220602_fine_grained_cpu_attribution.md line 275 at r1 (raw file):

Previously, knz (kena) wrote…

I thought you were extracting this sentence in a separate section?

I think the need for org changes needs to be discussed as one of the top areas of focus in the RFC review.

This block ("Future runtime changes") was pulled out from the earlier one ("Development and release"), and now includes a sketch of possible guidelines to consider when evaluating runtime patches. See top-level comment for org change discussions.

kvoli

Exciting 🥳

docs/RFCS/20220602_fine_grained_cpu_attribution.md

CRDB lacks primitives to attribute CPU usage to specific scopes (ranges, tenants, sessions, statements) or processes (pebble compactions, snapshot generation). For subsystems that rely on such attribution, we use proxy signals like '# of batch requests' or 'size of read/write request' as observed by a range/replica or a given tenant. We propose using a patched Go runtime to track CPU use at the level of individual goroutines, using the primitive to derive an accurate resource consumption signal in various CRDB subsystems. The compiler, language, and tooling will remain the same. Release note: None

irfansharif · 2022-06-02T19:50:10Z

bors r+

craig · 2022-06-02T21:54:12Z

Build succeeded:

GitHub CI (Cockroach)

petermattis

Reviewable status: complete! 0 of 0 LGTMs obtained

docs/RFCS/20220602_fine_grained_cpu_attribution.md line 55 at r3 (raw file):

The time spent per-goroutine in the running state (henceforth "CPU time") is an
accurate and precise measure for CPU usage. This is not currently tracked by

"accurate and precise" seems mildly exaggerated because (as you note below) a goroutine may not actually be running while in the _Grunning state because the underlying thread is not being run by the kernel. I suspect this source of error is small in normal circumstances, though it becomes larger if another process on the node starts consuming significant CPU. The error also becomes larger if we're over-scheduling CPU. Are we doing that in Serverless? (I can't recall if we are over-scheduling CPU or memory or both).

docs/RFCS/20220602_fine_grained_cpu_attribution.md line 282 at r3 (raw file):

`crypto` ("core" functionality in general), and performance implications.

## Evaluation

Might be worthwhile to see how inaccurate the CPU time measurement is under adverse conditions. Notably, when there is another process on the CRDB node consuming significant CPU. If the inaccuracy is large, is there anything that could be done to mitigate it? For example, could we periodically call getrusage and normalize the goroutine CPU times with the actual process CPU times in some fashion (I'm hand-waving here)? I think this would be overkill initially. Mainly I'm looking for a small Future Work section.

docs/RFCS/20220602_fine_grained_cpu_attribution.md line 398 at r3 (raw file):

following idea.
- We'd make use of profiler labels at points of interest (code paths where
  requests to a single range/tenant feed through for e.g.)

Nit: e.g. means for example. So when I see for e.g. I'm reading that as for for example which seems repetitive. Not sure if that is my misreading or not.

Package grunning is a library that's able to retrieve on-CPU running time for individual goroutines. It relies on using a patched Go and provides a primitive for fine-grained CPU attribution and control through a single API: package grunning // Time returns the time spent by the current goroutine in the // running state. func Time() time.Duration The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625. We build CRDB using use the patched Go runtime for all officially supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly building CRDB also use happen to use two platforms we don't use a patched Go for: - FreeBSD (we don't have cross-compilers setup), and - M1/M2 Macs (we don't have a code-signing pipeline, yet). We use '(darwin && arm64) || freebsd || !bazel' as the build tag to exclude such platforms. See cockroachdb#84867 for more details. This package tests various properties we should expect over the running time value. It does not make assertions given the CI environments we run these under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy CRDB deployments). This is also why they're skipped under stress. Still, these tests are useful to understand the properties we expect running time to have: === RUN TestEquivalentGoroutines thread=03 expected≈10.00% got= 9.98% of on-cpu time thread=06 expected≈10.00% got=10.00% of on-cpu time thread=02 expected≈10.00% got=10.01% of on-cpu time thread=10 expected≈10.00% got=10.01% of on-cpu time thread=07 expected≈10.00% got= 9.99% of on-cpu time thread=04 expected≈10.00% got= 9.99% of on-cpu time thread=09 expected≈10.00% got=10.00% of on-cpu time thread=01 expected≈10.00% got= 9.99% of on-cpu time thread=08 expected≈10.00% got=10.02% of on-cpu time thread=05 expected≈10.00% got=10.02% of on-cpu time --- PASS: TestEquivalentGoroutines (0.56s) === RUN TestProportionalGoroutines thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x --- PASS: TestProportionalGoroutines (1.72s) === RUN TestPingPongHog pinger/ponger expected≈1.00x got=0.96x --- PASS: TestPingPongHog (0.91s) Release note: None

85850: importer: support {} format for arrays in CSV r=ecwall a=rafiss fixes #84631 Release note (sql change): Arrays can now be imported in a CSV file using the {} format, similar to COPY FROM. Importing array expressions (e.g. ARRAY[1, 2, 3]) is still supported as well. 85854: clusterversion,storage: remove 22.1 PebbleFormat version gates r=jbowens a=celiala This commit removes the following 22.1 version gates: - PebbleFormatBlockPropertyCollector - PebbleFormatSplitUserKeysMarked Cleanup was done following guidance from [21.2 cleanup](#74270 (comment)): > For the most part, if the gates were just simple if !version.IsActive { return x } or something, I just removed the block, and even if it was a little more complicated, like args = [x]; if version { args = append(args, y) }; foo(args) I still tried to mostly inline it such that it looked natural (i.e. remove that append and make it args = [x, y]). > However for just a couple more complicated cases that were referring to <21.2 versions that needed to be replaced when those were deleted, I added a placeholder clusterversion.TODOPre21_2 alias for 21.2. Replacing those calls with this alias shouldn't change their behavior -- it was already always true, since the code today should never run in a <21.2 cluster -- but means we can delete those older versions in the meantime and then the owners of these bits can decide how to update them. Partially addresses #80663 Release note: none 85909: kvserver: instrument RaftTransport workers with pprof labels r=tbg,erikgrinaker a=pavelkalinnikov The unused arguments in the method signature were used to identify goroutines in traces. This no longer works after Go 1.17 started passing arguments via registers. This commit adds pprof labels when starting these goroutines, to have a cleaner code, more readable traces, and to work around the new Go convention. Release note: None 85977: grunning: add library for precise on-CPU time measurement r=irfansharif a=irfansharif Package grunning is a library that's able to retrieve on-CPU running time for individual goroutines. It relies on using a patched Go and provides a primitive for fine-grained CPU attribution and control through a single API: package grunning // Time returns the time spent by the current goroutine in the // running state. func Time() time.Duration The motivating RFC is over at #82356. Informs #82625. We build CRDB using use the patched Go runtime for all officially supported platforms when built using Bazel (#84867). Engineers commonly building CRDB also use happen to use two platforms we don't use a patched Go for: - FreeBSD (we don't have cross-compilers setup), and - M1/M2 Macs (we don't have a code-signing pipeline, yet). We use '(darwin && arm64) || freebsd || !bazel' as the build tag to exclude such platforms. See #84867 for more details. This package tests various properties we should expect over the running time value. It does not make assertions given the CI environments we run these under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy CRDB deployments). This is also why they're skipped under stress. Still, these tests are useful to understand the properties we expect running time to have: === RUN TestEquivalentGoroutines thread=03 expected≈10.00% got= 9.98% of on-cpu time thread=06 expected≈10.00% got=10.00% of on-cpu time thread=02 expected≈10.00% got=10.01% of on-cpu time thread=10 expected≈10.00% got=10.01% of on-cpu time thread=07 expected≈10.00% got= 9.99% of on-cpu time thread=04 expected≈10.00% got= 9.99% of on-cpu time thread=09 expected≈10.00% got=10.00% of on-cpu time thread=01 expected≈10.00% got= 9.99% of on-cpu time thread=08 expected≈10.00% got=10.02% of on-cpu time thread=05 expected≈10.00% got=10.02% of on-cpu time --- PASS: TestEquivalentGoroutines (0.56s) === RUN TestProportionalGoroutines thread=01 got 1.82% of on-cpu time: expected≈ 1.00x got=1.00x thread=02 got 3.64% of on-cpu time: expected≈ 2.00x got=2.00x thread=03 got 5.47% of on-cpu time: expected≈ 3.00x got=3.00x thread=04 got 7.28% of on-cpu time: expected≈ 4.00x got=4.00x thread=05 got 9.09% of on-cpu time: expected≈ 5.00x got=4.99x thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x --- PASS: TestProportionalGoroutines (1.72s) === RUN TestPingPongHog pinger/ponger expected≈1.00x got=0.96x --- PASS: TestPingPongHog (0.91s) Release note: None 85987: sql: fix aggregation of statistics r=maryliag a=maryliag Previously, because we were using a join, we were double counting statistics when we had the same fingerprint in memory and persisted. This commit adds a `DISTINCT` so we only count them once. Fixes #85958 Release note: None 86008: sql: logic test to inform maximum builtin function oid change r=chengxiong-ruan a=chengxiong-ruan This commit adds a logic test to let engineers who added a new builtin function know that the new builtin function is constructed earlier than some existing builtin functions at init time. Release note: None Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Celia La <celia@cockroachlabs.com> Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Marylia Gutierrez <marylia@cockroachlabs.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>

irfansharif requested a review from a team as a code owner June 2, 2022 15:25

irfansharif force-pushed the 220602.fine-grained-cpu branch from 7228848 to 75654ff Compare June 2, 2022 15:28

irfansharif requested review from ajwerner, bdarnell, knz and tbg June 2, 2022 15:33

knz reviewed Jun 2, 2022

View reviewed changes

irfansharif commented Jun 2, 2022

View reviewed changes

irfansharif requested a review from a team June 2, 2022 18:45

irfansharif force-pushed the 220602.fine-grained-cpu branch from 75654ff to 08cf393 Compare June 2, 2022 19:08

kvoli approved these changes Jun 2, 2022

View reviewed changes

docs/RFCS/20220602_fine_grained_cpu_attribution.md Show resolved Hide resolved

docs/RFCS/20220602_fine_grained_cpu_attribution.md Show resolved Hide resolved

docs/RFCS/20220602_fine_grained_cpu_attribution.md Outdated Show resolved Hide resolved

irfansharif force-pushed the 220602.fine-grained-cpu branch from 08cf393 to 3460044 Compare June 2, 2022 19:47

craig bot merged commit 8cdcaa3 into cockroachdb:master Jun 2, 2022

irfansharif deleted the 220602.fine-grained-cpu branch June 2, 2022 21:56

This was referenced Jun 3, 2022

tenantrate: use measured on-cpu time for rate limiting #77041

Open

rfcs: Task groups in Go: per-session resource usage tracking #60589

Open

petermattis reviewed Jun 6, 2022

View reviewed changes

This was referenced Jun 6, 2022

admission,kv,bulk: unify (local) store overload protection via admission control #75066

Closed

*: fine-grained cpu attribution #82625

Closed

This was referenced Jun 27, 2022

kvserver: throttle writes on followers #79215

Closed

allocator,admission: consider resource utilization + throttling signals directly #83490

Open

irfansharif mentioned this pull request Aug 11, 2022

grunning: add library for precise on-CPU time measurement #85977

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfcs: fine-grained cpu attribution#82356

rfcs: fine-grained cpu attribution#82356
craig[bot] merged 1 commit intocockroachdb:masterfrom
irfansharif:220602.fine-grained-cpu

irfansharif commented Jun 2, 2022 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jun 2, 2022

Uh oh!

knz left a comment

Uh oh!

irfansharif left a comment

Uh oh!

kvoli left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

irfansharif commented Jun 2, 2022

Uh oh!

craig bot commented Jun 2, 2022

Uh oh!

petermattis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

irfansharif commented Jun 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jun 2, 2022

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

irfansharif left a comment

Choose a reason for hiding this comment

Uh oh!

kvoli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

irfansharif commented Jun 2, 2022

Uh oh!

craig bot commented Jun 2, 2022

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

irfansharif commented Jun 2, 2022 •

edited

Loading