release-22.1: sql,tracing: per-stmt probabilistic trace sampling by irfansharif · Pull Request #89774 · cockroachdb/cockroach

irfansharif · 2022-10-11T19:56:13Z

This commit introduces a backportable alternative to #82750 and #83020, to help solve for #82896. It gives a palatable alternative to sql.trace.stmt.enable_threshold. See those PRs/issues for verbose commentary, but roughly we can do the following:

  Pick a stmt fingerprint, declare a sampling probability which controls
  when verbose tracing is enabled for it, and a latency threshold for
  which a trace is logged. With a given stmt rate (say 1000/s) and a
  given percentile we're trying to capture (say p99.9), we have N stmt/s
  in the 99.9th percentile (1 in our example). We should be able to set
  a sampling probability P such that with high likelihood (>95%) we
  capture at least one trace captured over the next S seconds. The
  sampling rate lets you control how the overhead you’re introducing for
  those statements in aggregate, which if dialed up higher lets you
  lower S. You might want to do such a thing for infrequently executed
  statements.

This commit does the simplest thing: ask for all input through shoddy cluster settings and just log traces to our usual files. Below we outline an example of how to use these settings to catch outlier executions for writes to the history table in TPC-C. When using it, we did not see an overall increase in p99s as a result of the sampling. It also took only about 10s to capture this data, showing clearly that it was due to latch waits.

  > SELECT
      encode(fingerprint_id, 'hex'),
      (statistics -> 'statistics' ->> 'cnt')::INT AS count,
      metadata ->> 'query' AS query
    FROM system.statement_statistics
    ORDER BY COUNT DESC limit 10;

           encode    | count |                                             query
    -----------------+-------+--------------------------------------------------------------------------------------------------------------------
    ...
    4e4214880f87d799 |  2680 | INSERT INTO history(h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES ($1, $2, __more6__)

    > SET CLUSTER SETTING trace.fingerprint = '4e4214880f87d799'; -- fingerprint
    > SET CLUSTER SETTING trace.fingerprint.probability = 0.01;   -- 1% sampling probability
    > SET CLUSTER SETTING trace.fingerprint.threshold = '90ms';   -- latency threshold

     0.521ms      0.005ms    event:kv/kvserver/concurrency/concurrency_manager.go:260 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] acquiring latches›
     0.532ms      0.011ms    event:kv/kvserver/spanlatch/manager.go:532 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] waiting to acquire read latch /Table/109/1/105/3/519/0@0,0, held by write latch /Table/109/1/105/3/519/0@1654849354.483984000,0›
    99.838ms     99.306ms    event:kv/kvserver/concurrency/concurrency_manager.go:301 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] scanning lock table for conflicting locks›
    99.851ms      0.013ms    event:kv/kvserver/replica_read.go:251 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] executing read-only batch›

Compare this to sql.trace.stmt.enable_threshold which enables verbose tracing for all statements with 100% probability. It introduces far too much overhead for it to be used in reasonable production clusters. The overhead also often masks the problems we're looking to capture.

Release note (general change): We introduced a trifecta of three cluster settings to collect trace data for outlier executions with low overhead. This is only going to be available in 22.1; in 22.2 and beyond we have other mechanisms to collect outlier traces. Traces come in handy when looking to investigate latency spikes, and these three settings are intended to supplant most uses of sql.trace.stmt.enable_threshold. That setting enables verbose tracing for all statements with 100% probability which can cause a lot of overhead in production clusters, and also a lot of logging pressure. Instead we introduce the following:

trace.fingerprint
trace.fingerprint.probability
trace.fingerprint.threshold

Put together (all have to be set) it only enables tracing for the statement with the set hex-encoded fingerprint, does so probabilistically (where probability is whatever trace.fingerprint.probability is set to), logging it only if the latency threshold is exceeded (configured using trace.fingerprint.threshold). To obtain a hex-encoded fingerprint, consider looking at the contents of system.statement_statistics. For example:

  > SELECT
      encode(fingerprint_id, 'hex'),
      (statistics -> 'statistics' ->> 'cnt')::INT AS count,
      metadata ->> 'query' AS query
    FROM system.statement_statistics
    ORDER BY COUNT DESC limit 10;

           encode    | count |                                             query
    -----------------+-------+--------------------------------------------------------------------------------------------------------------------
    ...
    4e4214880f87d799 |  2680 | INSERT INTO history(h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES ($1, $2, __more6__)

Release justification: adds helpful instrumentation possibilities for latency investigations.

blathers-crl · 2022-10-11T19:56:16Z

cockroach-teamcity · 2022-10-11T19:56:20Z

This change is

This commit introduces a backportable alternative to cockroachdb#82750 and cockroachdb#83020, to help solve for cockroachdb#82896. It gives a palatable alternative to sql.trace.stmt.enable_threshold. See those PRs/issues for verbose commentary, but roughly we can do the following: Pick a stmt fingerprint, declare a sampling probability which controls when verbose tracing is enabled for it, and a latency threshold for which a trace is logged. With a given stmt rate (say 1000/s) and a given percentile we're trying to capture (say p99.9), we have N stmt/s in the 99.9th percentile (1 in our example). We should be able to set a sampling probability P such that with high likelihood (>95%) we capture at least one trace captured over the next S seconds. The sampling rate lets you control how the overhead you’re introducing for those statements in aggregate, which if dialed up higher lets you lower S. You might want to do such a thing for infrequently executed statements. This commit does the simplest thing: ask for all input through shoddy cluster settings and just log traces to our usual files. Below we outline an example of how to use these settings to catch outlier executions for writes to the history table in TPC-C. When using it, we did not see an overall increase in p99s as a result of the sampling. It also took only about 10s to capture this data, showing clearly that it was due to latch waits. > SELECT encode(fingerprint_id, 'hex'), (statistics -> 'statistics' ->> 'cnt')::INT AS count, metadata ->> 'query' AS query FROM system.statement_statistics ORDER BY COUNT DESC limit 10; encode | count | query -----------------+-------+-------------------------------------------------------------------------------------------------------------------- ... 4e4214880f87d799 | 2680 | INSERT INTO history(h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES ($1, $2, __more6__) > SET CLUSTER SETTING trace.fingerprint = '4e4214880f87d799'; -- fingerprint > SET CLUSTER SETTING trace.fingerprint.probability = 0.01; -- 1% sampling probability > SET CLUSTER SETTING trace.fingerprint.threshold = '90ms'; -- latency threshold 0.521ms 0.005ms event:kv/kvserver/concurrency/concurrency_manager.go:260 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] acquiring latches› 0.532ms 0.011ms event:kv/kvserver/spanlatch/manager.go:532 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] waiting to acquire read latch /Table/109/1/105/3/519/0@0,0, held by write latch /Table/109/1/105/3/519/0@1654849354.483984000,0› 99.838ms 99.306ms event:kv/kvserver/concurrency/concurrency_manager.go:301 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] scanning lock table for conflicting locks› 99.851ms 0.013ms event:kv/kvserver/replica_read.go:251 [n1,s1,r105/1:/Table/109/1/{90/7/2…-105/10…}] executing read-only batch› Compare this to sql.trace.stmt.enable_threshold which enables verbose tracing for all statements with 100% probability. It introduces far too much overhead for it to be used in reasonable production clusters. The overhead also often masks the problems we're looking to capture. Release note (general change): We introduced a trifecta of three cluster settings to collect trace data for outlier executions with low overhead. This is only going to be available in 22.1; in 22.2 and beyond we have other mechanisms to collect outlier traces. Traces come in handy when looking to investigate latency spikes, and these three settings are intended to supplant most uses of sql.trace.stmt.enable_threshold. That setting enables verbose tracing for all statements with 100% probability which can cause a lot of overhead in production clusters, and also a lot of logging pressure. Instead we introduce the following: - trace.fingerprint - trace.fingerprint.probability - trace.fingerprint.threshold Put together (all have to be set) it only enables tracing for the statement with the set hex-encoded fingerprint, does so probabilistically (where probability is whatever trace.fingerprint.probability is set to), logging it only if the latency threshold is exceeded (configured using trace.fingerprint.threshold). To obtain a hex-encoded fingerprint, consider looking at the contents of system.statement_statistics. For example: > SELECT encode(fingerprint_id, 'hex'), (statistics -> 'statistics' ->> 'cnt')::INT AS count, metadata ->> 'query' AS query FROM system.statement_statistics ORDER BY COUNT DESC limit 10; encode | count | query -----------------+-------+-------------------------------------------------------------------------------------------------------------------- ... 4e4214880f87d799 | 2680 | INSERT INTO history(h_c_id, h_c_d_id, h_c_w_id, h_d_id, h_w_id, h_amount, h_date, h_data) VALUES ($1, $2, __more6__)

irfansharif · 2022-10-14T14:57:39Z

(Bump.)

irfansharif requested review from andreimatei and yuzefovich October 11, 2022 19:56

irfansharif force-pushed the 221011.tail-tracing-22.1 branch from 0774848 to 0ad199d Compare October 11, 2022 22:04

irfansharif requested review from a team and tbg October 14, 2022 14:57

aayushshah15 approved these changes Oct 14, 2022

View reviewed changes

irfansharif merged commit 846310f into cockroachdb:release-22.1 Oct 14, 2022

irfansharif deleted the 221011.tail-tracing-22.1 branch October 14, 2022 20:48

cockroach-teamcity mentioned this pull request Oct 14, 2022

PR #89774 - sql,tracing: per-stmt probabilistic trace sampling cockroachdb/docs#15370

Closed

irfansharif mentioned this pull request Oct 19, 2022

tracing,*: always-on probabilistic tracing #90292

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.1: sql,tracing: per-stmt probabilistic trace sampling#89774

release-22.1: sql,tracing: per-stmt probabilistic trace sampling#89774
irfansharif merged 1 commit intocockroachdb:release-22.1from
irfansharif:221011.tail-tracing-22.1

irfansharif commented Oct 11, 2022 •

edited

Loading

Uh oh!

blathers-crl bot commented Oct 11, 2022 •

edited by irfansharif

Loading

Uh oh!

cockroach-teamcity commented Oct 11, 2022

Uh oh!

irfansharif commented Oct 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

irfansharif commented Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Oct 11, 2022 • edited by irfansharif Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Oct 11, 2022

Uh oh!

irfansharif commented Oct 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

irfansharif commented Oct 11, 2022 •

edited

Loading

blathers-crl bot commented Oct 11, 2022 •

edited by irfansharif

Loading