sql: per-statement statistics, step two#14181
Conversation
|
Review status: 0 of 8 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. pkg/sql/app_stats.go, line 149 at r1 (raw file):
I wonder if we should dump these to the log before resetting. pkg/sql/crdb_internal.go, line 264 at r1 (raw file):
To abbreviate or not to abbreviate? pkg/sql/executor.go, line 362 at r1 (raw file):
That's sort of an odd construction. Normally we'd just call something like Comments from Reviewable |
|
Reviewed 8 of 8 files at r1, 3 of 3 files at r2. pkg/sql/app_stats.go, line 35 at r1 (raw file):
I'm fine doing in a followup, but I'd like to switch to a wrapped porto-defined struct as early as possible for holding the actual stats, since separating data from behavior in a normal struct later can get trickier than building it that way from the beginning. pkg/sql/app_stats.go, line 44 at r1 (raw file):
out of scope for this PR but... env var is fine for things we don't trust yet and want to hide, but in general, I'd like to be able to change settings like this at runtime without editing the startup scripts and rebooting the whole cluster. Seems like we should have a system.settings table that nodes can cache for ~1m or something. pkg/sql/app_stats.go, line 123 at r1 (raw file):
we could potentially size this map based on the old one (e.g. pkg/sql/app_stats.go, line 128 at r1 (raw file):
ditto pkg/sql/app_stats.go, line 144 at r1 (raw file):
I'm find landing this as-is for now, but I'll probably pull periodic triggering of clearing into the server reporting loop, so we don't risk accidentally clearing right before reporting. pkg/sql/app_stats.go, line 129 at r2 (raw file):
okay, yeah, now I really want to proto this. pkg/sql/executor.go, line 362 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
ditto above -- this goes away and we pul it into existing worker running a 24h loop in server. feel free to Comments from Reviewable |
|
(will process your other comments in a 2nd round of tweaks) Review status: 4 of 8 files reviewed at latest revision, 9 unresolved discussions. pkg/sql/app_stats.go, line 44 at r1 (raw file): Previously, dt (David Taylor) wrote…
That's it -- not trusted yet, not enabled by default. I envision that the feature will always be enabled later. Added a clarifying comment. pkg/sql/app_stats.go, line 123 at r1 (raw file): Previously, dt (David Taylor) wrote…
Done. pkg/sql/app_stats.go, line 128 at r1 (raw file): Previously, dt (David Taylor) wrote…
Done. Comments from Reviewable |
|
Review status: 2 of 10 files reviewed at latest revision, 7 unresolved discussions. pkg/sql/app_stats.go, line 35 at r1 (raw file): Previously, dt (David Taylor) wrote…
Done. pkg/sql/app_stats.go, line 144 at r1 (raw file): Previously, dt (David Taylor) wrote…
Let's talk more; I have a feeling we'll be more happy all around if we were to persist this data in a SQL table rather than our ts database. In which case, we'd issue a sql transaction here, at this custom interval. Then the reporting loop (actually the update loop, running once a day) can pick up the data to send from this SQL table. pkg/sql/app_stats.go, line 149 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. pkg/sql/app_stats.go, line 129 at r2 (raw file): Previously, dt (David Taylor) wrote…
Done. pkg/sql/crdb_internal.go, line 264 at r1 (raw file): Previously, petermattis (Peter Mattis) wrote…
Not to. Done. pkg/sql/executor.go, line 362 at r1 (raw file): Previously, dt (David Taylor) wrote…
Done. Comments from Reviewable |
4fc7eeb to
e1ef3ac
Compare
|
Review status: 2 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks pending. pkg/sql/app_stats.proto, line 42 at r5 (raw file):
pkg/sql/app_stats.proto, line 51 at r5 (raw file):
Tabs on the above lines. Also, looking at this again, the terms Comments from Reviewable |
|
Review status: 2 of 10 files reviewed at latest revision, 8 unresolved discussions, some commit checks pending. pkg/sql/app_stats.proto, line 26 at r5 (raw file):
These don't need to be capitalized and usually aren't -- the generated Go code will cap them for us. pkg/sql/app_stats.proto, line 36 at r5 (raw file):
we usually snake_case proto field names -- again, generated Go code will convert to more idiomatic CamelCase identifiers. pkg/sql/app_stats.proto, line 50 at r5 (raw file):
is that a tab? spaces in protos. pkg/sql/crdb_internal.go, line 264 at r5 (raw file):
nit: I'm slightly apprehensive of presenting the node-specific, unaggregated stats directly to users via SQL. Getting different answers depending on what node you ask (e.g. behind a load-balancer) seems like a ux-surprise. I understand that this is a step towards some form of aggregation that can come later, it just seems that surfacing this intermediate step in SQL, a medium where we usually are consistent (thanks to being on top of a common kv) has the potential to be surprising. my concerns would be mostly assuaged by something as simple as making it obvious what it is, e.g. Comments from Reviewable |
|
Review status: 2 of 10 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful. pkg/sql/app_stats.proto, line 26 at r5 (raw file): Previously, dt (David Taylor) wrote…
I had forgotten. Thanks for the reminder. pkg/sql/app_stats.proto, line 36 at r5 (raw file): Previously, dt (David Taylor) wrote…
Done. pkg/sql/app_stats.proto, line 42 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
Done. pkg/sql/app_stats.proto, line 50 at r5 (raw file): Previously, dt (David Taylor) wrote…
Done. pkg/sql/app_stats.proto, line 51 at r5 (raw file): Previously, petermattis (Peter Mattis) wrote…
"exec" senamed to "service latency", added comments too. pkg/sql/crdb_internal.go, line 264 at r5 (raw file): Previously, dt (David Taylor) wrote…
Of the other 4 tables in Comments from Reviewable |
|
Please suggest what remains to be done here in light of today's discussion :) |
|
documenting our discussion this morning:
|
…e_`. Requested by @dt: the name of the internal tables that only retrieve node-local information (i.e. not aggregated across the cluster) should be clearly marked (in their name) as such. This patch does this by adding a prefix `node_` to these tables. Also for good measure a column `node_id` is added to the tables so that a client inspecting the table can detect when the source of the information changes (e.g. via a load balancer, reconnect etc).
|
I have renamed the other crdb_internal tables as suggested. Note: value removal will occur in a subsequent PR. For now the bucketing includes the values. This is why the feature is currently gated. |
This patch extends/replaces the previously introduced mechanism that tracks per-application statistics: statistics are now bucketed by statement/query instead, but each application has its own set of buckets (so we collect different stats per application). Only the statement types that do not implement the (new) interface `parser.HiddenFromStats` are collected. For example SET and SHOW are excluded. The collected data is exposed in the new virtual table `crdb_internal.statement_statistics`, which replaces the previous `app_stats` virtual table. These statistics are only enabled if the environment variable COCKROACH_SQL_STMT_STATS_ENABLE is set and true. This feature get is meant to be temporary, so as to not risk the stability of default setups. Also, the data is cleared from memory periodically, at the interval set by COCKROACH_SQL_STMT_STATS_RESET_INTERVAL, by default set to 1 hour. When this happens, the data collected so far is dump to the INFO log. Later this can be replaced by a dump to some system table in the database.
This patch starts tracking the following details per statement: - total number of executions - total number of first attempts (to separate implicit retries) - maximum observed number of retries - last error encountered - average and variance for: - total number of rows - parse latency - logical plan latency - execution latency - overhead latency (total latency - parse - logical - execution) The statements are also separately bucketed depending on: - whether the execution failed with an error - whether the query was distributed. (This results in potentially 4 buckets per statement.)
|
👍 |
|
Thanks! |
This PR follows up on #14092 towards #13968.
sql: track per-statement statistics.
This patch extends/replaces the previously introduced mechanism that
tracks per-application statistics: statistics are now bucketed by
statement/query instead, but each application has its own set of
buckets (so we collect different stats per application).
Only the statement types that do not implement the (new) interface
parser.HiddenFromStatsare collected. For example SET and SHOW areexcluded.
The collected data is exposed in the new virtual table
crdb_internal.stmt_stats, which replaces the previousapp_statsvirtual table.
These statistics are only enabled if the environment variable
COCKROACH_SQL_STMT_STATS_ENABLE is set and true.
Also, the data is cleared from memory periodically, at the interval
set by COCKROACH_SQL_STMT_STATS_RESET_INTERVAL, by default set to 1
day.
sql: track per-statement execution details.
This patch starts tracking the following details per statement:
The statements are also separately bucketed depending on:
(This results in potentially 4 buckets per statement.)
cc @petermattis
This change is