sql: add storage iteration stats to fingerprint stats tables#96016
Conversation
|
Re: #71351, here are some Comparing this with the nightly As a result, I think we might want to move forward with the first approach laid out in 72048 (or something similar):
|
36f8b73 to
dd8a6b4
Compare
maryliag
left a comment
There was a problem hiding this comment.
Although the level of detail is great, it would probably be too overwhelming for users and hard to understand. The end goal is to show % of live data, so you don't need to be too granular to what the MVCC is. Create just one value and add all the ones from your list
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo and @yuzefovich)
|
Re: benchmarking of I see that this PR is marked as draft, so I haven't reviewed it. Please ping me when it is RFAL. I also agree with Marylia that this level of detail might be too much (I'd guess even most CRDB engineers wouldn't be able to meaningfully utilize the newly-added fields). Perhaps it's worth collaborating with someone from the storage team to come up with a handful of useful statistics there. I think recently a large comment about these details was merged somewhere in the codebase. |
dd8a6b4 to
9bcc5e4
Compare
4e028a9 to
7cd0af5
Compare
ericharmeling
left a comment
There was a problem hiding this comment.
Thanks for the feedback!
I briefly chatted with @sumeerbhola, and then with @kevin-v-ngo, and @maryliag, and it looks like the best path forward is to expose all of the storage iteration stats in the internal execution statistics tables where advanced users/L2s can find them, and then provide documentation on how to interpret the stats. In a future PR, we can pare down the stats or derive some other statistic in the status server, for exposure to the Console.
I've updated this PR to add the stats to the crdb_internal.node_{transaction,statement}_statistics tables, and then surface them in the crdb_internal.{transaction,statement}_statistics and system.{transaction,statement}_statistics tables under statistics->'execution_statistics'->'mvcc_iterator_stats'. I've also added some high-level comments that point to internal storage docs.
The PR should now resolve #77580, but not #72048. After this PR is merged, we'll need to properly document the meaning of these stats in the most common cases in our user-facing docs.
I believe the main impact of #64503 was in the collection phase, i.e. where we call recordIteratorStats in storage package where we allocate roachpb.ScanStats object on every call (and we can have thousands or more calls per query).
@yuzefovich Thank you for clarifying that for me 😁 . With that context, the benchmark results make more sense. Given that the stats are already collected, this PR should help surface them more easily, at little cost. I think these changes are RFAL at this point.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @sumeerbhola, and @yuzefovich)
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 24 of 24 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @sumeerbhola, and @yuzefovich)
pkg/sql/appstatspb/app_stats.proto line 342 at r2 (raw file):
// Internal storage iteration statistics. optional MvccIteratorStats mvcc_iterator_stats = 8 [(gogoproto.nullable) = false];
nit: MVCCIteratorStats, you can also add that as the custom name for this parameter
pkg/sql/appstatspb/app_stats.proto line 367 at r2 (raw file):
// Internal storage iteration statistics. message MvccIteratorStats {
nit: MVCCIteratorStats
pkg/sql/appstatspb/app_stats.proto line 369 at r2 (raw file):
message MvccIteratorStats { // StepCount collects the number of times the iterator moved forward or backward over the
these comments are great!
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):
}, "mvcc_iterator_stats": { "stepCount": {
nit: spacing seem off here, a lot of space compared to the line above
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):
}, "mvcc_iterator_stats": { "stepCount": {
nit: same as above
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):
}, "mvcc_iterator_stats": { "stepCount": {
nit: same as above
How would L2/on-call interpret the stats today and do we already have internal documentation/runbooks? Last thing we want is to have the stats unusable by advanced users so just want to confirm we're not missing any stats. And do we have a general idea of how we'd expose to the console yet? Happy to chat offline |
7cd0af5 to
e29e24f
Compare
ericharmeling
left a comment
There was a problem hiding this comment.
How would L2/on-call interpret the stats today and do we already have internal documentation/runbooks?
There are internal docs (code comments, really) in the storage package and in the pebble codebase. The comments I put on the protos for these stats point to those files. I do agree that we'll want to make some kind of runbook/internal wiki that will make this info a lot more accessible. We can base the user-facing docs on those.
Last thing we want is to have the stats unusable by advanced users so just want to confirm we're not missing any stats.
We are surfacing all stats that are being collected at the storage level, so I don't think we're missing any stats, but the docs are crucial to making these stats usable.
And do we have a general idea of how we'd expose to the console yet? Happy to chat offline
Off the top of my head, I think we can get the range stats for the database/table the stmt/txn is querying, and then report the % live data in that range, using the same query from the database details endpoint. We could compare this with the key + value bytes read by the scan, or just report it separate from the iterator stats. We should chat about this offline though and get some design input too.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @maryliag, @sumeerbhola, and @yuzefovich)
pkg/sql/appstatspb/app_stats.proto line 342 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: MVCCIteratorStats, you can also add that as the custom name for this parameter
Done.
pkg/sql/appstatspb/app_stats.proto line 367 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: MVCCIteratorStats
Done.
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: spacing seem off here, a lot of space compared to the line above
It looks like this comment is on an older revision. I believe this is fixed now.
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: same as above
Same as above
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: same as above
Same as above
maryliag
left a comment
There was a problem hiding this comment.
just a few nits from me
Reviewed 15 of 15 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @sumeerbhola, and @yuzefovich)
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 204 at r3 (raw file):
// "maxDiskUsage": { "$ref": "#/definitions/numeric_stats" }, // "cpuSQLNanos": { "$ref": "#/definitions/numeric_stats" }, // "mvcc_iterator_stats": { "$ref": "#/definitions/MVCCIteratorStats" }
nit: is this suppose to be camelCase?
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 215 at r3 (raw file):
// "maxDiskUsage", // "cpuSQLNanos", // "mvcc_iterator_stats"
nit: is this suppose to be camelCase?
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 371 at r3 (raw file):
// "maxDiskUsage": { "$ref": "#/definitions/numeric_stats" }, // "cpuSQLNanos": { "$ref": "#/definitions/numeric_stats" }, // "mvcc_iterator_stats": { "$ref": "#/definitions/MVCCIteratorStats" }
nit: same
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 381 at r3 (raw file):
// "maxDiskUsage", // "cpuSQLNanos", // "mvcc_iterator_stats"
nit: same
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_impl.go line 367 at r3 (raw file):
{"maxDiskUsage", (*numericStats)(&e.MaxDiskUsage)}, {"cpuSQLNanos", (*numericStats)(&e.CPUSQLNanos)}, {"mvcc_iterator_stats", (*iteratorStats)(&e.MVCCIteratorStats)},
nit: is this suppose to be camelCase?
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):
Previously, ericharmeling (Eric Harmeling) wrote…
It looks like this comment is on an older revision. I believe this is fixed now.
hm, I still see it (both on github and reviewable). Compare the stepCount and the mean from the one above, they're suppose to be on the same spacing
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):
Previously, ericharmeling (Eric Harmeling) wrote…
Same as above
same as above +1 😅
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):
Previously, ericharmeling (Eric Harmeling) wrote…
Same as above
+1
e29e24f to
89bc72e
Compare
Related to cockroachdb#72048. Fixes cockroachdb#77580. This commit adds storage iteration stats from tracing to internal `{statement,transaction}_statistics` tables. Release note (sql change): The internal `statement_statistics` and `transaction_statistics` tables now include sampled execution statistics on storage iteration.
89bc72e to
68b725a
Compare
ericharmeling
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @maryliag, @sumeerbhola, and @yuzefovich)
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 204 at r3 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: is this suppose to be camelCase?
Done.
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 215 at r3 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: is this suppose to be camelCase?
Done.
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_impl.go line 367 at r3 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: is this suppose to be camelCase?
Done.
pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
hm, I still see it (both on github and reviewable). Compare the
stepCountand themeanfrom the one above, they're suppose to be on the same spacing
You're right! It should be fixed now.
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @sumeerbhola, and @yuzefovich)
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @maryliag, and @yuzefovich)
pkg/sql/appstatspb/app_stats.proto line 370 at r4 (raw file):
} // Internal storage iteration statistics.
I made a quick pass here and it looks good
|
TFTRs! Tests are passing and I'll take the feedback as two approvals. I'm going to go ahead and bors. |
|
bors r+ |
|
Build succeeded: |


Related to #72048.
Fixes #77580.
This PR adds storage iteration stats from tracing to internal
{statement,transaction}_statisticstables.Release note (sql change): The internal
statement_statisticsandtransaction_statisticstables now include sampled execution statisticson storage iteration.