Skip to content

sql: add storage iteration stats to fingerprint stats tables#96016

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
ericharmeling:iterator-stats-in-statement-statistics
Feb 13, 2023
Merged

sql: add storage iteration stats to fingerprint stats tables#96016
craig[bot] merged 1 commit intocockroachdb:masterfrom
ericharmeling:iterator-stats-in-statement-statistics

Conversation

@ericharmeling
Copy link
Copy Markdown

@ericharmeling ericharmeling commented Jan 26, 2023

Related to #72048.
Fixes #77580.

This PR adds storage iteration stats from tracing to internal
{statement,transaction}_statistics tables.

Release note (sql change): The internal statement_statistics and
transaction_statistics tables now include sampled execution statistics
on storage iteration.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ericharmeling
Copy link
Copy Markdown
Author

Re: #71351, here are some tpcch_concurrency benchmark results on 80de3db (with the default sample rate 0.01):

79, 82, 99, 94, 78

Comparing this with the nightly tpcch_concurrency on master roachperf that's been hovering around 80 [60,100] since 10/2022, I don't see a lot of evidence to suggest that adding additional fields to exec stats is having much of an added impact on memory usage beyond the regression already reported in #71351 due to exec stats collection.

As a result, I think we might want to move forward with the first approach laid out in 72048 (or something similar):

  1. Add all iterator stats sampled from tracing to exec stats (i.e., resolve sql: surface pebble internal iterator stats for scans #77580).
  2. Add status endpoint functions to process relevant iterator stats (i.e., PointCount/PointsCoveredByRangeTombstones) for exposure to the statement_statistics and transaction_statistics tables.
  3. Add new column(s) to SQL Activity page for relevant storage stats.

@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch 3 times, most recently from 36f8b73 to dd8a6b4 Compare January 30, 2023 14:02
@ericharmeling ericharmeling marked this pull request as ready for review January 30, 2023 18:02
@ericharmeling ericharmeling requested review from a team January 30, 2023 18:02
@ericharmeling ericharmeling requested a review from a team as a code owner January 30, 2023 18:02
@ericharmeling ericharmeling requested a review from a team January 30, 2023 18:02
@ericharmeling ericharmeling requested a review from a team as a code owner January 30, 2023 18:02
@ericharmeling ericharmeling requested review from kevin-v-ngo and removed request for a team January 30, 2023 18:03
Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the level of detail is great, it would probably be too overwhelming for users and hard to understand. The end goal is to show % of live data, so you don't need to be too granular to what the MVCC is. Create just one value and add all the ones from your list

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo and @yuzefovich)

@ericharmeling ericharmeling marked this pull request as draft February 1, 2023 18:08
@yuzefovich
Copy link
Copy Markdown
Member

Re: benchmarking of tpch_concurrency: the fact that this PR doesn't change the numbers is expected - adding already-collected information into new fields in the protos is cheap. I believe the main impact of #64503 was in the collection phase, i.e. where we call recordIteratorStats in storage package where we allocate roachpb.ScanStats object on every call (and we can have thousands or more calls per query).

I see that this PR is marked as draft, so I haven't reviewed it. Please ping me when it is RFAL.

I also agree with Marylia that this level of detail might be too much (I'd guess even most CRDB engineers wouldn't be able to meaningfully utilize the newly-added fields). Perhaps it's worth collaborating with someone from the storage team to come up with a handful of useful statistics there. I think recently a large comment about these details was merged somewhere in the codebase.

@ericharmeling ericharmeling removed request for a team February 7, 2023 15:28
@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch from dd8a6b4 to 9bcc5e4 Compare February 7, 2023 19:12
@ericharmeling ericharmeling changed the title sql: add pebble iterator stats to ExecStats, surface in fingerprint stats tables sql: surface storage iteration statistics in fingerprint statistics tables Feb 7, 2023
@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch 3 times, most recently from 4e028a9 to 7cd0af5 Compare February 8, 2023 17:41
@ericharmeling ericharmeling changed the title sql: surface storage iteration statistics in fingerprint statistics tables sql: add storage iteration stats to fingerprint stats tables Feb 8, 2023
@ericharmeling ericharmeling marked this pull request as ready for review February 8, 2023 18:39
Copy link
Copy Markdown
Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback!

I briefly chatted with @sumeerbhola, and then with @kevin-v-ngo, and @maryliag, and it looks like the best path forward is to expose all of the storage iteration stats in the internal execution statistics tables where advanced users/L2s can find them, and then provide documentation on how to interpret the stats. In a future PR, we can pare down the stats or derive some other statistic in the status server, for exposure to the Console.

I've updated this PR to add the stats to the crdb_internal.node_{transaction,statement}_statistics tables, and then surface them in the crdb_internal.{transaction,statement}_statistics and system.{transaction,statement}_statistics tables under statistics->'execution_statistics'->'mvcc_iterator_stats'. I've also added some high-level comments that point to internal storage docs.

The PR should now resolve #77580, but not #72048. After this PR is merged, we'll need to properly document the meaning of these stats in the most common cases in our user-facing docs.

I believe the main impact of #64503 was in the collection phase, i.e. where we call recordIteratorStats in storage package where we allocate roachpb.ScanStats object on every call (and we can have thousands or more calls per query).

@yuzefovich Thank you for clarifying that for me 😁 . With that context, the benchmark results make more sense. Given that the stats are already collected, this PR should help surface them more easily, at little cost. I think these changes are RFAL at this point.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @sumeerbhola, and @yuzefovich)

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 24 of 24 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @sumeerbhola, and @yuzefovich)


pkg/sql/appstatspb/app_stats.proto line 342 at r2 (raw file):

  // Internal storage iteration statistics.
  optional MvccIteratorStats mvcc_iterator_stats = 8 [(gogoproto.nullable) = false];

nit: MVCCIteratorStats, you can also add that as the custom name for this parameter


pkg/sql/appstatspb/app_stats.proto line 367 at r2 (raw file):

// Internal storage iteration statistics.
message MvccIteratorStats {

nit: MVCCIteratorStats


pkg/sql/appstatspb/app_stats.proto line 369 at r2 (raw file):

message MvccIteratorStats {

  // StepCount collects the number of times the iterator moved forward or backward over the

these comments are great!


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):

         },
         "mvcc_iterator_stats": {
					 "stepCount": {

nit: spacing seem off here, a lot of space compared to the line above


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):

         },
         "mvcc_iterator_stats": {
					 "stepCount": {

nit: same as above


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):

    },
    "mvcc_iterator_stats": {
			"stepCount": {

nit: same as above

@kevin-v-ngo
Copy link
Copy Markdown

then provide documentation on how to interpret the stats. In a future PR, exposure to the Console.

How would L2/on-call interpret the stats today and do we already have internal documentation/runbooks? Last thing we want is to have the stats unusable by advanced users so just want to confirm we're not missing any stats.

And do we have a general idea of how we'd expose to the console yet? Happy to chat offline

@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch from 7cd0af5 to e29e24f Compare February 9, 2023 16:45
Copy link
Copy Markdown
Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would L2/on-call interpret the stats today and do we already have internal documentation/runbooks?

There are internal docs (code comments, really) in the storage package and in the pebble codebase. The comments I put on the protos for these stats point to those files. I do agree that we'll want to make some kind of runbook/internal wiki that will make this info a lot more accessible. We can base the user-facing docs on those.

Last thing we want is to have the stats unusable by advanced users so just want to confirm we're not missing any stats.

We are surfacing all stats that are being collected at the storage level, so I don't think we're missing any stats, but the docs are crucial to making these stats usable.

And do we have a general idea of how we'd expose to the console yet? Happy to chat offline

Off the top of my head, I think we can get the range stats for the database/table the stmt/txn is querying, and then report the % live data in that range, using the same query from the database details endpoint. We could compare this with the key + value bytes read by the scan, or just report it separate from the iterator stats. We should chat about this offline though and get some design input too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @maryliag, @sumeerbhola, and @yuzefovich)


pkg/sql/appstatspb/app_stats.proto line 342 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: MVCCIteratorStats, you can also add that as the custom name for this parameter

Done.


pkg/sql/appstatspb/app_stats.proto line 367 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: MVCCIteratorStats

Done.


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: spacing seem off here, a lot of space compared to the line above

It looks like this comment is on an older revision. I believe this is fixed now.


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: same as above

Same as above


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: same as above

Same as above

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few nits from me

Reviewed 15 of 15 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @sumeerbhola, and @yuzefovich)


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 204 at r3 (raw file):

//	        "maxDiskUsage":    { "$ref": "#/definitions/numeric_stats" },
//	        "cpuSQLNanos":     { "$ref": "#/definitions/numeric_stats" },
//	        "mvcc_iterator_stats": { "$ref": "#/definitions/MVCCIteratorStats" }

nit: is this suppose to be camelCase?


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 215 at r3 (raw file):

//	        "maxDiskUsage",
//	        "cpuSQLNanos",
//	        "mvcc_iterator_stats"

nit: is this suppose to be camelCase?


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 371 at r3 (raw file):

//	        "maxDiskUsage":    { "$ref": "#/definitions/numeric_stats" },
//	        "cpuSQLNanos":     { "$ref": "#/definitions/numeric_stats" },
//	        "mvcc_iterator_stats": { "$ref": "#/definitions/MVCCIteratorStats" }

nit: same


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 381 at r3 (raw file):

//	        "maxDiskUsage",
//	        "cpuSQLNanos",
//	        "mvcc_iterator_stats"

nit: same


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_impl.go line 367 at r3 (raw file):

		{"maxDiskUsage", (*numericStats)(&e.MaxDiskUsage)},
		{"cpuSQLNanos", (*numericStats)(&e.CPUSQLNanos)},
		{"mvcc_iterator_stats", (*iteratorStats)(&e.MVCCIteratorStats)},

nit: is this suppose to be camelCase?


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

It looks like this comment is on an older revision. I believe this is fixed now.

hm, I still see it (both on github and reviewable). Compare the stepCount and the mean from the one above, they're suppose to be on the same spacing
Screen Shot 2023-02-09 at 11.52.20 AM.png
Screen Shot 2023-02-09 at 11.52.28 AM.png


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 325 at r2 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Same as above

same as above +1 😅


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 495 at r2 (raw file):

Previously, ericharmeling (Eric Harmeling) wrote…

Same as above

+1

@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch from e29e24f to 89bc72e Compare February 9, 2023 18:09
Related to cockroachdb#72048.
Fixes cockroachdb#77580.

This commit adds storage iteration stats from tracing to internal
`{statement,transaction}_statistics` tables.

Release note (sql change): The internal `statement_statistics` and
`transaction_statistics` tables now include sampled execution statistics
on storage iteration.
@ericharmeling ericharmeling force-pushed the iterator-stats-in-statement-statistics branch from 89bc72e to 68b725a Compare February 9, 2023 18:28
Copy link
Copy Markdown
Author

@ericharmeling ericharmeling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @maryliag, @sumeerbhola, and @yuzefovich)


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 204 at r3 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: is this suppose to be camelCase?

Done.


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding.go line 215 at r3 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: is this suppose to be camelCase?

Done.


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_impl.go line 367 at r3 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

nit: is this suppose to be camelCase?

Done.


pkg/sql/sqlstats/persistedsqlstats/sqlstatsutil/json_encoding_test.go line 141 at r2 (raw file):

Previously, maryliag (Marylia Gutierrez) wrote…

hm, I still see it (both on github and reviewable). Compare the stepCount and the mean from the one above, they're suppose to be on the same spacing
Screen Shot 2023-02-09 at 11.52.20 AM.png
Screen Shot 2023-02-09 at 11.52.28 AM.png

You're right! It should be fixed now.

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kevin-v-ngo, @sumeerbhola, and @yuzefovich)

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ericharmeling, @kevin-v-ngo, @maryliag, and @yuzefovich)


pkg/sql/appstatspb/app_stats.proto line 370 at r4 (raw file):

}

// Internal storage iteration statistics.

I made a quick pass here and it looks good

@ericharmeling
Copy link
Copy Markdown
Author

TFTRs! Tests are passing and I'll take the feedback as two approvals. I'm going to go ahead and bors.

@ericharmeling
Copy link
Copy Markdown
Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 13, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: surface pebble internal iterator stats for scans

6 participants