server: flush SQL stats during drain by cameronnunez · Pull Request #76397 · cockroachdb/cockroach

cameronnunez · 2022-02-10T21:14:57Z

Fixes #72045.
Fixes #74413.

Previously, SQL stats would be lost when a node drains. Now a drain
triggers a flush of the SQL stats into the statement statistics
system table while the SQL layer is being drained.

Release note (cli change): a drain of node now ensures that
SQL statistics are not lost during the process; they are now
preserved in the statement statistics system table.

cockroach-teamcity · 2022-02-10T21:15:05Z

This change is

Azhng

Thank you for working on this! I have a quick question. This PR seems like a very explicit fix for #74413. So is the plan to have a more general solution for #72045 eventually?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)

pkg/server/drain_test.go, line 143 at r1 (raw file):

	// Check that in-memory data was flushed into system tables during the drain.
	// Verify that the statement stats are in the reported stats pool.
	stats, err = ts.GetScrubbedReportingStats(ctx)

the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats() doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.

cameronnunez · 2022-02-15T18:08:33Z

@Azhng thanks for the review! I was actually wondering are there other use cases for which a more general solution is necessary?

cameronnunez · 2022-02-15T21:07:39Z

RFAL

Azhng

Good question. As of now, within SQL Observability, SQL Stats is the only subsystem that require this hook. Though as we build up our observability stack, we will have more similar subsystems in the future that requires this type of hook.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)

pkg/server/drain_test.go, line 140 at r2 (raw file):

	// Issue a drain.
	drainCtx.sendDrainNoShutdown()

nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.

pkg/server/drain_test.go, line 149 at r2 (raw file):

	require.NotEqualf(t,
		func() int {
			q := sqlDB.Query(t, `SELECT count(*) FROM system.statement_statistics WHERE node_id = 1`)

nit: sqlDB.CheckQueryResults can come in pretty handy here :)

cameronnunez

Gotcha yeah I think it makes sense to eventually get to the general solution then.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @knz)

pkg/server/drain_test.go, line 143 at r1 (raw file):

Previously, Azhng (Archer Zhang) wrote…

the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats() doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.

Done.

pkg/server/drain_test.go, line 140 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.

Done.

pkg/server/drain_test.go, line 149 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

nit: sqlDB.CheckQueryResults can come in pretty handy here :)

Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?

Azhng

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @knz)

pkg/server/drain_test.go, line 149 at r2 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?

Hmm how about SELECT count(*) > 0 FROM ... and then just assert the output is true in the CheckQueryResults ?

Previously, SQL stats would be lost when a node drains. Now a drain triggers a flush of the SQL stats into the statement statistics system table while the SQL layer is being drained. Release note (cli change): a drain of node now ensures that SQL statistics are not lost during the process; they are now preserved in the statement statistics system table.

knz

simple, elegant, wow, amaze 🐶

Reviewed 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez)

cameronnunez · 2022-02-17T17:14:45Z

TFYRs!

bors r=knz,Azhng

craig · 2022-02-17T19:31:44Z

Build succeeded:

GitHub CI (Cockroach)

cameronnunez requested a review from a team as a code owner February 10, 2022 21:14

cameronnunez marked this pull request as draft February 10, 2022 21:15

cameronnunez force-pushed the flush-sql-stats-drain branch 4 times, most recently from 69058a5 to e3b2a88 Compare February 10, 2022 21:34

cameronnunez changed the title ~~server: flush SQL stats during drain~~ [WIP] server: flush SQL stats during drain Feb 10, 2022

cameronnunez force-pushed the flush-sql-stats-drain branch 2 times, most recently from 2482acf to 9bcb617 Compare February 14, 2022 20:10

cameronnunez marked this pull request as ready for review February 14, 2022 20:13

cameronnunez changed the title ~~[WIP] server: flush SQL stats during drain~~ server: flush SQL stats during drain Feb 14, 2022

cameronnunez force-pushed the flush-sql-stats-drain branch 2 times, most recently from 5cd3e7c to dd808f9 Compare February 15, 2022 14:19

cameronnunez requested review from Azhng and knz February 15, 2022 14:22

Azhng reviewed Feb 15, 2022

View reviewed changes

cameronnunez force-pushed the flush-sql-stats-drain branch 6 times, most recently from 7d0b350 to 9e0dc93 Compare February 15, 2022 21:06

cameronnunez requested a review from Azhng February 15, 2022 21:07

Azhng reviewed Feb 16, 2022

View reviewed changes

cameronnunez force-pushed the flush-sql-stats-drain branch from 9e0dc93 to e5e1609 Compare February 16, 2022 21:24

cameronnunez commented Feb 16, 2022

View reviewed changes

cameronnunez force-pushed the flush-sql-stats-drain branch from e5e1609 to b4036e5 Compare February 16, 2022 21:31

cameronnunez requested a review from Azhng February 16, 2022 21:32

Azhng approved these changes Feb 16, 2022

View reviewed changes

cameronnunez force-pushed the flush-sql-stats-drain branch from b4036e5 to 56ff1ac Compare February 16, 2022 22:07

knz approved these changes Feb 17, 2022

View reviewed changes

craig bot merged commit f2a722f into cockroachdb:master Feb 17, 2022

cockroach-teamcity mentioned this pull request Feb 17, 2022

server: flush SQL stats during drain cockroachdb/docs#13057

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: flush SQL stats during drain#76397

server: flush SQL stats during drain#76397
craig[bot] merged 1 commit intocockroachdb:masterfrom
cameronnunez:flush-sql-stats-drain

cameronnunez commented Feb 10, 2022 •

edited by knz

Loading

Uh oh!

cockroach-teamcity commented Feb 10, 2022

Uh oh!

Azhng left a comment

Uh oh!

cameronnunez commented Feb 15, 2022

Uh oh!

cameronnunez commented Feb 15, 2022

Uh oh!

Azhng left a comment

Uh oh!

cameronnunez left a comment

Uh oh!

Azhng left a comment

Uh oh!

knz left a comment

Uh oh!

cameronnunez commented Feb 17, 2022

Uh oh!

craig bot commented Feb 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cameronnunez commented Feb 10, 2022 • edited by knz Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Feb 10, 2022

Uh oh!

Azhng left a comment

Choose a reason for hiding this comment

Uh oh!

cameronnunez commented Feb 15, 2022

Uh oh!

cameronnunez commented Feb 15, 2022

Uh oh!

Azhng left a comment

Choose a reason for hiding this comment

Uh oh!

cameronnunez left a comment

Choose a reason for hiding this comment

Uh oh!

Azhng left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

cameronnunez commented Feb 17, 2022

Uh oh!

craig bot commented Feb 17, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cameronnunez commented Feb 10, 2022 •

edited by knz

Loading