server: flush SQL stats during drain#76397
Conversation
69058a5 to
e3b2a88
Compare
2482acf to
9bcb617
Compare
5cd3e7c to
dd808f9
Compare
Azhng
left a comment
There was a problem hiding this comment.
Thank you for working on this! I have a quick question. This PR seems like a very explicit fix for #74413. So is the plan to have a more general solution for #72045 eventually?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/server/drain_test.go, line 143 at r1 (raw file):
// Check that in-memory data was flushed into system tables during the drain. // Verify that the statement stats are in the reported stats pool. stats, err = ts.GetScrubbedReportingStats(ctx)
the reporting stats pool is an in-memory pool, so this means GetScrubbedReportingStats() doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.
|
@Azhng thanks for the review! I was actually wondering are there other use cases for which a more general solution is necessary? |
7d0b350 to
9e0dc93
Compare
|
RFAL |
Azhng
left a comment
There was a problem hiding this comment.
Good question. As of now, within SQL Observability, SQL Stats is the only subsystem that require this hook. Though as we build up our observability stack, we will have more similar subsystems in the future that requires this type of hook.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/server/drain_test.go, line 140 at r2 (raw file):
// Issue a drain. drainCtx.sendDrainNoShutdown()
nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.
pkg/server/drain_test.go, line 149 at r2 (raw file):
require.NotEqualf(t, func() int { q := sqlDB.Query(t, `SELECT count(*) FROM system.statement_statistics WHERE node_id = 1`)
nit: sqlDB.CheckQueryResults can come in pretty handy here :)
9e0dc93 to
e5e1609
Compare
cameronnunez
left a comment
There was a problem hiding this comment.
Gotcha yeah I think it makes sense to eventually get to the general solution then.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @knz)
pkg/server/drain_test.go, line 143 at r1 (raw file):
Previously, Azhng (Archer Zhang) wrote…
the reporting stats pool is an in-memory pool, so this means
GetScrubbedReportingStats()doesn't actually read from the system table at all. My preference here would be a bit explicit and to ensure that the stats is flushed into system by reading them back. This is because the flush is only done in the best-effort basis and if it the flush fails, we simply just log a warning and it won't be retried. So this means it is possible for the stats to end up in the reporting pool but not in the system table.
Done.
pkg/server/drain_test.go, line 140 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit: i would do a sanity check before we drain the node to ensure that we have nothing in the system table.
Done.
pkg/server/drain_test.go, line 149 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
nit:
sqlDB.CheckQueryResultscan come in pretty handy here :)
Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?
e5e1609 to
b4036e5
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @knz)
pkg/server/drain_test.go, line 149 at r2 (raw file):
Previously, cameronnunez (Cameron Nunez) wrote…
Would've gone that route but figured it would be best just to check for the count being non-zero so as to future-proof this test; if we record more stats in the future, this test would fail because the count would increase. What're your thoughts on this?
Hmm how about SELECT count(*) > 0 FROM ... and then just assert the output is true in the CheckQueryResults ?
Previously, SQL stats would be lost when a node drains. Now a drain triggers a flush of the SQL stats into the statement statistics system table while the SQL layer is being drained. Release note (cli change): a drain of node now ensures that SQL statistics are not lost during the process; they are now preserved in the statement statistics system table.
b4036e5 to
56ff1ac
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewed 2 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez)
|
TFYRs! bors r=knz,Azhng |
|
Build succeeded: |
Fixes #72045.
Fixes #74413.
Previously, SQL stats would be lost when a node drains. Now a drain
triggers a flush of the SQL stats into the statement statistics
system table while the SQL layer is being drained.
Release note (cli change): a drain of node now ensures that
SQL statistics are not lost during the process; they are now
preserved in the statement statistics system table.