sql: implement SQL Stats flush logic#67866
Conversation
ae73e88 to
88a898f
Compare
|
CI Failure seems to be due to some unrelated changeccl tests. |
88a898f to
23a00c1
Compare
19859e2 to
30a8330
Compare
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)
pkg/sql/exec_util.go, line 843 at r2 (raw file):
Unit: metric.Unit_COUNT, } MetaSQLStatsFlushStarted = metric.Metadata{
note for later: let's consider adding this to a db console dashboard. cc @kevin-v-ngo
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 32 at r2 (raw file):
// during the flush will be logged as warning. func (s *PersistedSQLStats) Flush(ctx context.Context) { log.Infof(ctx, "flushing %d sql stats after %s", s.SQLStats.GetTotalFingerprintCount(), timeutil.Since(s.lastFlushStarted))
It would be nice to also include the number of bytes in the sql stats, since that's a condition for flushing as well.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 52 at r2 (raw file):
err := s.SQLStats.Reset(ctx) if err != nil {
minor nit: could collapse these 2 lines (if err := stats.reset(); err != nil {
log...
}
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):
if rowsAffected == 0 { return errors.New("failed to update transaction statistics")
Should this be an assertion error? What does this mean? I guess that no rows were selected, which should not be possible since we did the read and found a row in the same txn.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):
if rowsAffected == 0 { return errors.New("failed to update statement statistics")
ditto above
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 430 at r2 (raw file):
AND aggregated_ts = $3 AND node_id = $4 FOR UPDATE
Technically I think this is not necessary since you've already tried to insert into it before... but I guess it doesn't matter either way. Maybe add a comment as to why the FOR UPDATE.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):
if row == nil { return errors.New("transaction statistics not found")
include fingerprint id like you did below
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):
if len(row) != 1 { return errors.AssertionFailedf("unexpected number of returning columns")
include unexpected number (yes, it seems impossible :))
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):
if len(row) != 1 { return errors.AssertionFailedf("unexpected number of returning columns")
ditto above
pkg/sql/sqlstats/persistedsqlstats/provider.go, line 34 at r2 (raw file):
// Once we are able get consistent hash value from a query plan, we should // update this. const dummyPlanHash = int64(0)
cc @cucaroach - please talk with @Azhng about your ideas for that reconstructable serialized plan vs an integer hash to discuss tradeoffs / requirements. thanks!
Noticed in cockroachdb#67866. Release note: None
68156: roachtest: fix replicagc-changed-peers (again) r=erikgrinaker a=tbg I wrote buggy code. Since it isn't hit in every invocation, it slipped through my local testing. Fixes #68155 Fixes #68162 Release note: None 68157: CODEOWNERS: own catalog package to obs-inf-prs r=erikgrinaker a=tbg Noticed in #67866. Release note: None 68203: roachtest: avoid releasing alloc multiple times r=erikgrinaker a=tbg As of #68103, existing problems in the cluster creation retry logic are tickled reliably: on retry we are releasing an alloc twice, resulting in a panic. Unfortunately, the alloc is acquired several layers above the retry, so for now the best I can do is to avoid releasing the alloc if we are bound for a retry. The reason the above PR tickles it more reliably is because it accidentally removed a sensible failfast: when the cluster already exists, we shouldn't destroy & try to recreate it again (which would then explode on the double-release anyway). Release note: None 68224: roachtest: fix inconsistency r=erikgrinaker a=tbg The way that test was passing the args no longer works. Fixes #64806. Release note: None 68226: roachtest: allow local runs with roachstress r=erikgrinaker a=tbg It's appealing to use roachstress also as a tool to simply run a roachtest, since it avoids having to think much about the flags and incantation. This commit adds a prompt to roachstress about running in local mode. If selected, the binaries are built to target the local architecture, and the builder is not invoked. Release note: None 68227: roachtest: skip acceptance/gossip/restart-node-one r=erikgrinaker a=tbg Touches #68107. Release note: None Co-authored-by: Tobias Grieger <tobias.schottdorf@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
30a8330 to
5af11bd
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 32 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
It would be nice to also include the number of bytes in the sql stats, since that's a condition for flushing as well.
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Should this be an assertion error? What does this mean? I guess that no rows were selected, which should not be possible since we did the read and found a row in the same txn.
Yeah it should be impossible because the otherwise the INSERT statement before should succeed. Changed to errors.AssertionFailf
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
ditto above
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 430 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Technically I think this is not necessary since you've already tried to insert into it before... but I guess it doesn't matter either way. Maybe add a comment as to why the
FOR UPDATE.
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
include fingerprint id like you did below
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
include unexpected number (yes, it seems impossible :))
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
ditto above
Done.
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewed 27 of 27 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Yeah it should be impossible because the otherwise the
INSERTstatement before should succeed. Changed toerrors.AssertionFailf
nit: there's a mismatch between number of args and number of fmt strings
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Done.
nit: there's a mismatch between number of args and number of fmt strings
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Done.
ditto
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Done.
ditto
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Done.
ditto again
5af11bd to
e932f17
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
nit: there's a mismatch between number of args and number of fmt strings
🤦 , oof. Thought Goland would have caught this. Fixed
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
nit: there's a mismatch between number of args and number of fmt strings
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
ditto
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
ditto
Done.
pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
ditto again
Done.
e932f17 to
0fa5321
Compare
|
Rebased to resolve merge conflict. RFAL @jordanlewis |
jordanlewis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)
pkg/sql/catalog/systemschema/system.go, line 380 at r8 (raw file):
aggregated_ts TIMESTAMPTZ NOT NULL, fingerprint_id BYTES NOT NULL, plan_hash INT NOT NULL,
Let's change all of the INT to INT8 to make sure we don't run into problems later if defaults change.
Previously, system.statement_statistics and system.transaction_statistics table includes a `count` column that corresponds to `roachpb.StatementStatistics.Count` and `roachpb.TransactionStatistics.Count` fields respectively. The objective for that column is to make `INSERT ON CONFLICT DO UPDATE` style query easy. However, since early prototyping have shown that `INSERT ON CONFLICT DO UPDATE` style statement is quite inefficient, the SQL Stats flush mechanism will be implemented using separate queries INSERT and UPDATE statements. This column is no longer userful and it would require special handling. Removing this column simplifies the flush logic and removes the need for special handlings. Release note (sql change): count column is removed from system.statement_statistics and system.transaction_statistics tables.
This commit implements the initial flush logic of the persisted sql stats subsystem. Release note: None
0fa5321 to
8383509
Compare
|
TFTR! bors r+ |
|
Build succeeded: |
…68660 #68661 67090: sql: periodically flush sqlstats r=Azhng a=Azhng Previous PR: #67866 This commit introduces a new persisted sql stats subsystem that wraps the existing node-local sql stats subsystem. This new subsystem is responsible for flushing the in-meory statistics into the system table periodically, or when it detects memory pressure. This replaces sql.Server's in-memory sqlStats provider. Release note (sql change): SQL stats now can be persisted into system.statement_statistics and system.transaction_statistics tables by enabling the sql.stats.flush.enable cluster setting. The interval of persistence is determined by the new sql.stats.flush.interval cluster setting which defaults to 1 hour. 68195: streamingccl: allow stream ingestion processors to keep running on `GenerationEvent` r=annezhu98 a=annezhu98 Previously, a stream ingestion processor would shut down if it ever loses connection with its stream client. With generation support, the processor should not immediately move to draining state, instead, it should be in `StateRunning` to poll for cutover signal sent by the coordinator. Generation support will be implemented by the following PR: #67189 The first commit adds `GenerationEvent` as a valid event type that can be emitted over a cluster stream. The second commit implements the mechanism that keeps processors running when losing connection with the client. 68288: changefeedccl: Propagate pushback throughout changefeed pipeline. r=ajwerner a=miretskiy Propagate pushback information throughout changefeed pipeline. The pushback propagation is accomplished by associating a `Resource` with each event that is processed by changefeed system. The resource is propagated throughout changefeed, and is released when the event has been written to the sink. This change also updates and simplifies event memory accounting. Prior to this PR, memory accounting was incomplete and was error prone. This PR simplifies memory accounting by allocating resources once when the event arrives (blocking buffer), and releasing resources when event is written to the sink. Dynamic modifications to the amount of allocated memory are, in general, not safe (without additional complexity). To accommodate the fact that during event processing we use more memory (e.g. to parse and convert this event), we over-allocate resources to the the event. Release Notes: Enterprise change; changefeed will slow down correctly whenever there is a slow down in the system (i.e. downstream sink is slow). 68444: multiregionccl: add zone config waiting for partitioned tables r=arulajmani a=pawalt Previously, wait-for-zone-configs could only wait on tables, meaning we couldn't wait for zone config changes to apply to REGIONAL BY ROW partitions. This PR adds a `partition-name` flag to allow users to wait on a specific partition. Release note: None A future PR will extend the tracing to allow us to trace REGIONAL BY ROW queries, but those changes combined with these will likely be too large for a single PR. 68596: authors: add pseudomuto to authors r=pseudomuto a=pseudomuto Release note: None 68611: ci: move bazel build scripts to `build/teamcity` directory r=rail a=rickystewart This gives a little more structure to what was previously a completely flat `build` directory that contained one or more scripts for every single build configuration. When this PR is merged I'll move the affected build configurations to final destinations in the `Cockroach / CI` subproject in TC. Release note: None 68614: server: reduce flakiness of TestEnsureSQLStatsAreFlushedForTelemetry r=Azhng a=Azhng Previously, this unit tests generates statements with the fingerprint 'INSERT INTO _ VALUES (_)'. However, this is a very common fingerprint and can collide with actual statements issued by other subsystems, which in turn cause test failure. This commit change the test cases to use uncommon statements that should not cause fingerprint collision Resolves #66826 Release note: None 68637: kvserver/closedts: fix nil dereference in HTML generation r=aliher1911 a=erikgrinaker The `/debug/closedts-{sender,receiver}` endpoints could panic due to a nil dereference if the last update in the sidetransport buffer was removed by the time it was rendered. This patch adds a `nil` check to avoid that panic. Release note (bug fix): Fixed a crash in the `/debug/closedts-{sender,receiver}` advanced debug pages if the last message of the closed timestamp side transport buffer was removed before rendering. 68642: kvserver: deflake TestNonVoterCatchesUpViaRaftSnapshotQueue r=aayushshah15 a=aayushshah15 Fixes #68142 Release note: None 68660: authors: Add shiranka to authors r=samiskin a=samiskin 68661: add ding@ to AUTHORS r=davidwding a=davidwding Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Anne Zhu <anne.zhu@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Peyton Walters <peyton.walters@cockroachlabs.com> Co-authored-by: David Muto (pseudomuto) <david.muto@gmail.com> Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com> Co-authored-by: David Ding <ding@cockroachlabs.com>
Previous PR: #67805
Next Chained PR: #67090
First Commit
sql: remove
countfrom stmt/txn stats system tablePreviously, system.statement_statistics and
system.transaction_statistics table includes a
countcolumnthat corresponds to
roachpb.StatementStatistics.Countandroachpb.TransactionStatistics.Countfields respectively.The objective for that column is to make
INSERT ON CONFLICT DO UPDATEstyle query easy. However,since early prototyping have shown that
INSERT ON CONFLICT DO UPDATEstyle statement is quite inefficient,the SQL Stats flush mechanism will be implemented using
separate queries INSERT and UPDATE statements.
This column is no longer userful and it would require special handling.
Removing this column simplifies the flush logic and removes the
need for special handlings.
Release note (sql change): count column is removed from
system.statement_statistics and system.transaction_statistics
tables.
Second Commit
sql: implement persistedsqlstats flush logic
This commit implements the initial flush logic of the
persisted sql stats subsystem.
Release note: None