Skip to content

sql: implement SQL Stats flush logic#67866

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
Azhng:sqlstats-flush-logic
Aug 5, 2021
Merged

sql: implement SQL Stats flush logic#67866
craig[bot] merged 2 commits intocockroachdb:masterfrom
Azhng:sqlstats-flush-logic

Conversation

@Azhng
Copy link
Copy Markdown
Contributor

@Azhng Azhng commented Jul 21, 2021

Previous PR: #67805
Next Chained PR: #67090

First Commit

sql: remove count from stmt/txn stats system table

Previously, system.statement_statistics and
system.transaction_statistics table includes a count column
that corresponds to roachpb.StatementStatistics.Count and
roachpb.TransactionStatistics.Count fields respectively.
The objective for that column is to make
INSERT ON CONFLICT DO UPDATE style query easy. However,
since early prototyping have shown that
INSERT ON CONFLICT DO UPDATE style statement is quite inefficient,
the SQL Stats flush mechanism will be implemented using
separate queries INSERT and UPDATE statements.
This column is no longer userful and it would require special handling.
Removing this column simplifies the flush logic and removes the
need for special handlings.

Release note (sql change): count column is removed from
system.statement_statistics and system.transaction_statistics
tables.

Second Commit

sql: implement persistedsqlstats flush logic

This commit implements the initial flush logic of the
persisted sql stats subsystem.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@Azhng Azhng force-pushed the sqlstats-flush-logic branch 2 times, most recently from ae73e88 to 88a898f Compare July 21, 2021 20:40
@Azhng Azhng marked this pull request as ready for review July 21, 2021 22:04
@Azhng Azhng requested a review from a team July 21, 2021 22:05
@Azhng
Copy link
Copy Markdown
Contributor Author

Azhng commented Jul 21, 2021

CI Failure seems to be due to some unrelated changeccl tests.

@Azhng Azhng force-pushed the sqlstats-flush-logic branch from 88a898f to 23a00c1 Compare July 22, 2021 18:19
@Azhng Azhng changed the title sql: implement SQL Stats flush logc sql: implement SQL Stats flush logic Jul 22, 2021
@Azhng Azhng force-pushed the sqlstats-flush-logic branch 3 times, most recently from 19859e2 to 30a8330 Compare July 27, 2021 16:25
@Azhng Azhng requested a review from a team as a code owner July 27, 2021 16:25
@Azhng Azhng removed the request for review from a team July 27, 2021 16:36
Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)


pkg/sql/exec_util.go, line 843 at r2 (raw file):

		Unit:        metric.Unit_COUNT,
	}
	MetaSQLStatsFlushStarted = metric.Metadata{

note for later: let's consider adding this to a db console dashboard. cc @kevin-v-ngo


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 32 at r2 (raw file):

// during the flush will be logged as warning.
func (s *PersistedSQLStats) Flush(ctx context.Context) {
	log.Infof(ctx, "flushing %d sql stats after %s", s.SQLStats.GetTotalFingerprintCount(), timeutil.Since(s.lastFlushStarted))

It would be nice to also include the number of bytes in the sql stats, since that's a condition for flushing as well.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 52 at r2 (raw file):

	err := s.SQLStats.Reset(ctx)
	if err != nil {

minor nit: could collapse these 2 lines (if err := stats.reset(); err != nil {
log...
}


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):

	if rowsAffected == 0 {
		return errors.New("failed to update transaction statistics")

Should this be an assertion error? What does this mean? I guess that no rows were selected, which should not be possible since we did the read and found a row in the same txn.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):

	if rowsAffected == 0 {
		return errors.New("failed to update statement statistics")

ditto above


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 430 at r2 (raw file):

	  AND aggregated_ts = $3
    AND node_id = $4
FOR UPDATE

Technically I think this is not necessary since you've already tried to insert into it before... but I guess it doesn't matter either way. Maybe add a comment as to why the FOR UPDATE.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):

	if row == nil {
		return errors.New("transaction statistics not found")

include fingerprint id like you did below


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):

	if len(row) != 1 {
		return errors.AssertionFailedf("unexpected number of returning columns")

include unexpected number (yes, it seems impossible :))


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):

	if len(row) != 1 {
		return errors.AssertionFailedf("unexpected number of returning columns")

ditto above


pkg/sql/sqlstats/persistedsqlstats/provider.go, line 34 at r2 (raw file):

//  Once we are able get consistent hash value from a query plan, we should
//  update this.
const dummyPlanHash = int64(0)

cc @cucaroach - please talk with @Azhng about your ideas for that reconstructable serialized plan vs an integer hash to discuss tradeoffs / requirements. thanks!

tbg added a commit to tbg/cockroach that referenced this pull request Jul 28, 2021
craig bot pushed a commit that referenced this pull request Jul 29, 2021
68156: roachtest: fix replicagc-changed-peers (again) r=erikgrinaker a=tbg

I wrote buggy code. Since it isn't hit in every invocation, it slipped
through my local testing.

Fixes #68155
Fixes #68162

Release note: None


68157: CODEOWNERS: own catalog package to obs-inf-prs r=erikgrinaker a=tbg

Noticed in #67866.

Release note: None


68203: roachtest: avoid releasing alloc multiple times r=erikgrinaker a=tbg

As of #68103, existing problems in the cluster creation retry logic
are tickled reliably: on retry we are releasing an alloc twice,
resulting in a panic. Unfortunately, the alloc is acquired several
layers above the retry, so for now the best I can do is to avoid
releasing the alloc if we are bound for a retry.

The reason the above PR tickles it more reliably is because it
accidentally removed a sensible failfast: when the cluster already
exists, we shouldn't destroy & try to recreate it again (which
would then explode on the double-release anyway).

Release note: None


68224: roachtest: fix inconsistency r=erikgrinaker a=tbg

The way that test was passing the args no longer works.

Fixes #64806.

Release note: None


68226: roachtest: allow local runs with roachstress r=erikgrinaker a=tbg

It's appealing to use roachstress also as a tool to simply run
a roachtest, since it avoids having to think much about the flags
and incantation.

This commit adds a prompt to roachstress about running in local
mode. If selected, the binaries are built to target the local
architecture, and the builder is not invoked.

Release note: None


68227: roachtest: skip acceptance/gossip/restart-node-one r=erikgrinaker a=tbg

Touches #68107.

Release note: None


Co-authored-by: Tobias Grieger <tobias.schottdorf@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@Azhng Azhng force-pushed the sqlstats-flush-logic branch from 30a8330 to 5af11bd Compare August 3, 2021 16:15
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 32 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

It would be nice to also include the number of bytes in the sql stats, since that's a condition for flushing as well.

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Should this be an assertion error? What does this mean? I guess that no rows were selected, which should not be possible since we did the read and found a row in the same txn.

Yeah it should be impossible because the otherwise the INSERT statement before should succeed. Changed to errors.AssertionFailf


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto above

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 430 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Technically I think this is not necessary since you've already tried to insert into it before... but I guess it doesn't matter either way. Maybe add a comment as to why the FOR UPDATE.

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

include fingerprint id like you did below

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

include unexpected number (yes, it seems impossible :))

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto above

Done.

Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 27 of 27 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Yeah it should be impossible because the otherwise the INSERT statement before should succeed. Changed to errors.AssertionFailf

nit: there's a mismatch between number of args and number of fmt strings


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

nit: there's a mismatch between number of args and number of fmt strings


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

ditto


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

ditto


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):

Previously, Azhng (Archer Zhang) wrote…

Done.

ditto again

@Azhng Azhng force-pushed the sqlstats-flush-logic branch from 5af11bd to e932f17 Compare August 3, 2021 18:09
Copy link
Copy Markdown
Contributor Author

@Azhng Azhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 305 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: there's a mismatch between number of args and number of fmt strings

🤦 , oof. Thought Goland would have caught this. Fixed


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 355 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

nit: there's a mismatch between number of args and number of fmt strings

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 452 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 456 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto

Done.


pkg/sql/sqlstats/persistedsqlstats/flush.go, line 509 at r2 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

ditto again

Done.

@Azhng Azhng force-pushed the sqlstats-flush-logic branch from e932f17 to 0fa5321 Compare August 4, 2021 16:45
@Azhng
Copy link
Copy Markdown
Contributor Author

Azhng commented Aug 4, 2021

Rebased to resolve merge conflict. RFAL @jordanlewis

Copy link
Copy Markdown
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Azhng and @jordanlewis)


pkg/sql/catalog/systemschema/system.go, line 380 at r8 (raw file):

    aggregated_ts  TIMESTAMPTZ NOT NULL,
    fingerprint_id BYTES NOT NULL,
    plan_hash      INT NOT NULL,

Let's change all of the INT to INT8 to make sure we don't run into problems later if defaults change.

Azhng added 2 commits August 5, 2021 09:34
Previously, system.statement_statistics and
system.transaction_statistics table includes a `count` column
that corresponds to `roachpb.StatementStatistics.Count` and
`roachpb.TransactionStatistics.Count` fields respectively.
The objective for that column is to make
`INSERT ON CONFLICT DO UPDATE` style query easy. However,
since early prototyping have shown that
`INSERT ON CONFLICT DO UPDATE` style statement is quite inefficient,
the SQL Stats flush mechanism will be implemented using
separate queries INSERT and UPDATE statements.
This column is no longer userful and it would require special handling.
Removing this column simplifies the flush logic and removes the
need for special handlings.

Release note (sql change): count column is removed from
 system.statement_statistics and system.transaction_statistics
 tables.
This commit implements the initial flush logic of the
persisted sql stats subsystem.

Release note: None
@Azhng Azhng force-pushed the sqlstats-flush-logic branch from 0fa5321 to 8383509 Compare August 5, 2021 13:37
@Azhng
Copy link
Copy Markdown
Contributor Author

Azhng commented Aug 5, 2021

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 5, 2021

Build succeeded:

@craig craig bot merged commit 78a788f into cockroachdb:master Aug 5, 2021
craig bot pushed a commit that referenced this pull request Aug 10, 2021
…68660 #68661

67090: sql: periodically flush sqlstats r=Azhng a=Azhng

Previous PR: #67866

This commit introduces a new persisted sql stats subsystem
that wraps the existing node-local sql stats subsystem.
This new subsystem is responsible for flushing the in-meory
statistics into the system table periodically, or when it
detects memory pressure.
This replaces sql.Server's in-memory sqlStats provider.

Release note (sql change): SQL stats now can be persisted into
system.statement_statistics and system.transaction_statistics
tables by enabling the sql.stats.flush.enable cluster setting.
The interval of persistence is determined by the new
sql.stats.flush.interval cluster setting which defaults to 1 hour.

68195: streamingccl: allow stream ingestion processors to keep running on `GenerationEvent` r=annezhu98 a=annezhu98

Previously, a stream ingestion processor would shut down if it ever loses connection with its stream client. With generation support, the processor should not immediately move to draining state, instead, it should be in `StateRunning` to poll for cutover signal sent by the coordinator. Generation support will be implemented by the following PR: #67189

The first commit adds `GenerationEvent` as a valid event type that can be emitted over a cluster stream.
The second commit implements the mechanism that keeps processors running when losing connection with the client.

68288: changefeedccl: Propagate pushback throughout changefeed pipeline. r=ajwerner a=miretskiy

Propagate pushback information throughout changefeed pipeline.

The pushback propagation is accomplished by associating a `Resource`
with each event that is processed by changefeed system.  The resource is
propagated throughout changefeed, and is released when the event
has been written to the sink.

This change also updates and simplifies event memory accounting.
Prior to this PR, memory accounting was incomplete and was error prone.
This PR simplifies memory accounting by allocating resources once when
the event arrives (blocking buffer), and releasing resources
when event is written to the sink.  Dynamic modifications to the amount
of allocated memory are, in general, not safe (without additional
complexity).  To accommodate the fact that during event processing we
use more memory (e.g. to parse and convert this event), we over-allocate
resources to the the event.

Release Notes: Enterprise change; changefeed will slow down correctly
whenever there is a slow down in the system (i.e. downstream sink is
slow).

68444: multiregionccl: add zone config waiting for partitioned tables r=arulajmani a=pawalt

Previously, wait-for-zone-configs could only wait on tables, meaning we
couldn't wait for zone config changes to apply to REGIONAL BY ROW
partitions. This PR adds a `partition-name` flag to allow users to wait
on a specific partition.

Release note: None

A future PR will extend the tracing to allow us to trace REGIONAL BY ROW queries, but those changes combined with these will likely be too large for a single PR.

68596: authors: add pseudomuto to authors r=pseudomuto a=pseudomuto

Release note: None

68611: ci: move bazel build scripts to `build/teamcity` directory r=rail a=rickystewart

This gives a little more structure to what was previously a completely
flat `build` directory that contained one or more scripts for every
single build configuration.

When this PR is merged I'll move the affected build configurations to
final destinations in the `Cockroach / CI` subproject in TC.

Release note: None

68614: server: reduce flakiness of TestEnsureSQLStatsAreFlushedForTelemetry r=Azhng a=Azhng

Previously, this unit tests generates statements with the fingerprint
'INSERT INTO _ VALUES (_)'. However, this is a very common fingerprint
and can collide with actual statements issued by other subsystems, which
in turn cause test failure.
This commit change the test cases to use uncommon statements that should
not cause fingerprint collision

Resolves #66826

Release note: None

68637: kvserver/closedts: fix nil dereference in HTML generation r=aliher1911 a=erikgrinaker

The `/debug/closedts-{sender,receiver}` endpoints could panic due to a
nil dereference if the last update in the sidetransport buffer was
removed by the time it was rendered. This patch adds a `nil` check to
avoid that panic.

Release note (bug fix): Fixed a crash in the
`/debug/closedts-{sender,receiver}` advanced debug pages if the last
message of the closed timestamp side transport buffer was removed before
rendering.

68642: kvserver: deflake TestNonVoterCatchesUpViaRaftSnapshotQueue r=aayushshah15 a=aayushshah15

Fixes #68142

Release note: None

68660: authors: Add shiranka to authors r=samiskin a=samiskin



68661: add ding@ to AUTHORS r=davidwding a=davidwding



Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Anne Zhu <anne.zhu@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Peyton Walters <peyton.walters@cockroachlabs.com>
Co-authored-by: David Muto (pseudomuto) <david.muto@gmail.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com>
Co-authored-by: David Ding <ding@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants