sql: ensure SQL Stats cleanup job to not interfere with foreground traffic#80515
sql: ensure SQL Stats cleanup job to not interfere with foreground traffic#80515craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
544d788 to
67f4b62
Compare
67f4b62 to
4fc717f
Compare
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 3 of 5 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @mgartner)
pkg/sql/sqlstats/persistedsqlstats/compaction_exec.go line 168 at r1 (raw file):
for remainToBeRemoved := rowsToRemove; remainToBeRemoved > 0; { if len(lastDeletedRow) > 0 { stmt = ops.subsequentDeleteStmt
I think it'd simplify things to make a method on cleanupOperations (or maybe just a method of StatsCompactor, do you need the cleanupOperations struct?) that returns the stmt to run, given lastDeletedRow. It would return the unconstrained query if lastDeletedRow was empty.
pkg/sql/sqlstats/persistedsqlstats/compaction_exec.go line 224 at r1 (raw file):
func getQargs(shardIdx, limit int64, lastDeletedRow tree.Datums) []interface{} { qargs := make([]interface{}, 0, len(lastDeletedRow)+2)
nit: you could make this a field of StatsCompactor so it can be reused to avoid allocations.
pkg/sql/sqlstats/persistedsqlstats/query_utils.go line 41 at r1 (raw file):
deleteFromStmtStatsWithWideScan = ` DELETE FROM system.statement_statistics
It would be nice to have some tests that verified the query plans of these statements. We typically keep those type of tests in pkg/sql/opt/exec/execbuilder/testdata/, but I'm not sure of a good way to ensure that any queries we put there stay consistent with the queries here... I suppose once we have UDFs we can define these in a single place and reference them here and in execbuilder tests.
pkg/sql/sqlstats/persistedsqlstats/query_utils.go line 105 at r1 (raw file):
} func (c *StatsCompactor) cleanupOpForTable(tableName string) *cleanupOperations {
In most cases I think it's best to define methods of a struct in a single place. Any particular reason for creating a new file here?
pkg/sql/sqlstats/persistedsqlstats/query_utils.go line 108 at r1 (raw file):
switch tableName { case "system.statement_statistics": return &cleanupOperations{
Might be simpler to make this singleton vars and eliminate this function altogether. You could also define the SQL string directly in the structs so the unconstrained and constrained queries could be more compared side-by-side.
|
Previously, mgartner (Marcus Gartner) wrote…
We could define these as builtins that run the internal executor, like cockroach/pkg/sql/sem/builtins/pg_builtins.go Lines 910 to 943 in cac13b8 |
|
I'm realizing that some of my comments are duplicated from the ones I made in #80341. I thought that reviewable had lost them. Next time you have a PR stacked on-top of another that's in review, can you post a link to them in a comment so it's easy to find the other one? |
4fc717f to
662d203
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/sqlstats/persistedsqlstats/query_utils.go line 41 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
We could define these as builtins that run the internal executor, like
, but I'm skeptical that this is a pattern we want to repeat.cockroach/pkg/sql/sem/builtins/pg_builtins.go
Lines 910 to 943 in cac13b8
To clarify, is this something I should implement for this commit or we should wait for the implementation of UDF?
|
Previously, Azhng (Archer Zhang) wrote…
I was trying to think of ways we can notice and prevent query plan regressions for these delete queries before shipping them. I think it'd be wise to add these queries to a new file |
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 80 at r4 (raw file):
estimated row count: 9 (missing stats) table: statement_statistics@primary spans
@mgartner for some reason the spans here is not rendered at all when i run this test with -rewrite flag. Any thoughts on how I should fix this?
pkg/sql/sqlstats/persistedsqlstats/query_utils.go line 41 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I was trying to think of ways we can notice and prevent query plan regressions for these delete queries before shipping them. I think it'd be wise to add these queries to a new file
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compactionso we can view the query plans and make sure they are still efficient. This should also help prevent query plan regressions as the optimizer evolves. I'd leave notes both above the definition of these queries to make it clear that the tests you add inexecbuilderneed to be updated if the queries change, and a note above the tests pointing to where they are defined.
Test added
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 80 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
@mgartner for some reason the
spanshere is not rendered at all when i run this test with-rewriteflag. Any thoughts on how I should fix this?
Or perhaps it's fine to ignore? Since unit test already check for the scan spans through KV request filters
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 30 at r4 (raw file):
AND aggregated_ts < date_trunc('hour', now()) ORDER BY aggregated_ts ASC LIMIT 10
nit: Let's use the default batch size for the LIMIT here so that we mimic the default query as closely as possible.
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 80 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Or perhaps it's fine to ignore? Since unit test already check for the scan spans through KV request filters
The spans aren't rendered because there aren't any - the optimizer has determined there is a contradiction in the query filter (and for reasons not super important here, the scan cannot be reduced to an empty values operator). So this delete statement won't delete any rows. We should not ignore this - it seems like a serious issue.
I'm trying to figure out what the contradiction is, but no luck yet. Let me know if you think of anything.
|
Previously, mgartner (Marcus Gartner) wrote…
Ahh, the contradiction is from the filters on |
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 30 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: Let's use the default batch size for the LIMIT here so that we mimic the default query as closely as possible.
Done.
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 80 at r4 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Ahh, the contradiction is from the filters on
aggregated_ts— it cannot be both>= now()and< date_trunc('hour', now())(unlessnow()is at the exact top of the hour).
Ah thanks so much! Done.
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 80 at r4 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Ah thanks so much! Done.
Also hardcoded the timestamp, just realized that using now() will make the test non-deterministic.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 4 files at r2, 2 of 4 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 20 at r5 (raw file):
query T EXPLAIN (VERBOSE) DELETE FROM system.statement_statistics
nit: add the unconstrained queries for both tables here too.
…affic Related to cockroachdb#79548 Previously, if a workload generated large amount of unique statement/transaction fingerprints, the number of distinct fingerprint within the latest aggregation interval can become so large that it would exceed the limit defined by `sql.stats.persisted_rows.max` When this happened, the background SQL Stats cleanup job would be issuing DELETE query that contended with the flush traffic, which could lead to spiking query latencies. This commit ensured that the background SQL Stats cleanup job completely avoids scanning over the overlapping key ranges touched by the SQL Stats flush. Release note: None
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/exec/execbuilder/testdata/sql_activity_stats_compaction line 20 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: add the unconstrained queries for both tables here too.
Done.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
|
TFTR! bors r+ |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 2aab526 to blathers/backport-release-21.2-80515: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/sqlstats/persistedsqlstats/compaction_test.go line 271 at r6 (raw file):
} func TestSQLStatsForegroundInterference(t *testing.T) {
please comment what "interference" this test is concerned with, and how it tests that there's no such interference. I've read it and couldn't figure out.
pkg/sql/sqlstats/persistedsqlstats/compaction_test.go line 303 at r6 (raw file):
serverSQLStats.Flush(ctx) statsCompactor := persistedsqlstats.NewStatsCompactor(
just curious - is this statsCompactor running in parallel with the server's?
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/sqlstats/persistedsqlstats/compaction_test.go line 271 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please comment what "interference" this test is concerned with, and how it tests that there's no such interference. I've read it and couldn't figure out.
pkg/sql/sqlstats/persistedsqlstats/compaction_test.go line 303 at r6 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
just curious - is this
statsCompactorrunning in parallel with the server's?
Yes it is
Related to #79548
Previously, if a workload generated large amount of unique
statement/transaction fingerprints, the number of distinct fingerprint
within the latest aggregation interval can become so large that it would
exceed the limit defined by
sql.stats.persisted_rows.maxWhen thishappened, the background SQL Stats cleanup job would be issuing DELETE
query that contended with the flush traffic, which could lead to spiking
query latencies.
This commit ensured that the background SQL Stats cleanup job completely
avoids scanning over the overlapping key ranges touched by the SQL Stats
flush.
Release note: None