sql/stats: fix some flaky partial stats tests#127395
sql/stats: fix some flaky partial stats tests#127395craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
# statistic. statement ok ALTER TABLE int_outer_buckets INJECT STATISTICS '$int_outer_buckets_stats'
Could you explain about what these steps achieve? It seems like they should have no effect because the existing stats are just begin re-injected.
pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 1421 at r1 (raw file):
CREATE STATISTICS ka_partialstat ON a FROM ka USING EXTREMES query T retry
I think we might only need this first retry because if this one succeeds, then the stats have been updated and the remaining tests should pass.
I supposed multiple retries will reduce other forms of potential flakiness, but I'd prefer to be conservative with retries because they can increase test duration.
e5fddc4 to
d0eb6e5
Compare
Uzair5162
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Could you explain about what these steps achieve? It seems like they should have no effect because the existing stats are just begin re-injected.
Injecting stats invalidates the stat cache synchronously (here) whereas just creating stats updates the cache async. Doing this guarantees that the upcoming partial stat collections see the new full stats instead of relying on the cache being updated quickly enough. More context in #92617.
pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 1421 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I think we might only need this first retry because if this one succeeds, then the stats have been updated and the remaining tests should pass.
I supposed multiple retries will reduce other forms of potential flakiness, but I'd prefer to be conservative with retries because they can increase test duration.
Done, good call.
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Previously, Uzair5162 (Uzair Ahmad) wrote…
Injecting stats invalidates the stat cache synchronously (here) whereas just creating stats updates the cache async. Doing this guarantees that the upcoming partial stat collections see the new full stats instead of relying on the cache being updated quickly enough. More context in #92617.
Maybe it's finally time to add a builtin that flushes the stats cache. 🤔
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Maybe it's finally time to add a builtin that flushes the stats cache. 🤔
👍
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Maybe it's finally time to add a builtin that flushes the stats cache. 🤔
👍
Should we be going through the stats cache at all when building new stats? I think it happens here and here.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Maybe it's finally time to add a builtin that flushes the stats cache. 🤔
:) #127414
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @Uzair5162)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Should we be going through the stats cache at all when building new stats? I think it happens here and here.
The stats cache is nice because it already has all the machinery we need to read the stats from system.table_statistics and convert them to the in-memory format. And most of the time the cached stats will be fresh; we only see stale stats when we collect partial stats within a few milliseconds after a full stats collection.
If we really wanted to, we could add a flag to GetTableStats that forces invalidation, and then could use that flag in these two calls. But I think this is mostly a test-only issue, because (a) as unlikely as it is, everything still functions correctly if partial stats are based off a stale full stats collection, they're just not as optimal, and (b) in real-world scenarios I would expect it to be very rare to collect partial stats a few ms after a full stats collection. IMO it seems better to take advantage of the cache.
:) https://reviewable.io/reviews/cockroachdb/cockroach/127414
haha, nice
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @Uzair5162)
d0eb6e5 to
37f97d4
Compare
Uzair5162
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Should we be going through the stats cache at all when building new stats? I think it happens here and here.
The stats cache is nice because it already has all the machinery we need to read the stats from
system.table_statisticsand convert them to the in-memory format. And most of the time the cached stats will be fresh; we only see stale stats when we collect partial stats within a few milliseconds after a full stats collection.If we really wanted to, we could add a flag to
GetTableStatsthat forces invalidation, and then could use that flag in these two calls. But I think this is mostly a test-only issue, because (a) as unlikely as it is, everything still functions correctly if partial stats are based off a stale full stats collection, they're just not as optimal, and (b) in real-world scenarios I would expect it to be very rare to collect partial stats a few ms after a full stats collection. IMO it seems better to take advantage of the cache.:) https://reviewable.io/reviews/cockroachdb/cockroach/127414
haha, nice
Thanks @DrewKimball ! I've updated this PR to use clear_table_stats_cache().
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Uzair5162)
-- commits line 5 at r3:
[nit] Update the commit message.
This commit adds a retry to a partial stats test on EXPLAIN output. It also clears the stat cache between full and partial stat collections to ensure that the new full stat is seen by the partial stat collection, which should resolve flakiness causing 'column %s does not have a prior statistic' errors. Fixes: cockroachdb#92495 Fixes: cockroachdb#127023 Release note: None
37f97d4 to
5707cf1
Compare
|
bors r+ |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 5707cf1 to blathers/backport-release-24.2-127395: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit adds a retry to a partial stats test on EXPLAIN output. It also clears the stat cache between full and partial stat collections to ensure that the new full stat is seen by the partial stat collection, which should resolve flakiness causing 'column %s does not have a prior statistic' errors.
Fixes: #92495
Fixes: #127023
Release note: None