Skip to content

sql/stats: fix some flaky partial stats tests#127395

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
Uzair5162:fix-partial-stats-test-flakes
Jul 25, 2024
Merged

sql/stats: fix some flaky partial stats tests#127395
craig[bot] merged 1 commit intocockroachdb:masterfrom
Uzair5162:fix-partial-stats-test-flakes

Conversation

@Uzair5162
Copy link
Copy Markdown
Contributor

@Uzair5162 Uzair5162 commented Jul 17, 2024

This commit adds a retry to a partial stats test on EXPLAIN output. It also clears the stat cache between full and partial stat collections to ensure that the new full stat is seen by the partial stat collection, which should resolve flakiness causing 'column %s does not have a prior statistic' errors.

Fixes: #92495
Fixes: #127023

Release note: None

@Uzair5162 Uzair5162 requested a review from DrewKimball July 17, 2024 20:58
@Uzair5162 Uzair5162 requested a review from a team as a code owner July 17, 2024 20:58
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

# statistic.
statement ok
ALTER TABLE int_outer_buckets INJECT STATISTICS '$int_outer_buckets_stats'

Could you explain about what these steps achieve? It seems like they should have no effect because the existing stats are just begin re-injected.


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 1421 at r1 (raw file):

CREATE STATISTICS ka_partialstat ON a FROM ka USING EXTREMES

query T retry

I think we might only need this first retry because if this one succeeds, then the stats have been updated and the remaining tests should pass.

I supposed multiple retries will reduce other forms of potential flakiness, but I'd prefer to be conservative with retries because they can increase test duration.

@Uzair5162 Uzair5162 force-pushed the fix-partial-stats-test-flakes branch from e5fddc4 to d0eb6e5 Compare July 17, 2024 21:40
Copy link
Copy Markdown
Contributor Author

@Uzair5162 Uzair5162 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Could you explain about what these steps achieve? It seems like they should have no effect because the existing stats are just begin re-injected.

Injecting stats invalidates the stat cache synchronously (here) whereas just creating stats updates the cache async. Doing this guarantees that the upcoming partial stat collections see the new full stats instead of relying on the cache being updated quickly enough. More context in #92617.


pkg/sql/opt/exec/execbuilder/testdata/partial_stats line 1421 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think we might only need this first retry because if this one succeeds, then the stats have been updated and the remaining tests should pass.

I supposed multiple retries will reduce other forms of potential flakiness, but I'd prefer to be conservative with retries because they can increase test duration.

Done, good call.

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @mgartner)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Previously, Uzair5162 (Uzair Ahmad) wrote…

Injecting stats invalidates the stat cache synchronously (here) whereas just creating stats updates the cache async. Doing this guarantees that the upcoming partial stat collections see the new full stats instead of relying on the cache being updated quickly enough. More context in #92617.

Maybe it's finally time to add a builtin that flushes the stats cache. 🤔

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Maybe it's finally time to add a builtin that flushes the stats cache. 🤔

👍

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Maybe it's finally time to add a builtin that flushes the stats cache. 🤔

👍

Should we be going through the stats cache at all when building new stats? I think it happens here and here.

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Maybe it's finally time to add a builtin that flushes the stats cache. 🤔

:) #127414

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @Uzair5162)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Should we be going through the stats cache at all when building new stats? I think it happens here and here.

The stats cache is nice because it already has all the machinery we need to read the stats from system.table_statistics and convert them to the in-memory format. And most of the time the cached stats will be fresh; we only see stale stats when we collect partial stats within a few milliseconds after a full stats collection.

If we really wanted to, we could add a flag to GetTableStats that forces invalidation, and then could use that flag in these two calls. But I think this is mostly a test-only issue, because (a) as unlikely as it is, everything still functions correctly if partial stats are based off a stale full stats collection, they're just not as optimal, and (b) in real-world scenarios I would expect it to be very rare to collect partial stats a few ms after a full stats collection. IMO it seems better to take advantage of the cache.

:) https://reviewable.io/reviews/cockroachdb/cockroach/127414

haha, nice

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @Uzair5162)

Copy link
Copy Markdown
Contributor Author

@Uzair5162 Uzair5162 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/logictest/testdata/logic_test/distsql_stats line 2927 at r1 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Should we be going through the stats cache at all when building new stats? I think it happens here and here.

The stats cache is nice because it already has all the machinery we need to read the stats from system.table_statistics and convert them to the in-memory format. And most of the time the cached stats will be fresh; we only see stale stats when we collect partial stats within a few milliseconds after a full stats collection.

If we really wanted to, we could add a flag to GetTableStats that forces invalidation, and then could use that flag in these two calls. But I think this is mostly a test-only issue, because (a) as unlikely as it is, everything still functions correctly if partial stats are based off a stale full stats collection, they're just not as optimal, and (b) in real-world scenarios I would expect it to be very rare to collect partial stats a few ms after a full stats collection. IMO it seems better to take advantage of the cache.

:) https://reviewable.io/reviews/cockroachdb/cockroach/127414

haha, nice

Thanks @DrewKimball ! I've updated this PR to use clear_table_stats_cache().

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Uzair5162)


-- commits line 5 at r3:
[nit] Update the commit message.

This commit adds a retry to a partial stats test on EXPLAIN output. It
also clears the stat cache between full and partial stat collections to
ensure that the new full stat is seen by the partial stat collection,
which should resolve flakiness causing 'column %s does not have a prior
statistic' errors.

Fixes: cockroachdb#92495
Fixes: cockroachdb#127023

Release note: None
@Uzair5162 Uzair5162 force-pushed the fix-partial-stats-test-flakes branch from 37f97d4 to 5707cf1 Compare July 25, 2024 15:04
@Uzair5162
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 25, 2024

@craig craig bot merged commit 6075607 into cockroachdb:master Jul 25, 2024
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Jul 25, 2024

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 5707cf1 to blathers/backport-release-24.2-127395: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

5 participants