roachtest: use SQL comparison in TLP by michae2 · Pull Request #79551 · cockroachdb/cockroach

michae2 · 2022-04-06T22:39:34Z

Assists: #77279

TLP compares results of two queries, one unpartitioned and one
partitioned three ways by a predicate. We were comparing TLP results by
printing values to strings. Unfortunately there are some values which
are equal according to SQL comparison, but print differently. And TLP,
being the nefarious chaos monkey that it is, tends to find these. For
example:

  0::FLOAT
  -0::FLOAT

  'hello' COLLATE "en-US-u-ks-level2"
  'HELLO' COLLATE "en-US-u-ks-level2"

  INTERVAL '1 day'
  INTERVAL '24 hours'

When equal-yet-different values like these are used in MAX or MIN
aggregations, or are the keys for GROUP BY bucketing, it is
nondeterministic which value will be chosen. (This is also true in
PostgreSQL.) And sometimes the two TLP queries choose differently. This
is not indicative of a bug, yet TLP fails.

So this patch teaches TLP to use SQL comparison instead of string
comparison. To do so, we wrap both queries in a bigger query:

  WITH unpart AS MATERIALIZED (
    <unpartitioned query>
  ), part AS MATERIALIZED (
    <partitioned query>
  ), undiff AS (
    TABLE unpart EXCEPT ALL TABLE part
  ), diff AS (
    TABLE part EXCEPT ALL TABLE unpart
  )
  SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff)

Only if this query returns counts other than 0 (meaning the SQL
comparison detected a real mismatch) do we diff the printed results.

Release note: None

cockroach-teamcity · 2022-04-06T22:39:43Z

This change is

msirek

Very clever solution!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)

yuzefovich

Nice!

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)

mgartner

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @michae2)

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

		}

		if diff := unsortedMatricesDiff(unpartitionedRows, partitionedRows); diff != "" {

You can remove all the of sqlutils.RowsToStrMatrix and unsortedMatricesDiff stuff now, right?

mgartner

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @michae2)

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

You can remove all the of sqlutils.RowsToStrMatrix and unsortedMatricesDiff stuff now, right?

I suppose you kept this so that the diff is still printed? I don't find it all that useful - the partitioned and unpartitioned queries are enough to begin diagnosing. But if you think it's useful, we can keep it.

But you will have to remove the error below to prevent these false positives.

mgartner · 2022-04-08T16:37:19Z

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

But you will have to remove the error below to prevent these false positives.

Oh, nevermind, I see the early returns.

yuzefovich

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

But you will have to remove the error below to prevent these false positives.

Oh, nevermind, I see the early returns.

I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.

michae2

Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.

Without the printed diff, won't it be more difficult to triage TLP issues? For example, when TLP was hitting #77279 it was easy to look at the diff in the issue and see "-0" and dupe it. But maybe I'm overthinking it?

mgartner · 2022-04-08T18:05:56Z

it was easy to look at the diff in the issue and see "-0" and dupe it.

Well, the diff fooled us into thinking it was a duplicate, but there was actually two separate cases - one was a false positive but the other was an actual correctness bug.

But I'm fine leaving it for now and seeing if it proves useful. Up to you.

Assists: cockroachdb#77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None

michae2 · 2022-04-08T18:42:15Z

Well, the diff fooled us into thinking it was a duplicate, but there was actually two separate cases - one was a false positive but the other was an actual correctness bug.

But I'm fine leaving it for now and seeing if it proves useful. Up to you.

Good point! Hmm. After thinking about it, I left the diff but changed the logic slightly so that the test only depends on SQL comparison, and the diff is purely informative. This will make it easy to remove the diff if we so decide.

michae2 · 2022-04-11T20:39:35Z

TFTRs!

Bazel CI failure looks unrelated, so I'll go ahead and merge.

bors r+

craig · 2022-04-11T23:24:53Z

Build succeeded:

GitHub CI (Cockroach)

michae2 · 2022-05-04T22:54:57Z

blathers backport 22.1

michae2 requested review from mgartner and yuzefovich April 6, 2022 22:39

michae2 requested a review from a team as a code owner April 6, 2022 22:39

michae2 force-pushed the tlp branch 2 times, most recently from c132fd0 to fdacf77 Compare April 6, 2022 22:41

msirek approved these changes Apr 6, 2022

View reviewed changes

yuzefovich approved these changes Apr 7, 2022

View reviewed changes

mgartner suggested changes Apr 8, 2022

View reviewed changes

yuzefovich reviewed Apr 8, 2022

View reviewed changes

michae2 commented Apr 8, 2022

View reviewed changes

michae2 force-pushed the tlp branch from fdacf77 to c3b6822 Compare April 8, 2022 18:38

craig bot merged commit 6de7a18 into cockroachdb:master Apr 11, 2022

michae2 deleted the tlp branch April 11, 2022 23:26

michae2 mentioned this pull request Apr 14, 2022

sql: v22.1: spurious tlp failure due to string comparison #79947

Closed

blathers-crl bot mentioned this pull request May 4, 2022

release-22.1: roachtest: use SQL comparison in TLP #81015

Merged

This was referenced May 31, 2022

release-22.1.0: roachtest: use SQL comparison in TLP #82135

Closed

roachtest: tlp failed #81885

Closed

michae2 mentioned this pull request Jan 16, 2024

roachtest: unoptimized-query-oracle/disable-rules=all/rand-tables failed #117806

Closed

Conversation

michae2 commented Apr 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Apr 6, 2022

Uh oh!

msirek left a comment

Choose a reason for hiding this comment

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner commented Apr 8, 2022

Uh oh!

yuzefovich left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner commented Apr 8, 2022

Uh oh!

michae2 commented Apr 8, 2022

Uh oh!

michae2 commented Apr 11, 2022

Uh oh!

craig bot commented Apr 11, 2022

Uh oh!

michae2 commented May 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

michae2 commented Apr 6, 2022 •

edited

Loading