Skip to content

roachtest: use SQL comparison in TLP#79551

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
michae2:tlp
Apr 11, 2022
Merged

roachtest: use SQL comparison in TLP#79551
craig[bot] merged 1 commit intocockroachdb:masterfrom
michae2:tlp

Conversation

@michae2
Copy link
Copy Markdown
Collaborator

@michae2 michae2 commented Apr 6, 2022

Assists: #77279

TLP compares results of two queries, one unpartitioned and one
partitioned three ways by a predicate. We were comparing TLP results by
printing values to strings. Unfortunately there are some values which
are equal according to SQL comparison, but print differently. And TLP,
being the nefarious chaos monkey that it is, tends to find these. For
example:

  0::FLOAT
  -0::FLOAT

  'hello' COLLATE "en-US-u-ks-level2"
  'HELLO' COLLATE "en-US-u-ks-level2"

  INTERVAL '1 day'
  INTERVAL '24 hours'

When equal-yet-different values like these are used in MAX or MIN
aggregations, or are the keys for GROUP BY bucketing, it is
nondeterministic which value will be chosen. (This is also true in
PostgreSQL.) And sometimes the two TLP queries choose differently. This
is not indicative of a bug, yet TLP fails.

So this patch teaches TLP to use SQL comparison instead of string
comparison. To do so, we wrap both queries in a bigger query:

  WITH unpart AS MATERIALIZED (
    <unpartitioned query>
  ), part AS MATERIALIZED (
    <partitioned query>
  ), undiff AS (
    TABLE unpart EXCEPT ALL TABLE part
  ), diff AS (
    TABLE part EXCEPT ALL TABLE unpart
  )
  SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff)

Only if this query returns counts other than 0 (meaning the SQL
comparison detected a real mismatch) do we diff the printed results.

Release note: None

@michae2 michae2 requested review from mgartner and yuzefovich April 6, 2022 22:39
@michae2 michae2 requested a review from a team as a code owner April 6, 2022 22:39
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@michae2 michae2 force-pushed the tlp branch 2 times, most recently from c132fd0 to fdacf77 Compare April 6, 2022 22:41
Copy link
Copy Markdown
Contributor

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clever solution!
:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @michae2)


pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

		}

		if diff := unsortedMatricesDiff(unpartitionedRows, partitionedRows); diff != "" {

You can remove all the of sqlutils.RowsToStrMatrix and unsortedMatricesDiff stuff now, right?

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @michae2)


pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

You can remove all the of sqlutils.RowsToStrMatrix and unsortedMatricesDiff stuff now, right?

I suppose you kept this so that the diff is still printed? I don't find it all that useful - the partitioned and unpartitioned queries are enough to begin diagnosing. But if you think it's useful, we can keep it.

But you will have to remove the error below to prevent these false positives.

@mgartner
Copy link
Copy Markdown
Contributor

mgartner commented Apr 8, 2022

pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

But you will have to remove the error below to prevent these false positives.

Oh, nevermind, I see the early returns.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)


pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

But you will have to remove the error below to prevent these false positives.

Oh, nevermind, I see the early returns.

I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.

Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)


pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.

Without the printed diff, won't it be more difficult to triage TLP issues? For example, when TLP was hitting #77279 it was easy to look at the diff in the issue and see "-0" and dupe it. But maybe I'm overthinking it?

@mgartner
Copy link
Copy Markdown
Contributor

mgartner commented Apr 8, 2022

it was easy to look at the diff in the issue and see "-0" and dupe it.

Well, the diff fooled us into thinking it was a duplicate, but there was actually two separate cases - one was a false positive but the other was an actual correctness bug.

But I'm fine leaving it for now and seeing if it proves useful. Up to you.

Assists: cockroachdb#77279

TLP compares results of two queries, one unpartitioned and one
partitioned three ways by a predicate. We were comparing TLP results by
printing values to strings. Unfortunately there are some values which
are equal according to SQL comparison, but print differently. And TLP,
being the nefarious chaos monkey that it is, tends to find these. For
example:

```sql
  0::FLOAT
  -0::FLOAT

  'hello' COLLATE "en-US-u-ks-level2"
  'HELLO' COLLATE "en-US-u-ks-level2"

  INTERVAL '1 day'
  INTERVAL '24 hours'
```

When equal-yet-different values like these are used in MAX or MIN
aggregations, or are the keys for GROUP BY bucketing, it is
nondeterministic which value will be chosen. (This is also true in
PostgreSQL.) And sometimes the two TLP queries choose differently. This
is not indicative of a bug, yet TLP fails.

So this patch teaches TLP to use SQL comparison instead of string
comparison. To do so, we wrap both queries in a bigger query:

```sql
  WITH unpart AS MATERIALIZED (
    <unpartitioned query>
  ), part AS MATERIALIZED (
    <partitioned query>
  ), undiff AS (
    TABLE unpart EXCEPT ALL TABLE part
  ), diff AS (
    TABLE part EXCEPT ALL TABLE unpart
  )
  SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff)
```

Only if this query returns counts other than 0 (meaning the SQL
comparison detected a real mismatch) do we diff the printed results.

Release note: None
@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Apr 8, 2022

Well, the diff fooled us into thinking it was a duplicate, but there was actually two separate cases - one was a false positive but the other was an actual correctness bug.

But I'm fine leaving it for now and seeing if it proves useful. Up to you.

Good point! Hmm. After thinking about it, I left the diff but changed the logic slightly so that the test only depends on SQL comparison, and the diff is purely informative. This will make it easy to remove the diff if we so decide.

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Apr 11, 2022

TFTRs!

Bazel CI failure looks unrelated, so I'll go ahead and merge.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 11, 2022

Build succeeded:

@craig craig bot merged commit 6de7a18 into cockroachdb:master Apr 11, 2022
@michae2 michae2 deleted the tlp branch April 11, 2022 23:26
@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented May 4, 2022

blathers backport 22.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants