roachtest: use SQL comparison in TLP#79551
Conversation
c132fd0 to
fdacf77
Compare
msirek
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @michae2)
pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):
} if diff := unsortedMatricesDiff(unpartitionedRows, partitionedRows); diff != "" {
You can remove all the of sqlutils.RowsToStrMatrix and unsortedMatricesDiff stuff now, right?
mgartner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @michae2)
pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
You can remove all the of
sqlutils.RowsToStrMatrixandunsortedMatricesDiffstuff now, right?
I suppose you kept this so that the diff is still printed? I don't find it all that useful - the partitioned and unpartitioned queries are enough to begin diagnosing. But if you think it's useful, we can keep it.
But you will have to remove the error below to prevent these false positives.
|
pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):
Oh, nevermind, I see the early returns. |
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
But you will have to remove the error below to prevent these false positives.
Oh, nevermind, I see the early returns.
I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/cmd/roachtest/tests/tlp.go, line 256 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I think I also don't find the diffs very useful either, so I'm +1 on removing the old stringified comparison altogether.
Without the printed diff, won't it be more difficult to triage TLP issues? For example, when TLP was hitting #77279 it was easy to look at the diff in the issue and see "-0" and dupe it. But maybe I'm overthinking it?
Well, the diff fooled us into thinking it was a duplicate, but there was actually two separate cases - one was a false positive but the other was an actual correctness bug. But I'm fine leaving it for now and seeing if it proves useful. Up to you. |
Assists: cockroachdb#77279 TLP compares results of two queries, one unpartitioned and one partitioned three ways by a predicate. We were comparing TLP results by printing values to strings. Unfortunately there are some values which are equal according to SQL comparison, but print differently. And TLP, being the nefarious chaos monkey that it is, tends to find these. For example: ```sql 0::FLOAT -0::FLOAT 'hello' COLLATE "en-US-u-ks-level2" 'HELLO' COLLATE "en-US-u-ks-level2" INTERVAL '1 day' INTERVAL '24 hours' ``` When equal-yet-different values like these are used in MAX or MIN aggregations, or are the keys for GROUP BY bucketing, it is nondeterministic which value will be chosen. (This is also true in PostgreSQL.) And sometimes the two TLP queries choose differently. This is not indicative of a bug, yet TLP fails. So this patch teaches TLP to use SQL comparison instead of string comparison. To do so, we wrap both queries in a bigger query: ```sql WITH unpart AS MATERIALIZED ( <unpartitioned query> ), part AS MATERIALIZED ( <partitioned query> ), undiff AS ( TABLE unpart EXCEPT ALL TABLE part ), diff AS ( TABLE part EXCEPT ALL TABLE unpart ) SELECT (SELECT count(*) FROM undiff), (SELECT count(*) FROM diff) ``` Only if this query returns counts other than 0 (meaning the SQL comparison detected a real mismatch) do we diff the printed results. Release note: None
Good point! Hmm. After thinking about it, I left the diff but changed the logic slightly so that the test only depends on SQL comparison, and the diff is purely informative. This will make it easy to remove the diff if we so decide. |
|
TFTRs! Bazel CI failure looks unrelated, so I'll go ahead and merge. bors r+ |
|
Build succeeded: |
|
blathers backport 22.1 |
Assists: #77279
TLP compares results of two queries, one unpartitioned and one
partitioned three ways by a predicate. We were comparing TLP results by
printing values to strings. Unfortunately there are some values which
are equal according to SQL comparison, but print differently. And TLP,
being the nefarious chaos monkey that it is, tends to find these. For
example:
When equal-yet-different values like these are used in MAX or MIN
aggregations, or are the keys for GROUP BY bucketing, it is
nondeterministic which value will be chosen. (This is also true in
PostgreSQL.) And sometimes the two TLP queries choose differently. This
is not indicative of a bug, yet TLP fails.
So this patch teaches TLP to use SQL comparison instead of string
comparison. To do so, we wrap both queries in a bigger query:
Only if this query returns counts other than 0 (meaning the SQL
comparison detected a real mismatch) do we diff the printed results.
Release note: None