sql/tests: TestRandomSyntaxSchemaChangeColumn use a resettable timeout#84358
sql/tests: TestRandomSyntaxSchemaChangeColumn use a resettable timeout#84358craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
b67504a to
4964a8f
Compare
|
Marked for backport since this test issue is also happening on older releases |
rafiss
left a comment
There was a problem hiding this comment.
nice idea
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @rafiss)
pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):
// Recompute our target as the intended target minus any time // that has passed since the last statement. targetDuration = duration - db.mu.lastCompletedStmt.Add(duration).Sub(timeutil.Now())
i don't fully follow this recomputation step. why not just keep waiting for the same duration during each loop iteration?
postamar
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)
pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i don't fully follow this recomputation step. why not just keep waiting for the same
durationduring each loop iteration?
My understanding of the general idea is that the timeout should apply more to elapsed CPU time instead of elapsed wall clock time. I have no idea whether this is preferable to a simpler scheme like you propose @rafiss . I'm biased towards simple schemes so this deserves more explanation.
fqazi
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)
pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
My understanding of the general idea is that the timeout should apply more to elapsed CPU time instead of elapsed wall clock time. I have no idea whether this is preferable to a simpler scheme like you propose @rafiss . I'm biased towards simple schemes so this deserves more explanation.
The reason here is behaviour was for longer timeout values (previously the timeout was 2 minutes, so we can remove this if you feel strongly). One example would be that we have 2 connections:
(1) => Executes work in 1 second (setting the last completed query)
(2) => Times out after 2 minutes
If we do the naive reset approach then (2) will wait an additional 2 minutes before timing out. This approach says, that 2 gains only 1 extra second. Let me know your thoughts @postamar @rafiss, for the current timeout values we can just do full resets.
4964a8f to
fc0f21e
Compare
|
Personally I'm fine with a "time quota" like you propose. |
Fixes: cockroachdb#65736 Previously, TestRandomSyntaxSchemaChangeColumn had a fixed timeout, which meant that if schema change got stuck behind each other, this timeout may not have been sufficient. Previously, we tried bumping up this timeout, but this is not the most reliable for this test. To address this, this patch introduces the concept of resettable timeouts, which states that the timeout expires only if no other statements are complete within the given timeout (otherwise, its recalculated since the completion of the last statement. To avoid potential starvation there is a limit on the number of resets, which guarantees eventual expiry if a query is always bypassed. Release note: None
fc0f21e to
d2f78ea
Compare
|
@rafiss Are you good with the approach or feel strongly about it? |
rafiss
left a comment
There was a problem hiding this comment.
i still wonder if just using the naive reset would be more robust to failures, but i guess we can see if it’s still flaky after this is merged
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)
Definitely, if see flakes we can revisit this. Thanks, @rafiss bors r+ |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from d2f78ea to blathers/backport-release-22.1-84358: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
fixes #65736
Previously, TestRandomSyntaxSchemaChangeColumn had a fixed
timeout, which meant that if schema change got stuck behind
each other, this timeout may not have been sufficient. Previously,
we tried bumping up this timeout, but this is not the most reliable
for this test. To address this, this patch introduces the concept of
resettable timeouts, which states that the timeout expires only if
no other statements are complete within the given timeout (otherwise,
its recalculated since the completion of the last statement. To avoid
potential starvation there is a limit on the number of resets,
which guarantees eventual expiry if a query is always bypassed.
Release note: None