Skip to content

sql/tests: TestRandomSyntaxSchemaChangeColumn use a resettable timeout#84358

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:addColumnRandomTimeiout
Aug 11, 2022
Merged

sql/tests: TestRandomSyntaxSchemaChangeColumn use a resettable timeout#84358
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:addColumnRandomTimeiout

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Jul 13, 2022

fixes #65736

Previously, TestRandomSyntaxSchemaChangeColumn had a fixed
timeout, which meant that if schema change got stuck behind
each other, this timeout may not have been sufficient. Previously,
we tried bumping up this timeout, but this is not the most reliable
for this test. To address this, this patch introduces the concept of
resettable timeouts, which states that the timeout expires only if
no other statements are complete within the given timeout (otherwise,
its recalculated since the completion of the last statement. To avoid
potential starvation there is a limit on the number of resets,
which guarantees eventual expiry if a query is always bypassed.

Release note: None

@fqazi fqazi requested review from a team July 13, 2022 15:09
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@fqazi fqazi force-pushed the addColumnRandomTimeiout branch from b67504a to 4964a8f Compare July 13, 2022 15:10
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Jul 13, 2022

Marked for backport since this test issue is also happening on older releases

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi and @rafiss)


pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):

						// Recompute our target as the intended target minus any time
						// that has passed since the last statement.
						targetDuration = duration - db.mu.lastCompletedStmt.Add(duration).Sub(timeutil.Now())

i don't fully follow this recomputation step. why not just keep waiting for the same duration during each loop iteration?

Copy link
Copy Markdown

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)


pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

i don't fully follow this recomputation step. why not just keep waiting for the same duration during each loop iteration?

My understanding of the general idea is that the timeout should apply more to elapsed CPU time instead of elapsed wall clock time. I have no idea whether this is preferable to a simpler scheme like you propose @rafiss . I'm biased towards simple schemes so this deserves more explanation.

Copy link
Copy Markdown
Collaborator Author

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)


pkg/sql/tests/rsg_test.go line 213 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

My understanding of the general idea is that the timeout should apply more to elapsed CPU time instead of elapsed wall clock time. I have no idea whether this is preferable to a simpler scheme like you propose @rafiss . I'm biased towards simple schemes so this deserves more explanation.

The reason here is behaviour was for longer timeout values (previously the timeout was 2 minutes, so we can remove this if you feel strongly). One example would be that we have 2 connections:

(1) => Executes work in 1 second (setting the last completed query)
(2) => Times out after 2 minutes

If we do the naive reset approach then (2) will wait an additional 2 minutes before timing out. This approach says, that 2 gains only 1 extra second. Let me know your thoughts @postamar @rafiss, for the current timeout values we can just do full resets.

@fqazi fqazi force-pushed the addColumnRandomTimeiout branch from 4964a8f to fc0f21e Compare July 18, 2022 13:52
@fqazi fqazi requested review from postamar and rafiss July 19, 2022 15:06
@postamar
Copy link
Copy Markdown

Personally I'm fine with a "time quota" like you propose.

Fixes: cockroachdb#65736

Previously, TestRandomSyntaxSchemaChangeColumn had a fixed
timeout, which meant that if schema change got stuck behind
each other, this timeout may not have been sufficient. Previously,
we tried bumping up this timeout, but this is not the most reliable
for this test. To address this, this patch introduces the concept of
resettable timeouts, which states that the timeout expires only if
no other statements are complete within the given timeout (otherwise,
its recalculated since the completion of the last statement. To avoid
potential starvation there is a limit on the number of resets,
which guarantees eventual expiry if a query is always bypassed.

Release note: None
@fqazi fqazi force-pushed the addColumnRandomTimeiout branch from fc0f21e to d2f78ea Compare August 4, 2022 03:02
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Aug 9, 2022

@rafiss Are you good with the approach or feel strongly about it?

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still wonder if just using the naive reset would be more robust to failures, but i guess we can see if it’s still flaky after this is merged

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)

@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Aug 10, 2022

i still wonder if just using the naive reset would be more robust to failures, but i guess we can see if it’s still flaky after this is merged

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @postamar and @rafiss)

Definitely, if see flakes we can revisit this. Thanks, @rafiss

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 10, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 11, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 11, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 11, 2022

Build succeeded:

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 11, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from d2f78ea to blathers/backport-release-22.1-84358: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/tests: TestRandomSyntaxSchemaChangeColumn failed

4 participants