[DNM] kv: add environment variable to inject retries#54695
Draft
ajwerner wants to merge 1 commit intocockroachdb:masterfrom
Draft
[DNM] kv: add environment variable to inject retries#54695ajwerner wants to merge 1 commit intocockroachdb:masterfrom
ajwerner wants to merge 1 commit intocockroachdb:masterfrom
Conversation
Member
e0a3c10 to
e76fbc3
Compare
e76fbc3 to
0a68a74
Compare
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 23, 2020
…ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
This patch extends the testserver to inject retry errors in calls to send batch requests at a rate indicated by an environment variable. This should help us uncover bugs due to bad handling of transaction retries. Release note: None
0a68a74 to
a53eb00
Compare
craig bot
pushed a commit
that referenced
this pull request
Sep 24, 2020
54748: backupccl: fix bug due to transaction restart while publishing descri… r=ajwerner a=ajwerner …ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in #54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 24, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
craig bot
pushed a commit
that referenced
this pull request
Sep 24, 2020
54261: ui: add transactions page to Admin UI r=dhartunian a=elkmaster This commit adds the new Transactions Page to the Admin UI. This page is much like the Statements Page but differs in that it shows Transaction-level statistics in tabular and detail form. Every Transaction is able to display its contained Statements for more detailed analysis. The page and its components are imported from the `admin-ui-components` library. Depends on: cockroachdb/yarn-vendored#38 Release justification: low-risk high impact addition to Admin UI Release note (admin ui change): add Transactions and Transactions details pages. These pages allow for viewing stats at the transaction level. 54749: cli/demo: provision an initial non-`root` "demo" user r=irfansharif a=knz Fixes #54557. Requested by @jseldess First 2 commits from #54741 (can be ignored during review) Release note (cli change): `cockroach demo` now pre-creates a `demo` user account with a random password, instead of letting (and encouraging) the user to use the `root` account directly. The `demo` account is currently granted the `admin` role. 54755: sql: fix bug whereby backfiller would drop spans on txn restart r=ajwerner a=ajwerner This bug was caught by testing with #54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts. Co-authored-by: Vlad <carrott9@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 28, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 28, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 29, 2020
This bug was caught by testing with cockroachdb#54695. Before that change, it would fail almost immediately, now it does not fail under stress. I'm open to suggestions on how to more generally test this. Release note (bug fix): Fixed a rare bug which can lead to index backfills failing in the face of transaction restarts.
ajwerner
pushed a commit
to ajwerner/cockroach
that referenced
this pull request
Sep 29, 2020
…ptors Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing `TestBackupRestoreResume` with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
pbardea
added a commit
to pbardea/cockroach
that referenced
this pull request
Oct 19, 2020
…lishing tables Before this patch, if a restart occurred while trying to publish descriptors, we would short-circuit publishing the descriptors on the subsequent attempt. The approach was validated by stressing TestBackupRestoreResume with the patch in cockroachdb#54695 which previously reproduced the failure immediately. Release note (bug fix): Fixed a bug whereby a transaction restart at the wrong moment during a restore could leave descriptors offline after the restore completed successfully.
Contributor
|
I've been playing around with improving bulk jobs' testing around these txn retries and am curious if there's a path forward to getting something like this merged/what can be done to help our testing in this area. I think that a setting like this is useful. Do we have any worries about introducing this setting generally? I played around with a very early prototype of introducing these kinds of retires specifically when we create new txns in bulk jobs: #56074, but it might be nice to get the more general testing that this provides. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch extends the testserver to inject retry errors in calls to send
batch requests at a rate indicated by an environment variable. This should
help us uncover bugs due to bad handling of transaction retries.
Release note: None