leaktest, stop: don't recover from panics#92627
Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom Dec 2, 2022
Merged
leaktest, stop: don't recover from panics#92627craig[bot] merged 2 commits intocockroachdb:masterfrom
craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
tbg
approved these changes
Nov 29, 2022
Member
andreimatei
commented
Nov 30, 2022
Contributor
Author
andreimatei
left a comment
There was a problem hiding this comment.
@knz do I have your blessing too?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
Contributor
|
This one I don't have an opinion on. It's all yours (and tobi's). |
This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic when invoked on a panic unwind, instead of failing the test but otherwise swallowing the panic and letting further tests run. Recovering from panics is generally not a Go thing to do. The leaktest can only recover from panics on the test's main goroutine (panics on other goroutines, for example panics in a TestServer/TestCluster still crash the binary), so the leaktest's behavior is inconsistent at best. The panic-swallowing behavior is also surprising and magical - tests using leaktest get it, tests not using leaktests don't get it, and figuring out this difference is hard. Additionally, having the leaktest recover panics is contageous - it puts pressure on other random code to be mindful of its behavior under panic unwind. For example, the Stopper has complex logic for recovering panics and doing best-effort cleanup, recovering more panics from confused cleanup code, so that ultimately future tests can run in a more or less sane environment. I don't think the magic is worth it. Let's try again without it. I believe that, at the time the leaktest magic was added, panics in go tests were not resulting in a very clean experience in CI - it wasn't trivial to see what test failed. I don't know if that's still the case. Release note: None Epic: None
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when called while a panic is unwinding: it would signal quiescence, and then run all closer async while swallowing their panics, then wait a bit, then re-panic. If the stopper is to do any shutdown on panics, this shutdown has to be somewhat involved, since some closers are known to panic easily when closed in an unclean state (e.g. memory accounts with bytes taken out of them panic), and tasks can fail to respond to quiescence, etc. This is all too complicated, and pretty unsound, for little, if any, gain, so this patch gets rid of it all in favor of not doing any shutdown and re-panicking immediately. This best-effort shutdown was added in cockroachdb#3581, for the purported benefit of CLI tests that wanted to trigger panic code-paths. The respective code is long gone - as far as I can dig up, it had to do with ancient, per-SQL cli kv-client commands. Over time, this panic recovery became load-bearing in cockroachdb#45983, when leaktest was modified to not crash the package binary on panics, and instead continue with further tests. If we are to continue with other tests after a panicking one, we'd better try to cleanup the panicking TestServer as best we can. The leaktest panic behavior is reverted in the prior commit. Release note: None
61a8885 to
94a25f8
Compare
andreimatei
commented
Dec 2, 2022
Contributor
Author
andreimatei
left a comment
There was a problem hiding this comment.
I've tweaked this so that leaktest does a t.Log(<recovered panic>) so that it looks better in TeamCity. With this, the experience on panic on TC is good - the panicking test is identified, and you see the panic in the snippet, etc.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
andreimatei
commented
Dec 2, 2022
Contributor
Author
andreimatei
left a comment
There was a problem hiding this comment.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @knz)
Contributor
|
Build failed (retrying...): |
Contributor
|
Build succeeded: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See individual commits.
Epic: None