Skip to content

leaktest, stop: don't recover from panics#92627

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
andreimatei:stopper.leaktest
Dec 2, 2022
Merged

leaktest, stop: don't recover from panics#92627
craig[bot] merged 2 commits intocockroachdb:masterfrom
andreimatei:stopper.leaktest

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

See individual commits.

Epic: None

@andreimatei andreimatei requested review from knz and tbg November 28, 2022 23:04
@andreimatei andreimatei requested a review from a team as a code owner November 28, 2022 23:04
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knz do I have your blessing too?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@knz
Copy link
Copy Markdown
Contributor

knz commented Nov 30, 2022

This one I don't have an opinion on. It's all yours (and tobi's).

This patch effectively reverts cockroachdb#45983. The leaktest will now re-panic
when invoked on a panic unwind, instead of failing the test but
otherwise swallowing the panic and letting further tests run. Recovering
from panics is generally not a Go thing to do. The leaktest can only
recover from panics on the test's main goroutine (panics on other
goroutines, for example panics in a TestServer/TestCluster still crash
the binary), so the leaktest's behavior is inconsistent at best. The
panic-swallowing behavior is also surprising and magical - tests using
leaktest get it, tests not using leaktests don't get it, and figuring
out this difference is hard. Additionally, having the leaktest recover
panics is contageous - it puts pressure on other random code to be
mindful of its behavior under panic unwind. For example, the Stopper has
complex logic for recovering panics and doing best-effort cleanup,
recovering more panics from confused cleanup code, so that ultimately
future tests can run in a more or less sane environment.

I don't think the magic is worth it. Let's try again without it.

I believe that, at the time the leaktest magic was added, panics in go
tests were not resulting in a very clean experience in CI - it wasn't
trivial to see what test failed. I don't know if that's still the case.

Release note: None
Epic: None
Before this patch, Stopper.Stop() would do a "best-effort" shutdown when
called while a panic is unwinding: it would signal quiescence, and then
run all closer async while swallowing their panics, then wait a bit,
then re-panic. If the stopper is to do any shutdown on panics, this
shutdown has to be somewhat involved, since some closers are known to
panic easily when closed in an unclean state (e.g. memory accounts with
bytes taken out of them panic), and tasks can fail to respond to
quiescence, etc. This is all too complicated, and pretty unsound, for
little, if any, gain, so this patch gets rid of it all in favor of not
doing any shutdown and re-panicking immediately.

This best-effort shutdown was added in cockroachdb#3581, for the purported benefit
of CLI tests that wanted to trigger panic code-paths. The respective
code is long gone - as far as I can dig up, it had to do with ancient,
per-SQL cli kv-client commands.
Over time, this panic recovery became load-bearing in cockroachdb#45983, when
leaktest was modified to not crash the package binary on panics, and
instead continue with further tests. If we are to continue with other
tests after a panicking one, we'd better try to cleanup the panicking
TestServer as best we can. The leaktest panic behavior is reverted in
the prior commit.

Release note: None
Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tweaked this so that leaktest does a t.Log(<recovered panic>) so that it looks better in TeamCity. With this, the experience on panic on TC is good - the panicking test is identified, and you see the panic in the snippet, etc.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 2, 2022

Build failed (retrying...):

@craig craig bot merged commit 865ad56 into cockroachdb:master Dec 2, 2022
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 2, 2022

Build succeeded:

@andreimatei andreimatei deleted the stopper.leaktest branch December 5, 2022 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants