storage: avoid timer deadlock in quotaPool + add lint check#17796
storage: avoid timer deadlock in quotaPool + add lint check#17796nvb merged 2 commits intocockroachdb:masterfrom
Conversation
Maybe fixes cockroachdb#17524. When a `timeutil.Timer` is read from, it needs to set `timer.Read = true` before calling `timer.Reset` or it risks deadlocking itself. A deadlock in the `quotaPool` was possible because it missed this step. Once the `quotaPool` deadlocked, it would block the `CommandQueue` and wreak havoc on a cluster.
|
Wow, nice lint! LGTM, but @tamird should probably take a look at your AST craziness as well. Either way, hope we can merge this tomorrow. Reviewed 1 of 1 files at r1, 3 of 3 files at r2. pkg/cmd/metacheck/main.go, line 160 at r2 (raw file):
This is silly, but if we renamed that package, your lint would silently fail, right? Can we somehow import that package to make sure we explode if we move it? Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending. pkg/storage/quota_pool.go, line 147 at r2 (raw file):
Is there a reason you moved this Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful. pkg/storage/quota_pool.go, line 147 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
I think it is desirable. We were staring at logs today that had zero activity and we had to grep through the history to figure out if there was ever a message. Starting to repetitively log loads of messages here has its own problems, but we should work really hard to never get into that state. Since each of these repetitive messages would be bound to a client's context, I think we should try this out. Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 2 unresolved discussions, all commit checks successful. pkg/storage/quota_pool.go, line 147 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Ok, maybe we should change the other Comments from Reviewable |
ee4b01b to
631aa79
Compare
|
Review status: 3 of 4 files reviewed at latest revision, 2 unresolved discussions. pkg/cmd/metacheck/main.go, line 160 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Good point. @tamird, is there a standard way we've addressed this in the past? pkg/storage/quota_pool.go, line 147 at r2 (raw file): Previously, petermattis (Peter Mattis) wrote…
It was already inside the loop, just in the The other three instances of slow timers are in:
As you point out, none of these warn multiple times. These should probably be consistent with Comments from Reviewable |
|
Reviewed 1 of 1 files at r1, 1 of 1 files at r3. pkg/cmd/metacheck/main.go, line 160 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yep, pkg/cmd/metacheck/main.go, line 195 at r3 (raw file):
this will also silently break if the name of this field changes. probably fine. Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 3 unresolved discussions, all commit checks successful. pkg/cmd/metacheck/main.go, line 160 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Oh, and I imagine you can use reflection to confirm the type name (and the field name, "C", below). Comments from Reviewable |
This change adds a new metacheck that assures that `timeutil.Timer`
is used correctly.
> checkSetTimeutilTimerRead assures that timeutil.Timer objects are used
> correctly, to avoid race conditions and deadlocks. These timers require
> callers to set their Read field to true when their channel has been received
> on. If this field is not set and the timer's Reset method is called, we will
> deadlock. This lint assures that the Read field is set in the most common
> case where Reset is used, within a for-loop where each iteration blocks
> on a select statement. The timers are usually used as timeouts on these
> select statements, and need to be reset after each iteration.
>
> for {
> timer.Reset(...)
> select {
> case <-timer.C:
> timer.Read = true <-- lint verifies that this line is present
> case ...:
> }
> }
Note that the lint is triggered even if `Reset` is not present in each iteration.
Even though this case will not deadlock without the `timer.Read = true`, I
think it's safer to mandate that all select cases set the `Read` flag anyway.
This prevents against issues calling `Reset` sometime after the `for` loop, which
could still deadlock.
631aa79 to
bcfb816
Compare
|
Review status: 3 of 4 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. pkg/cmd/metacheck/main.go, line 160 at r2 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. pkg/cmd/metacheck/main.go, line 195 at r3 (raw file): Previously, tamird (Tamir Duberstein) wrote…
Done. Comments from Reviewable |
|
Reviewed 2 of 3 files at r2, 1 of 1 files at r4. Comments from Reviewable |
Maybe fixes #17524.
When a
timeutil.Timeris read from, it needs to settimer.Read = truebefore calling
timer.Resetor it risks deadlocking itself. A deadlockin the
quotaPoolwas possible because it missed this step. Once thequotaPooldeadlocked, it would block theCommandQueueand wreak havocon a cluster.
To prevent against this happening again, this change adds a new lint check that
assures that
timeutil.Timeris used correctly.Note that the lint is triggered even if
Resetis not present in each iteration.Even though this case will not deadlock without the
timer.Read = true, Ithink it's safer to mandate that all select cases set the
Readflag anyway.This prevents against issues calling
Resetsometime after theforloop, whichcould still deadlock.
@tschottdorf for the first commit, @tamird for the second.