timeutil: prevent illegal sharing of Timers#119595
timeutil: prevent illegal sharing of Timers#119595craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
nvb
left a comment
There was a problem hiding this comment.
Nice debugging! The fix in the second commit here . I left a question about the first patch though.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @yuzefovich)
pkg/util/timeutil/timer.go line 110 at r1 (raw file):
} } if !t.fromPool {
I don't entirely follow why this provides protection from misuse if the contract is still that "a Timer must never be used again after calls to Stop". Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using NewTimer) can now be re-used after Stop is called? In other words, is this allowing us to change the contract? Or just making a common misuse of the contract less impactful, with the assumption that stack allocation is often paired with violating this contract?
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @nvanbenschoten)
pkg/util/timeutil/timer.go line 110 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I don't entirely follow why this provides protection from misuse if the contract is still that "a Timer must never be used again after calls to Stop". Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using
NewTimer) can now be re-used after Stop is called? In other words, is this allowing us to change the contract? Or just making a common misuse of the contract less impactful, with the assumption that stack allocation is often paired with violating this contract?
This is a defensive mechanism that prevents problems from one class of misuse (like the one the second commit fixes). In particular, now with the "stack-allocated" (that, indeed, should actually escape) timer it won't ever be possible to share the timer by mistake. It is still possible for this problem to occur if the timer is created via NewTimer and the contract of Stop not being honored.
I'm somewhat on the fence about the first commit, but I'm leaning towards being defensive and merging it due to:
- if the timer originally didn't come from the pool, then it seems somewhat strange to put it into the pool on
Stop - this mechanism does protect from misuse - with low overhead - at least in some cases, and even very experienced engineers can make mistakes like this (the bug fixed in the second commit was introduced by Andrew W in 4ef67a0).
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @nvanbenschoten and @yuzefovich)
-- commits line 15 at r1:
nit: remove one of the two "only"'s
pkg/util/timeutil/timer.go line 110 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This is a defensive mechanism that prevents problems from one class of misuse (like the one the second commit fixes). In particular, now with the "stack-allocated" (that, indeed, should actually escape) timer it won't ever be possible to share the timer by mistake. It is still possible for this problem to occur if the timer is created via
NewTimerand the contract ofStopnot being honored.I'm somewhat on the fence about the first commit, but I'm leaning towards being defensive and merging it due to:
- if the timer originally didn't come from the pool, then it seems somewhat strange to put it into the pool on
Stop- this mechanism does protect from misuse - with low overhead - at least in some cases, and even very experienced engineers can make mistakes like this (the bug fixed in the second commit was introduced by Andrew W in 4ef67a0).
Is there any reasonable use case for a stack allocated timer? If not, why not prevent stack allocated timers by making Timer a point to an internal type struct:
type Timer = *t
type t struct {
timer *time.Timer
// C is a local "copy" of timer.C that can be used in a select case before
// the timer has been initialized (via Reset).
C <-chan time.Time
Read bool
}Then trying to use an uninitialized var t Timer will result in a NPE, which should force all timers to be allocated via the pool.
Currently, whenever `Timer.Stop` is called, we unconditionally put that timer into `timerPool`. This works well assuming that the contract of `Stop` is satisfied - namely that the stopped timer can no longer be used. However, we have at least one case where this contract is violated (fixed in the following commit), and in such a scenario it is possible for multiple users to access the same timer object which can lead to an undefined behavior (we've seen a deadlock on a test server shutdown). This commit hardens the timer code to prevent a class of such issues by putting the timer into the pool on `Stop` only if the timer originally came from the pool. Release note: None
The contract of `timeutil.Timer.Stop` is such that the timer can no longer be used, and it was violated in `Registry.poll`. This is now fixed by allocating a fresh timer after each `Stop` call. Release note: None
a2245e2 to
196dfd5
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @mgartner and @nvanbenschoten)
pkg/util/timeutil/timer.go line 110 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is there any reasonable use case for a stack allocated timer? If not, why not prevent stack allocated timers by making
Timera point to an internal type struct:type Timer = *t type t struct { timer *time.Timer // C is a local "copy" of timer.C that can be used in a select case before // the timer has been initialized (via Reset). C <-chan time.Time Read bool }Then trying to use an uninitialized
var t Timerwill result in a NPE, which should force all timers to be allocated via the pool.
On a quick search there are like 20 places where the timers are allocated on the "stack" and don't use the pool. For example, replicaScanner from server/scanner.go - it seems reasonable to me that we embed the timer into the parent struct to avoid accessing timerPool (although, later, on Reset we'll still access the shared timeTimerPool). To me this seems like a reasonable use case since we avoid an extra pointer to track / extra access to timerPool - WDYT?
|
Previously, yuzefovich (Yahor Yuzefovich) wrote…
I see, makes sense. No need to disallow stack allocation here then. |
|
TFTRs! bors r+ |
|
Build succeeded: |
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale)
pkg/util/timeutil/timer.go line 110 at r1 (raw file):
Is the idea that stack allocated timers (which I assume still escape to the heap and would always benefit from just using NewTimer) can now be re-used after Stop is called? In other words, is this allowing us to change the contract?
So can the contact be relaxed now for timers that didn't come from the pool? Seems like code actually does want to reuse timers after Stop.
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
Informs cockroachdb#119593. Closes cockroachdb#38055. This commit removes the pooling of timeutil.Timer structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (cockroachdb#61373 and cockroachdb#119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` Epic: None Release note: None
119901: timeutil: stack-allocate Timer, remove pooling r=nvanbenschoten a=nvanbenschoten Informs #119593. Closes #38055. This PR removes the pooling of `timeutil.Timer` structs and, in doing so, permits the structs to be stack allocated so that no pooling is necessary. This superfluous (and in hindsight, harmful) memory pooling was introduced in f11ec1c, which also added very necessary pooling for the internal time.Timer structs. The pooling was harmful because it mandated a contract where Timer structs could not be used after their Stop method was called. This was surprising (time.Timer has no such limitation) and led to subtle use-after-free bugs over time (#61373 and #119595). It was also unnecessary because the outer Timer structs can be stack allocated. Ironically, the only thing that causes them to escape to the heap was the pooling mechanism itself. Removing pooling solves the issue. ``` name old time/op new time/op delta Timer-10 153µs ± 1% 152µs ± 1% ~ (p=0.589 n=10+9) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 200B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` ---- The PR then improves the memory pooling of the inner `time.Timer` so that it is always recycled. This was originally identified by `@andreimatei` in #13466 (review). Doing so has a positive impact on the microbenchmark introduced in the first commit, demonstrating that timers can be stack-allocated and require zero heap allocations: ``` name old time/op new time/op delta Timer-10 152µs ± 1% 153µs ± 1% ~ (p=0.133 n=9+10) name old alloc/op new alloc/op delta Timer-10 200B ± 0% 0B -100.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Timer-10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10) ``` ---- cc. `@andreimatei` `@ajwerner` Epic: None Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
timeutil: prevent illegal sharing of Timers
Currently, whenever
Timer.Stopis called, we unconditionally put that timer intotimerPool. This works well assuming that the contract ofStopis satisfied - namely that the stopped timer can no longer be used.However, we have at least one case where this contract is violated (fixed in the following commit), and in such a scenario it is possible for multiple users to access the same timer object which can lead to an undefined behavior (we've seen a deadlock on a test server shutdown).
This commit hardens the timer code to prevent a class of such issues by putting the timer into the pool on
Stoponly if the timer originally came from the pool.Release note: None
stmtdiagnostics: fix incorrect usage of timeutil.Timer
The contract of
timeutil.Timer.Stopis such that the timer can no longer be used, and it was violated inRegistry.poll. This now fixed by allocating a fresh timer after eachStopcall.Fixes: #119593.
Release note: None