Add loopvarcapture linter#82324
Conversation
a5bd265 to
2390a7a
Compare
|
Take or leave this feedback: I wish this thing was just a tad more sophisticated: I wish it would accept it in the case where the variable is never written to again. In practice that's a tad complicated. I guess you'd want to see if the variable were captured in a different closure and to see if its address were taken and then generally looking for assignments. In the vast majority of the changes you have to make here, the variables are never changed after their initial assignment. All that being said, nice job writing a linter. The analysis framework is cool. @cockroachdb/sql-schema code LGTM |
This linter's opinion does not correspond to mine :).
I checked a couple of the cases touched by this PR, and in all of them adding some sort of new concurrency would cause all sorts of problems unrelated to the loop variable. It's very common to have iterations of a loop that use goroutines internally, and wait for all these goroutines to finish before moving to the next iteration. In fact, I think you'll find this is the common case. I don't want this linter in the way of this pattern. So I for one think I'm against this linter. Tell me the numbers again: how many bugs were actually fixed here, versus how many false positives? We already have too many linters; let's think twice before adding more. If this does go forward after all, at the very least let's fix the "violations" by making a copy of the variable in question inside the loop and have the closure capture the copy (or by using the index-only range iteration), rather than introducing parameters to the closures. Also, reorder the new linter commit at the end, so that the tests always pass (for bisections for example). |
rickystewart
left a comment
There was a problem hiding this comment.
Drive-by: add the package to the existing list of lint checks in the top-level BUILD.bazel file. No opinion expressed on the linter itself.
@ajwerner Not sure I understand your comment here. Why would it matter whether the variable is written to again? This linter only applies to loop variables (which are "assigned" in each iteration of the loop). As long as there's a chance that the Go routine will outlive the iteration where it was created, there's a data race. Let me know if I'm missing the point you were making. |
|
@andreimatei thank you for your feedback 👍 My opinion is that loop variables being captured by reference in Go routines increase the chances of data races even if it's safe to do so at the time they were introduced. At the same time, I also understand your point that the pattern of creating a Go routine and waiting for it to finish in the same iteration is a common pattern, and I didn't want this linter to get too much in the way of that.
This is indeed the common case as I mentioned in the commit messages and PR description. The majority (if not all) of the changes here don't fix data races that exist at this point in time either because the tests run sequentially or because there's extra synchronization involved. I wanted to point out that I saw this linter not as a guaranteed data-race detector, but as an indicator of a potentially dangerous pattern. IMO (and as substantiated by the paper linked in the PR description), this behaviour of closures capturing variables in scope by reference is not obvious to most people and lead to subtle bugs. This can happen even when it looks like we're waiting for the Go routines to finish at the end of the loop, like below: for _, tc := range array {
// do things
wg.Add(len(tc.queries))
for _, query := range tc.queries {
go func() {
defer wg.Done()
doSomethingWithQuery(query)
}()
}
wg.Wait()
// etc
}Maybe the issue is obvious when looking at the code above because we have our minds set to look for that pattern and beause the code is not complex enough; in more realistic scenarios, I think it's easy to miss the data race there. All of that said, I'm open to hearing everyone's opinions and if most people think it's better to not have the linter, I'm fine with that as well. 👍 Your comment also gave me an idea to change the linter so that it will recognize "obvious" synchronization happening in the loop. The linter could not report issues if one of the following is true:
I prototyped this change, and the vast majority of test cases changed here would not need to be changed if we implemented it like that. The linter will not be able to detect more sophisticated synchronization mechanisms, but I think this gets us a long way. Happy to hear thoughts and opinions on this approach or on anything else here. 🙂 |
|
I'm +1 on this. Capturing loop variables is a frequent and subtle footgun, especially for newcomers to Go. The upside here definitely outweighs the downside. |
srosenberg
left a comment
There was a problem hiding this comment.
Reviewed 4 of 6 files at r1, 2 of 2 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt and @renatolabs)
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 61 at r2 (raw file):
pass.Report(issue) } return false
What about nested loops?
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 75 at r2 (raw file):
pass *analysis.Pass // closures keeps a map from local closures that capture loop
Nit: maps a closure to the captured-by-reference loop variable.
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 105 at r2 (raw file):
// 'closures' map is updated. // // When a `go` statement is found, we look for function literals
May want to tighten this a bit by emphasizing that only the body of the closure is checked; specifically, the actuals passed into the closure formals are not checked; i.e., the following is ok because the closure has no "free" vars.
for k, v := range myMap {
go func(k, v) {
fmt.Printf("k = %v, v = %v\n", k, v)
})(k, v)
}
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 146 at r2 (raw file):
} ast.Inspect(rhs, v.funcLitInspector(func(id *ast.Ident) {
Might be worth adding above ast.Inspect(...),
// Inspect closure's body, looking for captured variables; if found, store the mapping below.
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 173 at r2 (raw file):
} // findVariableCaptures returns a function that can be passed to
Nit,
// findVariableCaptures inspects closure's body. When a reference to a loop variable is found...
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 178 at r2 (raw file):
// reference is called, the `onInvalidReference` function passed is // called. func (v *Visitor) findVariableCaptures(onInvalidReference func(*ast.Ident)) func(ast.Node) bool {
I wonder if it's more precise to call it "findFreeVarsInClosure" since "free" and "bound" variables have formal definitions [1] whereas "captured" is less formally defined :)
[1] https://en.wikipedia.org/wiki/Free_variables_and_bound_variables
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 182 at r2 (raw file):
switch expr := n.(type) { case *ast.Ident: if expr.Obj == nil {
Curious, when is it nil?
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 187 at r2 (raw file):
for _, loopVar := range v.loop.Vars { if expr.Obj == loopVar.Obj {
This implies that the loop variable occurs freely inside the closure's body by virtue of having the same Obj; might be worth a comment since Obj is not super well-defined in ast.go.
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 194 at r2 (raw file):
return false case *ast.CallExpr:
I wonder if it's better to report the issue at the declaration site? We know that the function declaration has to be used, otherwise the compiler will raise an error. However, if it's a false-positive, and the function is used frequently, then the comment to suppress it must be copied several times.
pkg/testutils/lint/passes/goroutinerefcapture/testdata/src/p/p.go line 208 at r2 (raw file):
//nolint:goroutinerefcapture badClosure()
I wonder if it's better to report the issue inside badClosure's body, i.e., at the declaration site. This way, if it's a false-positive, only one comment is needed to suppress the linter vs. k for every kth invocation.
andreimatei
left a comment
There was a problem hiding this comment.
I wanted to point out that I saw this linter not as a guaranteed data-race detector, but as an indicator of a potentially dangerous pattern. IMO (and as substantiated by the paper linked in the PR description), this behaviour of closures capturing variables in scope by reference is not obvious to most people and lead to subtle bug
From the paper:
We have deployed this detector in Uber’s 46 million lines of
Go codebase hosting 2100 distinct microservices, found over
2000 data races, and fixed over 1000 data races, spanning 790
distinct code patches submitted by 210 unique developers
over a six-month period. Based on a detailed investigation
of these data race patterns in Go, we make seven high-level
observations relating to the complex interplay between the
Go language paradigm and data races.
So their study is on races that are detected by standard tooling. We use the respective race detector. And it's being upgraded to tsan v3 in go1.19, which is supposed to make it significantly better still.
I personally remain unconvinced that the benefits here outweigh the risks. Did Uber itself implement something like this?
I prototyped this change, and the vast majority of test cases changed here would not need to be changed if we implemented it like that. The linter will not be able to detect more sophisticated synchronization mechanisms, but I think this gets us a long way.
Consider also ctxgroup.Group and errorgroup.Group as popular synchronization mechanisms.
But, I personally would much prefer the dumber govet test. Do we already run that one?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @renatolabs)
srosenberg
left a comment
There was a problem hiding this comment.
I wonder if still need to run golang.org/x/tools/go/analysis/passes/loopclosure? Technically, this linter subsumes the inspection of go, but it leaves defer open.
Reviewed 2 of 6 files at r1, 4 of 4 files at r3, 1 of 1 files at r4, 4 of 4 files at r5, 1 of 1 files at r6, 1 of 1 files at r7, 10 of 10 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt and @renatolabs)
srosenberg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @renatolabs)
pkg/testutils/lint/passes/goroutinerefcapture/goroutinerefcapture.go line 139 at r8 (raw file):
} case *ast.AssignStmt:
Looks like it might be missing IncDecStmt as done in loopclosure [1].
stevendanna
left a comment
There was a problem hiding this comment.
It's interesting that most of the violations it found here are in test code.
Regarding the overall usefulness of this linter, one of my main questions would be how long this takes to run on some of the larger packages. I imagine it is rather fast.
I'm OK turning this linter on. If we can't get to agreement, perhaps we could still commit this and make it easy to run occasionally (some of the features of the nilness linter would also benefit from this)
| // loop variable, that is also reported. | ||
| func (v *Visitor) visitLoopBody(n ast.Node) bool { | ||
| switch stmt := n.(type) { | ||
| case *ast.GoStmt: |
There was a problem hiding this comment.
Should we check *ast.DeferStmt as well?
2390a7a to
b9f5d8b
Compare
This commit removes references to loop variables in Go routines in `ccl`. This is in preparation for a linter that will detect such references, in an attempt to reduce the chances of introducing data races. Even though the captures fixed here are safe (since there was synchronization involved), we want to avoid implicitly capturing loop variables in Go routines whenever possible. Release note: None.
This is in preparation for a linter that will detect such references. Release note: None.
Release note: None.
Even though this use was safe (due to the use of the WaitGroup), capturing loop variables by reference in Go routines should not be encouraged. Release note: None.
Release note: None.
Release note: None.
Release note: None.
Release note: None.
Release note: None.
This commit introduces a new linter: `loopvarcapture`. It reports uses of loop variables captured by reference in Go routines or defer statements, a common source of data races [1]. `govet` currently has a similar linter [2]; however, that project prioritizes having no false positives at the expense of allowing false negatives. This linter, on the other hand, represents the opinion that loop variables should not be captured by reference in Go routines even when it's safe to do so. That behavior is confusing and concurrency added to related code over time could lead to the introduction of data races, potentially manifesting as bugs in the product or flakiness in the tests. These issues are hard to debug and take a lot of developer time. Developers are still able to use their own judgement and disable this linter in specific instances by using a `nolint` comment. [1] A Study of Real-World Data Races in Golang: https://arxiv.org/pdf/2204.00764.pdf [2] https://github.com/golangci/govet/blob/44ddbe260190d79165f4150b828650780405d801/rangeloop.go#L36 Resolves: cockroachdb#80803. Release note: None.
b61dffa to
b3d21b2
Compare
|
Canceled. |
|
bors r=srosenberg |
|
Build succeeded: |
|
This seems to have triggered a false positive on a random PR in bors: The PR in question is: Line 411 is I think there's no goroutines at play there, just two nested anonymous functions. As far as I can see, at least. |
It's captured inside the |
|
But look at the code. The defer does not outlive the loop iteration; it's inside a function called inline. |
That's true; this defer is always executed sequentially. @renatolabs Thoughts on whether it's easy to allow these cases? i.e., exclude defer when it's declared inside an applied closure? Personally, I would refactor to this form, Not only does it improve readability by removing the common subexpression ( |
|
I'm aware that the linter's treatment of |
This rewrite is worse, in my opinion. You're introducing an argument ( @renatolabs given that this false-positive was triggered by, I believe, literally the first bors batch after merging the linter, can I convince you to revert it until you fix it (if the fix is not imminent)? The use of anonymous functions called inline for the explicit purpose of putting a |
Yep, that's true. For example, I just merged PR #83083 that added a dozen of those. |
This is a good example from that PR, |
|
There's a PR up to handle the situation discussed here: #83418. |
|
Thank you! |
This PR introduces a new linter:
loopvarcapture. It reportsuses of loop variables captured by reference in Go routines or
deferstatements, a common source of data races [1].
govetcurrently has a similar linter [2]; however, that projectprioritizes having no false positives at the expense of allowing
false negatives. This linter, on the other hand, represents the
opinion that loop variables should not be captured by reference in Go
routines even when it's safe to do so. That behavior is confusing and
concurrency added to related code over time could lead to the
introduction of data races, potentially manifesting as bugs in the
product or flakiness in the tests. These issues are hard to debug and
take a lot of developer time.
For instance, the following code will be identified by the linter:
To fix such cases, the code is expected to not use the loop variable in the
closure, as in the examples below:
Developers are still able to use their own judgement and disable this
linter in specific instances by using a
nolintcomment.The commits after the first one fix violations found by this new linter in multiple
packages; all of the violations happened in test files. In most cases, the fixes
did not actually solve data races because most of our unit tests run
sequentially. There was a minor roachtest bug fix related to progress
logging.
Apologies for updating this many packages in one PR; the majority of the
changes were mechanical and don't require much review by the different teams.
Reviewing of the first commit should be mostly done by @cockroachdb/test-eng.
[1] A Study of Real-World Data Races in Golang: https://arxiv.org/pdf/2204.00764.pdf
[2] https://github.com/golangci/govet/blob/44ddbe260190d79165f4150b828650780405d801/rangeloop.go#L36