Skip to content

lint: add new deferunlockcheck pass#107577

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
Santamaura:deferunlocktest
Sep 12, 2023
Merged

lint: add new deferunlockcheck pass#107577
craig[bot] merged 3 commits intocockroachdb:masterfrom
Santamaura:deferunlocktest

Conversation

@Santamaura
Copy link
Copy Markdown
Contributor

This commit adds a new pass which checks for
...Unlock() expressions without defer which
could result in a lock being held indefinitely.

Resolves #105366

Release note: None

@Santamaura Santamaura requested a review from a team as a code owner July 25, 2023 20:37
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@Santamaura Santamaura requested review from a team and ericharmeling and removed request for a team July 25, 2023 20:38
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also arrange for the check to be disabled if there's a special // nolint:XXX comment next to an Unlock call to say to disable the linter?

See the other linters - we have // nolint:errwrap, // nolint:errcmp etc.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

Copy link
Copy Markdown
Contributor Author

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check does account for this 😄 the exclusion rule is // nolint:deferunlock

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

Copy link
Copy Markdown
Collaborator

@rickystewart rickystewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CI is failing, have a look.

Copy link
Copy Markdown
Collaborator

@rickystewart rickystewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, add //pkg/testutils/lint/passes/deferunlockcheck along with the other lint checks in the list in top-level BUILD.bazel as part of nogo.

@rickystewart
Copy link
Copy Markdown
Collaborator

Looks like you will have to go into build/bazelutil/nogo_config.json and add a stanza for deferunlockcheck so that we only check first-party code. You can copy one of the existing blocks. Let me know if you have any questions.

Copy link
Copy Markdown
Contributor Author

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

"deferunlockcheck": {

"only\_files": {

"cockroach/pkg/.\*$": "first-party code",

"cockroach/bazel-out/.\*/bin/pkg/.\*$": "first-party code"

}

},

to the nogo_config.json but build still seems to fail on external code..anything additional needed?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

@rickystewart
Copy link
Copy Markdown
Collaborator

What's going on with the backslash?

"only\_files":

Copy link
Copy Markdown
Contributor Author

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd it wasn't a typo, it's inserted when copy-pasting
Screenshot 2023-07-26 at 12.32.33 PM.png

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

@rickystewart
Copy link
Copy Markdown
Collaborator

You spelled the name of your linter wrong. It's called deferunlockcall, not deferunlockcheck.

This diff works. Should you copy-paste it, be very careful you do not insert backslashes.

diff --git a/build/bazelutil/nogo_config.json b/build/bazelutil/nogo_config.json
index cddfbc90d5e..b5e9b80b09e 100644
--- a/build/bazelutil/nogo_config.json
+++ b/build/bazelutil/nogo_config.json
@@ -28,6 +28,12 @@
             "_test\\.go$": "tests"
         }
     },
+    "deferunlockcall": {
+        "only_files": {
+            "cockroach/pkg/.*$": "first-party code",
+            "cockroach/bazel-out/.*/bin/pkg/.*$": "first-party code"
+        }
+    },
     "errcheck": {
         "exclude_files": {
             "pkg/.*\\.eg\\.go$": "generated code",

@rickystewart
Copy link
Copy Markdown
Collaborator

It looks like you use deferunlockcheck in most of the places besides the actual name of the Analyzer. Is that intentional? You may find it easier just to update the Analyzer name to deferunlockcheck if that's what you want it to be called.

@rickystewart
Copy link
Copy Markdown
Collaborator

Oh, and the PR calls it deferunlocktest. :P It will probably be much less confusing for everyone if we select one name.

@Santamaura Santamaura changed the title lint: add new deferunlocktest pass lint: add new deferunlockcheck pass Jul 26, 2023
Copy link
Copy Markdown
Contributor Author

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that was the problem thanks! In this scenario should I nolint the current offenders to pass CI?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling)

@rickystewart
Copy link
Copy Markdown
Collaborator

In this scenario should I nolint the current offenders to pass CI?

If you do, at least file bugs so we can keep track of the current offenders. Otherwise they'll just get lost.

@Santamaura
Copy link
Copy Markdown
Contributor Author

There is a group effort internally right now to address unsafe locks so I will hold off on merging this until that work progresses since it will resolve the problems this linter is picking up anyways.

@Santamaura
Copy link
Copy Markdown
Contributor Author

@rickystewart I have PRs that are addressing #107946 but there will still be many manual Unlocks which this linter will catch, is there a way to set the linter such that it would only capture new cases or is the only way to nolint all the remaining safe cases?

@rickystewart
Copy link
Copy Markdown
Collaborator

@rickystewart I have PRs that are addressing #107946 but there will still be many manual Unlocks which this linter will catch, is there a way to set the linter such that it would only capture new cases or is the only way to nolint all the remaining safe cases?

You want to nolint all the remaining cases. Otherwise it becomes impossible to lint unless you perform a diff between the commit in question and the commit before this one, which is very confusing -- we don't want lints depending on git history in some way.

@Santamaura Santamaura force-pushed the deferunlocktest branch 3 times, most recently from 0dc79ce to 3904acc Compare August 11, 2023 14:39
@rafiss
Copy link
Copy Markdown
Collaborator

rafiss commented Aug 16, 2023

Make automatic PR comments -- let the reviewer and the author sort things out. No need for lint that prevents legitimate uses.

IMO a linter is exactly the right place to "start a conversation" between the reviewer and author about what needs to be sorted out. I don't think automatic PR comments are as effective at that.

Seems like the major concern here would be resolved if we made sure there were fewer nolint exceptions before merging this.

@renatolabs
Copy link
Copy Markdown

I'm also in favour of some form of linting for this pattern, thanks for working on it.

Others already covered the reasons why I think it's important to have tooling to stop these kind of errors instead of relying on human reviews so I won't repeat that; at the end of the day, people are fallible and we will miss issues in the code when it's sufficiently complex.

I also agree with the general sentiment that we could make the linter a little smarter. Things like:

non-linear control flow in-between (i.e. flag if there's a condition, function call or a loop in the middle).

no more than 5 lines of code between the lock and the unlock. No function or method calls in the critical section

make sense to me and would help us focus on the more likely to be problematic "unsafe" uses of Unlock.

We'd have to do something where we track the Lock() calls and record their positions. I think this is possible but haven't worked with the linter tooling in a little while to know how much work it is off hand.

This shouldn't be too much work. We can keep track of which object is being locked and unlocked in a way that is similar to [1]. Line numbers can also be computed by calling Position on the underlying *token.FileSet [2]. Feel free to ping me if working on something like this, @Santamaura, I've used these Analyzer APIs before and would be happy to help if I can.

[1]

closures map[*ast.Object]*ast.Ident

[2] https://pkg.go.dev/go/token#FileSet.Position

if !ok {
return
}
if sel.Sel != nil && (sel.Sel.Name == "Unlock" || sel.Sel.Name == "RUnlock") && !passesutil.HasNolintComment(pass, n, noLintName) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to make sure we're only reporting issues with Unlock calls called on instances of sync{,util}.{RW}Mutex. Currently if a custom struct implements an Unlock function, the linter will complain about it and we probably don't want that, as the semantics are unknown to the linter.

Some similar code:

// isGoRoutineFunction takes a call expression node and returns
// whether that call is being made to one of the functions in the
// GoRoutineFunctions slice.
func (v *Visitor) isGoRoutineFunction(call *ast.CallExpr) bool {

@renatolabs
Copy link
Copy Markdown

The linter will also currently complain about code like below, and it probably shouldn't:

mu.Lock()
defer func() {
	cancel()
    mu.Unlock()
}()

@Santamaura
Copy link
Copy Markdown
Contributor Author

Made the "smarter" linter but still doing some testing. Need to update the nogo_config once done with testing.

@Santamaura Santamaura force-pushed the deferunlocktest branch 5 times, most recently from fb21941 to 70f1b24 Compare August 28, 2023 20:32
@Santamaura
Copy link
Copy Markdown
Contributor Author

OK this should be RFAL

@Santamaura Santamaura requested a review from renatolabs August 28, 2023 21:31
@Santamaura
Copy link
Copy Markdown
Contributor Author

Should also mention my team wants me to timebox any more time spent on this linter since it's not really obs inf domain so if this iteration ain't it then I'll probably need to hand it off to dev inf.

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking time to address the concerns around this linter.

For me, the inclusion of the "close enough" exception makes this something worth trying.

I'm not convinced we need to report an error on for/range statements. but can live with it if y'all feel it is important.

I briefly read over the linter code, but it should probably have some review from someone more familiar with the relevant APIs.

@Santamaura
Copy link
Copy Markdown
Contributor Author

Santamaura commented Aug 31, 2023

Not too sure who to get a review from maybe @renatolabs since you're familiar with the analyzer apis?

// shouldReportNonLinearControlFlow checks if locks has a *LockInfo without a matching unlock
// and if the node has a nolint comment near it. If it finds both it returns true.
func (lt *LockTracker) shouldReportNonLinearControlFlow(node ast.Node) bool {
hasNoLintComment := passesutil.HasNolintComment(lt.pass, node, noLintName)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you can immediately if hasNoLintComment { return false }, right? We will always return false in this case with the way this function is currently written.

}

var (
noLintName = "deferunlock"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: noLintName and maxLineDistance should be const, not var.

if lockDistance > maxLineDistance && !passesutil.HasNolintComment(lt.pass, node, noLintName) {
lt.issues = append(lt.issues, analysis.Diagnostic{
Pos: unlockPos,
Message: fmt.Sprintf("{RW}Mutex {R}Unlock is >%d lines away from matching {R}Lock", maxLineDistance),
Copy link
Copy Markdown
Collaborator

@rickystewart rickystewart Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message here (and on line 311) is not great. Note that this is the message that will be displayed to devs when they violate the lint rule. In particular it's unclear what someone is supposed to do to solve the problem. e.g. "Oh, I can't have more than 5 lines between the lock and unlock, so I need to start combining multiple lines of code into one line with semicolons" or something like that.

The error message should succinctly describe what the actual problem is and suggest a fix like "move the Unlock() call to a defer statement immediately after the Lock() call". If you produce something the developer can directly copy-paste (like defer x.y.whatever.Unlock() -- I think you actually already have the pieces you need to make this work), even better.

The error message on line 311 has an additional problem because what is "non-linear control flow"? Why is it a problem if I have non-linear control flow, and how do I fix it?

This commit adds a new pass which checks for
`...Unlock()` expressions without `defer` which
could result in a lock being held indefinitely.

Resolves cockroachdb#105366

Release note: None
for linter to pass

This commit adds a number of files to the nogo config for the linter
to ignore. These files have manual unlocks that are catagorized as
potentially problematic since there are unlocks >10 lines away from
the lock. The commit also applies the nolint rule to any remaining
manual unlock cases that are <=10 lines away from the lock.

Release note: None
This commit udpates the lock linter to work by
keeping track of locks and if we find any
if conditions, loops, or function calls before
finding a matching unlock it will report. It will
also report if the matching unlock is >5 lines away
from the lock. It will ignore cases where a
`nolint:deferunlock` is present.

Release note: None
Copy link
Copy Markdown
Contributor Author

@Santamaura Santamaura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ericharmeling, @renatolabs, and @rickystewart)


pkg/testutils/lint/passes/deferunlockcheck/deferunlockcheck.go line 40 at r13 (raw file):

Previously, rickystewart (Ricky Stewart) wrote…

Nit: noLintName and maxLineDistance should be const, not var.

Done


pkg/testutils/lint/passes/deferunlockcheck/deferunlockcheck.go line 286 at r13 (raw file):

Previously, rickystewart (Ricky Stewart) wrote…

The error message here (and on line 311) is not great. Note that this is the message that will be displayed to devs when they violate the lint rule. In particular it's unclear what someone is supposed to do to solve the problem. e.g. "Oh, I can't have more than 5 lines between the lock and unlock, so I need to start combining multiple lines of code into one line with semicolons" or something like that.

The error message should succinctly describe what the actual problem is and suggest a fix like "move the Unlock() call to a defer statement immediately after the Lock() call". If you produce something the developer can directly copy-paste (like defer x.y.whatever.Unlock() -- I think you actually already have the pieces you need to make this work), even better.

The error message on line 311 has an additional problem because what is "non-linear control flow"? Why is it a problem if I have non-linear control flow, and how do I fix it?

Thanks for the feedback, I made the error messages a bit simpler and added what the dev can do to fix it.


pkg/testutils/lint/passes/deferunlockcheck/deferunlockcheck.go line 298 at r13 (raw file):

Previously, rickystewart (Ricky Stewart) wrote…

Seems like you can immediately if hasNoLintComment { return false }, right? We will always return false in this case with the way this function is currently written.

Agree, done.

@Santamaura
Copy link
Copy Markdown
Contributor Author

TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 12, 2023

Build succeeded:

@craig craig bot merged commit b99bcd9 into cockroachdb:master Sep 12, 2023
@Santamaura Santamaura deleted the deferunlocktest branch September 12, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lint: add check to ensure mutex Unlock() calls are deferred

10 participants