Fix deadlock in Remove (linux/inotify)#203
Merged
markbates merged 1 commit intofsnotify:masterfrom Mar 29, 2017
Merged
Conversation
Several people have reported this issue where if you are using a single goroutine to watch for fs events and you call Remove in that goroutine it can deadlock. The cause for this is that the Remove was made synchronous by PR fsnotify#73. The reason for this was to try and ensure that maps were no longer leaking. In this PR: IN_IGNORE was used as the event to ensure map cleanup. This worked fine when Remove() was called and the next event was IN_IGNORE, but when a different event was received the main goroutine that's supposed to be reading from the Events channel would be stuck waiting for the sync.Cond, which would never be hit because the select would then block waiting for someone to receive the non-IN_IGNORE event from the channel so it could proceed to process the IN_IGNORE event that was waiting in the queue. Deadlock :) Removing the synchronization then created two nasty races where Remove followed by Remove would error unnecessarily, and one where Remove followed by an Add could result in the maps being cleaned up AFTER the Add call which means the inotify watch is active, but our maps don't have the values anymore. It then becomes impossible to delete the watches via the fsnotify code since it checks it's local data before calling InotifyRemove. This code attempts to use IN_DELETE_SELF as a means to know when a watch was deleted as part of an unlink(). That means that we didn't delete the watch via the fsnotify lib and we should clean up our maps since that watch no longer exists. This allows us to clean up the maps immediately when calling Remove since we no longer try to synchronize cleanup using IN_IGNORE as the sync point. - Fix fsnotify#195 - Fix fsnotify#123 - Fix fsnotify#115
|
An example of a library with problems that this patch fixes: markbates/refresh#6 |
markbates
approved these changes
Mar 29, 2017
Contributor
|
@aarondl RE:
I too am strongly against this! +1 Thanks for the patch. Look forward to testing this in https://github.com/purpleidea/mgmt/ shortly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This resolves a fairly common and easy to hit deadlock that occurs in inotify/linux. The issues reported are quoted below.
Commit message follows:
What does this pull request do?
Fixes the deadlocks reported in the quoted issues.
How should this be manually tested?
Make a small program that simply watches a directory, and on response to the create event, delete the watch from the directory. Once that's done, run the program, and create a file inside the directory. If you delete and create the file rapidly enough times the deadlock should occur with the old code, but will no longer occur with this code.
I've signed the CLA and REALLY didn't want to. I'm fairly upset that this repo requires this, and then I read this only to find out I signed it without reason: golang/go#4068 (comment)
We should remove the requirement from the CONTRIBUTING.md immediately.