Skip to content

Fix deadlock in Remove (linux/inotify)#203

Merged
markbates merged 1 commit intofsnotify:masterfrom
aarondl:fix-deadlocks
Mar 29, 2017
Merged

Fix deadlock in Remove (linux/inotify)#203
markbates merged 1 commit intofsnotify:masterfrom
aarondl:fix-deadlocks

Conversation

@aarondl
Copy link
Copy Markdown
Contributor

@aarondl aarondl commented Mar 29, 2017

This resolves a fairly common and easy to hit deadlock that occurs in inotify/linux. The issues reported are quoted below.
Commit message follows:

Several people have reported this issue where if you are using a
single goroutine to watch for fs events and you call Remove in
that goroutine it can deadlock. The cause for this is that the Remove
was made synchronous by PR #73. The reason for this was to try and
ensure that maps were no longer leaking.

In this PR: IN_IGNORE was used as the event to ensure map cleanup.
This worked fine when Remove() was called and the next event was
IN_IGNORE, but when a different event was received the main goroutine
that's supposed to be reading from the Events channel would be stuck
waiting for the sync.Cond, which would never be hit because the select
would then block waiting for someone to receive the non-IN_IGNORE event
from the channel so it could proceed to process the IN_IGNORE event that
was waiting in the queue. Deadlock :)

Removing the synchronization then created two nasty races where Remove
followed by Remove would error unnecessarily, and one where Remove
followed by an Add could result in the maps being cleaned up AFTER the
Add call which means the inotify watch is active, but our maps don't
have the values anymore. It then becomes impossible to delete the
watches via the fsnotify code since it checks it's local data before
calling InotifyRemove.

This code attempts to use IN_DELETE_SELF as a means to know when a watch
was deleted as part of an unlink(). That means that we didn't delete the
watch via the fsnotify lib and we should clean up our maps since that
watch no longer exists. This allows us to clean up the maps immediately
when calling Remove since we no longer try to synchronize cleanup
using IN_IGNORE as the sync point.

What does this pull request do?

Fixes the deadlocks reported in the quoted issues.

How should this be manually tested?

Make a small program that simply watches a directory, and on response to the create event, delete the watch from the directory. Once that's done, run the program, and create a file inside the directory. If you delete and create the file rapidly enough times the deadlock should occur with the old code, but will no longer occur with this code.

I've signed the CLA and REALLY didn't want to. I'm fairly upset that this repo requires this, and then I read this only to find out I signed it without reason: golang/go#4068 (comment)
We should remove the requirement from the CONTRIBUTING.md immediately.

Several people have reported this issue where if you are using a
single goroutine to watch for fs events and you call Remove in
that goroutine it can deadlock. The cause for this is that the Remove
was made synchronous by PR fsnotify#73. The reason for this was to try and
ensure that maps were no longer leaking.

In this PR: IN_IGNORE was used as the event to ensure map cleanup.
This worked fine when Remove() was called and the next event was
IN_IGNORE, but when a different event was received the main goroutine
that's supposed to be reading from the Events channel would be stuck
waiting for the sync.Cond, which would never be hit because the select
would then block waiting for someone to receive the non-IN_IGNORE event
from the channel so it could proceed to process the IN_IGNORE event that
was waiting in the queue. Deadlock :)

Removing the synchronization then created two nasty races where Remove
followed by Remove would error unnecessarily, and one where Remove
followed by an Add could result in the maps being cleaned up AFTER the
Add call which means the inotify watch is active, but our maps don't
have the values anymore. It then becomes impossible to delete the
watches via the fsnotify code since it checks it's local data before
calling InotifyRemove.

This code attempts to use IN_DELETE_SELF as a means to know when a watch
was deleted as part of an unlink(). That means that we didn't delete the
watch via the fsnotify lib and we should clean up our maps since that
watch no longer exists. This allows us to clean up the maps immediately
when calling Remove since we no longer try to synchronize cleanup
using IN_IGNORE as the sync point.

- Fix fsnotify#195
- Fix fsnotify#123
- Fix fsnotify#115
@nullbio
Copy link
Copy Markdown

nullbio commented Mar 29, 2017

An example of a library with problems that this patch fixes: markbates/refresh#6

@markbates markbates merged commit 4da3e2c into fsnotify:master Mar 29, 2017
@purpleidea
Copy link
Copy Markdown
Contributor

@aarondl RE:

I've signed the CLA and REALLY didn't want to. I'm fairly upset that this repo requires this, and then I read this only to find out I signed it without reason: golang/go#4068 (comment)

I too am strongly against this! +1

Thanks for the patch. Look forward to testing this in https://github.com/purpleidea/mgmt/ shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants