sweeper: avoid deadlock on shutdown#4851
Conversation
carlaKC
left a comment
There was a problem hiding this comment.
Fix looks good to me, would it be possible to add a unit test for this case, or is it too involved?
bhandras
left a comment
There was a problem hiding this comment.
Looks good just have a question with regard to how the deadlock can actually happen.
sweep/sweeper.go
Outdated
There was a problem hiding this comment.
In which case does the block notifier shut down? Aren't sweeper clients listen to the same quit channel?
There was a problem hiding this comment.
When lnd is signalled to shut down, the notifier will be stopped. This means the the blockEpoch channel will close and the collector goroutine in the sweeper will exit.
This all happens before the sweeper's quit channel has been closed, and any of the subsystems attemping to sweep an input at this point will block which stalls lnd shut down.
In the case I saw the contractcourt resolvers would attempt to sweep inputs, but since the notifier already had exited, the collector go routine wouldn't pick up their request. This caused shutdown to deadlock, since the contractcourt subsystems are stopped before the sweeper, and they would never exit.
ca9b9cd to
ae9bdb4
Compare
|
Unit test added and comments addressed, PTAL! |
sweep/sweeper_test.go
Outdated
sweep/sweeper_test.go
Outdated
There was a problem hiding this comment.
nit: defaultTestTimeout?
nitnit: newline between cases
ae9bdb4 to
d64c988
Compare
We risked deadlocking on shutdown if a client (in our case a contract resolver) attempted to schedule a sweep of an input after the ChainNotifier had been shut down. This would cause the `collector` goroutine to exit, and not handle incoming requests, causing a deadlock (since the ChainArbitrator is being stopped before the Sweeper in the server). To fix this we could change the order these subsystems are stopped, but this doesn't ensure there aren't other clients that could end up in the same deadlock scenario. So instead we keep handling the incoming requests even after the collector has exited (immediatly returning an error), until the sweeper is signalled to shutdown.
d64c988 to
77daa3d
Compare
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in lightningnetwork#4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
This commit removes the hack introduced in #4851. Previously we had this issue because the chain notifier was stopped before the sweeper, which was changed a while back and we now always stop the chain notifier last. In addition, since we no longer subscribe to the block epoch chan directly, this issue can no longer happen.
We risked deadlocking on shutdown if a client (in our case a contract
resolver) attempted to schedule a sweep of an input after the
ChainNotifier had been shut down. This would cause the
collectorgoroutine to exit, and not handle incoming requests, causing a deadlock
(since the ChainArbitrator is being stopped before the Sweeper in the
server).
To fix this we could change the order these subsystems are stopped, but
this doesn't ensure there aren't other clients that could end up in the
same deadlock scenario. So instead we keep handling the incoming
requests even after the collector has exited (imemdiately returning an
error), until the sweeper is signalled to shutdown.