Skip to content

fix a potential deadlock (restartmanager/restartmanager.go)#42576

Closed
charlesxsh wants to merge 1 commit intomoby:masterfrom
charlesxsh:fix-deadlock-3
Closed

fix a potential deadlock (restartmanager/restartmanager.go)#42576
charlesxsh wants to merge 1 commit intomoby:masterfrom
charlesxsh:fix-deadlock-3

Conversation

@charlesxsh
Copy link
Contributor

Signed-off-by: Shihao Xia charlesxsh@hotmail.com

- What I did

- How I did it

ch := make(chan error)
go func() {
timeout := time.NewTimer(rm.timeout)
defer timeout.Stop()
select {
case <-rm.cancel:
ch <- ErrRestartCanceled
close(ch)
case <-timeout.C:
rm.Lock()
close(ch)
rm.active = false
rm.Unlock()
}
}()
return true, ch, nil

If line 117 happened first, and there is no receiver for ch created at line 108, line 115 might be blocked forever.

One example is

should, _, err := rm.ShouldRestart(0, false, duration)

The channel returned is directly been omitted.

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Shihao Xia <charlesxsh@hotmail.com>
@AkihiroSuda AkihiroSuda changed the title fix a potential deadlock fix a potential deadlock (moby/restartmanager/restartmanager.go) Jun 29, 2021
@AkihiroSuda AkihiroSuda changed the title fix a potential deadlock (moby/restartmanager/restartmanager.go) fix a potential deadlock (restartmanager/restartmanager.go) Jun 29, 2021
@thaJeztah
Copy link
Member

Looks like another flaky libnetwork test; TestCreateParallel. Opened #42582

=== RUN   TestCreateParallel
time="2021-06-29T04:56:20Z" level=warning msg="bridge store not initialized. kv object docker/network/v1.0/bridge/net26/ is not added to the store"
time="2021-06-29T04:56:20Z" level=warning msg="bridge store not initialized. kv object docker/network/v1.0/bridge/net70/ is not added to the store"
    bridge_test.go:1133: Success should be 1 instead: 2
--- FAIL: TestCreateParallel (0.09s)

@songlh
Copy link

songlh commented Jul 23, 2021

Can we merge the pull request? The change can fix the potential deadlock, and it seems that it does not have any bad side effects.

@samuelkarp
Copy link
Member

Not LGTM

If line 117 happened first, and there is no receiver for ch created at line 108, line 115 might be blocked forever.

Line 117 and 115 are mutually exclusive. If one case executes, the other doesn't.

@tianon
Copy link
Member

tianon commented Jul 23, 2021

@songlh what's driving your interest in this? Is there a concrete edge case this resolves that you're running into? If we had a way to reproduce the deadlock, this would be much more interesting.

As it stands, we don't really have enough information here to determine whether this is as critical as you seem to believe it to be, so it would be helpful to get more details.

@songlh
Copy link

songlh commented Jul 29, 2021

@tianon this is not critical. The bug is found by our dynamic tool. We just want to know programmers' attitude awards this bug.

@cpuguy83
Copy link
Member

cpuguy83 commented Sep 2, 2021

This is a bit more tricky to tell what affect this is going to have. I'm hesitant to take this change.

@thaJeztah
Copy link
Member

If it's not critical, and we're not 100% sure if there may be side-effects, I'll go ahead and close this one for now. Feel free to continue the discussion though!

@thaJeztah thaJeztah closed this Sep 2, 2021
@charlesxsh
Copy link
Contributor Author

Not LGTM

If line 117 happened first, and there is no receiver for ch created at line 108, line 115 might be blocked forever.

Line 117 and 115 are mutually exclusive. If one case executes, the other doesn't.

More precise description should be:
We cannot guarantee ch has a receiver when <-rm.cancel happened in all possible situations. Many rm.ShouldRestart(...) calls directly ignore the channel return value, which directly make ch has no receiver. That's I think making it buffer increase 1 can ensure sending operation will not block its child goroutine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants