Skip to content

p2p: resolved deadlock on p2p server shutdown#2183

Merged
MatusKysel merged 2 commits intodevelopfrom
fix-p2p-server-timeout
Jan 26, 2024
Merged

p2p: resolved deadlock on p2p server shutdown#2183
MatusKysel merged 2 commits intodevelopfrom
fix-p2p-server-timeout

Conversation

@MatusKysel
Copy link
Copy Markdown
Contributor

Description

This PR removes forceful stop of p2p server that was caused by deadlock on channels. In protoTracker() we should be waiting for signals from all handlerDoneCh, but with return on stopCh we returned and protocol handler got stuck on sending signal to handlerDoneCh. This changes removes this deadlock and also part of the code from p2p server that forcefully killed the server.

@zzzckck
Copy link
Copy Markdown
Collaborator

zzzckck commented Jan 26, 2024

Could you also post the deadlock callstack?

@zzzckck zzzckck requested a review from galaio January 26, 2024 05:43
close(srv.quit)
srv.lock.Unlock()

stopChan := make(chan struct{})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code tries to skip srv.loopWG.Wait() if stopTimeout reach, maybe this is to solve issues before?
Is it ok just remove above case <-cs.handler.stopCh:, and keep this logic unchanged?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's ok but I removed because:

  • this is hiding potential deadlock in the future
  • there is no right timeout value, it was 5s but for graceful shutdown of node with 2k connection is much more

I am ok with returning it but increasing timeout to like 30s or something, but still, geth has nothing like that in their code and they just wait

@MatusKysel MatusKysel force-pushed the fix-p2p-server-timeout branch from f08eda2 to f0d9f61 Compare January 26, 2024 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants