abciClient.BeginBlockSync should not hang on crashed server #1891
abciClient.BeginBlockSync should not hang on crashed server #1891melekes merged 4 commits intotendermint:developfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #1891 +/- ##
===========================================
+ Coverage 60.86% 60.87% +0.01%
===========================================
Files 217 217
Lines 16963 16960 -3
===========================================
Hits 10325 10325
+ Misses 5745 5740 -5
- Partials 893 895 +2
|
| cmn.Service, abcicli.Client) { | ||
| // some port between 20k and 30k | ||
| port := 20000 + cmn.RandInt32()%10000 | ||
| addr := fmt.Sprintf("localhost:%d", port) |
There was a problem hiding this comment.
can't we use :0 as port here so that the server would pick an ephemeral port for us?
There was a problem hiding this comment.
We could, and it would be better, but I didn't know how to read the port out, so we can pass the same number into the client.
I do bet this would cause a random 1 in 10000 failure on some dev machines as is, so not ideal.
There was a problem hiding this comment.
I think the server should expose Addr() method, but this is for another PR. thanks
func (s *Server) Addr() net.Addr
Addr returns the server's network address, either a *TCPAddr or *UnixAddr.
ebuchman
left a comment
There was a problem hiding this comment.
Thanks for the fix frey - tested and indeed it seems to work. Also the reasoning makes sense since we clear the reqQueue but forget about anything thats already been read off the reqQueue and marked as willSend
I'm just wary of the sleep dependent tests - think there's any way we could eliminate the sleeps?
| flush := c.FlushAsync() | ||
| // wait 20 ms for all events to travel socket, but | ||
| // no response yet from server | ||
| time.Sleep(20 * time.Millisecond) |
There was a problem hiding this comment.
this seems non-deterministic. is there some way we can test this without depending on sleeps ?
There was a problem hiding this comment.
I couldn't think of any way without major changes, as we wait for another goroutine to process an event and there is no clean way to hook into some event system.
Good thing to keep in mine for a refactor of this code.
| s.Stop() | ||
|
|
||
| // wait for the response from BeginBlock | ||
| fmt.Println("waiting for begin block") |
There was a problem hiding this comment.
can we get rid of the print statements please?
There was a problem hiding this comment.
Yeah, my bad.
I'm away from coding machines for a few days. Can you just push that onto this branch? I think I gave maintainers push access.
Addressing issue #1890
Reproduced the errant behavior, and pushing to demo that.
Working on a solution.