-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Imagine the system with 4 nodes p1, p2, p3 and p4, where p4 is faulty process. Assume that p3 is initially disconnected with the rest of the nodes, and at time t p1, p2 and p4 are in round 10. Faulty process is always sending nil so it does not help correct processes to commit a value.
After time t, we enter synchronous period and p3 is now able to communicate in timely and reliable manner with other correct processes. p4 is still sending nil so p1, p2 and p4 keep proceeding in rounds and in parallel they are helping p3 to catch up. Note that although other correct processes are in higher round and we have common exit condition at consensus layer that allow processes to jump ahead, at the gossip layer we always send a message from the peer current round. So p3 will receive messages from round 0, and he will in the worst case wait for at least TimeoutCommit before moving to round 1. Timeouts are being increased with rounds so round synchronisation gets slower with increasing rounds. Can we guarantee that in this scenario p3 will be able to catch up with other correct processes so we can decide? Furthermore, how long in the worst case we will need to wait before this happens? Is it a problem if this is order of minutes for example. During this time period we are not delivering blocks although network is synchronous.