Skip to content

Round synchronisation could be very slow #1496

@milosevic

Description

@milosevic

Imagine the system with 4 nodes p1, p2, p3 and p4, where p4 is faulty process. Assume that p3 is initially disconnected with the rest of the nodes, and at time t p1, p2 and p4 are in round 10. Faulty process is always sending nil so it does not help correct processes to commit a value.

After time t, we enter synchronous period and p3 is now able to communicate in timely and reliable manner with other correct processes. p4 is still sending nil so p1, p2 and p4 keep proceeding in rounds and in parallel they are helping p3 to catch up. Note that although other correct processes are in higher round and we have common exit condition at consensus layer that allow processes to jump ahead, at the gossip layer we always send a message from the peer current round. So p3 will receive messages from round 0, and he will in the worst case wait for at least TimeoutCommit before moving to round 1. Timeouts are being increased with rounds so round synchronisation gets slower with increasing rounds. Can we guarantee that in this scenario p3 will be able to catch up with other correct processes so we can decide? Furthermore, how long in the worst case we will need to wait before this happens? Is it a problem if this is order of minutes for example. During this time period we are not delivering blocks although network is synchronous.

Metadata

Metadata

Assignees

Labels

C:consensusComponent: ConsensusT:bugType Bug (Confirmed)stalefor use by stalebot

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions