Skip to content

Rationalize the timeout parameters #7274

@williambanfield

Description

@williambanfield

This issue proposes changing the number of timeout parameters from 7 to 3.

We currently have 7 parameters to configuring timings for tendermint consensus. They are defined here in the local config. These parameters are as follows:

	// How long we wait for a proposal block before prevoting nil
        TimeoutPropose time.Duration `mapstructure:"timeout-propose"`
 	TimeoutProposeDelta time.Duration `mapstructure:"timeout-propose-delta"`
	// How long we wait after receiving +2/3 prevotes for “anything” (ie. not a single block or nil)
	TimeoutPrevote time.Duration `mapstructure:"timeout-prevote"`
	// How much the timeout-prevote increases with each round
	TimeoutPrevoteDelta time.Duration `mapstructure:"timeout-prevote-delta"`
	// How long we wait after receiving +2/3 precommits for “anything” (ie. not a single block or nil)
	TimeoutPrecommit time.Duration `mapstructure:"timeout-precommit"`
	// How much the timeout-precommit increases with each round
	TimeoutPrecommitDelta time.Duration `mapstructure:"timeout-precommit-delta"`
	// How long we wait after committing a block, before starting on the new
	// height (this gives us a chance to receive some more precommits, even
	// though we already have +2/3).
	TimeoutCommit time.Duration `mapstructure:"timeout-commit"`

The consensus paper does not appear to require this many timeout parameters. The paper describes the following relation between timeouts:

timeoutPropose(r) > 2∆+timeoutPrecommit(r−1)
timeoutPrevote(r) > 2∆ 
timeoutPrecommit(r) > 2∆

I.e., when the algorithm has picked the correct timeouts, timeouts greater than 2∆, the algorithm will terminate.
Here, ∆ is a bound on the message delay between any two processes. I.e., it is assumed in the paper that if a process sends a message, it will be received by any other process in at most ∆ time. ∆ is not known in advance, so the algorithm tries to determine it by increasing the timeout each round. This per-round increase is specified as timeoutDelta in the paper.

It would therefore be possible to simply fix an initial assumed estimatedDelta and a timeoutDelta, and grow all of the timeouts until they reach the system's true ∆ instead of having a different configured timeout for each step. This would allow us to use only 2 timeout parameters instead of 7. We could still include a 3rd, TimeoutCommit parameter for chains that want to allow precommits to arrive from slow validators.

This has the upside of increasing the simplicity of the configuration file and of the algorithm as written in code. This is also a UX improvement for networks/operators, as keeping track of what all of these parameters do is likely confusing.

It has the downside of forcing all steps to use the same timeout. Different steps are likely to take different amounts of times. Namely, the propose step is likely to take the longest time and the precommit and prevote steps are likely to take much less time. This is because blocks are much larger and will therefore take longer to gossip across the network. We could simply let networks to use the longest possible timeout of any step as the value, but this may cause the prevote and precommit steps to slow down.

Metadata

Metadata

Assignees

No one assigned

    Labels

    stalefor use by stalebot

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions