Upon receiving +2/3 `nil` precommits, the consensus state-machine does not proceed to the step `Propose(H,R+1)` as specified in the documentation, instead it waits for a timeout

## Bug Report

### Setup

**CometBFT version**:`v0.37.0` but also on `main` at time of writing. Code links are to `v0.37.x` in order to illustrate the issue.

**Have you tried the latest version**: yes

**ABCI app** N/A

**Environment**: N/A

**node command runtime flags**:  N/A

### What happened?

A validator took a long time to sync to a high round number.

This has downstream effects that I haven't formally proven. For example I suspect that it could be impossible (or at least very slow, only able to catch-up during the precommit step) for a validator to catch up to the latest round number in the case that +2/3 of other validators are online and incrementing the round number.

While there are theoretically ways (in the algorithm) for the validator to skip to future rounds once they see +2/3 prevotes or precommits for those future rounds, in-reality the validator may be reading serially from the consensus WAL which means they need to process all prior messages before processing newer (read: higher-round) messages. Due to the incorrect forcing of waiting for the timeout period, the serialization in slowed considerably and the validator cannot process the latest messages in a timely manner. Furthermore, this extends to the chain resuming increasing in height. If processing a long WAL backlog, the validator would be slow to catch up to the latest height because it would be waiting for timeouts on rounds from previous heights.

### What did you expect to happen?

The validator should sync to the latest round pretty fast after being gossiped (or reading from WAL) +2/3 `nil` precommits on every round before the latest.

### How to reproduce it

Reproduced by halting a chain with several completed rounds of +2/3 `nil` precommits. This could be done by, for example, hacking the code to reject all block proposals.

Then, restart a validator node at the latest height and round 0. It will take a long time to sync to the highest round because it will always timeout in the `Precommit` step even after seeing +2/3 `nil` precommits. Instead it should proceed directly to the next round.

This is further exacerbated by the increasing timeout periods that increase linearly per round, leading to the consumption of `round^2` when trying to catch up to the latest round.

### Where is the code

The incorrect code seems to be [here](https://github.com/cometbft/cometbft/blob/v0.37.x/consensus/state.go#L2224)

The spec can be found [here](https://github.com/cometbft/cometbft/blob/v0.37.x/spec/consensus/consensus.md?plain=1#L184)

[PR](https://github.com/tendermint/tendermint/pull/2540/files) where this code was introduced. [Some comment](https://github.com/tendermint/tendermint/pull/2540/files#r1250796562) on that PR that this should be changed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upon receiving +2/3 `nil` precommits, the consensus state-machine does not proceed to the step `Propose(H,R+1)` as specified in the documentation, instead it waits for a timeout #1431

Bug Report

Setup

What happened?

What did you expect to happen?

How to reproduce it

Where is the code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Upon receiving +2/3 nil precommits, the consensus state-machine does not proceed to the step Propose(H,R+1) as specified in the documentation, instead it waits for a timeout #1431

Description

Bug Report

Setup

What happened?

What did you expect to happen?

How to reproduce it

Where is the code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Upon receiving +2/3 `nil` precommits, the consensus state-machine does not proceed to the step `Propose(H,R+1)` as specified in the documentation, instead it waits for a timeout #1431