Conversation
|
Making this a network parameter is great! I fully support doing that! I actually think we can do one-step better as well though. Right now a block's time is roughly set by 4 steps
Each of the steps could have variable delay for whatever reason. It would be nice if we made timeout_commit react to the time-since-last-block. In PBTS, we could do this by saying, block N+1 proposer should wait And then we lower the |
What I like about the current proposal is that it doesn't restrict the ABCI dev in any way. There's no |
|
Oh true, didnt think about that. That sounds great to me! |
cason
left a comment
There was a problem hiding this comment.
I think that the ADR can be extended/reworded as follows:
- The problem description should make it more precise what we mean by predictable block times. If I understand correctly, we want the medium-to-long term average block time to converge to a desired value. To achieve that, we need to define a form of minimum block time, but also be able to changing it dynamically. Namely, if a block takes longer than expected, we should be able to render the next block(s) faster in order to converge to the desired (average) block time.
- Properly define how
timeout_commitworks, at which extent it can be adopted to achieve the predictable block times goal, and what are its limitations. In addition to the one explained in the text (it is not a global parameter), we have the fact that it is static. - There is a confusion in the text, that I assume also comes from the confusing in the operation of
timeout_commit, between block time and the delay between committing a block and proposing the next block. And this is a main concern: we cannot precisely predict the consensus delay for a block, but we can establish a minimum duration for the processing step of a block, i.e., the time from which consensus decides it and the next consensus instance effectively starts. - We need to make it clear that this approach is always best-effort, not only because delays are variable (by block size, by system size, by network conditions), but also because we have nodes that don't follow the protocol. By the way, running a validator node with a
timeout_commitparameter that differs from the one defined for the network is a form of misbehavior.
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
| `next_block_delay`. This field's semantics stays essentially the same: delay | ||
| between this block and the time when the next block is proposed. | ||
|
|
||
| If Comet goes into multiple rounds of consensus, the ABCI application can react |
There was a problem hiding this comment.
Not sure if the application is aware of the commit round. In any case, different nodes may commit at different rounds.
There was a problem hiding this comment.
different nodes may commit at different rounds.
could you please explain?
There was a problem hiding this comment.
decided_last_commit https://docs.cometbft.com/v0.38/spec/abci/abci++_methods#finalizeblock ?
There was a problem hiding this comment.
A node can commit at round 0, other might decide the same block on block 1. This happens if part of the nodes don't see the precommits of round 0 timely and proceed to round 1.
The canonical commit, in fact, is the one observed by the proposer (for the previous block) and added to the proposed block.
There was a problem hiding this comment.
Not sure if the application is aware of the commit round.
does the ABCI dev need it? They can simply use real time, no?
There was a problem hiding this comment.
Yes, it should use real time, in particular because the duration of rounds can be pretty variable.
There was a problem hiding this comment.
But the problem of real time is, of course, that different nodes will process a block at (slightly) different times.
There was a problem hiding this comment.
I have read up to here, so what I say here might be redundant with text below.
My impression is that most applications, in order to leverage this feature:
- will need to use real --wallclock-- time (having Berachain's use case in mind, where you want to conform to regular wall-clock deadlines)
- will mandate its nodes to have synchronized clocks (NTP, or other). This is not a big deal since we also require this for PBTS
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
| Move `timeout_commit` into `FinalizeBlockResponse` as `next_block_delay`. This | ||
| field's semantics stays essentially the same: delay between the time when the | ||
| current block is committed and the next height is started. The idea is | ||
| literally to have the same behavior as `timeout_commit`, while allowing the | ||
| application to pick a different value for each height. |
There was a problem hiding this comment.
fwiw, if I'm interpreting this correctly we've tried this here
in our network tests, everything worked as expected and we have very consistent block times. However when we shipped this to a testnet, we frequently failed to reach consensus.
replicating and debugging this issue by changing the clocks of the network instances is next on our list of todos after we switch network testing backends celestiaorg/celestia-core#1333
There was a problem hiding this comment.
Thanks for flagging this. We will keep this in mind when implementing this feature.
sergio-mena
left a comment
There was a problem hiding this comment.
Thanks @melekes for this ADR!
I support this way of dealing with timeout_commit over the alternative that has been discussed internally in the past months: moving timeout_commit to ConsensusParams. Your solution is more versatile and powerful for apps and I believe the upgrade path can be made painless (unlike the "move timeout_commit to ConsensusParams" solution).
| `timeout_commit` dynamically (if such feature existed) to give the state | ||
| machine some extra time to finish execution. |
There was a problem hiding this comment.
If a block takes a lot of time to execute, that time is expected to be roughly the same at all nodes (although there may be HW differences). So I don't clearly see how increasing timeout_commit in this case would help. While all nodes are executing a long block, the consensus algorithm stays blocked until FinalizeBlock returns.
There was a problem hiding this comment.
Agree. I copied this from tendermint/tendermint#5911 (comment). @ValarDragon @sunnya97 can you please clarify what you're hoping to archive in tendermint/tendermint#5911?
There was a problem hiding this comment.
Because timeout_commit encompasses the block execution.
So, faster nodes will start next height (proposing blocks or expecting for proposals) before slower nodes.
| `next_block_delay`. This field's semantics stays essentially the same: delay | ||
| between this block and the time when the next block is proposed. | ||
|
|
||
| If Comet goes into multiple rounds of consensus, the ABCI application can react |
There was a problem hiding this comment.
I have read up to here, so what I say here might be redundant with text below.
My impression is that most applications, in order to leverage this feature:
- will need to use real --wallclock-- time (having Berachain's use case in mind, where you want to conform to regular wall-clock deadlines)
- will mandate its nodes to have synchronized clocks (NTP, or other). This is not a big deal since we also require this for PBTS
docs/references/architecture/adr-115-predictable-block-times.md
Outdated
Show resolved
Hide resolved
as `next_block_delay` ADR-115: #2966
as `next_block_delay` ADR-115: #2966 Closes #2655 --- #### PR checklist - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Sergio Mena <sergio@informal.systems>
…bft#3089) as `next_block_delay` ADR-115: cometbft#2966 Closes cometbft#2655 --- - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Sergio Mena <sergio@informal.systems>
…bft#3089) as `next_block_delay` ADR-115: cometbft#2966 Closes cometbft#2655 --- - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Sergio Mena <sergio@informal.systems>
as `next_block_delay` ADR-115: #2966 Closes #2655 --- - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Sergio Mena <sergio@informal.systems>
…bft#3089) as `next_block_delay` ADR-115: cometbft#2966 Closes cometbft#2655 --- - [ ] ~~Tests written/updated~~ - [x] Changelog entry added in `.changelog` (we use [unclog](https://github.com/informalsystems/unclog) to manage our changelog) - [x] Updated relevant documentation (`docs/` or `spec/`) and code comments - [x] Title follows the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) spec --------- Co-authored-by: Sergio Mena <sergio@informal.systems> (cherry picked from commit 259793a)
RENDERED
Refs #2655
PR checklist
Tests written/updatedChangelog entry added in.changelog(we use unclog to manage our changelog)docs/orspec/) and code comments