[Versions] Versioning the DAG to remove the need for premature round advancement
I gave some thought to the problem of advancing rounds with zero-blocks. A potential solution is to decouple batch certification from the DAG. Instead, a primary can chose to either advance a round or "version" the round. Versioning means that the primary sends a new block for the same round Block(R.version_number) which extends the previous block. If it gets certified it can be proposed upstream
When the primary advances their round they make sure to use as a parent their last version block of the previous round in order to make sure the DAG does not have many dangling blocks.
Potential benefits of this proposal is that:
- No empty blocks are made in order to help the liveness of transactions
- No empty blocks made means that the primary no longer races with the workers for the NIC since rounds do not advance unless the worker has sent at least one batch out.
- Backpressure becomes less complex as advancing a dag-round is merely a local decision
So it means that if a primary is ready to advance round (ie. it has 2f+1 certificates from the previous round) but has no payload to propose, it can instead propose a new version of the current round (rather than advancing round)?
The exact opposite. When a node has transactions to propose but not enough parents, they propose a new version of this round.
Got it. So this prevents us from having a potentially always-growing backlog of batch digests on the primary of popular nodes. Popular nodes are nodes that always have abundant client's transactions and thus always produce non-empty batches.
Did I get it correctly? If so, aren't we getting the properties you mentioned at the cost of increasing latency (which may be a fine tradeoff)?
We can still put a max_delay to get empty blocks, but this is mostly useful for Narwhal as a mempool otherwise we also slow down coinflips. To answer your question, yes this could increase latency if the load is so imbalanced that >f nodes do not have any transactions for a long time.
Got it. Indeed I see the direct benefit for Narwhal as mempool. For Bullshark/Tusk it is more delicate indeed (but once in the codebase, it may as well be used).
There is already a timer for that. We currently only produce empty-headers if we receive (i) a quorum of parent certificates, and (ii) the timer triggers. This timer is configurable through the system's parameters.