narwhal icon indicating copy to clipboard operation
narwhal copied to clipboard

[Versions] Versioning the DAG to remove the need for premature round advancement

Open LefKok opened this issue 4 years ago • 5 comments

I gave some thought to the problem of advancing rounds with zero-blocks. A potential solution is to decouple batch certification from the DAG. Instead, a primary can chose to either advance a round or "version" the round. Versioning means that the primary sends a new block for the same round Block(R.version_number) which extends the previous block. If it gets certified it can be proposed upstream

When the primary advances their round they make sure to use as a parent their last version block of the previous round in order to make sure the DAG does not have many dangling blocks.

Potential benefits of this proposal is that:

  1. No empty blocks are made in order to help the liveness of transactions
  2. No empty blocks made means that the primary no longer races with the workers for the NIC since rounds do not advance unless the worker has sent at least one batch out.
  3. Backpressure becomes less complex as advancing a dag-round is merely a local decision

LefKok avatar Mar 27 '22 14:03 LefKok

So it means that if a primary is ready to advance round (ie. it has 2f+1 certificates from the previous round) but has no payload to propose, it can instead propose a new version of the current round (rather than advancing round)?

asonnino avatar Jun 15 '22 16:06 asonnino

The exact opposite. When a node has transactions to propose but not enough parents, they propose a new version of this round.

LefKok avatar Jun 15 '22 16:06 LefKok

Got it. So this prevents us from having a potentially always-growing backlog of batch digests on the primary of popular nodes. Popular nodes are nodes that always have abundant client's transactions and thus always produce non-empty batches.

Did I get it correctly? If so, aren't we getting the properties you mentioned at the cost of increasing latency (which may be a fine tradeoff)?

asonnino avatar Jun 15 '22 16:06 asonnino

We can still put a max_delay to get empty blocks, but this is mostly useful for Narwhal as a mempool otherwise we also slow down coinflips. To answer your question, yes this could increase latency if the load is so imbalanced that >f nodes do not have any transactions for a long time.

LefKok avatar Jun 15 '22 16:06 LefKok

Got it. Indeed I see the direct benefit for Narwhal as mempool. For Bullshark/Tusk it is more delicate indeed (but once in the codebase, it may as well be used).

There is already a timer for that. We currently only produce empty-headers if we receive (i) a quorum of parent certificates, and (ii) the timer triggers. This timer is configurable through the system's parameters.

asonnino avatar Jun 15 '22 16:06 asonnino