narwhal [timestamp MVP] Add a timestamp to Narwhal blocks

Description of the simplest thing that could possibly work

We would like to add a basic timestamp to the Narwhal block header (in the context of an external consensus, "Narwhal collection headers").

The simplest approach would be recursive, using the parents of the block itself: each block has 2f+1 parents, which each themselves have a timestamp, and taking the median of these is necessarily a value within the interval of honest timestamps (the median has a breakdown point of 50%, and a 2f+1 sample is majority-honest).

One simple approach would be to ask the validators to produce a timestamp on header minting that is greater than the median of their parents.

The timestamp would be considered correct, and voted on, as part of header verification, if the time stamp is not more than 15s in the future by the time it arrives at validators.

Synchronicity

The requirement of 15s is derived from Ethereum practice. It's got a lot of mileage and should not pose practical problems. Note that a key to not introducing any synchronicity assumptions is that the block should not be rejected if it is > 15s in the future. Instead, it should be "parked" for later examination (at a time where it's no longer >15s in the future).

Since every block proposal has a round, and those rounds advance once the voter has received 2f+1 certificates, the "shelf" of blocks with a timestamp too high to be voted on should be cleared:

either through re-examinations,
or when the round advances, whichever comes first.

Modifications

The header creation is here: https://github.com/MystenLabs/narwhal/blob/2572c24b034a447a0b23e5df08843d733e868b09/primary/src/proposer.rs#L82-L105

It should integrate a timestamp creation.

The header verification is here: https://github.com/MystenLabs/narwhal/blob/2572c24b034a447a0b23e5df08843d733e868b09/types/src/primary.rs#L149-L173

It should integrate timestamp verification as indicated above, and gate voting on the header by returning a sentinel error (aka a rust Error enum with an appropriate variant) to this caller: https://github.com/MystenLabs/narwhal/blob/2572c24b034a447a0b23e5df08843d733e868b09/primary/src/core.rs#L384-L388

This should change the return type of verify to return that sentinel error.

For remediation, we can take inspiration for the "temporary shelving and re-execution" pattern we have implemented for the block synchronizer (when we're trying to import a certificate and don't yet have all its parents): https://github.com/MystenLabs/narwhal/blob/410ed5d51b7bd2a1e67fd8b2960986c39be58da9/primary/src/synchronizer.rs#L50

Side-effect

Since this interacts with local time, it becomes a core performance issue that the node has an accurate local time in the first place. We should recommend and document that nodes should have an accurate time, e.g. using an NTP client (or better).

Beware: since Docker does not run by default with the CAP_SYS_TIME capability enabled, it's useless to integrate time synchronization in software on our side. We will have to integrate this in "running a node" instructions.

Jun 02 '22 17:06 huitseeker

Another way to derive the timestamp would be to assume all blocks are the median of the parent round plus some "round_delay". Round_delay could be the average over the difference of median timestamps of some number of previous rounds.

Then we wouldn't need all this logic for too far in the future blocks since they anyway do not affect neither the median nor the round_delay.

Jun 02 '22 20:06 LefKok

@LefKok It's a bit more complicated than what I propose (at worst, parking a bounded number of certificates of the current round), but I'm not against it. However: how does this ensure we have some "closeness" of the timestamp time with wall clock time (which is what smart contract users want in practice)? It seems that as long as the prior round timestamps drift together, they can drift arbitrarily far.

Jun 02 '22 23:06 huitseeker

The drift seems orthogonal to the 15 seconds since the 15 seconds are also relative to local time. For this part, we will need some external clock-synchronization whether a smart-contract oracle inputting GMT onchain or some lower-level protocol (NTP)

Jun 03 '22 08:06 LefKok

For this part, we will need some external clock-synchronization

I must have missed something: you described a timestamp computation mechanism where all the inputs are in the rounds of the past (some median over parent timestamps, plus a duration, itself computed as an average statistic over prior rounds). That sounds like an endogenous computation, witch I'm worried may entirely be determined by the bootstrapping values in the initial rounds. But I may have gotten it wrong?

There's nothing wrong with logical clocks, but I don't think that's what we're trying to solve here.

Jun 03 '22 11:06 huitseeker

The blocks still have a real-time (UTC) timestamp, I agree with you that this if not left unchecked can lead to arbitrary drift from the actual UTC. Hence as you mentioned in your proposal

"node needs an accurate local time in the first place. We should recommend and document that nodes should have an accurate time, e.g. using an NTP client (or better)."

This is no longer performance-critical, but if we use it as a real-time input for smart contracts then it needs to happen as well.

Jun 03 '22 12:06 LefKok

@LefKok I don't think that's the problem, if I understood your proposal right.

Let's formalize what you're suggesting above a bit (which should let you describe where I'm misinterpreting):

the block timestamp is the median of its parents, plus
a block interval, which is the average of inter-block delays (timestamp differences) along all parenthood edges in the last five rounds of our lineage (transitive closure of the block's parenthood).

Let's look at this starting from the happy case: at first, all nodes have perfect (atomic) clocks, and produce blocks perfectly synchronously (i.e. the inter-block delays are all equal) every 200ms, there is no Byzantine influence, and the whole network advances in sync: on the first round everyone timestamps at midnight, and for subsequent rounds, the timestamp advances by 200ms. The timestamp is perfectly in sync with the wall clock.

Let's now assume at noon (round N = 216000) there is a real-world delay in block production, and that it takes 300ms for everyone to produce a block, even if everyone stays in sync. The timestamp calculation does not change a bit, since past medians and past intervals are not affected by this new delay. So

the timestamp at round N+1 is 12:00.200, 100ms in the past,
the timestamp at round N+2 is 12:00.400, 200ms in the past,
the timestamp at round N+3 is 12:00.600, 300ms in the past ...

The drift with real time increases by 100ms per round, and that's despite every node being honest & equipped with an atomic clock, simply because there is no correcting factor involved in the computation. Hence my thinking there is something I missed in your proposal.

The point of defining a timestamp as valid if it's anywhere within the interval [Median of past round, Local clock or recipient +15s] is that honest nodes can use the breadth of this interval to act as controllers on every round. The issue I see with a proposal that functionally derives the timestamp from past rounds is that the only degree of freedom is in the initial rounds, after which this is a logical clock with counters that happen to be formatted according to RFC 3339[^1].

[^1]: that's not entirely true: if there is enough variability in the initial timestamps, a clever parenthood choice may affect the outcome of a block timestamp. But then that parenthood choice starts having an impact on performance.

Jun 03 '22 13:06 huitseeker

Thanks for the breakdown! I still do not clearly see how this calculation happens, but I probably didn't get the description clear enough.

What I meant is that every round gets the median timestamp of the clocks of the local_time of the parents (which is the external input and does not rely on the past) + some delay that captures the necessary delay of transmission (this second part is not strictly necessary). So instead of that:

the timestamp at round N+1 is 12:00.200, 100ms in the past,

the timestamp at round N+2 is 12:00.400, 200ms in the past,

the timestamp at round N+3 is 12:00.600, 300ms in the past ...

I would expect:

the timestamp at round N+1 is 12:00.200, 100ms in the past,
the timestamp at round N+2 is 12:00.300 (median of round N+1 local clocks since it took 300ms until they actually entered round N+1) + 200ms= 12:00.500, 100ms in the past,
the timestamp at round N+3 is 12:00.600 (median of round N+2 local clocks since it took 2*300ms until they actually entered round N+2) + 200ms = 12:00.800, 100ms in the past

until eventually the averaging catches up to adding 300ms at which point we will be accurate.

Would this make sense?

Jun 03 '22 18:06 LefKok

Let me start with a stupid question, why do we want to add timestamps into Narwhal? Shouldn’t we have timestamps at the consensus layer?

For tusk/bullshark protocols, it seems the timestamps that matter are the one of leader certificates (all their subdag’s certificates can be assumed to have the same timestamp as the leader since they will be committed at that time).

For traditional leader base consensus, it seems we should add the timestamp directly into the consensus protocol. (Eg. In the QC of hotstuff).

Jun 03 '22 18:06 asonnino

So for instance, for Tusk, why can’t we simply add timestamp in the certificates (eg by embedding them into the votes and taking the median upon interpreting the certificate’s timestamp)? Then upon consensus, we take the timestamp of the leader certificate and apply it to all its subdag. What am I missing?

Jun 03 '22 18:06 asonnino

@asonnino

For Tusk specifically, there should be no certificates since it is an implicit consensus (am I missing something?). The timestamp would be the timestamp of the head of the DAG which gets us back to the question "What is the timestamp of a block".
If we were to run Hotstuff (not happening) we could say the timestamp of all transactions is the median timestamp of the QC, however, with Narwhal as the underlying mempool we would have hundreds of thousands of transactions that look "concurrent".
The QC solution is nothing more than a special case of the median of parents solution described above, where the DAG is collapsed to 2f+1 links to copies of the same data (parent block) with a different timestamp.
Finally, Narwhal could provide a much more coarse-grained and accurate time for Move smart contracts than Tusk, especially for time-sensitive applications. If we take it one step further we could investigate some notion of Order Fairness over Narwhal as well (where transaction duplication is actually desired)

Jun 07 '22 13:06 LefKok

Yes the question is indeed what do you consider being the timestamp of the block. I guessed we are trying to approximate the time of the transaction's execution (or to be precise, approximate the time when 2f+1 validators executed the transaction).

For Tusk specifically, there should be no certificates since it is an implicit consensus (am I missing something?). The timestamp would be the timestamp of the head of the DAG [..].

Exactly that, I meant the timestamp of the head of the sub-DAG that commits the transaction.

If we were to run Hotstuff (not happening) we could say the timestamp of all transactions is the median timestamp of the QC, however, with Narwhal as the underlying mempool we would have hundreds of thousands of transactions that look "concurrent".

We would indeed have many transactions with the same timestamp (same as Tusk actually). Is this a problem? We can always differentiate those transactions using the totally ordered sequenced (the DAG's linearisation).

Finally, Narwhal could provide a much more coarse-grained and accurate time for Move smart contracts than Tusk, especially for time-sensitive applications. If we take it one step further we could investigate some notion of Order Fairness over Narwhal as well (where transaction duplication is actually desired)

How? Do we assume pre-execution? Otherwise how can Narwhal guess the execution time (or even the commit time)?

Jun 07 '22 14:06 asonnino

I guessed we are trying to approximate the time of the transaction's execution

I don't see much value in knowing the transaction execution (especially if there are invalid transactions in the DAG, which means that proofs need to be against state commitments post consensus) since this is internal and irrelevant to any MEV challenges. Even for the blockchain explorer, the most natural definition would be inclusion time and later linking to the effects when the block is executed.

In my mind transaction creation time is the holy grail which of course we cannot know in a distributed system. We try to go as close as possible with transaction inclusion in the DAG which is what I think is the goal of this discussion (@huitseeker, @gdanezis feel free to chime in). In any case, as discussed above the problem in Tusk is equivalent so significant discussion at this point might not be necessary. Let's have fine-grained timestamps per block and the dapp can decide to use the timestamp of the committing block or the timestamp of the inclusion in a block.

How? Do we assume pre-execution? Otherwise, how can Narwhal guess the execution time (or even the commit time)?

Think of a smart contract that parses current_time() and executes something. What is current_time()? I would assume the time of the block where the transaction calling the smart contract, but as mentioned before we can let either the Move module or the developer decide.

Jun 08 '22 09:06 LefKok