Currently if the relayer process exits for any reason, or if the submission flow is broken after a BlobTx is broadcast to Celestia (e.g. via a gRPC timeout) then there is a high chance that the relayer will resubmit the same data in a new tx.
While this isn't problematic from an end-to-end flow perspective, we want to try and minimize the chances of this happening for efficiency and cost reasons.
To handle the restart case:
We should update the presubmit.json file once we have prepared a new BlobTx ready for broadcasting to Celestia with the TxHash of the BlobTx and a current timestamp.
In BlobSubmitter::run, on startup after initializing the client we can then read that optional value from submission_state. If it is Some, we can then use the client to confirm whether that tx exists on Celestia by polling GetTx as normal, but only for up to one minute after recorded timestamp.
If the tx isn't confirmed or there was no in-flight tx recorded in the presubmit file, the flow continues as per currently. If the tx is confirmed, we just skip blocks in the main select! loop until we reach the sequencer height specified in the presubmit file.
To handle the timeout case:
In submit_with_retry, we're already passing a last_error_receiver to the client try_submit method (so it can potentially get the required fee from the error message). Rather than passing the receiver to the client, we can instead retrieve the error inside tryhard::retry_fn and if the error indicates a broadcast timeout, we can use the same new confirm method on the client as used in the restart case (i.e. it only tries for up to one minute before assuming the broadcast was unsuccessful). If the tx is confirmed, flow continues as normal. If the tx is not confirmed, we fall into the retry flow and attempt to resubmit from scratch.
Currently if the relayer process exits for any reason, or if the submission flow is broken after a
BlobTxis broadcast to Celestia (e.g. via a gRPC timeout) then there is a high chance that the relayer will resubmit the same data in a new tx.While this isn't problematic from an end-to-end flow perspective, we want to try and minimize the chances of this happening for efficiency and cost reasons.
To handle the restart case:
We should update the presubmit.json file once we have prepared a new
BlobTxready for broadcasting to Celestia with theTxHashof theBlobTxand a current timestamp.In
BlobSubmitter::run, on startup after initializing the client we can then read that optional value fromsubmission_state. If it isSome, we can then use the client to confirm whether that tx exists on Celestia by pollingGetTxas normal, but only for up to one minute after recorded timestamp.If the tx isn't confirmed or there was no in-flight tx recorded in the presubmit file, the flow continues as per currently. If the tx is confirmed, we just skip blocks in the main
select!loop until we reach the sequencer height specified in the presubmit file.To handle the timeout case:
In
submit_with_retry, we're already passing alast_error_receiverto the clienttry_submitmethod (so it can potentially get the required fee from the error message). Rather than passing the receiver to the client, we can instead retrieve the error insidetryhard::retry_fnand if the error indicates a broadcast timeout, we can use the same new confirm method on the client as used in the restart case (i.e. it only tries for up to one minute before assuming the broadcast was unsuccessful). If the tx is confirmed, flow continues as normal. If the tx is not confirmed, we fall into the retry flow and attempt to resubmit from scratch.