Calling `engine_preparePayload` in advance

## Description

In a post-merge beacon chain, a CL (consensus layer/eth2) node will need to call two functions in order to prepare a block:

- [`engine_preparePayload`](https://github.com/ethereum/execution-apis/blob/v1.0.0-alpha.2/src/engine/interop/specification.md#engine_preparepayload): returns a `payloadId`.
- [`engine_getPayload`](https://github.com/ethereum/execution-apis/blob/v1.0.0-alpha.2/src/engine/interop/specification.md#engine_getpayload): accepts a `payloadId`.

The ultimate goal of these two calls is to return an [`ExecutionPayload`](https://github.com/ethereum/consensus-specs/blob/v1.1.2/specs/merge/beacon-chain.md#executionpayload), which is effectively an execution (eth1) block to be included in a consensus (eth2) block.

The reason there are separate preparePayload and getPayload calls is to allow the CL nodes to be able to give the EL (execution layer/eth1) nodes some time to prepare the payload (i.e., find the best set of transactions it can). To this end, in the ideal case we should call preparePayload some time before we call getPayload.

The purpose of this issue is to establish when the CL nodes  should call preparePayload and to consider the engineering requirements for CL implementations (e.g., Lighthouse).

## When to call preparePayload

Lets start with three basic constraints about when and how to call preparePayload:

1. preparePayload only needs to be called if we expect to call getPayload during some slot `s`.
    - I.e., only call preparePayload if a beacon node (BN) expects to *propose* a block in slot `s`.
1. Since preparePayload accepts a `parentHash`, we can only call it *after* we know the parent of the block at slot `s`.
    - I.e., preparePayload needs to be called sometime during slot `s - 1`.
1. preparePayload parameters are determined by what we expect to be the canonical head block at the start of slot `s`.

Given these constraints, we could say that preparePayload should be called whenever the canonical head changes during slot `s - 1`.

But alas, there is an edge-case. What if the node never receives a block at slot `s - 1` (i.e., `s - 1` is a "skip slot")? The head could remain unchanged (e.g. the block at slot `s - 2`) and therefore we'd never call preparePayload.

In light of skip slots, it seems we may need to decide at some point during slot `s - 1` that we're probably not going to get a block and that we should call preparePayload with the current head (e.g. `s - 2`). This point would be the threshold at which we assume there is a skip slot, so lets call it `assumed_skip_slot_threshold`.

We can now form a general definition of when to call preparePayload:

### General definition

If a CL node expects to propose a block at slot `s`, then it should call preparePayload with values computed from the canonical head whenever the following events occur during slot `s - 1`:

1. The canonical head changes.
2. The `assumed_skip_slot_threshold` is reached, and the first condition (1) has not already been triggered.

## The nitty gritty of implementation

### Proposer shuffling

Our previous definition makes the assumption that we always know the proposers for slot `s` at slot `s - 1`. This is not strictly true. The proposer shuffling for epoch `e` can only be known after the final block in epoch `e - 1` is processed. 

This means that if we're in the last slot of the epoch (i.e., `(s + 1) % SLOTS_PER_EPOCH == 0`), we won't know what the proposer shuffling is until we either (a) receive a block at slot `s - 1` or (b) hit `assumed_skip_slot_threshold` and assume that there is no block at `s - 1`.

With this in mind, we can create a more implementation-specific definition that is aware of proposer-shuffling constraints:

#### Proposer-shuffling aware definition

If the CL node is performing duties for any active validators, then it should run the `maybe_prepare_payload` routine whenever:

1. The canonical head changes.
2. The `assumed_skip_threshold` is reached, and the first condition (1) has not already been triggered.

Where `maybe_prepare_payload` involves:

1. Taking the canonical head block and running [`process_slots`](https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/beacon-chain.md#beacon-chain-state-transition-function) to advance it to slot `s`.
1. Determining if the CL node is performing duties for the block proposer at slot `s`. If so, continue, else exit.
1. Computing the values for preparePayload and issuing the request to the EL node.

*Note: `maybe_prepare_payload` can be optimized in the non-epoch-boundary scenario to avoid calling `process_slots`, but this definition aims to be simple and general.*

### Is the VC or BN driving?

You may notice that I've used "CL node" instead of referring to the duties of a beacon node (BN) or validator client (VC). That's because it's not immediately clear whether the BN or VC should be the one driving this series of events.

#### VC driving

In the "VC driving" scenario, the BN has no idea about which validators may produce blocks at slot `s`. It is up to the VC to ensure that the BN issues a relevant preparePayload request at the correct time(s). The "VC driving" process looks like this:

If the VC is performing duties for any active validators, then it should run the `maybe_prepare_payload` routine whenever:

1. The canonical head changes (i.e., it receives a `head` [SSE event](https://ethereum.github.io/beacon-APIs/#/Events/eventstream)).
2. The `assumed_skip_threshold` is reached, and the first condition (1) has not already been triggered.

Where `maybe_prepare_payload` involves:

1. Determining the proposer duties for slot `s`
    - It may have these cached, or it may need to use the BNs [`duties/proposer`](https://ethereum.github.io/beacon-APIs/#/Validator/getProposerDuties) endpoint.
1. Determining if the VC is performing duties for the block proposer at slot `s`. If so, continue, else exit.
1. Issuing a request to the BN API which, in turn, makes it issue a preparePayload request to the EL node.
    - Such a BN API does not yet exist, but let's call it `validator/prepare_payload` for the time being.
    
The definition of `validator/prepare_payload` requires some thought too. I propose it should take `(slot, head_block_root)` as parameters and return nothing. It will be the duty of the BN to hold the `payloadId` and provide it during a getPayload request. For the input parameters, `slot` is the slot in which the VC expects to propose a slot (i.e., `s`) and `head_block_root` will be head block at the time of the call (i.e., the expected parent of the beacon block it expects to propose at `s`).

#### BN driving

In the "BN driving" scenario, the VC knows nothing of the preparePayload request. Instead, just tells the BN which validators it is managing and the BN transparently calls preparePayload when it sees fit.

The "BN driving" process looks like this:

1. The VC sends a message to the BN with the list of validator indices it controls
    - The [`validator/beacon_committee_subscriptions`](https://ethereum.github.io/beacon-APIs/#/Validator/prepareBeaconCommitteeSubnet) endpoint could theoretically be repurposed to also do this.
    - Alternatively we could create a new `validator/potential_beacon_proposers` endpoint (naming can be improved).
    - It would probably make sense for this "subscription" to potential beacon proposers to expire after some time, since it does incur effort for the EL node and a once-and-forever subscription could end up wasteful.
1. The BN follows exactly the steps described in the [Proposer shuffling aware definition](#proposer-shuffling-aware-definition).

 #### What does @paulhauner think about VC or BN driving?
 
 At this stage, I think I prefer BN driving because it strives for simplicity in the VC (the scary secret-key-holding thing) and it also allows for more optimization inside the BN. Some clients (Lighthouse, Teku, at least) are already doing optimizations to compute the proposer duties for epoch `e` at the end of `e - 1`, these could be leveraged to make preparePayload more efficient.
    
## Open Questions

I'm not sure what to define `assumed_skip_slot_threshold` as. One way to do it would be to set it at roughly the last time in which we usually expect a beacon block. In my experience this would be somewhere between 4-8s since slot start. However, it would be good to know if there's a point of diminishing returns regarding the delay between preparePayload and getPayload. For example, if it never takes the EL more than 3s to build the ideal `ExecutionPayload`, then lets just set it to 9s (12s - 3s) after slot start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling `engine_preparePayload` in advance #2715

Description

When to call preparePayload

General definition

The nitty gritty of implementation

Proposer shuffling

Proposer-shuffling aware definition

Is the VC or BN driving?

VC driving

BN driving

What does @paulhauner think about VC or BN driving?

Open Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Calling engine_preparePayload in advance #2715

Description

Description

When to call preparePayload

General definition

The nitty gritty of implementation

Proposer shuffling

Proposer-shuffling aware definition

Is the VC or BN driving?

VC driving

BN driving

What does @paulhauner think about VC or BN driving?

Open Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Calling `engine_preparePayload` in advance #2715