Rationale
The header chain for the mainnet is needed by every Ethereum client. The data is effectively append only and regularly accessed. This makes it a prime candidate for storage on the Swarm network.
Owner
@pipermerriam
Stakeholder Point of Contact
Description
- a mechanism for header data to be stored in Swarm
- a mechanism for swarm to learn about new headers
- a mechanism for ethereum to retrieve headers by their native ethereum hash (not swarm chunk hash)
Storage
For swarm to store and serve headers, they need to behave like other chunks. This only requires an extra validator using the Keccak SHA3 256-bit addressing.
For swarm to access header chunks, the localstore needs to be prepopulated. The following options are viable and non-exclusive
- manual import
- Ethereum node pushes data to Swarm nodes/network
- Swarm node pulls data from Ethereum node/network
The latter option is easiest if we subsume the pull mechanism (swarm nodes requesting data from ethereum clients) under the protocol with which swarm and eth clients communicate.
Communication
For ethereum nodes to retrieve this data, they will need a communication channel with swarm nodes. Obvious options include.
- use the http-based public swarm gateway
- use the external JSON-RPC API exposed by a swarm node.
- use the
DevP2P network to talk directly to swarm nodes (probably over a new sub-protocol like bzzeth
Candidate for DevP2P communication
One way for nodes to communicate would be over devp2p. This has the following benefits
- it does require users to run both an Ethereum and Swarm node to benefit from this functionality since both nodes already connect to this network.
- it allows for bidirectional communication, thus allowing swarm nodes to get informed about hashes.
The following commands define version 1 of a new sub protocol identified by the string bzz-eth.
This p2p protocol is somewhat special in that it is asymmetrical, ie., the two peers are not sending the same type of messages.
In particular the swarm nodes never send NewBlockHeaders, only receive them.
Handshake (0x00)
This MUST be the first message sent (under this protocol) after a p2p connection has been established.
[serves_headers: uint8|bool]
TODO: I removed the head from this since it seems like swarm nodes shouldn't be required to track the chain head and that the NewBlockHeaders message serves as a mechanisms for eth nodes to broadcast this information. Consider adding an Announce message to serve as a more concrete way to update a peer about stateful protocol information (like chain head).
serve_headers: boolean indicating if this node can be expected to serve requests for headers.
If later we find that swarm nodes do not always need new headers announced, a serve_new_headers: uint8|bool field could be added.
NewBlockHeaders (0x01)
[[hash: B_32, number: uint256], ...]
hash: the block hash
number: the block number corresponding to the provided block hash.
Advertise headers that the connected peer may be interested in. For a given session with a peer, no block hash should be sent more than once (never re-advertise the same block hash).
If later we find that swarm nodes do not always need new headers announced, a GetNewHeaders message could be introduced.
GetBlockHeaders (0x02)
[request_id: uint32, hashes: [hash_0: B_32, hash_1: B_32, ...]]
request_id: any 32 bit integer
hashes: array of 32-byte hashes.
Request a set of headers referenced by their ethereum hashes.
BlockHeaders (0x03)
[request_id: uint32, headers: [header_0, header_1, ...]]
request_id: The request_id from the GetHeaders message.
headers: array of rlp encoded block headers.
Response to GetBlockHeaders. headers must be a subset of RLP encoded block headers. No ordering is enforced on the response headers.
TODO: discuss semantics of multi-response. Maybe add nonce to response. Maybe enforce uniqueness across response headers.
Note that it is allowed to send several headers responses to the same request. This way, the swarm node can send all it has whenever it has something and serve the eth client with minimal latency.
Note that it is allowed to send a Headers message with empty headers array. This serves as an indication to the requesting eth client that the peer has no more headers available out of the requested batch. Even though this cannot be enforced, it is prudent so that the eth client can register the request context closed and fire alternative requests on the outstanding headers. This has increased relevance once requests become non-free in order to control cost vs concurrency trade-off.
Context
Ethereum node implementation notes
It's worth noting that Ethereum clients that want to retrieve this data will need to learn about the latest headers from a separate mechanism such as other ETH peers, since it will not be possible to request headers by their block number. Once an ETH peer has a recent header that they trust, they can use the parent_hash to track their way backwards to the genesis block. At a later stage of this track, swarm nodes will need to be able to do the same, see https://hackmd.io/oj9_cT2KQimMdIPe_W_ejQ#
It seems that a reasonable algorithm for syncing the header chain when connected to both a set of ETH peers and a set of BZZ peers would be to use the ETH peers to construct a "header skeleton" which is the header chain with large gaps, and then to use the bzz-eth peers to fill the gaps.
Swarm node data validation notes
Swarm nodes will want to validate headers they receive. The things that can be validated are:
- bytes are a valid RLP encoded header.
keccak(rlp-encoded-header-bytes) matches the expected content hash.
ethash validation of the proof-of-work seal.
- recursive validation via lookup of parent header using
parent_hash field
- this should probably have a maximum depth limit since naively tracing back to genesis is probably undesirable.
For POC, doing the RLP and keccak validation are likely adequate to catch obvious bugs.
Issues
No external issues at this time
Dependencies
Swarm needs to support a new hash type that is the keccak(raw-binary-data) so that Ethereum nodes are able to use the hashes it has available to request data and the Swarm nodes are able to know what data is being requested.
Timeline
The Trinity client team should be able to deliver on each of these phases in the 1-2 week time frame which suggests that if the Swarm team can deliver on a similar timeline we should be able to have a fully working POC of this within a 4 week timeline.
Acceptance criteria
- An ETH peer with access to a trusted recent header can populate its header chain with data pulled from Swarm nodes.
- A Swarm node can stay reasonably up-to-date with the Ethereum header chain as well as applying reasonable validation to new header data.
- Swarm nodes can retrieve an Ethereum header that has been referenced by its ethereum hash from other swarm nodes or fall back to other ETH nodes.
To implement an easy test harness, we will assume the swarm node will be connected to at least 2 eth clients
- a light (fast syncing) node; and
- a full node, which will serve new headers upon request to the swarm node
Rationale
The header chain for the mainnet is needed by every Ethereum client. The data is effectively append only and regularly accessed. This makes it a prime candidate for storage on the Swarm network.
Owner
@pipermerriam
Stakeholder Point of Contact
Description
Storage
For swarm to store and serve headers, they need to behave like other chunks. This only requires an extra validator using the Keccak SHA3 256-bit addressing.
For swarm to access header chunks, the localstore needs to be prepopulated. The following options are viable and non-exclusive
The latter option is easiest if we subsume the pull mechanism (swarm nodes requesting data from ethereum clients) under the protocol with which swarm and eth clients communicate.
Communication
For ethereum nodes to retrieve this data, they will need a communication channel with swarm nodes. Obvious options include.
DevP2Pnetwork to talk directly to swarm nodes (probably over a new sub-protocol likebzzethCandidate for DevP2P communication
One way for nodes to communicate would be over
devp2p. This has the following benefitsThe following commands define version
1of a new sub protocol identified by the stringbzz-eth.This p2p protocol is somewhat special in that it is asymmetrical, ie., the two peers are not sending the same type of messages.
In particular the swarm nodes never send
NewBlockHeaders, only receive them.Handshake (0x00)
This MUST be the first message sent (under this protocol) after a p2p connection has been established.
serve_headers: boolean indicating if this node can be expected to serve requests for headers.NewBlockHeaders (0x01)
hash: the block hashnumber: the block number corresponding to the provided block hash.Advertise headers that the connected peer may be interested in. For a given session with a peer, no block hash should be sent more than once (never re-advertise the same block hash).
If later we find that swarm nodes do not always need new headers announced, a
GetNewHeadersmessage could be introduced.GetBlockHeaders (0x02)
request_id: any 32 bit integerhashes: array of 32-byte hashes.Request a set of headers referenced by their ethereum hashes.
BlockHeaders (0x03)
request_id: Therequest_idfrom theGetHeadersmessage.headers: array of rlp encoded block headers.Response to
GetBlockHeaders.headersmust be a subset of RLP encoded block headers. No ordering is enforced on the response headers.Note that it is allowed to send several
headersresponses to the same request. This way, the swarm node can send all it has whenever it has something and serve the eth client with minimal latency.Note that it is allowed to send a
Headersmessage with emptyheadersarray. This serves as an indication to the requesting eth client that the peer has no more headers available out of the requested batch. Even though this cannot be enforced, it is prudent so that the eth client can register the request context closed and fire alternative requests on the outstanding headers. This has increased relevance once requests become non-free in order to control cost vs concurrency trade-off.Context
Ethereum node implementation notes
It's worth noting that Ethereum clients that want to retrieve this data will need to learn about the latest headers from a separate mechanism such as other ETH peers, since it will not be possible to request headers by their block number. Once an ETH peer has a recent header that they trust, they can use the
parent_hashto track their way backwards to the genesis block. At a later stage of this track, swarm nodes will need to be able to do the same, see https://hackmd.io/oj9_cT2KQimMdIPe_W_ejQ#It seems that a reasonable algorithm for syncing the header chain when connected to both a set of ETH peers and a set of BZZ peers would be to use the ETH peers to construct a "header skeleton" which is the header chain with large gaps, and then to use the
bzz-ethpeers to fill the gaps.Swarm node data validation notes
Swarm nodes will want to validate headers they receive. The things that can be validated are:
keccak(rlp-encoded-header-bytes)matches the expected content hash.ethashvalidation of the proof-of-work seal.parent_hashfieldFor POC, doing the RLP and
keccakvalidation are likely adequate to catch obvious bugs.Issues
Dependencies
Swarm needs to support a new hash type that is the
keccak(raw-binary-data)so that Ethereum nodes are able to use the hashes it has available to request data and the Swarm nodes are able to know what data is being requested.Timeline
NewBlockHeaders.GetBlockHeadersrequests.GetBlockHeadersrequests sent by Ethereum nodes.The Trinity client team should be able to deliver on each of these phases in the 1-2 week time frame which suggests that if the Swarm team can deliver on a similar timeline we should be able to have a fully working POC of this within a 4 week timeline.
Acceptance criteria
To implement an easy test harness, we will assume the swarm node will be connected to at least 2 eth clients