Skip to content

RFC-024: block structure consolidation#8457

Merged
mergify[bot] merged 7 commits intomainfrom
wb/rfc-block-structure
Aug 16, 2022
Merged

RFC-024: block structure consolidation#8457
mergify[bot] merged 7 commits intomainfrom
wb/rfc-block-structure

Conversation

@williambanfield
Copy link
Contributor

@williambanfield williambanfield commented May 3, 2022

This RFC lays out a proposed set of changes for consolidating the set of information that may be included in the block structure.

{{ rendered }}

original block and therefore transitively included as in the Merkle root hash of
every block. The redundant field is a candidate for removal from the root hash
of each block. However, aesthetically, it's somewhat nice to include in each
block, as if the block was 'stamped' with the ID. Additionally, re-validating

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debuggability seems to be the main argument in favour of keeping this one.

Leaving the field will allow for transaction execution issues that are not
correctly reflected in the `AppHash` to be more completely diagnosed.

Take the case of mismatch of `LastResultsHash` between two nodes, A and B, where both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the more likely scenario is that A and B report different AppHashes for the same cumulative transaction hash? Not that it couldn't be the opposite too, but if I saw that in a trace I'd assume the node had computed the wrong values rather than the app diverging.

contains the same data but for the next height.

This data is effectively redundant. Having both values present in the block
structure is helpful for light client verification. The light client is able to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the light client is going to go on to verify the next block, won't it need to fetch both headers anyway?

I suppose if the light client only wants to verify one block it's a savings, but I'm not sure optimizing for that case makes much sense. It seems like the normal case is the light client is verifying a bunch of blocks, where this only affects the fencepost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if the light client only wants to verify one block it's a savings, but I'm not sure optimizing for that case makes much sense.

In the bisection algorithm, the light client will verify a header without verifying the adjacent header. It is attempting to bootstrap trust by looking for validators that behaved well at a previous height with the intention of using those newly trusted validators to verify a particular candidate header.


#### ProposerAddress

The section below details a change to allow the `ProposerAddress` to be calculated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it probably doesn't make sense to remove this. Issues like #8326 suggest that if anything, we might want to consider making sure we compute the value "correctly". I think the value we store here now doesn't necessarily reflect changes in a height with multiple rounds.

@williambanfield williambanfield marked this pull request as ready for review May 6, 2022 14:01
@williambanfield williambanfield changed the title (wip) RFC-019: block structure consolidation RFC-020: block structure consolidation May 6, 2022
Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't gone through the entire RFC yet but posting my comments now and will return to this tomorrow


A block is the structure produced as the result of an instance of the Tendermint
consensus algorithm. At its simplest, the 'block' can be represented as a Merkle
root hash of all of the data used to construct and produce the hash. Our current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "hash" are you referring to? Do you mind being a little more explicit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also your definition of a "block" seems more apt to the definition of the "header". A block at the end of the day must include the set of state transitions to be applied against application state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the point here is a bit abstract, but, broadly: every piece of data needed to execute the state transition is not included in the 'block' structure. Most concretely, the application state is not stored in the 'block' structure. A hash of it is gossiped in the block and validators use this to ensure they have reached the same state as each other. We have a protocol, statesync for disseminating and verifying the data referenced by the AppHash. We could just as easily have a txsync protocol, where validators gossip transaction sets separately, instead of storing them in a data structure we call the block. A version of Tendermint, albeit an inefficient one, could just gossip the single hash during consensus and wait for other protocols to gather the data that constitutes the root hash. I'm not really saying we need to do this, just that the two presentations of the data - i.e. what we include in the block data structure and what we include as part of the Merkle root hash - are separable.

consensus algorithm can use proof-of-stake consensus to validate all of the
properties of the chain that we would like?
1. How can we create views of the data that can be easily retrieved, stored, and
verified by the relevant protocols?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easily retrieved and stored isn't really a concern of the structure of the block per say, only verification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block structure is a view of the underlying data. There is a consistent refrain that seems to percolate around the Tendermint ecosystem that removing fields from the block will save storage. This is largely because the storage and presentation of the data are, at the moment 1:1 mapped. This digression exists to clarify that this link is not necessary. Different protocols - light client, consensus, statesync - need to see different slices of the data, with accompanying proofs, to function correctly. We can slice the data into many different pieces as long as the different protocols can retrieve and store it in the requisite ways. A LightBlock is the most clear example of this re-slicing. Consensus may be able to re-slice the data as well, as long as we're able to ensure that it can verify and execute the correct data.

Comment on lines +124 to +126
1. What values need to be included as part of the Merkle tree so that the
consensus algorithm can use proof-of-stake consensus to validate all of the
properties of the chain that we would like?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note: We use a merkle tree for aggregating the hashes in the header but this might actually not be necessary. A regular sha256 hash could be sufficient as it's rare that we need to prove that a hash exists in a header (the header itself is so small you might as well verify the entire header).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. We have a total of 14 fields included in the the root hash.

return merkle.HashFromByteSlices([][]byte{

recalculating a hash over the concatenated data is quick and trivial, so the merkle nature of the header may ultimately not be needed.

Comment on lines +175 to +186
The [BlockID][block-id] field comprises the [PartSetHeader][part-set-header] and the hash of the block.
The `PartSetHeader` is used by nodes to gossip the block by dividing it into
parts. Nodes receive the `PartSetHeader` from their peers, informing them of
what pieces of the block to gather. There is no strong reason to include this
value in the block. Validators will still be able to gossip and validate the
blocks that they received from their peers using this mechanism even if it is
not written into the block. The `BlockID` can therefore be consolidated into
just the hash of the block. This is by far the most uncontroversial change
and there appears to be no good reason _not_ to do it. Further evidence that
the field is not meaningful can be found in the fact that the field is not
actually validated to ensure it is correct during block validation. Validation
only checks that the [field is well formed][psh-check].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still need to gossip the PartSetHeader in the proposal msg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes definitely, it is key our block propagation mechanism at the moment, but it isn't directly a concern of the block data.

Comment on lines +213 to +214
I would advocated for keeping this field. This field provides an additional check
for determinism across nodes. Logic to update the application hash is more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any actual value in providing additional checks for non-determinism. Put in another way, is there anything wrong with relying solely on the AppHash as the single value we are trying to replicate?

You mention debuggability which I think is fair but if applications are using merkle trees then it should be possible that they include a way of working out which leaf in the tree is different and thus better deduce where the divergence occurred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in another way, is there anything wrong with relying solely on the AppHash as the single value we are trying to replicate?

No, there definitely is nothing wrong with that. The AppHash is intended to encode this information. The mild value of this field is just in catching chain issues faster, there is nothing that this field encodes that AppHash does not also encode.

Copy link
Contributor

@cmwaters cmwaters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write up.

A similar question to what goes into the block is what is important to have in the header and what can be relegated to the rest of the block.

Hashes in headers are somewhat first class citizens in that it makes it faster to verify other data. As an example having the consensus hash in the header makes it faster to verify the consensus parameters than if the hash were in Data whereby you'd need to get the merkle path to that verify's that the consensus hash was present in Data and then that it correlated to the hash of the consensus params.

A suggestion from me would be to consolidate DataHash, LastCommitHash, ConsensusHash and EvidenceHash into a single merkle root hash.

Comment on lines +250 to +252
This creates a recursive problem. To verify the validator set that signed the
block at height `H`, what information do we need? We could fetch the
`NextValidatorsHash` from height `H-1`, but how do we verify that that hash is correct?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember thinking over this myself and drawing the conclusion that it was possible to not require the ValidatorsHash for the light client.

We do have this check here:

if !bytes.Equal(untrustedHeader.ValidatorsHash, trustedHeader.NextValidatorsHash) {
but I don't think it serves any extra security. For sequential verification we always start with a trusted header. We get the light block from the next height. We check that the hash of the untrusted validator set is equal to the NextValidatorsHash of the trusted header. We then have verified the validator set. To verify the corresponding commit we check that 2/3+ of the validator set correctly signed for a single non-nil blockID. If that's true, the commit is valid. Finally we check that the hash of the untrusted header is equal to the blockID.Hash that the commit is for. If that's true then we have verified the new header.

For verifyAdjacent, we don't even use the ValidatorsHash because a valid validator set is any that has at least "TrustLevel" (in most cases 1/3) overlap in voting power.

Interestingly, we don't even verify that the validator set we received and used was actually the one that correlates with the untrustedHeader.ValidatorsHash

Copy link
Contributor Author

@williambanfield williambanfield May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the verify non-adjacent case, we're trying to verify a block of height H. To do so, we try to build a chain of validators that we trust to reach from the trusted header to the untrusted header at height H. When we need to perform non-adjacent verification, we fetch a block header along with the commit that supposedly corresponds to that header. Once we verify this header was signed by enough of the validators that we trust, we update the trusted set of validators to include the other signers of that header.

If we remove the ValidatorsHash, it seems that we open ourselves to the following issue:

  • We have a known good validator set, vs1, and attempt to use it to verify a non-adjacent header.
  • We fetch the non-adjacent header along with the untrusted validator set, vs2, and the commit, c, that justifies this header.
  • We check that TrustLevel of vs1 has signatures in c for the header and that vs2 also has signatures in c for the header.
  • We now trust vs2. However, nothing in vs1 actually signed off on the veracity of vs2.

A malicious node could compose a c and vs2 that comports with vs1 so that the client now trusts phony validators in vs2. With ValidatorsHash in the header, we have reason to trust vs2: enough of vs1 signed off on its hash in the header. Definitely possible I've made a mistake here since my understanding of the light client algorithm is far from perfect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this currently? i.e. check hash(vs2) = h2.ValidatorsHash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:

if !bytes.Equal(untrustedHeader.ValidatorsHash, untrustedVals.Hash()) {

Called From:

if err := verifyNewHeaderAndVals(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah that makes sense. Perhaps then we do need to keep ValidatorsHash around. Thanks for investigating

Copy link
Contributor Author

@williambanfield williambanfield May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a closer look at the light client algorithm definition to be certain and add a more detailed description to this doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested in a whiteboard session during the retreat on the subject of how the validator set evolves, and how it's used for (light-client) verification

Comment on lines +261 to +265
I would advocate against this. Any consumer of the chain that wanted to
know which validator proposed a block would have to run the proposer selection
algorithm. This algorithm is not widely implemented, meaning that consumers
in other languages would need to implement the algorithm to determine a piece
of basic information about the chain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main consumer of the knowledge of the proposer is usually the application itself which might have some logic rewarding the validator. Any other use, I believe is just statistical i.e. block explorers say which validator proposed each block.

Rather than removing it completely you could have instead the index of the validator in the validator set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to keep 4 bytes to represent the validator index to allow a consumer of the chain to reconstruct the data without implementing the full proposer selection algorithm, is there a reason to not just keep the additional 16 bytes to identify the proposer directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we're trying to reduce the size of the header. If there's not something we think truly deserves to be in the header we should cut it

proposed the height. Instead, the validation logic simply checks that the value
is an address of one of the known validators.

Currently, we include this value in the `Commit` for height `H` which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the Round in Commit the proof of lock round or just the final round that received majority?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, it's the round in which the proposer committed in the previous height:

cs.LastCommit = cs.Votes.Precommits(cs.CommitRound)

I.e. the second of the options you listed. Added language into this document to clarify.

Comment on lines +292 to +296
Currently we store the [every piece of each block][save-block] in the `BlockStore`.
I suspect that this has lead to some mistakes in reasoning around the merits of
consolidating fields in the block. We could update the storage scheme we use to
store only some pieces of each block and still achieve a space savings without having
to change the block structure at all.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @creachadair also alluded to having separate data structures for storage and networking

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I'm planning to do as part of the database work in v0.37 et seq. But that's separate from the question of which parts of the block we might want to eliminate/rework. Both must be projections of the same abstract schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, and to clarify, this section is more hand-wavy ideas to maybe chew on in the future than concrete and actionable changes.

My general sense is that we should switch to storing data only in cases where the values change, versus what we do at the moment: store everything all the time.

Comment on lines +318 to +320
previous height. We could achieve a speed-up in many cases by communicating the
hashes _first_ and letting peers request additional information when they do not
recognize the communicated hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 really like this idea although it would be quite an overhaul to the gossip protocol.

I had similar ideas towards transactions (#7932) i.e. propagate blocks containing the hashes of txs and rely on the mempool to retrieve any missing transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would make the architecture much cleaner if we had a content addressing layer. To clarify though, I have no concrete plans to actually do this in the next release or two. I think it's something we can / should consider post 1.0. I think there are more low hanging fruit in the gossip layer to hit before implementing something like content addressing into Tendermint.

of each block. However, aesthetically, it's somewhat nice to include in each
block, as if the block was 'stamped' with the ID. Additionally, re-validating
the value from genesis would be painful and require reconstituting potentially
large chains. I'm therefore mildly in favor of maintaining this redundant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could still remove it from the block, but still make it part of the header hashing. So, in order to validate you don't need to go back to genesis if you have the value.

Comment on lines +213 to +214
I would advocate for keeping this field. This field provides an additional check
for determinism across nodes. Logic to update the application hash is more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with this. This could be crucial for troubleshooting non-determinism in some cases. And it's not big in terms of size

block at height `H`, what information do we need? We could fetch the
`NextValidatorsHash` from height `H-1`, but how do we verify that that hash is correct?

#### ProposerAddress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what I'm going to say is controversial 😄
ProposerAddress and Round (in LastCommit) should not be exposed to the Application. These are details of Tendermint's operation. If you replaced Tendermint by another consensus algorithm offering the same set of properties, the App should be able to work with it. Yes, I know this is utopian.
But:

  • All uses of ProposerAddress I've seen so far (all rewards-related) are controversial, but I may be unaware of others
  • Is there any use of round other than stats and monitoring? (which can be done without showing it to the App via ABCI)

Back to reality, as ProposerAddress is being used now (e.g., SDK), I don't think we can easily remove it.

to propose a height. Because of this, our [validation logic][proposer-check] does not check that
the `ProposerAddress` included in the block corresponds to the validator that
proposed the height. Instead, the validation logic simply checks that the value
is an address of one of the known validators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea. If the App is so keen in knowing, recording, or proving who the proposer is at each block, why don't we leave it up to the App?
I mean, with ABCI++, the propoer app can add a special tx with an app-specific proof that it is the proposer (in PrepareProposal), which other validators could validate in ProcessProposal.

was updated and construct each block using the data that existed at the time.

This document does not make any specific recommendations around storage since
that is likely to change with upcoming improvements to to the database infrastructure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that is likely to change with upcoming improvements to to the database infrastructure.
that is likely to change with upcoming improvements to the database infrastructure.

Either a soft-upgrades scheme will need to be implemented or a hard fork will
be required for chains and light clients to function with the new hashes.
This means that all of the additions and deletions from the Merkle root hash
proposed by this document will be light client breaking.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the actions proposed in this document would be an excellent first use-case for soft-upgrades, because we don't change any functionality, but only the way fields are organized, hashes are calculated and blocks are verified.


## Discussion

### Data to consider removing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the data structures myself and I think that the following fields in Commit

  • height
  • round
  • block_id

are redundant, and could therefore be removed.

They are redundant in that they can be "gleaned" from the rest of the fields in the block.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label May 29, 2022
@creachadair creachadair removed the stale for use by stalebot label May 29, 2022
@github-actions
Copy link

github-actions bot commented Jun 9, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Jun 9, 2022
@cmwaters cmwaters removed the stale for use by stalebot label Jun 10, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Jun 21, 2022
@cmwaters cmwaters removed the stale for use by stalebot label Jun 24, 2022
@github-actions
Copy link

github-actions bot commented Jul 5, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Jul 5, 2022
@github-actions github-actions bot closed this Jul 10, 2022
@cmwaters cmwaters removed the stale for use by stalebot label Jul 11, 2022
@cmwaters cmwaters reopened this Jul 11, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Jul 22, 2022
@creachadair creachadair removed the stale for use by stalebot label Jul 22, 2022
@creachadair creachadair requested a review from samricotta as a code owner July 22, 2022 00:31
@github-actions
Copy link

github-actions bot commented Aug 2, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Aug 2, 2022
@cmwaters cmwaters removed the stale for use by stalebot label Aug 3, 2022
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale for use by stalebot label Aug 14, 2022
@williambanfield williambanfield changed the base branch from master to main August 16, 2022 13:33
@williambanfield williambanfield requested review from a team August 16, 2022 13:33
@williambanfield williambanfield changed the title RFC-020: block structure consolidation RFC-024: block structure consolidation Aug 16, 2022
@williambanfield williambanfield added the S:automerge Automatically merge PR when requirements pass label Aug 16, 2022
@mergify mergify bot merged commit c322b89 into main Aug 16, 2022
@mergify mergify bot deleted the wb/rfc-block-structure branch August 16, 2022 13:51
cmwaters pushed a commit that referenced this pull request Aug 19, 2022
This RFC lays out a proposed set of changes for consolidating the set of information that may be included in the block structure.

{{ [rendered](https://github.com/tendermint/tendermint/blob/wb/rfc-block-structure/docs/rfc/rfc-020-block-structure-consolidation.md) }}
@williambanfield williambanfield restored the wb/rfc-block-structure branch September 9, 2022 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S:automerge Automatically merge PR when requirements pass stale for use by stalebot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants