Skip to content

Return specific hash type when calculating merkle tree#2853

Closed
tcharding wants to merge 3 commits intorust-bitcoin:masterfrom
tcharding:06-11-calculate-root
Closed

Return specific hash type when calculating merkle tree#2853
tcharding wants to merge 3 commits intorust-bitcoin:masterfrom
tcharding:06-11-calculate-root

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Jun 11, 2024

Currently we return a Txid or Wtxid when calculating a Merkle tree, however we have specific hash types for this - the TxMerkleNode and WitnessMerkleNode types.

Add a trait that ties together the Txid/TxMerkleNode and Wtxid/WitnessMerkleNode types and return the specific type when calculating a Merkle tree.

Patch 1 is preparatory, moves the Merkle tree hash types as suggested below.

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate labels Jun 11, 2024
@tcharding tcharding force-pushed the 06-11-calculate-root branch 2 times, most recently from 37e338c to 89cc0cc Compare June 11, 2024 03:32
@apoelstra
Copy link
Copy Markdown
Member

The original function doesn't get "clobbered". Its argument is just a type that has two names, and the API script only uses one of them (which happens to be different than the old one). So I don't see the issue here.

@apoelstra
Copy link
Copy Markdown
Member

Approach NACK 89cc0cc81bdb640a0ce5a4cce07f09ef202e773d. I think we should take a different tack.

Note that currently we have a compute_merkle_root method that takes a Txid/Wtxid and produces a Txid/Wtixd. This is conceptually wrong because (as Kix pointed out elsewhere) a hash of two Txids is not a Txid. It's a merkle node. And we seem to know this, because our existing code has TxMerkleNode and WitnessMerkleNode types which we immediately convert our bastard-txids into.

Instead, we should:

  • Move TxMerkleNode and WitnessMerkleNode into merkle_tree/mod.rs from blockdata/block.rs.
  • Implement a trait MerkleNode on these by which has an associated type (either Txid or Wtxid), the ability to convert the associated type to the MerkleNode type, and a function for combining two MerkleNode types.
  • Then reimplement compute_merkle_node in terms of this trait.

This way our types at all steps are the correct types, we avoid making the internals of our txid type pub(crate) (which is what raised my eyebrows that we were doing something questionable here), and the API matches what's conceptually happening.

@tcharding
Copy link
Copy Markdown
Member Author

Ah nice, I like it. Thanks for the patient explanation.

The original function doesn't get "clobbered". Its argument is just a type that has two names

Derp, type alias ...

Currently we are defining the two merkle tree hash types in the `block`
module, a better home for them is the `merkle_tree` module.

This is an API breaking change because the types were public in the
`block` module, however the change should/could be unnoticeable to users
if they use the crate level re-export - which is maintained.
@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Jun 12, 2024

I can't get calculate_root_inline to play nicely with merkle_root_r after adding a new trait with an associated type because it introduces a cycle between the type implementing the trait and the associated type. I tried adding MerkleNode with an associated Leaf (Txid) and implementing it on TxMerkleNode. And I tried adding an associated Root and swapping Txid/TxMerkleNode around. I also tried adding a separate LeafNode trait.

I guess we could use an unsafe block to convert the types in calculate_root_inline?

Oh I think I've got it.

Currently we return a `Txid` or `Wtxid` when calculating a Merkle tree,
however we have specific hash types for this - the `TxMerkleNode` and
`WitnessMerkleNode` types.

Add a trait that ties together the `Txid`/`TxMerkleNode` and
`Wtxid`/`WitnessMerkleNode` types and return the specific type when
calculating a Merkle tree.
@tcharding tcharding force-pushed the 06-11-calculate-root branch from 89cc0cc to a718079 Compare June 12, 2024 01:06
@tcharding tcharding marked this pull request as ready for review June 12, 2024 01:06
@github-actions github-actions bot removed the C-hashes PRs modifying the hashes crate label Jun 12, 2024
@tcharding tcharding changed the title bitcoin: Make merkle_root functions operate on Txid and Wtxid Return specific hash type when calculating merkle tree Jun 12, 2024
@tcharding
Copy link
Copy Markdown
Member Author

FTR this PR does not implement the suggestions by @Kixunil in #2770 (comment) for two reasons

  • I tried moving to hashes but the added complexity of how to handle consensus encoding seemed to not be worth it
  • Its not generic over all hashes only over types that implement the new MerkleNode trait, which is currently only Txid and Wtxid.

@apoelstra
Copy link
Copy Markdown
Member

That sounds good. This particular Merkle tree construction is questionable anyway because it does not domain-separate nodes and leaves, meaning that it is possible to have one Merkle root representing different trees. (IIRC there is some out-of-bound consensus rule preventing this from happening, but I don't remember now.) It also does a weird doubling-up construction to handle incomplete trees, which is okay for transactions which are guaranteed to be unique within a block but not great in general.

So I think having a specific trait for Txid and Wtxid is probably the way to go.

impl From<Txid> for TxMerkleNode {
fn from(txid: Txid) -> Self { Self::from_byte_array(txid.to_byte_array()) }
/// A trait that defines a hash that can appear as a node in a Merkle tree.
pub trait MerkleNode: Hash + Encodable {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a718079:

This trait should be inverted. It should be implemented on the node types, have type Leaf (which I don't think needs any bounds) and a from_leaf method.

Right now it is confusingly named because it's called MerkleNode but it's implemented on the leaves of the merkle tree, not the nodes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, after sleeping on this I'm not happy with it. The reasons you state are valid, I tried the other way and it also has problems. Its problematic because of the recursive function coupled with the inline function, because of inline we can't change the type (without unsafe) and because of the recursive function the leaf and internal nodes need to be the same type.

Second thing I woke up thinking, this is not going to work with the make-txid-not-a-general-purpose-hash thing, it might using to/from_byte_array but I'm not sure. I'm starting to think that we should just have two separate functions with suffix _txid and _wtxid and just be done with it. Like you say, this merkle construct is unique to merkelising transaction ids.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its problematic because of the recursive function coupled with the inline function, because of inline we can't change the type (without unsafe) and because of the recursive function the leaf and internal nodes need to be the same type.

Possibly you need a recursive function combined with a thin wrapper function.

Second thing I woke up thinking, this is not going to work with the make-txid-not-a-general-purpose-hash thing, it might using to/from_byte_array but I'm not sure

It will. to/from_byte_array is how you "cast" hashes between each other, which is exactly what's happening here. It looks ugly because it's a bad idea that resulted in these Merkle trees having bad behavior on-chain. So that is a win for our new API to make the ugly thing look ugly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take this one on if you're tired of it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it in #2864. I didn't have any issue with recursion and definitely did not need unsafe code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sweet, thanks man.

@tcharding
Copy link
Copy Markdown
Member Author

Replaced by #2864

@tcharding tcharding closed this Jun 17, 2024
apoelstra added a commit that referenced this pull request Jun 19, 2024
90b6d67 merkle_tree: remove some now-redundant code from block.rs (Andrew Poelstra)
48ad5e4 api changes for "rewrite merkle root calculation" (Andrew Poelstra)
f7ce9bb merkle_node: rewrite algorithm (Andrew Poelstra)
2bc97b2 api changes for new calculate_root method (Andrew Poelstra)
8d5cb01 merkle_tree: introduce MerkleNode trait to better-type merkle tree calculation (Andrew Poelstra)
0435390 api changes after moving the hashes (Andrew Poelstra)
5d8aa94 Move merkle_tree hash types (Andrew Poelstra)

Pull request description:

  Alternate to #2853. Takes patch 1 from that PR.

ACKs for top commit:
  Kixunil:
    ACK 90b6d67
  tcharding:
    ACK 90b6d67

Tree-SHA512: 0e0a3f532e2efb4116f4ebc50719d3a2022c37405b0a22b15f79189b54b623e906e8fdd53997efc611eac5ceaee4c0ef12dab1d5cb595755c0c9e091c0a5904f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bitcoin PRs modifying the bitcoin crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants