Skip to content

Implement support for Hidden nodes in taproot trees and Fix taproot serde bugs#1533

Merged
apoelstra merged 6 commits intorust-bitcoin:masterfrom
sanket1729:taproot_fixes
Mar 4, 2023
Merged

Implement support for Hidden nodes in taproot trees and Fix taproot serde bugs#1533
apoelstra merged 6 commits intorust-bitcoin:masterfrom
sanket1729:taproot_fixes

Conversation

@sanket1729
Copy link
Copy Markdown
Member

@sanket1729 sanket1729 commented Jan 8, 2023

This PR changes/removes the serde implementation for the following types

  • TaprootSpendInfo: Removed. This data structure contains derived information for taproot spending that cannot be validated easily. To elaborate, TaprootSpendInfo is constructed from a tree, but loses information about the tree structure and maintains handy information like script_control_block_map, cached tweaked key, Merkle root etc.
  • TaprootBuillder: Removed. Hard to implement and not very useful.
  • TapTree: Modified to check invariants.
  • NodeInfo: Now implements serde with support for Hidden nodes
  • ScriptLeaf: Removed serde. Users should not directly construct this. This is just an output iterator item of TapTree::script_leaves()

Data structure changes:

  • Introduced LeafNode: Supports Hidden leaves. Has serde implemented
  • Cleanly separate TapTree and NodeInfo: TapTree is a full BIP370 compatible tree with no hidden nodes. NodeInfo is a tree that can potentially contain hidden leaves.
  • Added NodeInfo::leaf_nodes: Iterator that iterates over known and hidden leaves of NodeInfo.
  • Updated TapTree::script_leaves: Iterator that is guaranteed to output known leaves.

@sanket1729
Copy link
Copy Markdown
Member Author

Based on #1523

tcharding
tcharding previously approved these changes Jan 9, 2023
Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 7cf04aa0070e45c6dd322a695239dd43dc0f7d68

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're implementing this manually (presumably to improve error handling) I suggest to use size hint both to pre-allocate the vec and to return early.

And we should use Error::invalid_length instead of custom.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing it automatically with #[serde(into = "Vec<sha256::Hash>")]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the optimizer will see you're comparing twice and remove the duplicate, so the commit message is not accurate, the only cost is returning the extra bool. Or if you want to be explicit assign a < b to a variable and use it at both places.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing these panics, why are we even storing TaprootBuilder and not NodeInfo and script leaves? Make invalid states unrepresentable...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I already am working on this locally. Raised this PR initially to move things forward.

@sanket1729 sanket1729 force-pushed the taproot_fixes branch 3 times, most recently from 38aab00 to 174092d Compare January 9, 2023 11:51
@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jan 9, 2023

Crap, you pushed in the middle of me reviewing. Do you intend to make more changes?

@sanket1729
Copy link
Copy Markdown
Member Author

sanket1729 commented Jan 9, 2023

@Kixunil, yes. Converting this to draft. I have a few other changes to address hidden nodes cleanly. I should be done by end of day today.

@sanket1729 sanket1729 marked this pull request as draft January 9, 2023 12:10
@sanket1729 sanket1729 force-pushed the taproot_fixes branch 2 times, most recently from 8819d3d to 7472f29 Compare January 9, 2023 17:51
@sanket1729 sanket1729 marked this pull request as ready for review January 9, 2023 18:00
@sanket1729
Copy link
Copy Markdown
Member Author

I plan to squash the last 4 commits into one after I get access to my dev machine. This should be ready for review otherwise. There are still rough edges, which I can fix after we have #1479 in. I will spend time on it first

@sanket1729
Copy link
Copy Markdown
Member Author

There are some TODOs in this PR that are dependent on #1479. If this PR gets in PR, I can create a followup PR to address those or address them in a subsequent commit if #1479 is merged first

@sanket1729
Copy link
Copy Markdown
Member Author

I squashed the last 4 commits into one. It might be a big commit to review, but could not easily split them while maintaining modularity and making sure each commit passes tests.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. I think we should get #1479 in first.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since serialize accepts &self this is forced to clone. I think into shouldn't be needed since serde treats newtypes as transparent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. It is still serialized as a new struct. Removing this line results in

thread 'taproot::test::test_merkle_branch_serde' panicked at 'expected Token::Seq { len: Some(2) } but serialized as NewtypeStruct { name: "TaprootMerkleBranch", }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, serializers should treat it as transaparent but they don't have to. Maybe #[serde(transparent)] would work? I'm not sure if it'd clash with try_from though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, #[serde(transparent)] conflicts with try_from

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these two delegate to a different field? It looks foot-gunny since they are supposed to be related.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, I should add some documentation to this. For taproot trees, because the leaves get sorted by hash while constructing the root, a tree with root A, left child B and right child C is the same as root A with left child B and right child A.

We had Eq reflect this property in TapTree, so I thought Hash should reflect this property. The Eq property is really handy while round-tripping psbts.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Eq returns true then comparing hashes must also return true.

I'm confused by this explanation:

a tree with root A, left child B and right child C is the same as root A with left child B and right child A.

Maybe you meant to say this?

a tree with root A, left child B and right child C is the same as root A with left child C and right child B.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that is what I meant to say. This is reason for custom implementation of Eq and Hash in NodeInfo. But TapTree now auto drives it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? There's no panic here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have fn with_capacity(capacity: usize) -> TaprootBuilder and call it here? I suppose we would need to internally allocate Vec with log2(capacity + 1) (size_of::<usize>() * 8 - (capacity + 1).leading_zeros() if I'm not mistaken)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not follow the formula, but conceptually I don't think it is possible to estimate the height of a tree from the number of leaves.
For example, there can be 128 depth tree with 129 leaves. One leaf at each level and 2 leaves at level 128.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to optimize for unoptimal trees - they will have bigger problems than a vec reallocating too many times.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. capacity is just a good enough guess, does not have to be strict upper bound.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I don't think there is a +1 here. Used size_of::<usize>() * 8 - (capacity).leading_zeros() instead of capacity + 1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like outside code could break invariants when this is pub?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it might break invariants. I remember thinking about this while adding pub, but I was mistaken :(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the leaf is not known we already know the hash and should return it?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be confusing because the hidden node might not necessarily be a leaf. It could be a hidden tree.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a note for adding node_hash() method that returns the TapNodeHash

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. Agree with adding node_hash.

@sanket1729 sanket1729 force-pushed the taproot_fixes branch 2 times, most recently from 4686241 to 0b4ac5e Compare January 11, 2023 12:29
Copy link
Copy Markdown
Member Author

@sanket1729 sanket1729 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few fixes planned for the PR. Basically, there will be two iterators.

  1. Iterator from NodeInfo that returns ScriptLeaves.
  2. Iterator from TapTree that returns a tuple (depth:u8, ver: LeafVersion, script: &Script)

I am also not thrilled with the name ScriptLeaf because it also now has hidden nodes, but couldn't think of anything better

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jan 11, 2023

I am also not thrilled with the name ScriptLeaf because it also now has hidden nodes

Is there any problem with Node? (and call iterator Nodes, method nodes()...)

@sanket1729
Copy link
Copy Markdown
Member Author

Is there any problem with Node? (and call iterator Nodes, method nodes()...)

Nodes might represent an internal node too. We are only dealing with tree leaves over here. But they may not be Scripts. So, ScriptLeaf is not really appropriate. I wanted to use TapLeaf, we are using it for the bare enum without the Merkle path caching. Perhaps, TapLeafInfo?

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jan 11, 2023

My understanding is that hidden node could potentially hide subtree rather than a single leaf so then "Leaf" depend on whether you define it as "leaf of the whole tree" or "leaf of the known part of the tree". But I admit I don't see a good way to express the latter.

LeafNode maybe?

@sanket1729
Copy link
Copy Markdown
Member Author

Sorry for the delay. This is ready for review again.

@sanket1729
Copy link
Copy Markdown
Member Author

If there are no significant changes required to get this one in, we should try to get this in 0.30

@sanket1729 sanket1729 dismissed stale reviews from Kixunil, tcharding, and apoelstra via a5544b9 March 1, 2023 01:07
@sanket1729 sanket1729 force-pushed the taproot_fixes branch 3 times, most recently from 61d8fe9 to b621465 Compare March 1, 2023 01:21
@sanket1729
Copy link
Copy Markdown
Member Author

@Kixunil addressed all the comments. This is ready for review.

@apoelstra Great that we have the test scripts to test these kinds of silent merge failures in these kinds of long-range PRs.

tcharding
tcharding previously approved these changes Mar 1, 2023
Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b6214651cb7579a5773f3da062b620b92d8f74e6

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some bugs unfortunately.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct, the value needs to be number of bytes but len() is returning number of items.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps unreachable! rather than silently ignoring potential bug?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bug. TaprootSpendInfo is a cached data structure for storing information about spending leaves. As of now, we don't cache anything for hidden leaves.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is cleaner:

    pub fn from_node_info<C: secp256k1::Verification>(
        secp: &Secp256k1<C>,
        internal_key: UntweakedPublicKey,
        node: NodeInfo,
    ) -> TaprootSpendInfo {
        // Create as if it is a key spend path with the given merkle root
        let root_hash = Some(node.hash);
        let mut info = TaprootSpendInfo::new_key_spend(secp, internal_key, root_hash);

        for leaves in node.leaves {
            match leaves.leaf {
                TapLeaf::Hidden(_) => {
                    // We don't store any information about hidden nodes in TaprootSpendInfo.
                }
                TapLeaf::Script(script, ver) => {
                    let key = (script, ver);
                    let value = leaves.merkle_branch;
                    match info.script_map.get_mut(&key) {
                        None => {
                            let mut set = BTreeSet::new();
                            set.insert(value);
                            info.script_map.insert(key, set);
                        },
                        Some(set) => {
                            set.insert(value);
                        }
                    }
                },
            }
        }
        info
    }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be multiplied by 2 since there are two calls of serialize_element.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x needs to be divided by 2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just realized this is wrong because it has to forward size_hint correctly. This method also doesn't need to be implemented.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing size_hint.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other traits for consideration here and above: FusedIterator, DoubleEndedIterator

@sanket1729 sanket1729 force-pushed the taproot_fixes branch 2 times, most recently from 31b283d to 82e4f5b Compare March 1, 2023 21:38
@sanket1729
Copy link
Copy Markdown
Member Author

Found some bugs unfortunately.

@Kixunil, good eye and great work on catching those bugs on *2 and /2. Surprisingly easy to miss and overlook. Force pushed fixes

@tcharding
Copy link
Copy Markdown
Member

Ouch, that was a hard review to read @Kixunil after my ack. You are a good bloke to have on the team.

tcharding
tcharding previously approved these changes Mar 1, 2023
Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to put a bit more review effort in, many suggestions are only mildly better, feel free to ignore.

And just to be clear, I'm not a domain expert on taproot, take my review as code/style only please.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this is cleaner:

    pub fn from_node_info<C: secp256k1::Verification>(
        secp: &Secp256k1<C>,
        internal_key: UntweakedPublicKey,
        node: NodeInfo,
    ) -> TaprootSpendInfo {
        // Create as if it is a key spend path with the given merkle root
        let root_hash = Some(node.hash);
        let mut info = TaprootSpendInfo::new_key_spend(secp, internal_key, root_hash);

        for leaves in node.leaves {
            match leaves.leaf {
                TapLeaf::Hidden(_) => {
                    // We don't store any information about hidden nodes in TaprootSpendInfo.
                }
                TapLeaf::Script(script, ver) => {
                    let key = (script, ver);
                    let value = leaves.merkle_branch;
                    match info.script_map.get_mut(&key) {
                        None => {
                            let mut set = BTreeSet::new();
                            set.insert(value);
                            info.script_map.insert(key, set);
                        },
                        Some(set) => {
                            set.insert(value);
                        }
                    }
                },
            }
        }
        info
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: These use self::<ErrorType>::* statements are sprinkled around the codebase because I was coding sometime with only half my brain connected, the self:: is totally useless.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the * 32 when we push just the length here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another good catch. I really need to think before I am typing :(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not important right now but just FYI; we don't need the : here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: And here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason next get inlined but not size_hint?

Cleanly separate `TapTree` and `NodeInfo`. Fix serde not respecting
invariants for several data structures

Repurpose some tests from removed taproot builder for taptree
This was incorrect and not needed. Users should not be able to create
only tree leaves directly without going through the tree construction in
rust-bitcoin
Implementing this for spendinfo is really complicated because it
contains some cached data without retaining the components that are used
to compute them.

Users should serde the 1) NodeInfo and 2) internal key and reconstruct
TaprootSpendInfo from it.
ScriptLeaf feels like leaf has to be a script, but it can a hidden
subtree. LeafNode conveys this better
@sanket1729
Copy link
Copy Markdown
Member Author

@tcharding, thanks for the detailed review. Force pushed addressing your comments.

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 74022ba

code/style ack

@tcharding
Copy link
Copy Markdown
Member

Review needed on this one please crew, this is the last big PR to go in before we can consider release (other than formatting changes and the hashes/hex debate). Thanks.

@apoelstra
Copy link
Copy Markdown
Member

apoelstra commented Mar 3, 2023

Sorry @tcharding I am ready to ACK this, but I spent all day yesterday trying to test it using my new nix setup, and running into unimplemented stuff that I had to fix (with pretty slow iteration times, and a long time spent trawling through cargo source code). You can see my efforts on the crate2nix repo PR list.

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 74022ba

@apoelstra apoelstra mentioned this pull request Mar 4, 2023
@apoelstra apoelstra merged commit 10eb0da into rust-bitcoin:master Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants