New Witness struct to improve ser/de perfomance#672
New Witness struct to improve ser/de perfomance#672dr-orlovsky merged 2 commits intorust-bitcoin:masterfrom
Conversation
|
Have you seen https://github.com/rust-bitcoin/rust-bitcoin/pull/668/files#diff-c55d0186f3d5990b5e223c1a8aa1bbe60988201c452dd13b3324e892bc33119eR749-R825 ? It should be very complimentary to this PR |
I didn't but I've looked it now, I am not sure to grasp the complementarity since one it's address witness and the other is the witness stack in the inputs, could you elaborate? |
af9711e to
8dcf3d6
Compare
|
By "complementing" I mean they solve strong types issue for segwit related structures. I gave you link just to prevent double work from you if you will occasionally do that type which I already did |
TheBlueMatt
left a comment
There was a problem hiding this comment.
This is awesome! I've historically always wanted to do this on a broader scale - think a similar "store it in serialized form" for an entire transaction/block and just store pointers to the start of various datastructures to provide accessors. Of course that could come later, but this is awesome by itself.
|
|
||
| /// contains the witness Vec<Vec<u8>> serialization without the initial varint indicating the | ||
| /// number of elements (which is stored in len) | ||
| content: Vec<u8>, |
There was a problem hiding this comment.
If we're going down this path, does it make sense to also consider something like Bitcoin Core's "prevec" for the witness contents?
There was a problem hiding this comment.
There was a problem hiding this comment.
Yea, I mean such a thing is super-duper trivial to write.
There was a problem hiding this comment.
I think a tinyvec like structure could probably be applicable broadly (like also for inputs/outpus) and as an addition to this, probably better to address in another PR
|
|
||
| #[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Hash)] | ||
| #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] //TODO manually implement? | ||
| pub struct Witness { |
There was a problem hiding this comment.
Does it make sense to have a "pointers to the start of each witness element" vec as well? I guess given most witnesses are like 2-5 elements long maybe its not worth it, but would be cool to add a benchmark for iterating a lightning HTLC transaction witness and just test.
There was a problem hiding this comment.
Pointers to the start of each witness element would allow efficient index access but require another Vec.
I think for witness is not worth it because, as you said usually the witness is a few elements.
There was a problem hiding this comment.
Maybe have offsets in SmallVec?
We could also optimistically assume that transactions are smaller than 4M bytes, so we could cram offsets into struct U24(u8, u8, u8);
There was a problem hiding this comment.
Maybe have offsets in SmallVec?
#672 (comment)
Also, having offsets would allow efficient indexed access, but mutable access it's another story... In the end I don't think there are really use case requiring that. I think I would not add offsets here even with tinyvec/smallvec.
We could also optimistically assume that transactions are smaller than 4M bytes, so we could cram offsets into struct U24(u8, u8, u8);
the overall weight of the struct would not change because of padding (on most common 64 bits arch), I would leave this optimization in case we need other fields in the struct
One allocation improves cache locality so even ignoring allocations, it's faster to go through one chunk of linear memory than jump between many. |
8dcf3d6 to
8ccc278
Compare
|
Some updates:
|
for reference, I found this unmaintained implementation that stores blockchain data structures in slices while providing walker to access structured data |
|
Newtype for witness structure - yes please! It always felt strange to me having just Storing block in serialized form with pointers - this may help us with electrs performance. We're troubleshooting/brainstorming right now: romanz/electrs#547 cc @romanz I think it'd require a bit of reasonable |
Kixunil
left a comment
There was a problem hiding this comment.
Concept ACK. Didn't do as deep review as I do usually. Maybe I will look at it more later.
|
|
||
| #[derive(Clone, PartialEq, Eq, PartialOrd, Ord, Debug, Hash)] | ||
| #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] //TODO manually implement? | ||
| pub struct Witness { |
There was a problem hiding this comment.
Maybe have offsets in SmallVec?
We could also optimistically assume that transactions are smaller than 4M bytes, so we could cram offsets into struct U24(u8, u8, u8);
src/blockdata/witness.rs
Outdated
| /// Support structure to allow efficient and convenient iteration over the Witness elements | ||
| pub struct WitnessIterator<'a> { | ||
| witness: &'a Witness, | ||
| cursor: usize, |
There was a problem hiding this comment.
Why not just core::slice::Iter<'a, u8>?
There was a problem hiding this comment.
Because you can't get slices from it and should not work as Read?
Or maybe I don't get what you have in mind
There was a problem hiding this comment.
You can, it has as_slice() method. I used it in #662 and #673
Was also thinking of doing take_slice_or_kill via extension trait, so it could be reused. You seem to be doing something similar so maybe worth revisiting that idea.
(Was also considering adding it to core as it seems widely applicable.)
src/blockdata/witness.rs
Outdated
| where | ||
| S: serde::Serializer, | ||
| { | ||
| let vec: Vec<_> = self.iter().map(|e| e.to_vec()).collect(); |
There was a problem hiding this comment.
I think this could be faster but for initial code probably good. (Manual serde impls are annoying as hell.)
Not sure how applicable it is, but in |
|
@dpc yes, also |
Kixunil
left a comment
There was a problem hiding this comment.
The latest update looks great!
| self.cursor = end; | ||
| Some(&vec[start..end]) | ||
| let varint = VarInt::consensus_decode(self.0.as_slice()).ok()?; | ||
| self.0.nth(varint.len() - 1)?; // VarInt::len returns at least 1 |
There was a problem hiding this comment.
I'd like to see VarInt::consensus_decode_iter() one day but good enough for this PR.
|
Needs a rebase |
|
Can you also pls stick in impl Display for Witness {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
for item in self.iter() {
writeln!(f, "{}", item.to_hex())?;
}
Ok(())
}
} |
|
Here are all the improvements I'd like to have in this PR: RCasatta#4 |
I don't think it's literally impossible to access elements without
Also, I can't find extensive indexed access in the miniscript lib (please point me to the code if I am wrong). So I am not convinced to implement I am open to changing my mind, what do other people think? |
|
Why |
This PR seems to be skipping deserializing witness, so not everyone has to pay for it, and code that do need it, can pay for it how they want. Seems like a win in most cases, and a small inconvenience for other. |
Well, I'd like to have in command-line pro wallets (I have one myself) printing tx witness like And not |
|
@RCasatta replied in that PR, but briefly, we need things like "accessing second to last element" in many cases, including parsing witness data for taproot and segwit v0, so why not to have a index map as a part of the structure and const-time |
Adding an index map would defeat the main [1] purpose of this PR [2] because it would require another allocation for the map. If "accessing second to last element" is a common use case (and also the last, I guess), we can add two [1]: main because I think deserialization happens more than operating on the witness |
|
I think that is a good solution! |
|
If there are multiple strong usecases here, how about having two (or even multiple)
Edit. One could even have Edit. I guess some types don't need to implement both decode and encode. If any is missing it will just make them un-encodeable/decodeabe. A |
sanket1729
left a comment
There was a problem hiding this comment.
Did a detailed code review. Looks good, left one comment about clear function.
I was going to ask for a pop function, but I think for verifying and interpreting scripts witnesses, we should convert it to Vec<Vec>.
src/blockdata/witness.rs
Outdated
| self.witness_elements = 0; | ||
| } | ||
|
|
||
| /// Push a new element on the witness, require an allocation |
There was a problem hiding this comment.
nit: requires an allocation.
src/blockdata/witness.rs
Outdated
| assert_eq!(witness_serialized, serialize(&witness)); | ||
|
|
||
| //assert_eq!(32, std::mem::size_of::<Witness>()); | ||
| //assert_eq!(24, std::mem::size_of::<Option<Vec<u8>>>()); |
There was a problem hiding this comment.
They were removed in latest commit, but present in old one, after the squash they are removed completely
src/blockdata/witness.rs
Outdated
There was a problem hiding this comment.
last and second_to_last are not reset when calling the clear function
…eeping most usability. Witness struct is in place of the Vec<Vec<u8>> we have before this commit. from_vec() and to_vec() methods are provided to switch between this type and Vec<Vec<u8>> Moreover, implementation of Default, Iterator and others allows to have similar behaviour but using a single Vec prevent many allocations during deserialization which in turns results in better performance, even 20% better perfomance on recent block. last() and second_to_last() allows to access respective element without going through costly Vec transformation
ba224cf to
106acdc
Compare
|
Fixed the clear method. I squashed all but the fuzz commit into one since they were changing the same code. |
Do you mean the tip commit? |
|
@dr-orlovsky could you remove the "changes requested" if everything is ok? I cannot merge otherwise |
| /// let _sighash = sig_hasher.segwit_signature_hash(inp, &prevout_script, 42, EcdsaSigHashType::All); | ||
| /// // ... sign the sighash | ||
| /// sig_hasher.witness_mut(inp).unwrap().push(Vec::new()); | ||
| /// sig_hasher.witness_mut(inp).unwrap().push(&Vec::new()); |
There was a problem hiding this comment.
Nit:
| /// sig_hasher.witness_mut(inp).unwrap().push(&Vec::new()); | |
| /// sig_hasher.witness_mut(inp).unwrap().push(&[]); |
| #[cfg(feature = "serde")] | ||
| use serde; |
There was a problem hiding this comment.
Nit: this seems redundant (we always have access to the crate without useing it)
|
|
||
| impl Witness { | ||
| /// Creates [`Witness`] object from an array of byte-arrays | ||
| pub fn from_vec(vec: Vec<Vec<u8>>) -> Self { |
There was a problem hiding this comment.
Nit: why not impl From<Vec<Vec<u8>> for Witness instead?
And probably pub fn from<T: AsRef<[u8]>>(bytes: &[T]) will be better.
There was a problem hiding this comment.
Because From is for cheap/trivial conversions. This is neither.
There was a problem hiding this comment.
Fromis for cheap/trivial conversions
Not really, impl From<&str> for String exists for instance and that conversion is not usually considered "cheap".
@dr-orlovsky I consider AsRef<[u8]> a footgun since it would accept strings without explicit as_bytes()
| } | ||
|
|
||
| /// Returns a struct implementing [`Iterator`] | ||
| pub fn iter(&self) -> Iter { |
There was a problem hiding this comment.
I think the name here is confusing: it is not clear do we iterate over witness stack elements - or witness byte sequence. So I will prefer fn byte_iter(&self)
There was a problem hiding this comment.
We iterate over witness stack elements. It wouldn't be meaningful to iterate over bytes. I don't think there is any ambiguity here.
There was a problem hiding this comment.
From what I see in the code we do iterate over bytes here: self.content.iter() where content: Vec<u8> will iterate over bytes.
There was a problem hiding this comment.
self.content is used internally to provide the witness stack elements iterator via iter()
There was a problem hiding this comment.
I agree with @dr-orlovsky here, chars() and bytes() exist on &str for this reason.
There was a problem hiding this comment.
It is clearly meaningful to iterate over either the characters or bytes of a string. It is not clear to me what it means to iterate over the bytes of a witness, or even in what order they would be returned.
There was a problem hiding this comment.
OK, if it seems completely obvious that it means stack elements fine for me. I'd just like the docs to be more descriptive.
Just note that non-obvious iterators can be a footgun. I once got burned by &std::path::Path implementing IntoIterator (Can you guess what it iterates over without looking it up?)
…arking 247a14f Use test big block for bench_stream_reader instead of making one (Riccardo Casatta) b92dfbb exclude test_data when publishing the crate (Riccardo Casatta) f5a9681 include a big block in test_data, use it for ser/de benchmark (Riccardo Casatta) 09dada5 Move bip158 test vectors to test_data (Riccardo Casatta) 06d1a82 Remove testnet block hex from tests, use test_data with include_bytes! (Riccardo Casatta) Pull request description: In the first two commits I moved some data from source files to the newly introduced `test_data` dir, including it with `include_[str|bytes]!` macro. The second-to-last commit introduces a big block in test_data which is very handy in ser/de benchmark (I used it for #672) because with smaller blocks you may not notice performance improvements. Since I don't want to pollute the package the last commit excludes the `test_data` dir from the published package. I think it's fine to do it because dependent packages don't run dependencies tests. ACKs for top commit: apoelstra: ACK 247a14f Kixunil: tACK 247a14f Tree-SHA512: a2beb635b0a358737d0b57d3e7205b1ddf87652b9a8c889ce63e2867659a8eaf7e43a5b87a453345d56d953745913f40b58596f449e5fbc87340e0dd2aef0727
…ex), and get_tapscript 3c0d5ae Add get_tapscript to Witness (junderw) 4226d60 Add Index<usize> and nth(index) to Witness (junderw) Pull request description: Ref: #672 (comment) [Add Index<usize> and nth(index) to Witness](4226d60) [4226d60](4226d60) Arbitrary indexing into Witness fixes the API of last and second_to_last to be more flexible. This patch started off as an addition of third_to_last, but ended up evolving into arbitrary indexing to allow for future use cases. A list of the indices of the start byte for each witness element is stored as an ordered contiguous group of u32s represented as 4 bytes each in the Vec<u8> contents. The bytes are stored using to_ne_bytes for performance reasons. A helper function is added to the tests to allow for easier contruction of the contents Vec in test vectors. u32 was chosen because 22 bits are needed to store 4,000,000 which is the maximum weight limit for a block. This might need to be reworked in the event of consensus limits increasing, but u32 can hold 1000x the current limit, so it should be fine for the forseeable future. The push and consensus_deserialize functions utilize rotate_left and rotate_right to move the indices to the end of the new allocation. Depending on the size of the data, this might be more of a performance hit than just allocating a new temporary Vec to store the indices and append them after parsing is completed. However, for a majority of cases rotating the indices should be faster. Suggestions to use VecDeque instead of Vec for contents necessitate other considerations, since it is not a public facing change, those optimizations can be dealt with in future patches. The Index<usize> trait is implemented using the new nth method with expect. The Iter struct is reworked to make use of the new data representation. This new data structure makes it trivial to implement DoubleEndedIterator and other such traits, but I have decided to leave this as out of scope for this patch. --- [Add get_tapscript to Witness](a7501d9) [a7501d9](a7501d9) This new method will check the last witness element to see if it starts with 0x50, and depending on the result it will return the second to last or third to last witness element according to BIP341. In its current state, Witness can not know what type of script it is fulfilling, so it is up to the caller to verify if the previous output is a taproot output or not. --- Edit: This is the previous PR body: > In a taproot script payment with annex, quick access to the 3rd to last element (which is the actual script in this case) is convenient. > > This feels like kicking the can down the road again, but I think it's a nice to have method. > > RCasatta dr-orlovsky were discussing this issue. I would like to ask if they have any thoughts on the addition of this. ACKs for top commit: tcharding: ACK 3c0d5ae apoelstra: ACK 3c0d5ae Kixunil: ACK 3c0d5ae Tree-SHA512: 0038eed6ad56786b8dd6d98db0d1846753b8b25de0bc1089cdc75d5850d0ccc66dde9a10be7fe09589ad7db118fd50ee9f7993695968df5c389457ccfcdaa761
df64705 primitives: single allocation witness decoder (Nick Johnson) Pull request description: When attempting to simplify the `WitnessDecoder` in #5224, we noticed that we were double allocating which erased the performance gains of the old decoder from #672. This updates `WitnessDecoder` to use a single allocation like the old decoder. While the intention of this patch was purely for performance, three of the new tests fail against the current version of the `WitnessDecoder`: `decode_empty_element`, `decode_multiple_empty_elements`, and `decode_incomplete_witness_count`. For the first two tests, there is a functional change in how empty witness elements are handled. This patch introduces a short-circuit where the old version might get stuck in a loop forever requesting zero bytes. The third test, `decode_incomplete_witness_count`, shows that the impl has a more explicit failure now if `end` is called before the witness element count is complete, instead of just returning and empty witness. ACKs for top commit: tcharding: ACK df64705 jrakibi: ACK df64705 apoelstra: ACK df64705; successfully ran local tests Tree-SHA512: 5177410da2d3ceb65a7bfc3e8444a913d23cad1c8e83f045501882b45bac2a90f403d0f7e72fe6b88c25774bbf636e007726d0dc93aeb77bed4ec94d67efa2bb
At the moment the Witness struct is
Vec<Vec<u8>>, the vec inside a vec cause a lot of allocations, specifically:The proposed Witness struct contains the serialized format of the witness. This reduces the allocations to:
The inconvenience is having slightly less comfortable access to the witness, but the iterator is efficient (no allocations) and you can always collect the iteration to have a Vec of slices. If you collect the iteration you end up doing allocation anyway, but the rationale is that it is an operation you need to do rarely while ser/de is done much more often.
I had to add a bigger block to better see the improvement (ae860247e191e2136d7c87382f78c96e0908d700), these are the results of the benches on my machine:
With an increased performance to deserialize a block of about 21% and to serialize a block of about 16% (seems even higher than expected, need to do more tests to confirm, I'll appreciate tests results from reviewers)