Refactor Script::bytes_to_asm_fmt to use iterator#662
Refactor Script::bytes_to_asm_fmt to use iterator#662dr-orlovsky merged 2 commits intorust-bitcoin:masterfrom
Conversation
172aa0a to
6594686
Compare
|
Would be really nice to get a fuzz test for this, though. |
|
Will let @Kixunil answer Matt's comments before merging. |
|
Yeah, I hoped I can get rustc to prove there's no panic which would make fuzz redundant. Sadly I was unable to do it so fuzz it will be. |
|
I would like to review this(hopefully by tomorrow). Requesting to hold off merging till then |
dr-orlovsky
left a comment
There was a problem hiding this comment.
Just a small nit. Do not approve yet since run out of time to complete review and will do it later
src/blockdata/script.rs
Outdated
There was a problem hiding this comment.
Nit: I do not see any harm of doing derives here with just a single line. But pls do not discard other reviews just because of this one Nit
There was a problem hiding this comment.
The type is private (at least for now) and none of the derives is needed, so let's not needlessly blow up compile time. If we decide to make it public (I'm in favor of this) there will be all applicable derives and trait impls.
There was a problem hiding this comment.
"data types should eagerly implement derives" (from rust guidelines). :)
I am not insisting at all, just trying to understand the logic. Ok, the logic is not to blow the computing time.
However from my experience frequently these private error types become public and ppl forget to add derives to them - at least it cost me a lot of time to update rust-secp256k1, rust-bitcoin and minor crates with derives required to do a proper error handling on my own error types :(
There was a problem hiding this comment.
I'd rather not add gratuitous derives. I find it confusing to read derives on private types because then I need to understand why they are needed.
If this type ever becomes public we can add derives at that time.
There was a problem hiding this comment.
@dr-orlovsky I should have realized there's "public" word missing in that line when reviewing. Yeah, forgetting is unfortunate. Maybe have some checklists for various common tasks that could include this when changing a type to public? I'm thinking of something similar LND has.
dr-orlovsky
left a comment
There was a problem hiding this comment.
utACK 659468681743458e5467bbb92e47ee3c6f084411 with two nits
src/blockdata/script.rs
Outdated
There was a problem hiding this comment.
maybe even be more strict with size > 8 since we haven't arch greater than 64 bits?
There was a problem hiding this comment.
I agree. This line is clever but we probably don't get any extra generality vs just using 8 as a bound.
There was a problem hiding this comment.
Was thinking about it but felt wrong to return NumericOverflow for what is actually too large input. Was also thinking of
enum UintSize {
OneByte = 1,
TwoBytes = 2,
FourBytes = 4,
EightBytes = 8,
}This enforces sane values but is API-breaking and a bit cumbersome. Maybe worth it considering most uses will use literal anyway?
(Also, maybe "width" is better than "size"?)
c83ef8f
6594686 to
c83ef8f
Compare
|
I have force-pushed all obvious style changes. Will address size checking when replied and fuzz test in a different commit, not sure if I will have time to do it today. |
|
CI error: |
This refactors `Script::bytes_to_asm_fmt`` function to use an iterator instead of index. Such change makes it easier to reason about overflows or out-of-bounds accesses. As a result this also fixes three unlikely overflows and happens to improve formatting to not output space at the beginning in some weird cases. To improve robustness even better it also moves `read_uint` implementation to internal function which returns a more specific error type which can be exhaustively matched on to guarantee correct error handling. Probably because of lack of this the code was previously checking the same condition twice, the second time being unreachable and attempting to behave differently than the first one. Finally this uses macro to deduplicate code which differs only in single number, ensuring the code stays in sync across all branches.
c83ef8f to
2f237ee
Compare
|
Rebased on top of master to include #664 and added the fuzz test. Disclaimer: didn't try to compile the fuzz test, (ab)using CI for that. |
a6e6bb1 to
339664e
Compare
This adds fuzz target for `Script::bytes_to_asm_fmt` which could panic due to overflow in the past. Fuzzing should decrease the risk of other panics.
339664e to
0e1b993
Compare
|
@dr-orlovsky that was not CI related but my mistake. :) |
dr-orlovsky
left a comment
There was a problem hiding this comment.
utACK 0e1b993, however extended test will be welcome (see my comment)
| "OP_PUSHDATA2 <push past end>"); | ||
| assert_eq!(hex_script!("4effffffff01").asm(), | ||
| " OP_PUSHDATA4 <push past end>"); | ||
| "OP_PUSHDATA4 <push past end>"); |
There was a problem hiding this comment.
We may need tests with > 1 push opcode to check formatting when multiple mixed instructions are present
sanket1729
left a comment
There was a problem hiding this comment.
ACK 0e1b993 as this PR as it for bytes_to_asm_fmt. Though I would like to note the panic in instructions API I mentioned previously is not addressed by this.
This instructions().next().next() should panic for the last testcase in the first commit on 32 bit machines.
|
@sanket1729 ah, I got confused when reading your comment previously. You're right there's a bunch of similar issues. I hope I can look at them soon. (In a different PR, also changing the slice to iterator) I came here just for quick sync need to go AFK. |
|
@TheBlueMatt comment was addressed, so merging now |
This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin#662 (review)
2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: #662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review)
…pt::Instructions` 2c28d3b Fix handling of empty slice in Instructions (Martin Habovštiak) e6ff754 Fix doc of take_slice_or_kill (Martin Habovštiak) 0ec6d96 Cleanup after `Instructions` refactoring (Martin Habovstiak) bc76325 Move repeated code to functions in script (Martin Habovstiak) 1f55edf Use iterator in `blockdata::script::Instructions` (Martin Habovstiak) Pull request description: This refactors `blockdata::script::Instructions` to use `::core::slice::Iter<'a, u8>` instead of `&'a [u8]` to better express the intention and to avoid some slicing mistakes. Similarly to a previous change this uses a macro to deduplicate the common logic and the new `read_uint_iter` internal function to automatically advance the iterator. Addresses: rust-bitcoin/rust-bitcoin#662 (review) ACKs for top commit: tcharding: ACK 2c28d3b sanket1729: ACK 2c28d3b. I don't want to hold ACKs on minor things as they can be in a fixup later. Tree-SHA512: 9dc770b9f7958efbd0df2cc2d3546e23deca5df2f94ea2c42b089df628f4b99f08032ca4aa8822caf6643a8892903e1bda41228b78c8519b90bcaa1255d9acc6
Because of this change being somewhat large I didn't do it together with #658.
I also don't mind if this is merged after Taproot (feel free to put off review).
This refactors `Script::bytes_to_asm_fmt`` function to use an iterator
instead of index. Such change makes it easier to reason about overflows
or out-of-bounds accesses. As a result this also fixes three unlikely
overflows and happens to improve formatting to not output space at the
beginning in some weird cases.
To improve robustness even better it also moves
read_uintimplementation to internal function which returns a more specific error
type which can be exhaustively matched on to guarantee correct error
handling. Probably because of lack of this the code was previously
checking the same condition twice, the second time being unreachable and
attempting to behave differently than the first one.
Finally this uses macro to deduplicate code which differs only in single
number, ensuring the code stays in sync across all branches.
I also attempted to get this work with
no_paniccrate to prove absence of panic but lookslike the compiler can not do that. (Yes, I did change writer to be trivial and a bunch of other things.)
If someone wants to give it a try I can publish that change.
Unresolved question
Since
Errorseems to be related to script execution maybeUintErrorshould be public?At least
NumericOverflowvariant is strange.