encode: Replace VarInt type with ReadExt and WriteExt functions#2133
encode: Replace VarInt type with ReadExt and WriteExt functions#2133stevenroose wants to merge 3 commits intorust-bitcoin:masterfrom
Conversation
|
Looks reasonable to me. Review would have been loads easier if your patches said why they should be considered other than just "this makes more sense" - I had to work out the why for each patch myself. Easy reviews make for easy merges :) |
|
Minimal push being a hard error kind of irks me... (This is semi-unrelated, and more of a general musing) Thoughts anyone? |
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
Is there like a BIP or something that can be referenced here?
There was a problem hiding this comment.
Nah, this is all super old stuff. Might be somewhere on wiki.bitcoin.it, but mostly just Core's ref impl.
0391e54 to
ac87127
Compare
|
Damn, there's also clippy making me do stupid changes just because? They are not the same thing. Delegation vs method call + error conversion. The code I wrote semantically means, "we call this method, want the returned value and upstream the error, with conversion if required", while pure method delegation means "[the remainder of] the execution is delegated to this other method". Apart from semantic difference, if ever either of the function signatures were to change, we would have to be dealing with the error types being different. |
|
If I personally had hit this clippy lint I would have changed the code to let n_bytes = w.emit_slice(&self[..])?;
Ok(n_bytes)I personally think the lint is valid but I won't hold #2134 up. |
Kixunil
left a comment
There was a problem hiding this comment.
Concept ACK, this is implementing #1016 (comment)
While we're at it could we also rename it to compact_size? (See the issue I mentioned.)
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
Should contain the explanation that this is because of usize (rightfully) not implementing Into<u64> but we don't support architectures other than 32 and 64 bits.
There was a problem hiding this comment.
On it. Yeah it's pretty unfortunate. Also not sure what to do with usize at all, see todo.
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
I believe we already do have the check but we were discussing elsewhere that these checks should be tied with the functions they provide for clarity.
There was a problem hiding this comment.
I added a NB instead, do you think that's enough at this point? Or should I add a unit tests in this module specifically to check that usize doesn't exceed 64 bits?
There was a problem hiding this comment.
It should be more along the lines of:
#[cfg(not(any(target_pointer_width = "32", target_pointer_width = "64")))]
compile_error!("Only 32 and 64 bit architectures are supported");Which I think we have somewhere but it's far away. Ideally we would have a module (or extension trait) with conversion functions that contains that check and then we would call into those everywhere and get rid of all as or try_into().unwrap() conversions in the whole codebase.
But for now NB is fine since the code is probably littered with these things anyway.
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
FTR this makes the trait no longer object safe. Could be a problem for downstream users, so might be worth adding where Self: Sized.
There was a problem hiding this comment.
I'm a bit confused about what impl IntoU64 is supposed to mean. I have never seen this used in an argument to a trait method before and this style isn't used anywhere else in this codebase.
It would be clearer to just use a generic. The result will still not be object-safe. To get object safety you'd need to use a dyn type.
There was a problem hiding this comment.
That use of impl is just a shorthand to create a generic, these two signatures are equivalent (except the ability of the caller to explicitly specify the generic type using the turbofish):
fn emit_varint(&mut self, v: impl IntoU64) -> Result<usize, io::Error>; fn emit_varint<T: IntoU64>(&mut self, v: T) -> Result<usize, io::Error>;I can add Self: Sized, but not sure that is sufficient to make the WriteExt types boxable. Let me try.
There was a problem hiding this comment.
Hmm, that's pretty annoying as then we don't allow reading from and writing to byte slices anymore as they are ?Sized. Is the boxability really an issue? I would really prefer to not get a dynamic reference in this method.
There was a problem hiding this comment.
Thanks for explaining. And no, I don't think Sized is sufficient to make WriteExt boxable. A generic method is inherently not boxable because the vtable needs to hold pointers to actual functions, and the compiler doesn't know upfront how many different functions the generic will expand into.
I think we should sacrifice object-safety. But if we want to keep it, our options are:
- Return to the old code which has a type rather than methods
- Move the method out of the trait definition into a standalone method
- Move the method into
IntoU64rather thanWriteExt(and thenIntoU64will not be object-safe but that seems less bad)
There was a problem hiding this comment.
- change the signature to take u64 and convert at call-time
But I agree object-safety here shouldn't be important.
There was a problem hiding this comment.
I don't see how it's related to slices, can you explain? This method previously wasn't present and the trait was object safe, now the method was added which makes it no longer object safe which is easily fixable by doing where Self: Sized, so that any existing code using &mut dyn WriteExt will continue to work without us losing anything.
Also I suspect we should impl WriteExt for &mut dyn WriteExt and impl WriteExt for Box<dyn WriteExt> but that's off topic.
There was a problem hiding this comment.
easily fixable by doing
where Self: Sized,
I don't understand this. The problem with object-safety is that we are adding a generic method to the trait. Sized won't help with this.
There was a problem hiding this comment.
It will, the offending method will be unavailable in unsized context but all others will be fine so any code using dyn WriteExt today will continue to work.
There was a problem hiding this comment.
the offending method will be unavailable in unsized context
ooo TIL. Ok, sorry for causing confusion.
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
This is cute but does it really make sense to complicate emit_slice by returning usize just because of this? This also repeats so why not have fn emit_varint_prefixed_slice?
There was a problem hiding this comment.
That actually exists as consensus_encode_with_size. I should probably use it there.
pub(crate) fn consensus_encode_with_size<W: io::Write>(
data: &[u8],
mut w: W,
) -> Result<usize, io::Error> {
Ok(w.emit_varint(data.len())? + w.emit_slice(data)?)
}There was a problem hiding this comment.
Changed the two other occurrences to use this method.
I'd be ok with changing that, but our Encodable trait also does it and now emit_varint has to do it, so it's not the only remaining method breaking with the trait. Tbh the int ones can also do it to make all the len += pieces more concise, but I didn't do it because it would probably be a ton of changes. I can do it if you prefer.
There was a problem hiding this comment.
On second thought, I wouldn't be opposed to moving this consensus_encode_with_size method into the WriteExt trade as something like emit_slice_with_length like you said.
There was a problem hiding this comment.
Yeah, moving it here sounds good. But my point is we shouldn't return usize from emit_slice - the length of the slice is already known. Also I wonder if we really want it rather than just using write_all.
|
I agree with @junderw IIRC there are cases in Bitcoin where minimal encodings are not used, so this could break things. We should have |
It's not really a type any user interfaces with, right? So it's a bit of a bikeshed. Also, it's used for other things than sizes, so I think Core's usage of that term is also somewhat legacy. |
Why? We don't currently have it.. Core always performs minimality check as well. Any |
f68ed57 to
b6b65ef
Compare
|
TIL that almost none of the EDIT: Ok this is not true, It seems like we almost have a pattern to never use the |
My concern is "Will it fail to parse a transaction or block that is valid through consensus at any point in the Bitcoin blockchain's history?" (Note: Nuance is important. I do not mean "Will it fail to parse existing confirmed txes and blocks?" but rather "Will it fail a tx or block that is or was valid consensus wise at any point?"... Bitcoin Core might rely on the fact that it only outputs minimal Varint and doesn't actually validate it. (I think there was some discussion around this area when discussing BIP62 for tx malleability way back a long time ago)) This includes data that was not generated by rust-bitcoin, which is why I think it deserves re-consideration. But like I said, it's just a general musing and since this PR is just converting existing code which had the same error check, it's not a blocker IMO. |
vincenzopalazzo
left a comment
There was a problem hiding this comment.
Concept ACK for this change looks really better and more natural.
junderw
left a comment
There was a problem hiding this comment.
ACK b6b65ef560a80ce048c6f13198f1a25132bcea1e
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:
These can use .into_u64. (Similar below and in several other places.)
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:
If we are going to make this pub it should probably live in the crate root (or maybe even in bitcoin-internals).
As @Kixunil suggests we should tailor our compile-gate by e.g. gating the usize impl specifically on usize fitting into u64. Then only code that tries to use this trait on usize will fail to compile on 128-bit arches.
bitcoin/src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:
It doesn't really matter but I kinda prefer using .into rather than casting for these cases, since it's possible.
|
Done reviewing b6b65ef560a80ce048c6f13198f1a25132bcea1e. everything looks good except some comments about u64 conversions. FYI since #2135 you no longer need to format your own PRs. |
|
Needs rebase man. |
b6b65ef to
8845cf8
Compare
Pull Request Test Coverage Report for Build 7919318263Details
💛 - Coveralls |
|
Rebased. |
|
@apoelstra if this gets another ack can we put it in before any other conflicting PRs. |
| fn emit_slice(&mut self, v: &[u8]) -> Result<(), io::Error> { self.write_all(v) } | ||
| fn emit_slice(&mut self, v: &[u8]) -> Result<usize, io::Error> { | ||
| self.write_all(v)?; | ||
| Ok(v.len()) |
There was a problem hiding this comment.
Why? This looks completely redundant since the caller can just call .len() on the argument. You've complained about breaking changes and this is a breaking change...
There was a problem hiding this comment.
Why? This looks completely redundant since the caller can just call .len() on the argument.
I think it's worthwhile to have a uniform API. You can see at the call sites that it makes the code easier to read.
You've complained about breaking changes and this is a breaking change...
I'd prefer we take the opposite tack and say "we let you make a breaking change so please don't complain about our breaking changes" ;).
More seriously, while this is a breaking change, it's in an obscure method of a trait that basically nobody ever uses directly. So I think the breakage is worth the API cleanup.
There was a problem hiding this comment.
"we let you make a breaking change so please don't complain about our breaking changes"
😆 OK.
| } | ||
| /// Returns 1 for 0..=0xFC, 3 for 0xFD..=(2^16-1), 5 for 0x10000..=(2^32-1), and 9 otherwise. | ||
| #[inline] | ||
| pub const fn varint_size_const(v: u64) -> usize { |
There was a problem hiding this comment.
I think it'd be better to name this varint_size_from_u64 and not mention const. (I know I've made the same mistake in hashes in the past but I learned since then.) Also maybe just not make it public for now.
c717f7f Improve docs on private Witness fields (Tobin C. Harding) Pull request description: The `Witness` type is a reasonable complex data structure, make an effort to clarify its structure in the docs on the private fields. Private docs only. (Original idea pulled out of #2133.) ACKs for top commit: Kixunil: ACK c717f7f apoelstra: ACK c717f7f much clearer, thanks! Tree-SHA512: 9d54b7eeefec97e584fb5f275049dbac0473c949fae8ab05c6961d6fc424c17a058af7037c2220ef1446af294d78c68bfee741cfeca1b18ecc402935d8069dab
579b76b Introduce ToU64 conversion trait (Tobin C. Harding) Pull request description: The idea for this was pulled out of Steven's work in #2133 We already explicitly do not support 16 bit machines. Also, because Rust supports `u182`s one cannot infallibly convert from a `usize` to a `u64`. This is unergonomic and results in a ton of casts. We can instead limit our code to running only on machines where `usize` is less that or equal to 64 bits then the infallible conversion is possible. Since 128 bit machines are not a thing yet this does not in reality introduce any limitations on the library. Add a "private" trait to the `internals` crate to do infallible conversion to a `u64` from `usize`. Implement it for all unsigned integers smaller than `u64` as well so we have the option to use the trait instead of `u32::from(foo)`. ACKs for top commit: Kixunil: ACK 579b76b apoelstra: ACK 579b76b successfully ran local tests Tree-SHA512: 2eaddfff995987a346e052386c6dfef3510e4732e674e3a2cfab60ee391b4cce1bf7ba4fb2dfd4926f8203d7251eea2198ccb61f0b40332e624c88fda4fa7f48
…ethods instead 18d8b0e Replace VarInt type with ReadExt and WriteExt functions (Steven Roose) 003db02 Return encoded length from WriteExt::emit_slice (Steven Roose) Pull request description: This the meat and potatoes out of Steven's work in #2133 and also closes #1016 ACKs for top commit: apoelstra: ACK 18d8b0e successfully ran local tests Tree-SHA512: 2df96c91e0fbfdc87158bde9bbdd9565f67e3f66601697d0e22341416c0cd45dd69d09637393993f350354a44031bead99fd0d2f006b4fc6e7613aedc4b0a832
|
Closing in favor of #2931. |
I've been meaning to try do this for a while, I kinda think it makes more sense like this, but wouldn't mind if it doesn't make it in.