Skip to content

encode: Replace VarInt type with ReadExt and WriteExt functions#2133

Closed
stevenroose wants to merge 3 commits intorust-bitcoin:masterfrom
stevenroose:encode-varint
Closed

encode: Replace VarInt type with ReadExt and WriteExt functions#2133
stevenroose wants to merge 3 commits intorust-bitcoin:masterfrom
stevenroose:encode-varint

Conversation

@stevenroose
Copy link
Copy Markdown
Collaborator

@stevenroose stevenroose commented Oct 20, 2023

I've been meaning to try do this for a while, I kinda think it makes more sense like this, but wouldn't mind if it doesn't make it in.

@tcharding
Copy link
Copy Markdown
Member

Looks reasonable to me. Review would have been loads easier if your patches said why they should be considered other than just "this makes more sense" - I had to work out the why for each patch myself. Easy reviews make for easy merges :)

@junderw
Copy link
Copy Markdown
Contributor

junderw commented Oct 20, 2023

Minimal push being a hard error kind of irks me... (This is semi-unrelated, and more of a general musing)

Thoughts anyone?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there like a BIP or something that can be referenced here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, this is all super old stuff. Might be somewhere on wiki.bitcoin.it, but mostly just Core's ref impl.

@stevenroose
Copy link
Copy Markdown
Collaborator Author

Damn, there's also clippy making me do stupid changes just because?


error: question mark operator is useless here
   --> bitcoin/src/consensus/encode.rs:542:17
    |
542 |                 Ok(w.emit_slice(&self[..])?)
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try removing question mark and `Ok()`: `w.emit_slice(&self[..])`
...

They are not the same thing. Delegation vs method call + error conversion. The code I wrote semantically means, "we call this method, want the returned value and upstream the error, with conversion if required", while pure method delegation means "[the remainder of] the execution is delegated to this other method". Apart from semantic difference, if ever either of the function signatures were to change, we would have to be dealing with the error types being different.

@tcharding
Copy link
Copy Markdown
Member

If I personally had hit this clippy lint I would have changed the code to

                let n_bytes = w.emit_slice(&self[..])?;
                Ok(n_bytes)

I personally think the lint is valid but I won't hold #2134 up.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, this is implementing #1016 (comment)

While we're at it could we also rename it to compact_size? (See the issue I mentioned.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should contain the explanation that this is because of usize (rightfully) not implementing Into<u64> but we don't support architectures other than 32 and 64 bits.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On it. Yeah it's pretty unfortunate. Also not sure what to do with usize at all, see todo.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we already do have the check but we were discussing elsewhere that these checks should be tied with the functions they provide for clarity.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a NB instead, do you think that's enough at this point? Or should I add a unit tests in this module specifically to check that usize doesn't exceed 64 bits?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be more along the lines of:

#[cfg(not(any(target_pointer_width = "32", target_pointer_width = "64")))]
compile_error!("Only 32 and 64 bit architectures are supported");

Which I think we have somewhere but it's far away. Ideally we would have a module (or extension trait) with conversion functions that contains that check and then we would call into those everywhere and get rid of all as or try_into().unwrap() conversions in the whole codebase.

But for now NB is fine since the code is probably littered with these things anyway.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR this makes the trait no longer object safe. Could be a problem for downstream users, so might be worth adding where Self: Sized.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about what impl IntoU64 is supposed to mean. I have never seen this used in an argument to a trait method before and this style isn't used anywhere else in this codebase.

It would be clearer to just use a generic. The result will still not be object-safe. To get object safety you'd need to use a dyn type.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That use of impl is just a shorthand to create a generic, these two signatures are equivalent (except the ability of the caller to explicitly specify the generic type using the turbofish):

    fn emit_varint(&mut self, v: impl IntoU64) -> Result<usize, io::Error>;
    fn emit_varint<T: IntoU64>(&mut self, v: T) -> Result<usize, io::Error>;

I can add Self: Sized, but not sure that is sufficient to make the WriteExt types boxable. Let me try.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's pretty annoying as then we don't allow reading from and writing to byte slices anymore as they are ?Sized. Is the boxability really an issue? I would really prefer to not get a dynamic reference in this method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. And no, I don't think Sized is sufficient to make WriteExt boxable. A generic method is inherently not boxable because the vtable needs to hold pointers to actual functions, and the compiler doesn't know upfront how many different functions the generic will expand into.

I think we should sacrifice object-safety. But if we want to keep it, our options are:

  • Return to the old code which has a type rather than methods
  • Move the method out of the trait definition into a standalone method
  • Move the method into IntoU64 rather than WriteExt (and then IntoU64 will not be object-safe but that seems less bad)

Copy link
Copy Markdown
Collaborator Author

@stevenroose stevenroose Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • change the signature to take u64 and convert at call-time

But I agree object-safety here shouldn't be important.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it's related to slices, can you explain? This method previously wasn't present and the trait was object safe, now the method was added which makes it no longer object safe which is easily fixable by doing where Self: Sized, so that any existing code using &mut dyn WriteExt will continue to work without us losing anything.

Also I suspect we should impl WriteExt for &mut dyn WriteExt and impl WriteExt for Box<dyn WriteExt> but that's off topic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easily fixable by doing where Self: Sized,

I don't understand this. The problem with object-safety is that we are adding a generic method to the trait. Sized won't help with this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, the offending method will be unavailable in unsized context but all others will be fine so any code using dyn WriteExt today will continue to work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the offending method will be unavailable in unsized context

ooo TIL. Ok, sorry for causing confusion.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cute but does it really make sense to complicate emit_slice by returning usize just because of this? This also repeats so why not have fn emit_varint_prefixed_slice?

Copy link
Copy Markdown
Collaborator Author

@stevenroose stevenroose Oct 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually exists as consensus_encode_with_size. I should probably use it there.

pub(crate) fn consensus_encode_with_size<W: io::Write>(
    data: &[u8],
    mut w: W,
) -> Result<usize, io::Error> {
    Ok(w.emit_varint(data.len())? + w.emit_slice(data)?)
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the two other occurrences to use this method.

I'd be ok with changing that, but our Encodable trait also does it and now emit_varint has to do it, so it's not the only remaining method breaking with the trait. Tbh the int ones can also do it to make all the len += pieces more concise, but I didn't do it because it would probably be a ton of changes. I can do it if you prefer.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I wouldn't be opposed to moving this consensus_encode_with_size method into the WriteExt trade as something like emit_slice_with_length like you said.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, moving it here sounds good. But my point is we shouldn't return usize from emit_slice - the length of the slice is already known. Also I wonder if we really want it rather than just using write_all.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Oct 22, 2023

I agree with @junderw IIRC there are cases in Bitcoin where minimal encodings are not used, so this could break things. We should have read_varint_minimal and read_varint_non_minimal instead.

@stevenroose
Copy link
Copy Markdown
Collaborator Author

While we're at it could we also rename it to compact_size? (See the issue I mentioned.)

It's not really a type any user interfaces with, right? So it's a bit of a bikeshed. Also, it's used for other things than sizes, so I think Core's usage of that term is also somewhat legacy.

@stevenroose
Copy link
Copy Markdown
Collaborator Author

I agree with @junderw IIRC there are cases in Bitcoin where minimal encodings are not used, so this could break things. We should have read_varint_minimal and read_varint_non_minimal instead.

Why? We don't currently have it.. Core always performs minimality check as well. Any read_varint should be created with an equivalent emit_varint that will always be minimal.

@stevenroose stevenroose force-pushed the encode-varint branch 2 times, most recently from f68ed57 to b6b65ef Compare October 22, 2023 18:48
@stevenroose
Copy link
Copy Markdown
Collaborator Author

stevenroose commented Oct 22, 2023

TIL that almost none of the emit_ functions are actually used. emit_bool is used once and emit_u8 is used once and all the others are never used.

EDIT: Ok this is not true, emit_u16, emit_u32 and emit_u64 are used for for varint impl. And the other ones are used in a macro for the implementation of Encodable and Decodable on the same type that is used in places.

It seems like we almost have a pattern to never use the WriteExt functions in non-encode.rs files, except for 3 occurrences, which we will now break with this commit. I do think it makes sense to leverage these extension trades more, though. If all the emits would return nb_bytes, I think we could actually get rid of Encodable implementations for these primitive types that might not even make too much sense in the first place. I'm quite indifferent when it comes to that though.

@junderw
Copy link
Copy Markdown
Contributor

junderw commented Oct 23, 2023

Why? We don't currently have it.. Core always performs minimality check as well. Any read_varint should be created with an equivalent emit_varint that will always be minimal.

My concern is "Will it fail to parse a transaction or block that is valid through consensus at any point in the Bitcoin blockchain's history?"

(Note: Nuance is important. I do not mean "Will it fail to parse existing confirmed txes and blocks?" but rather "Will it fail a tx or block that is or was valid consensus wise at any point?"... Bitcoin Core might rely on the fact that it only outputs minimal Varint and doesn't actually validate it. (I think there was some discussion around this area when discussing BIP62 for tx malleability way back a long time ago))

This includes data that was not generated by rust-bitcoin, which is why I think it deserves re-consideration.

But like I said, it's just a general musing and since this PR is just converting existing code which had the same error check, it's not a blocker IMO.

Copy link
Copy Markdown
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK for this change looks really better and more natural.

Copy link
Copy Markdown
Member

@clarkmoody clarkmoody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link
Copy Markdown
Contributor

@junderw junderw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b6b65ef560a80ce048c6f13198f1a25132bcea1e

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:

These can use .into_u64. (Similar below and in several other places.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:

If we are going to make this pub it should probably live in the crate root (or maybe even in bitcoin-internals).

As @Kixunil suggests we should tailor our compile-gate by e.g. gating the usize impl specifically on usize fitting into u64. Then only code that tries to use this trait on usize will fail to compile on 128-bit arches.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In e9123d1d25bce9be4d1702ae9da6b99bd2e9674e:

It doesn't really matter but I kinda prefer using .into rather than casting for these cases, since it's possible.

@apoelstra
Copy link
Copy Markdown
Member

Done reviewing b6b65ef560a80ce048c6f13198f1a25132bcea1e. everything looks good except some comments about u64 conversions. FYI since #2135 you no longer need to format your own PRs.

@tcharding
Copy link
Copy Markdown
Member

Needs rebase man.

@github-actions github-actions bot added the C-bitcoin PRs modifying the bitcoin crate label Feb 15, 2024
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 7919318263

Details

  • -7 of 167 (95.81%) changed or added relevant lines in 12 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.02%) to 83.975%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/consensus/encode.rs 96 98 97.96%
bitcoin/src/blockdata/transaction.rs 12 17 70.59%
Totals Coverage Status
Change from base Build 7917967798: -0.02%
Covered Lines: 19415
Relevant Lines: 23120

💛 - Coveralls

@stevenroose
Copy link
Copy Markdown
Collaborator Author

Rebased.

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8845cf8

@tcharding
Copy link
Copy Markdown
Member

@apoelstra if this gets another ack can we put it in before any other conflicting PRs.

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8845cf8 though I wish there were some way we could do a deprecation cycle here

fn emit_slice(&mut self, v: &[u8]) -> Result<(), io::Error> { self.write_all(v) }
fn emit_slice(&mut self, v: &[u8]) -> Result<usize, io::Error> {
self.write_all(v)?;
Ok(v.len())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This looks completely redundant since the caller can just call .len() on the argument. You've complained about breaking changes and this is a breaking change...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This looks completely redundant since the caller can just call .len() on the argument.

I think it's worthwhile to have a uniform API. You can see at the call sites that it makes the code easier to read.

You've complained about breaking changes and this is a breaking change...

I'd prefer we take the opposite tack and say "we let you make a breaking change so please don't complain about our breaking changes" ;).

More seriously, while this is a breaking change, it's in an obscure method of a trait that basically nobody ever uses directly. So I think the breakage is worth the API cleanup.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we let you make a breaking change so please don't complain about our breaking changes"

😆 OK.

}
/// Returns 1 for 0..=0xFC, 3 for 0xFD..=(2^16-1), 5 for 0x10000..=(2^32-1), and 9 otherwise.
#[inline]
pub const fn varint_size_const(v: u64) -> usize {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to name this varint_size_from_u64 and not mention const. (I know I've made the same mistake in hashes in the past but I learned since then.) Also maybe just not make it public for now.

@tcharding
Copy link
Copy Markdown
Member

I picked this up and turned it into #2929, #2930, and #2931

apoelstra added a commit that referenced this pull request Jun 30, 2024
c717f7f Improve docs on private Witness fields (Tobin C. Harding)

Pull request description:

  The `Witness` type is a reasonable complex data structure, make an effort to clarify its structure in the docs on the private fields.

  Private docs only.

  (Original idea pulled out of #2133.)

ACKs for top commit:
  Kixunil:
    ACK c717f7f
  apoelstra:
    ACK c717f7f much clearer, thanks!

Tree-SHA512: 9d54b7eeefec97e584fb5f275049dbac0473c949fae8ab05c6961d6fc424c17a058af7037c2220ef1446af294d78c68bfee741cfeca1b18ecc402935d8069dab
apoelstra added a commit that referenced this pull request Aug 8, 2024
579b76b Introduce ToU64 conversion trait (Tobin C. Harding)

Pull request description:

  The idea for this was pulled out of Steven's work in #2133

  We already explicitly do not support 16 bit machines.

  Also, because Rust supports `u182`s one cannot infallibly convert from a `usize` to a `u64`. This is unergonomic and results in a ton of casts.

  We can instead limit our code to running only on machines where `usize` is less that or equal to 64 bits then the infallible conversion is possible.

  Since 128 bit machines are not a thing yet this does not in reality introduce any limitations on the library.

  Add a "private" trait to the `internals` crate to do infallible conversion to a `u64` from `usize`.

  Implement it for all unsigned integers smaller than `u64` as well so we have the option to use the trait instead of `u32::from(foo)`.

ACKs for top commit:
  Kixunil:
    ACK 579b76b
  apoelstra:
    ACK 579b76b successfully ran local tests

Tree-SHA512: 2eaddfff995987a346e052386c6dfef3510e4732e674e3a2cfab60ee391b4cce1bf7ba4fb2dfd4926f8203d7251eea2198ccb61f0b40332e624c88fda4fa7f48
apoelstra added a commit that referenced this pull request Sep 30, 2024
…ethods instead

18d8b0e Replace VarInt type with ReadExt and WriteExt functions (Steven Roose)
003db02 Return encoded length from WriteExt::emit_slice (Steven Roose)

Pull request description:

  This the meat and potatoes out of Steven's work in #2133 and also closes #1016

ACKs for top commit:
  apoelstra:
    ACK 18d8b0e successfully ran local tests

Tree-SHA512: 2df96c91e0fbfdc87158bde9bbdd9565f67e3f66601697d0e22341416c0cd45dd69d09637393993f350354a44031bead99fd0d2f006b4fc6e7613aedc4b0a832
@stevenroose
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #2931.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API break This PR requires a version bump for the next release C-bitcoin PRs modifying the bitcoin crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants