Re-write the weight/size API#2076
Conversation
|
FTR I feel I may have been responsible for some of the confusion introduced recently because of my review suggestions being either incorrect or misleading. |
4cf8b56 to
3d69a9b
Compare
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
The comment would be simpler as size is equivalent to virtual size since all bytes of a TxOut are non-witness bytes.
As for the claims of overflow, strictly speaking no. When convirting from vsize to weight there is a *4 multiplier, which could cause an overflow even if size does not overflow.
IMO we shoud panic here and document that if the script size exceeds 2^62 then it'll panic.
There was a problem hiding this comment.
Cool thanks, will work in the suggestions.
There was a problem hiding this comment.
Would it not be better to have all the weight functions return Option<Weight> and add weight_unchecked functions?
There was a problem hiding this comment.
I dunno, this feels like a crappy API given that under normal usage weights will never come within a factor of a million of overflowing.
We could have weight functions that panic or truncate and weight_checked ones that don't.
There was a problem hiding this comment.
I solved this by just using saturating add/mull and commenting that its for defensive reasons.
There was a problem hiding this comment.
I think panicking would be better because if someone hits it it means something is very wrong somewhere else. Finding why the value is u64::MAX is much harder.
There was a problem hiding this comment.
Agreed, after discussion I have also come around to "better to panic than to saturate" viewpoint. In theory it's a DoS vector. In practice if a user manages to hit this panic, they'll also hit panics using + on normal int types, and there's nothing we can do to prevent that.
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
In 3d69a9bf832663bf50dc25a748bed9407a774ad3:
Same story: this will panic if the script size is within a few bytes of 2^64. We should just document it and move on.
There was a problem hiding this comment.
Is it worth introducing an invariant on the ScriptBuf and Script types to limit the size?
A quick websearch brought up this https://bitcoin.stackexchange.com/questions/117594/what-are-bitcoins-transaction-and-script-limits#117595
In case anyone is not using it this search site is awesome: https://bitcoinsearch.xyz/
There was a problem hiding this comment.
Yeah, I think we should restrict Script to have a maximum 2^31 size.
There was a problem hiding this comment.
Leaving this for another day because I don't want to hackishly add an invariant and I don't feel like thinking hard about it right now. I used saturating add/mul to resolve this issue.
There was a problem hiding this comment.
Same comment as I made elsewhere, but just saturating can be bad IMO since it leads to a silent failure. I agree with @apoelstra here to just panic and mark as panic and move on if wrapping in an error or none is too heavy to do now.
There was a problem hiding this comment.
@apoelstra why limit the script? I don't remember this limit in consensus rules. Also I suspect it could make using scripts much more annoying.
There was a problem hiding this comment.
@Kixunil in practice, scripts have to fit into blocks, which are limited to 4M.
Currently we have limited script to "whatever the maximum number of elements a Vec can hold" which is at most 2^32 on 32-bit systems, but which is likely to be 2^31 in practice because 32-bit allocators want to be able to hold allocation-sized signed diffs. These system-specific limits require us to have overflow checks all over the place when manipulating scripts, such as in this case where we are trying to add 12 to a script size and are worried that we'll overflow usize.
If we had a 2^31 limit this would:
- Never be encountered in practice.
- Not affect the API for any of the "template constructors" like p2wsh, or conversions from addresses, or parsing from transactions (which I believe already have a "maximum vector size" check which has a much lower limit).
- In the case of
script::Builderwe could defer the API impact tointo_scriptso that users could continue to usepush_*whatever without needing to check errors constantly, and only check for overflow in the end. (And we could provide a panicking variant for users who were sure they weren't going to overflow.)
There was a problem hiding this comment.
We would also have to limit input and output counts and transaction counts, since you could technically have so many of them it overflows. I'm not convinced putting all these checks everywhere is better than overflow checks everywhere. Overflow checks are at least pretty easy to understand and get right.
There was a problem hiding this comment.
This is true -- trying to bound all types to stay with in a non-overflowing window would be really hard (and probably impossible in some cases; e.g. you can have a taptree with 2^128 leaves in theory which would still be legal to use on-chain).
apoelstra
left a comment
There was a problem hiding this comment.
ACK 3d69a9bf832663bf50dc25a748bed9407a774ad3
3d69a9b to
07ad0bc
Compare
apoelstra
left a comment
There was a problem hiding this comment.
ACK 07ad0bcd6f9350c55b5edd4329fa9c05c56261ec
07ad0bc to
14e796e
Compare
|
Used saturating add/mul. |
|
So my effort yesterday did not adhere to the epic list of PR requirements #843, I'm not sure how we are supposed to enforce that other than everyone internalising it. |
|
Works for me. I wonder if we should change the |
apoelstra
left a comment
There was a problem hiding this comment.
ACK 14e796e4f2b0ef6b6d288695e88ec4c500673ee6
I had a play with it but it requires a bit of thinking and feels like there are more important things to do right now, I added #2086 to flag it. |
bitcoin/src/blockdata/block.rs
Outdated
There was a problem hiding this comment.
I'm still not convinced that using saturated_mul is better than checked_mul. As a downstream consumer, I'd prefer to know when the bounds are exceeded and get an error or panic instead of silently succeeding at the saturation point. There are a number of checked helper methods in weighthttps://github.com/rust-bitcoin/rust-bitcoin/blob/master/bitcoin/src/blockdata/weight.rs#L97).
There was a problem hiding this comment.
No sane use case should ever hit this limit, if a user really wants to know they can use base_size() and total_size() themselves and use the checked versions.
There was a problem hiding this comment.
As a downstream consumer, I'd prefer to know when the bounds are exceeded
I can tell you right now, without my reading or your writing the code, that the bounds are not exceeded. Would you really like to type a ton of untested panicking extra code every time you use the API, as a reminder?
There was a problem hiding this comment.
Don't have any strong opinions on usage here, but I don't like this method returning a result/option. Any other way (including how it is done) of doing this is fine is me.
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
If this is the serialized length of a u32, then why not just make this a u32 instead of a usize?
There was a problem hiding this comment.
Lengths are type usize in Rust by convention.
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
It doesn't look like this ever Panics now.
There was a problem hiding this comment.
Ah my bad, thanks. Will fix.
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
It doesn't seem any better to return MAX silently instead of overflow silently. Could None be returned instead of Max if the bounds are exceeded so that the consumer knows.
There was a problem hiding this comment.
In my opinion this code shows that the Weight::from_vb API is wrong and that it should do saturating multiplication, please see #2086.
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
Is it possible that this could overflow?
There was a problem hiding this comment.
Wow sloppy work by me when I re-did this version of the PR, all math operations in the PR should be handled the same way. Will fix, thanks.
|
This seems to conflict with a earlier PR #2069. I'll close 2069 when/if this one is merged. |
One of our stated aims is to make it possible to learn bitcoin by using our library. To help with this aim add to private consts for the segwit transaction marker and flag serialization fields.
In an attempt to help super new devs add code comments about transaction serialization formats pre and post segwit.
14e796e to
7d1f04e
Compare
|
I've documented the saturating add/mull on all the size functions. I have not documented the saturating behaviour on Also fixed my mistakes highlighted in review, sorry for the Friday afternoon sloppy AF push yesterday. |
|
To be explicit I have gone ahead with the saturating add/mull here, can we leave it like that and argue the pros/cons of that over in #2086, the point of this PR is just to clear up the size/weight stuff before the imminent release. The saturating behaviour can and should be argued through more thoroughly before v1.0.0 |
bitcoin/src/blockdata/transaction.rs
Outdated
There was a problem hiding this comment.
I was using this. Can you leave this in? There's a comment by @stevenroose which is correct in saying this should be multiplied by 4.
There was a problem hiding this comment.
No, "base [size]" is well defined term in the bip and this is not what it means. I should never have acked this const, its badly named, badly documented, and wrong. If you want this you can use Weight::from_vb(Sequence::SIZE + OutPoint::SIZE) (or from_vb_uncheked if you want const).
There was a problem hiding this comment.
I should never have acked this const
You do a lot of code review which is a thankless job most of the time and helps the project a lot. And when one of the many you reviewed has a mistake then people notice your review. There are a lot of maintainers on this project that never bother to code review as far as I can tell. Really I'm not sure what the point is of having lots of maintainers that never code review.
IMO it would be better to just fix the semantics of the API and leave all of the saturation changes for a different PR. Although I can understand wanting to make sure there is no hidden overflows. |
There was a problem hiding this comment.
I have a small preference for serialized_size instead of size in the API names. Overall, I am okay with any decision in the PR as long as:
- We don't silently overflow.
- We don't have option/result types.
Any implementation using panics/saturation is good with me.
bitcoin/src/blockdata/block.rs
Outdated
There was a problem hiding this comment.
Don't have any strong opinions on usage here, but I don't like this method returning a result/option. Any other way (including how it is done) of doing this is fine is me.
bitcoin/src/blockdata/block.rs
Outdated
There was a problem hiding this comment.
nit: slightly prefer more rust idiomatic
self.txdata.iter().map(|tx| tx.base_size())
.fold(size, |acc, elem| acc.saturating_add(elem))There was a problem hiding this comment.
Some part of me does not like the inefficiency of having bounds check everytime in the loop when we are sure that we are not exceeding it. But I can live with it :)
There was a problem hiding this comment.
Same comment suggesting to convert loop to rust iterators where possible. More reading if you are interested: https://ipthomas.com/blog/2023/07/n-times-faster-than-c-where-n-128/
There was a problem hiding this comment.
Nice article. Besides a performance increase, using bitwise operations also helps prevent against timing attacks by avoiding branching. It would be possible to write a C version using bitwise operations as well I think.
|
Ok, this whole saturating thing is starting to piss me off. I'm going to change this whole PR back to use normal operators |
In (2) did you mean to write "We don't return option/result types"? |
Recently we introduced a bug in the weight/size code, while investigating I found that our `Transaction`/`Block` weight/size APIs were in a total mess because: - The docs were stale - The concept of weight (weight units) and size (bytes) were mixed up I audited all the API functions, read some bips (141, 144) and re-wrote the API with the following goals: - Use terminology from the bips - Use abstractions that mirror the bips where possible
7d1f04e to
c34e3cc
Compare
|
This now introduces silent overflows, the whole saturating/panic thing turned out to be an epic bikeshedding event and totally out of scope for this PR which is explicitly "fix the weight/size mess we recently introduced" - leaving all arithmetic stuff for #2086 @sanket1729 this is against your review, requires your concept ack/nack to proceed please. |
There was a problem hiding this comment.
ACK c34e3cc.
Sorry @tcharding, I forgot my "don't" in the comment. I meant we don't have Option/Result types.
I have this really bad habit of sometimes omitting negation while writing. I might need some professional diagnosis :P
|
LOLZ |
|
Thanks for fixing this @tcharding. I'd have preferred to fix my own f* ups but I understand with imminent 1.0 release it's important to fix this quickly. |
LOL, not really that imminent. |
|
I think OP meant v0.31.0 being imminent. |
| .input | ||
| .iter() | ||
| .map(|input| { | ||
| if self.use_segwit_serialization() { |
There was a problem hiding this comment.
Nice quadratic complexity we made here. How TF I didn't see this before?!
There was a problem hiding this comment.
How TF I didn't see this before?!
The name for this function is bad. If the core developers make mistakes like that, they will be happening downstream a lot, and there will be no one to notice them for years.
There was a problem hiding this comment.
Could be it, I'm not sure. I vaguely remember looking at it during my review but seeing that I didn't post a proper review I think I didn't actually do a proper review. Looking at my calendar there was a conference around that time so clearly I was preparing for that. :D
There was a problem hiding this comment.
I agree that the name is bad and even after seeing this code it does not look quadratic.

Audit and re-write the weight/size API for
BlockandTransaction. First two patches are trivial, patch 3 contains justification and explanation for this work, copied here:Please note, this PR introduces panics if a sciptPubkey overflows the calculation
weight = spk.size() * 4.Fix #2049