Add consensus_decode_from_finite_reader optimization#1023
Add consensus_decode_from_finite_reader optimization#1023apoelstra merged 1 commit intorust-bitcoin:masterfrom dpc:issue-997-2
consensus_decode_from_finite_reader optimization#1023Conversation
src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
Not true, because network message is using it to decode payload.
There was a problem hiding this comment.
MAX_VEC_SIZE it is then.
|
OK. I have to benchmark this on a laptop, with a lot of noise, so I've ran the bench multiple times and picked some best results for both |
src/blockdata/script.rs
Outdated
There was a problem hiding this comment.
In ae28a501e0e41a78afb6b01dcf1fba8a689262dd:
Curious why this isn't just a default impl on the trait? This code looks like it's repeated for every individual type.
There was a problem hiding this comment.
Ah, you already have a default impl going the other way.
FWIW I think it would be ok to have both methods default-impl'd to call each other, with doccomments instructing trait implementors to implement one. I believe that std does this somewhere, though I don't remember where..
There was a problem hiding this comment.
I think I like it. I'll give it a try.
src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
In 3a87775a22f2957f8b71ca8a68dfaadd5615a3f0:
I'm a liiitle tempted to change the above line to Cursor::new(&data[..MAX_VEC_SIZE]) but maybe that's a little too weird/surprising.
There was a problem hiding this comment.
I was thinking about it. It would give a little bit of protection if someone somehow passed a huge vector in here. But then - if the caller allows malicious input to create huge vectors, they kind of already lost anyway. And the ability of decoding/encoding huge data might be useful for someone that can trust it. Eg. Vec<Block> stored in the file system or something. And arbitrary limits might get in the way.
apoelstra
left a comment
There was a problem hiding this comment.
ACK 888a1ec5c4e830f905288b6a824cb23fc1375989
|
I did a bunch of docs fixes (mainly grammar), if you'd like to use them I pushed a patch to branch I had one question, just out of interest, is there a reason you use such short line length in the comments? While cleaning up I've been favouring line length 100 for comments (I try to not touch them if they are at least 80 though if changing them does not make them noticeably nicer.) |
I typically let
Notes. Will do. Thanks a lot! |
ah, no sweat. We should have that fixed up soon hopefully :) |
tcharding
left a comment
There was a problem hiding this comment.
All my comments are nits only.
ACK 888a1ec5c4e830f905288b6a824cb23fc1375989
src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
This comment is stale, there is no len argument now. This PR was a little confusing to review because the last patch modifies code in the first patch.
src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
A question please since we have been discussing debug asserts a bit but I'm still not totally understanding when to use what. This debug statement is defensive programming against insane input with chunck size 0 causing an infinite loop, right? What made you choose to use debug assert instead of either
A. a regular asset
B. returning an error
There was a problem hiding this comment.
I wasn't really sure myself what to do. chunk_size should always be a constant, and this whole function is kind of internal, so I thought checking in debug_ is good enough to spot some silly mistake. I'm still not sure about it. I wish I could make chunk_size a const of some kind.
There was a problem hiding this comment.
I just turned it into assert_ne. Compiler can probably optimize it anyway.
src/consensus/encode.rs
Outdated
There was a problem hiding this comment.
Why references here? This works too
assert_eq!(
read_bytes_from_finite_reader(
io::Cursor::new(&data),
ReadBytesFromFiniteReaderOpts { len: data.len(), chunk_size }
).unwrap(),
data
);
There was a problem hiding this comment.
Natural possessiveness, I guess. 🤷
|
LGTM, can you squash this all down into a single commit please? |
As things are right now, memory exhaustion protection in `Decodable` is based on checking input-decoded lengths against arbitrary limits, and ad-hoc wrapping collection deserialization in `Take`. The problem with that are two-fold: * Potential consensus bugs due to incorrect limits. * Performance degradation when decoding nested structured, due to recursive `Take<Take<..>>` readers. This change introduces a systematic approach to the problem. A concept of a "size-limited-reader" is introduced to rely on the input data to finish at enforced limit and fail deserialization. Memory exhaustion protection is now achived by capping allocations to reasonable values, yet allowing the underlying collections to grow to accomodate rare yet legitmately oversized data (with tiny performance cost), and reliance on input data size limit. A set of simple rules allow avoiding recursive `Take` wrappers. Fix #997
|
No comment on choice of |
sanket1729
left a comment
There was a problem hiding this comment.
post merge ACK. Thanks for doing this cleanly
|
Bad news. Somehow, somewhere I've lost e6a3ec41bb9b3f072453665338059879d982eada . No idea when and why. Oh, it doesn't even show up. I'll introduce it in another PR. You might want to take a second look if I didn't mess up something else, just in case. I'm really sorry - no idea RN how it happened. *Edit: I push it again, should show up as 80d8cc2f6bc4e4ccbdc57ec5fff9d283c1e5e263 . I intended for it to be in there, would make this PR even smaller. |
|
@RCasatta since I know you're using these traits in |
…te_reader` optimization 082e185 Add `consensus_decode_from_finite_reader` optimization (Dawid Ciężarkiewicz) Pull request description: As things are right now, memory exhaustion protection in `Decodable` is based on checking input-decoded lengths against arbitrary limits, and ad-hoc wrapping collection deserialization in `Take`. The problem with that are two-fold: * Potential consensus bugs due to incorrect limits. * Performance degradation when decoding nested structured, due to recursive `Take<Take<..>>` readers. This change introduces a systematic approach to the problem. A concept of a "size-limited-reader" is introduced to rely on the input data to finish at enforced limit and fail deserialization. Memory exhaustion protection is now achived by capping allocations to reasonable values, yet allowing the underlying collections to grow to accomodate rare yet legitmately oversized data (with tiny performance cost), and reliance on input data size limit. A set of simple rules allow avoiding recursive `Take` wrappers. Fix #997 ACKs for top commit: apoelstra: ACK 082e185 tcharding: ACK 082e185 Tree-SHA512: fa04b62a4799c9a11c5f85ec78a18fa9c2cd4819c83a0d6148fbb203c6fa15c2689cb0847e612b35b8c285a756d81690b31a9bede4486b845f0c16b9fcc6d097
a0489d4 fuzz: use travis-fuzz.sh in CI (Andrew Poelstra) 4c6f9b3 fuzz: remove mysteriously-not-necessary quotes from gh action script (Andrew Poelstra) 7baa21c fuzz: disable features in honggfuzz (Andrew Poelstra) e003e57 Add `consensus_decode_from_finite_reader` optimization (Dawid Ciężarkiewicz) Pull request description: Backport for #1023. Required for #997. Addresses issues like #1359 ACKs for top commit: tcharding: ACK a0489d4 TheBlueMatt: ACK a0489d4. Tree-SHA512: 9145d9666e35ae77598aaecf89222c7234637b57ded39b69fbb93ee9ce01c6d7c938b36a2d86319ba84155f2424e524386593d6c0d7af449be6bd118f729fd64
As things are right now, memory exhaustion protection in
Decodableis based on checking input-decoded lengths against arbitrary limits,
and ad-hoc wrapping collection deserialization in
Take.The problem with that are two-fold:
due to recursive
Take<Take<..>>readers.This change introduces a systematic approach to the problem.
A concept of a "size-limited-reader" is introduced to rely on
the input data to finish at enforced limit and fail deserialization.
Memory exhaustion protection is now achived by capping allocations
to reasonable values, yet allowing the underlying collections
to grow to accomodate rare yet legitmately oversized data (with tiny
performance cost), and reliance on input data size limit.
A set of simple rules allow avoiding recursive
Takewrappers.Fix #997