consensus_encoding: implement batched allocation for vector decoders#5177
consensus_encoding: implement batched allocation for vector decoders#5177apoelstra merged 1 commit intorust-bitcoin:masterfrom
Conversation
|
Keep in mind that this only checks the minimum possible size ( I think fixing this would require tracking decoded bytes during the decoding process |
315d324 to
0d835b3
Compare
|
I'm hot and cold on this. The DoS protection seems valid but the solution adds to the public API in a non-trivial way. I'm not sure its worth it? The other DoS protection was a no-brainer because it was all internal and easy to implement. |
|
Have similar feelings as Tobin. I think the new method would be worth it if it was the single plane that protected against large allocations. But even with this new requirement, we still have the witness decoder ones as well, and even with both of those it is possible to have some massive allocations. |
|
I see four scenarios we can go with:
|
|
I think "allocate memory in batches" is probably the best solution. (Actually, I like this solution, but it's getting a lot of pushback and it affects the API so I'll concede). |
|
I"m down with 'allocate in batches' thing since I expect it to be an internal change. Going forwards, perhaps we should have an issue investigating and/or deciding on how much DoS protection we aim as a project to guarantee (or guard against). FTR I have not personally had this in mind over the last few years. We probably need a full audit of all malloc call sites. Or is this going to far? Might be a good use of the new ADR idea or a doc in |
Yes, we should. It's not too hard. There are very few. Basically only the deserialization code (which I have always read with DoS vectors in mind).
Yeah, not a bad idea. |
213be36 to
3b372b3
Compare
3b372b3 to
eb889ba
Compare
|
Updated the PR title/description to reflect the new changes. The batched allocation is applied to |
|
yeah, let’s get Nyonson’s PR merged first |
|
#5224 is now closed. I think nyonson is planning to do a replacement one. |
eb889ba to
d86e918
Compare
|
Sorry for the delay on this. Added batched allocation for The current approach automatically addresses the concerns raised in #5258 and #5239 (comment) |
|
the maximum free DoS allocation an attacker can force with the current approach is around ~2 MB (from the cases I can think of and as shown in the unit test) with the old doubling strategy, it's ~32 MB |
| buffer.reserve_exact(elements_to_reserve); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Perhaps this would be better as a method on the VecDecoder? Also could be private I believe.
/// Reserves capacity for typed vectors in batches.
///
/// Calculates how many elements of type `T` fit within `MAX_VECTOR_ALLOCATE` bytes and reserves
/// up to that amount when the buffer reaches capacity.
///
/// Documentation adapted from Bitcoin Core:
///
/// > For `DoS` prevention, do not blindly allocate as much as the stream claims to contain.
/// > Instead, allocate in ~1 MB batches, so that an attacker actually needs to provide X MB of
/// > data to make us allocate X+1 MB of memory.
///
/// ref: <https://github.com/bitcoin/bitcoin/blob/72511fd02e72b74be11273e97bd7911786a82e54/src/serialize.h#L669C2-L672C1>
fn reserve(&mut self) {
if self.buffer.len() == self.buffer.capacity() {
let elements_remaining = self.length - self.buffer.len();
let element_size = mem::size_of::<T>().max(1);
let batch_elements = MAX_VECTOR_ALLOCATE / element_size;
let elements_to_reserve = elements_remaining.min(batch_elements);
self.buffer.reserve_exact(elements_to_reserve);
}
}(Tested locally.)
There was a problem hiding this comment.
Yeep, that makes sense
| let available_capacity = buffer.capacity() - buffer.len(); | ||
| if available_capacity == 0 { |
There was a problem hiding this comment.
Perhaps use the same style in both functions (either with local var available_capacity or without).
primitives/src/witness.rs
Outdated
| /// Allocates buffer space in ~1MB batches | ||
| /// Returns buffer length (may be less than `required_len` !!) | ||
| fn reserve_batch(&mut self, required_len: usize, index_position: usize) -> usize { | ||
| let max_required = required_len.max(index_position); |
There was a problem hiding this comment.
I can't work out why we are comparing the required_len (space between read position (cursor) and index area) with the index_position (an index into the index area)?
I couldn't come up with a test that fails.
There was a problem hiding this comment.
Good catch, this was left over from an earlier iteration. required_len is always > index_position
I’ve removed it in the Witness PR: #5298
primitives/src/witness.rs
Outdated
| fn reserve_batch(&mut self, required_len: usize, index_position: usize) -> usize { | ||
| let max_required = required_len.max(index_position); | ||
|
|
||
| if max_required <= self.content.len() { |
There was a problem hiding this comment.
Why are we comparing max_required (amount of space required to read into) with total length including the index area?
There was a problem hiding this comment.
the index area and the element bytes live in the same buffer, max_required represents the total space we need in that single Vec. So we compare max_required to content.len() to check if that whole buffer is large enough
| self.cursor = witness_index_space; | ||
| self.content = alloc::vec![0u8; self.cursor + 128]; | ||
| self.content.resize(initial_index_space + 128, 0); | ||
| } |
There was a problem hiding this comment.
I think the 128 in the docs is meaningful if we are keeping the 128 in code.
tcharding
left a comment
There was a problem hiding this comment.
I'm a bit confused about all this. Totally willing to accept that the fault is my own.
|
Since the witness struct is so much more complicated (because of the index area) perhaps it would be better to do split this PR into two? |
…ByteVecDecoder` `VecDecoder` and `ByteVecDecoder` only checks that the element count/byte < 4,000,000. This still allows an attacker to claims a large size without providing the data. In this patch we allocates in 1 MB batches so an attacker now needs to provide X MB of data to make us allocate X+1 MB of memory
d86e918 to
0452a93
Compare
Agree |
|
Nice and clean, thanks for the continued effort. |
|
CI fails are unrelated. |
|
Obligatory reminder that "size of the Rust type" has very little correlation with "size of an encoded object". But in this case we do actually want the size of the Rust type. |
EDITED:
Right now vector decoders (
VecDecoderandByteVecDecoder) only checks that the element count/byte < 4,000,000. This still allows an attacker to claims a large size without providing the corresponding dataIn this patch we allocates in 1 MB batches so an attacker now needs to provide X MB of data to make us allocate X+1 MB of memory
The allocation is applied for
VecDecoderandByteVecDecoder.Fixes #5157