consensus_encoding: implement batched allocation for WitnessDecoder#5298
Conversation
|
cc #5177 |
|
In ea980a5: Should be a separate PR but at some point we should also stop 0-allocating here and use |
primitives/src/witness.rs
Outdated
| // Allocate space for the index and buffer. For DoS prevention, | ||
| // we allocate index space incrementally (not all upfront). |
There was a problem hiding this comment.
I think I might be missin something, the docs here mention that the index space is allocated incrementally, but where does that happen?
EDIT: But I guess it doesn't matter since the cursor is initialized to the "end" of the index space.
There was a problem hiding this comment.
we are actually allocating incrementally the index space too
if we claim to have 16Mb index space, we initially allocate only 1MB.
As we try to write data at the cursor position (16MB), reserve_batch() grows the buffer toward the cursor in 1MB batches
| let initial_elements = witness_elements.min(max_initial_elements); | ||
| let initial_index_space = initial_elements * 4; | ||
|
|
||
| self.cursor = witness_index_space; |
There was a problem hiding this comment.
Why is the cursor, the write position, set to the witness_index_space? (Also this can be greater than content.len() which is semantically odd for a 'write position'.)
(I'm aware that its not part of this PR but it effects it because this PR uses self.cursor below.)
There was a problem hiding this comment.
Can we have this please
// Initially the index space is at the front of the buffer then we rotate left in `end`.
self.cursor = witness_index_space`
primitives/src/witness.rs
Outdated
|
|
||
| // Store the element position in the index. | ||
| let encoded_size = compact_size::encoded_size(element_length); | ||
| let required_len = self.cursor + encoded_size; |
There was a problem hiding this comment.
Here self.cursor could be outside of the content buffer, again even if the math works its odd and hard to reason about.
primitives/src/witness.rs
Outdated
| let actual_len = self.reserve_batch(required_len); | ||
|
|
||
| if actual_len < required_len { | ||
| return Ok(true); |
There was a problem hiding this comment.
If this returns I believe we have read the element size but not written it into content.
There was a problem hiding this comment.
I think you can ignore this ...
| return Ok(true); | ||
| } | ||
| let position_after_rotation = self.cursor - witness_index_space; | ||
| encode_cursor(&mut self.content, 0, self.element_idx, position_after_rotation); |
There was a problem hiding this comment.
Either I'm blind or this encoders the element_idx at the start of context not at the start of the index section.
There was a problem hiding this comment.
Oh shit, I only just worked out that we rotate left in end - face palm.
|
Phew, I thought I could not read code but turns out I can. I believe this is buggy, as this output shows. I added a bunch of println statements and commented out the [buggy] length check that was allowing the // Among other things if this returns we loose the data read so far.
// if actual_len < required_len {
// return Ok(true);
// }running 1 test
input bytes: [254, 0, 9, 61, 0, 254, 0, 9, 61, 0]
cursor: 16000000
content len: 1000128
before reserve content len: 1000128
after reserve content len: 2000128
actual len 2000128 required_len 16000005
decoded content [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
input bytes: []
cursor: 16000000
content len: 2000128
thread 'witness::test::test_dos_attack' (612268) panicked at primitives/src/witness.rs:495:29:
range start index 16000000 out of range for slice of length 2000128
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test witness::test::test_dos_attack ... FAILED
failures:
failures:
witness::test::test_dos_attack
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 113 filtered out; finished in 0.02s |
|
So I guess one of two things:
And yes, if git blaming you'll see that I introduced this code originally into primitives. When I copied moved the original code I did not grok it I just copied it as it was - 100% my bad. In my defence the original code is only 60 lines, it has grown considerably in complexity since we added the new encoding traits. EDIT: I went away and came back fresh. Looked at the code on |
|
Can you write a unit test that triggers this? |
|
I knew someone was going to say that ... I'll hack it up. |
|
I agree the DoS protection we added for index space allocation makes the code a bit confusing to reason about, but I don't think the code is buggy. The confusing part is that we don’t consume/write anything until the buffer has grown to reach the cursor position. if actual_len < required_len {
return Ok(true);
}is importatnt! it triggers the incremental allocation for index space and prevents us from writing beyond the buffer For example in the test case: without the check, we'd try to write at position cursor (16MB), but content.len() is only 2MB (which triggers the issue you mentioned before) |
|
If this is too confusing, I think we can remove the batched allocation for the index space and only keep it for the data area. An attacker can still force around 20 MB of free allocation tho, but maybe maybe that’s fine !! |
|
Yeah, 20MB is a reduction from the current "attack", which is already below the threshold that I'd consider to be a security concern. We can always try again to improve it later. |
|
Just tossing in my 2 cents, I'd prefer the simpler approach for now cause I too got tripped on the cursor-position-index-allocation interaction. |
|
Makes sense, I’ll remove the index space allocation once I’m back on my laptop .. |
b265867 to
c15d5ae
Compare
done |
| let encoded_size = compact_size::encoded_size(element_length); | ||
| let required_len = self.cursor + encoded_size + element_length; | ||
| self.resize_if_needed(required_len); | ||
| self.reserve_batch(required_len); |
There was a problem hiding this comment.
I found it a little confusing how the return value is ignored here, while checked in the other branch. Makes sense that 9 bytes will always be available at this point. But wondering if reserve_batch needs to return a value if it isn't going to be consistently used cause made me feel like I was missing something.
There was a problem hiding this comment.
we need the return value in the first branch because reserve_batch may allocate less than what we request.
For the second branch, it’s guaranteed we already have enough capacity for encoded_size (<= 9 bytes)
There was a problem hiding this comment.
True, but I think I'd prefer reserve_batch to return nothing and for the first branch to just check content length directly. But don't feel too strongly about it.
primitives/src/witness.rs
Outdated
| // Ensure we have enough space. | ||
| let required_len = self.cursor + copy_len; | ||
| self.resize_if_needed(required_len); | ||
| if can_copy == 0 { |
There was a problem hiding this comment.
Is this short-circuit necessary? Seems like it could be removed and just flow through. Also having a tough time thinkin up the scenario where it is triggered.
c15d5ae to
b9f9098
Compare
|
@jrakibi I appreciate your effort but I'm hesitant on this PR. To see if I can be convinced, can we have the description improved to justify this please. Something like "currently an attacker can cause memory to be allocated like ... with this applied the attack surface is reduced to ...". To justify my hesitation: I already spent more time than I want to trying to understand the additional complexity. I don't like the idea of adding complexity that we have to carry for ever, along with the risk of any bugs, unless there is justification. And also I don't feel like putting in the time/effort to review deeply again without the justification. Thanks mate. FTR my position can be veto'd by @apoelstra |
|
Thanks, my fault, I should have added a detailed description to explain this better.
I’ve updated the description now.
If this still introduces too much complexity, I’m happy to keep the 32 MB free allocation. I think @apoelstra is also okay with that from what I understand here: #5239 (comment) and #5298 (comment) (maybe correct me if I’m wrong) |
|
I'd prefer to reduce it and I don't mind the complexity (which is not API-visible so we don't have to "carry it forever"). But this might take a while to review, and I do want two ACKs on it before merging. |
Maybe I was being hypbolic, I meant readers of the code have to read it for ever (while it exists). But fair point about being an internal complexity only. |
Sweet, if there is no rush I can definitely review when I have the bandwidth available. |
|
Yeah, I'd say no rush. @jrakibi's summary of my view -- that the 32MB allocation is small enough that it shouldn't count as an attack that we urgently need to deal with. In the "10 bytes to get 32 MB" example the allocation is dropped because parsing fails, so it is very hard to amplify this. |
|
Needs rebase please |
b9f9098 to
bd68f30
Compare
|
rebased |
|
Cargo mutants check is failing, will take a look at this later today |
Add batched allocation for witness decoder (data area). we allocate incrementally in ~1 MB batches The current approach automatically addresses the concerns raised in rust-bitcoin#5258 and rust-bitcoin#5239 (comment) we aslo add `reserve_batch` to Mutants exclusion list Mutation testing in WitnessDecoder::reserve_batch changes the buffer allocation logic and they cause infinite loop during decoding
6bd8885 to
9bf1522
Compare
|
Fixed, 9bf1522 adds |
Currently an attacker can force a 32 MB memory allocation by only providing 10 bytes as inputs (see test case that proves that in #5239 (comment))
With the current approach we reduce this attack surface to ~17 MB.
Attack vector (test case inputs):
[0xFE, 0x00, 0x09, 0x3D, 0x00]= 4_000_000 (witness count)[0xFE, 0x00, 0x09, 0x3D, 0x00]= 4_000_000 (1st element length)10 bytes of input will trigger the allocation below
Old approach with
resize_if_needed(32 MB allocation):resize_if_neededdoubles the bufferrequired_len = 16 MB + 128 + 4 MB = 20 MB + 128 bytes, it doubles from 16 MB to 32 MBNew approach with
reserve_batch(~17 MB allocation):reserve_batchallocates in 1 MB batchesThe current approach automatically addresses the concerns raised in #5258 and #5239 (comment)