consensus_encoding: implement batched allocation for vector decoders by jrakibi · Pull Request #5177 · rust-bitcoin/rust-bitcoin

jrakibi · 2025-10-21T15:20:45Z

EDITED:

Right now vector decoders (VecDecoder and ByteVecDecoder) only checks that the element count/byte < 4,000,000. This still allows an attacker to claims a large size without providing the corresponding data  

In this patch we allocates in 1 MB batches so an attacker now needs to provide X MB of data to make us allocate X+1 MB of memory

The allocation is applied for VecDecoder and ByteVecDecoder.

Fixes #5157

jrakibi · 2025-10-21T15:30:52Z

Keep in mind that this only checks the minimum possible size (minimum_encoded_size() × length), not the actual size of elements.

I think fixing this would require tracking decoded bytes during the decoding process
not sure if that’s something we’re interested in doing ?

tcharding · 2025-10-23T18:15:41Z

I'm hot and cold on this. The DoS protection seems valid but the solution adds to the public API in a non-trivial way. I'm not sure its worth it? The other DoS protection was a no-brainer because it was all internal and easy to implement.

nyonson · 2025-10-23T20:27:36Z

Have similar feelings as Tobin. I think the new method would be worth it if it was the single plane that protected against large allocations. But even with this new requirement, we still have the witness decoder ones as well, and even with both of those it is possible to have some massive allocations.

jrakibi · 2025-10-25T09:38:50Z

I see four scenarios we can go with:

Implement what I already mentioned in a previous review, allocate memory in batches consensus_encoding: Implement additional decoders #5057 (comment)
Don’t allocate any memory in advance, and instead keep track of bytes consumed during decoding, once it exceeds 4 Mb, we return an error.
Accept the current solution even though it adds to the public API (and add the missing WitnessDecoder).
Just keep things as they are.

apoelstra · 2025-10-26T20:55:10Z

I think "allocate memory in batches" is probably the best solution. (Actually, I like this solution, but it's getting a lot of pushback and it affects the API so I'll concede).

tcharding · 2025-10-27T02:22:22Z

I"m down with 'allocate in batches' thing since I expect it to be an internal change.

Going forwards, perhaps we should have an issue investigating and/or deciding on how much DoS protection we aim as a project to guarantee (or guard against). FTR I have not personally had this in mind over the last few years. We probably need a full audit of all malloc call sites. Or is this going to far? Might be a good use of the new ADR idea or a doc in docs?

apoelstra · 2025-10-27T17:01:38Z

We probably need a full audit of all malloc call sites. Or is this going too far?

Yes, we should. It's not too hard. There are very few. Basically only the deserialization code (which I have always read with DoS vectors in mind).

Might be a good use of the new ADR idea or a doc in docs?

Yeah, not a bad idea.

jrakibi · 2025-10-30T11:26:19Z

Updated the PR title/description to reflect the new changes.

The batched allocation is applied to WitnessDecoder, VecDecoder, and ByteVecDecoder as well.
We now allocate in 1 MB batches, so an attacker needs to provide X Mb of data to make us allocate X+1 MB of memory.

tcharding · 2025-10-30T21:05:37Z

Do we want to merge #5224 and redo this one, I'll wait to hear before reviewing. Sorry for my part in making you do extra work @jrakibi

jrakibi · 2025-10-30T23:25:50Z

yeah, let’s get Nyonson’s PR merged first
I’ll redo this one in the meantime

apoelstra · 2025-10-31T21:24:36Z

#5224 is now closed. I think nyonson is planning to do a replacement one.

jrakibi · 2025-11-15T16:14:31Z

Sorry for the delay on this.

Added batched allocation for WitnessDecoder
we also apply batched allocation for the index space to avoid allocating the full 16 MB upfront when an attacker claims 4 million witness_elements.

The current approach automatically addresses the concerns raised in #5258 and #5239 (comment)

jrakibi · 2025-11-15T16:27:06Z

the maximum free DoS allocation an attacker can force with the current approach is around ~2 MB (from the cases I can think of and as shown in the unit test)

with the old doubling strategy, it's ~32 MB
without batched allocation for the index space, it's ~20 MB

tcharding · 2025-11-16T22:04:22Z

consensus_encoding/src/decode/decoders.rs

+        buffer.reserve_exact(elements_to_reserve);
+    }
+}
+


Perhaps this would be better as a method on the VecDecoder? Also could be private I believe.

/// Reserves capacity for typed vectors in batches. /// /// Calculates how many elements of type `T` fit within `MAX_VECTOR_ALLOCATE` bytes and reserves /// up to that amount when the buffer reaches capacity. /// /// Documentation adapted from Bitcoin Core: /// /// > For `DoS` prevention, do not blindly allocate as much as the stream claims to contain. /// > Instead, allocate in ~1 MB batches, so that an attacker actually needs to provide X MB of /// > data to make us allocate X+1 MB of memory. /// /// ref: <https://github.com/bitcoin/bitcoin/blob/72511fd02e72b74be11273e97bd7911786a82e54/src/serialize.h#L669C2-L672C1> fn reserve(&mut self) { if self.buffer.len() == self.buffer.capacity() { let elements_remaining = self.length - self.buffer.len(); let element_size = mem::size_of::<T>().max(1); let batch_elements = MAX_VECTOR_ALLOCATE / element_size; let elements_to_reserve = elements_remaining.min(batch_elements); self.buffer.reserve_exact(elements_to_reserve); } }

(Tested locally.)

Yeep, that makes sense

tcharding · 2025-11-16T22:05:44Z

consensus_encoding/src/decode/decoders.rs

+    let available_capacity = buffer.capacity() - buffer.len();
+    if available_capacity == 0 {


Perhaps use the same style in both functions (either with local var available_capacity or without).

tcharding · 2025-11-16T22:37:29Z

primitives/src/witness.rs

+    /// Allocates buffer space in ~1MB batches
+    /// Returns buffer length (may be less than `required_len` !!)
+    fn reserve_batch(&mut self, required_len: usize, index_position: usize) -> usize {
+        let max_required = required_len.max(index_position);


I can't work out why we are comparing the required_len (space between read position (cursor) and index area) with the index_position (an index into the index area)?

I couldn't come up with a test that fails.

Good catch, this was left over from an earlier iteration. required_len is always > index_position

I’ve removed it in the Witness PR: #5298

tcharding · 2025-11-16T22:38:13Z

primitives/src/witness.rs

+    fn reserve_batch(&mut self, required_len: usize, index_position: usize) -> usize {
+        let max_required = required_len.max(index_position);
+
+        if max_required <= self.content.len() {


Why are we comparing max_required (amount of space required to read into) with total length including the index area?

the index area and the element bytes live in the same buffer, max_required represents the total space we need in that single Vec. So we compare max_required to content.len() to check if that whole buffer is large enough

tcharding · 2025-11-16T22:39:36Z

primitives/src/witness.rs

            self.cursor = witness_index_space;
-            self.content = alloc::vec![0u8; self.cursor + 128];
+            self.content.resize(initial_index_space + 128, 0);
        }


I think the 128 in the docs is meaningful if we are keeping the 128 in code.

good point, addressed in #5298

tcharding

I'm a bit confused about all this. Totally willing to accept that the fault is my own.

tcharding · 2025-11-16T22:56:54Z

Since the witness struct is so much more complicated (because of the index area) perhaps it would be better to do split this PR into two?

…ByteVecDecoder` `VecDecoder` and `ByteVecDecoder` only checks that the element count/byte < 4,000,000. This still allows an attacker to claims a large size without providing the data. In this patch we allocates in 1 MB batches so an attacker now needs to provide X MB of data to make us allocate X+1 MB of memory

jrakibi · 2025-11-17T11:01:36Z

Since the witness struct is so much more complicated (because of the index area) perhaps it would be better to do split this PR into two?

Agree
I updated this PR to cover only VecDecoder and ByteVecDecoder.
WitnessDecoder has a different implementation because of the index area, so I opened a separate PR for it #5298

tcharding

ACK 0452a93

tcharding · 2025-11-19T04:50:10Z

Nice and clean, thanks for the continued effort.

tcharding · 2025-11-19T04:50:28Z

CI fails are unrelated.

apoelstra · 2025-11-20T14:40:08Z

Obligatory reminder that "size of the Rust type" has very little correlation with "size of an encoded object". But in this case we do actually want the size of the Rust type.

apoelstra

ACK 0452a93; successfully ran local tests

github-actions bot added C-consensus_encoding PRs modifying the consensus-encoding crate C-primitives labels Oct 21, 2025

jrakibi force-pushed the 20-10-memory-bound branch from 315d324 to 0d835b3 Compare October 21, 2025 15:45

jrakibi force-pushed the 20-10-memory-bound branch 2 times, most recently from 213be36 to 3b372b3 Compare October 30, 2025 11:07

jrakibi changed the title ~~consensus_encoding: tighten VecDecoder allocation bound~~ consensus_encoding: implement batched allocation for vector decoders Oct 30, 2025

jrakibi force-pushed the 20-10-memory-bound branch from 3b372b3 to eb889ba Compare October 30, 2025 11:17

jrakibi mentioned this pull request Oct 30, 2025

Audit malloc call sites for DoS protection #5223

Open

nyonson mentioned this pull request Oct 30, 2025

Delegate witness decoder #5224

Closed

jrakibi mentioned this pull request Nov 6, 2025

primitives: single allocation witness decoder #5239

Merged

nyonson mentioned this pull request Nov 7, 2025

primitives: further tighten memory allocations in witness decoder #5258

Closed

jrakibi force-pushed the 20-10-memory-bound branch from eb889ba to d86e918 Compare November 15, 2025 16:00

tcharding reviewed Nov 16, 2025

View reviewed changes

jrakibi force-pushed the 20-10-memory-bound branch from d86e918 to 0452a93 Compare November 17, 2025 10:27

github-actions bot removed the C-primitives label Nov 17, 2025

tcharding approved these changes Nov 19, 2025

View reviewed changes

apoelstra mentioned this pull request Nov 20, 2025

consensus_encoding: implement batched allocation for WitnessDecoder #5298

Merged

apoelstra approved these changes Nov 20, 2025

View reviewed changes

apoelstra merged commit 28a89e4 into rust-bitcoin:master Nov 20, 2025
24 of 26 checks passed

		let available_capacity = buffer.capacity() - buffer.len();
		if available_capacity == 0 {

Conversation

jrakibi commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrakibi commented Oct 21, 2025

Uh oh!

tcharding commented Oct 23, 2025

Uh oh!

nyonson commented Oct 23, 2025

Uh oh!

jrakibi commented Oct 25, 2025

Uh oh!

apoelstra commented Oct 26, 2025

Uh oh!

tcharding commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apoelstra commented Oct 27, 2025

Uh oh!

jrakibi commented Oct 30, 2025

Uh oh!

tcharding commented Oct 30, 2025

Uh oh!

jrakibi commented Oct 30, 2025

Uh oh!

apoelstra commented Oct 31, 2025

Uh oh!

jrakibi commented Nov 15, 2025

Uh oh!

jrakibi commented Nov 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tcharding left a comment

Choose a reason for hiding this comment

Uh oh!

tcharding commented Nov 16, 2025

Uh oh!

jrakibi commented Nov 17, 2025

Uh oh!

tcharding left a comment

Choose a reason for hiding this comment

Uh oh!

tcharding commented Nov 19, 2025

Uh oh!

tcharding commented Nov 19, 2025

Uh oh!

apoelstra commented Nov 20, 2025

Uh oh!

apoelstra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jrakibi commented Oct 21, 2025 •

edited

Loading

tcharding commented Oct 27, 2025 •

edited

Loading