Skip to content

primitives: single allocation witness decoder#5239

Merged
apoelstra merged 1 commit intorust-bitcoin:masterfrom
nyonson:single-allocation-decoder
Nov 7, 2025
Merged

primitives: single allocation witness decoder#5239
apoelstra merged 1 commit intorust-bitcoin:masterfrom
nyonson:single-allocation-decoder

Conversation

@nyonson
Copy link
Copy Markdown
Contributor

@nyonson nyonson commented Nov 3, 2025

When attempting to simplify the WitnessDecoder in #5224, we noticed that we were double allocating which erased the performance gains of the old decoder from #672. This updates WitnessDecoder to use a single allocation like the old decoder.

While the intention of this patch was purely for performance, three of the new tests fail against the current version of the WitnessDecoder: decode_empty_element, decode_multiple_empty_elements, and decode_incomplete_witness_count.

For the first two tests, there is a functional change in how empty witness elements are handled. This patch introduces a short-circuit where the old version might get stuck in a loop forever requesting zero bytes.

The third test, decode_incomplete_witness_count, shows that the impl has a more explicit failure now if end is called before the witness element count is complete, instead of just returning and empty witness.

}

/// Resizes the content buffer if needed, doubling the size each time.
fn resize_if_needed(&mut self, required_len: usize) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resize logic is copied from the old decoder, but should maybe delegate to some consensus_encoding defined thing like @jrakibi is working on?

// be large enough to cover most witnesses without reallocating.
let witness_index_space = witness_elements * 4;
self.cursor = witness_index_space;
self.content = alloc::vec![0u8; self.cursor + 128];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we choose 128 here ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this over from the old decoding logic, and if I had to guess, it probably covers the majority of witness sizes (e.g. P2WPKH witness is ~100 bytes). But I don't have the data and it might be worth making it a documented const.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just swagging "a pubkey and a signature and some overhead". We can add a comment if you'd like. I don't think it's worth a named constant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beefed up the comment.

@nyonson nyonson force-pushed the single-allocation-decoder branch 2 times, most recently from f513e42 to f4269c0 Compare November 4, 2025 19:42
@tcharding
Copy link
Copy Markdown
Member

tcharding commented Nov 5, 2025

I wanted to convince myself that this was an optimisation only PR so I put moved the tests into a separate patch at the front. Three of them fail (at a minimum I would have expected decode_empty_element to pass) so this is either a bug fix PR also or something else is going on? So I don't have to use my brain if you already know, did you expect that? I.e do you think the code before this patch is applied is buggy? I'm not asking you to debug if you don't know.

@tcharding
Copy link
Copy Markdown
Member

Looks good man!

@nyonson
Copy link
Copy Markdown
Contributor Author

nyonson commented Nov 5, 2025

did you expect that? I.e do you think the code before this patch is applied is buggy?

Nope I did not...which makes me feel a bit uneasy! I can dig more into that tomorrow though.

@tcharding
Copy link
Copy Markdown
Member

tcharding commented Nov 5, 2025

$10 says my impl is buggy and yours is correct - you can pay me later. I wouldn't put too much thought into it, the tests you add look correct. I'd just mention in the PR description that these three tests don't pass before and that at least for two of them its surprising.

This converts the new witness decoder to use the optimized single
allocation internal memory structure which was shown back in
commit 2fd0125 with the old decoder to be a lot more performant,
justifying this increased complexity.
@nyonson nyonson force-pushed the single-allocation-decoder branch from f4269c0 to df64705 Compare November 5, 2025 13:47
@nyonson
Copy link
Copy Markdown
Contributor Author

nyonson commented Nov 5, 2025

@tcharding I added to the description reasoning for each test.

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK df64705

@tcharding
Copy link
Copy Markdown
Member

Thanks man, good effort - as usual!

Comment on lines +337 to 344
fn resize_if_needed(&mut self, required_len: usize) {
if required_len >= self.content.len() {
let mut new_len = self.content.len().max(1);
while new_len <= required_len {
new_len *= 2;
}
self.content.resize(new_len, 0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do u think we can use ~1MB batch allocation here instead of doubling ?

something like:

    fn resize_if_needed(&mut self, required_len: usize) {
        if required_len > self.content.len() {
            let bytes_remaining = required_len - self.content.len();
            let batch_size = bytes_remaining.min(Self::MAX_VECTOR_ALLOCATE);
            self.content.resize(self.content.len() + batch_size, 0);
        }
    }

if this make sense, am happy to ACK the PR as it is, and I can address this in #5177

Copy link
Copy Markdown
Contributor Author

@nyonson nyonson Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I copied this allocation logic over from the old decoder. I think it would be better to address the DoS stuff more holistically in your work. One thing I am wondering is if the witness decoder should have its own strategy, which would allow the consensus_encoding stuff to be private to that crate, or if witness should delegate to some exposed logic.

Copy link
Copy Markdown
Contributor

@jrakibi jrakibi Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the new refactoring you did we can have consensus_encoding stuff to be private

Copy link
Copy Markdown
Member

@tcharding tcharding Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW exponential back off is a pretty standard algorithm when allocating memory, so I found it easy to review and understand why it was chosen. If we choose a more domain specific algo it would need IMO more justification.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current doubling strategy is still vulnerable to DoS attack

Here’s a test that demonstrate that:

#[test]
fn test_dos_attack() {
    let mut encoded = Vec::new();
    encoded.extend_from_slice(&[0xFE, 0x00, 0x09, 0x3D, 0x00]); // 4_000_000 (witness count)
    encoded.extend_from_slice(&[0xFE, 0x00, 0x09, 0x3D, 0x00]); // 4_000_000 (1st element length)
    let mut slice = encoded.as_slice();
    let mut dec = WitnessDecoder::new();
    assert!(dec.push_bytes(&mut slice).unwrap());
    assert!(dec.content.len() >= 32_000_000);
}

With only 10 bytes inputs an attacker can allocate 32 Mbytes (16mb for index space and it doubles when we call resize_if_needed)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, if we were allocating 1 meg at a time this example would take 20Mb instead of 32?

It seems to me the way this works is:

  • The attacker declares that there are 4 million witness elements (the maximum we allow, on the theory that maybe an entire block could be filled with empty witnesses, okay) so we allocate 16Mb to store these 4 million indices.
  • Then the attacker declares that the first witness element has size 4Mb (though in this example he could say any positive value); we then need to store 16Mb + 1 so we allocate 32Mb.

But if we were allocating 1Mb at a time, we'd still be allocating 16Mb of index space and still be allocating 4Mb for the first witness.

And of course, because you aren't actually providing the claimed data, we return an error and deallocate everything. So I don't see anyway to make this attack use more than 32Mb, nor any way to amplify the attack to allocate lots of 32Mbs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this unit test though. I wonder if there's a way we can turn it into a fuzz test so we can test my "you can't make it allocate more than 32Mb without providing at least 16Mb of input" belief.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW if we really wanted to reduce this memory (which I do not think we should do in this PR) we could be cleverer about how to store the index. At the very least we could use 3-byte lengths instead of 4-byte ones.

Copy link
Copy Markdown
Contributor

@jrakibi jrakibi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK df64705

if required_len >= self.content.len() {
let mut new_len = self.content.len().max(1);
while new_len <= required_len {
new_len *= 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In df64705:

This can overflow on a 16-bit machine. We should clamp new_len to usize::MAX. (Fine to address in a followup.)

@apoelstra
Copy link
Copy Markdown
Member

In df64705:

I also think that somewhere we should be capping required_len to 20 Mb, which I think is the maximum amount it can be for a valid witness (one with 4 million entries, 3999999 of which are 0-length and the other one 4 million bytes).

Again, can wait for a followup, especially because I'm unsure of the exact number and I think we want to review this separately from the rest of the logic.

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK df64705; successfully ran local tests

@apoelstra apoelstra merged commit b19f238 into rust-bitcoin:master Nov 7, 2025
26 checks passed
@nyonson
Copy link
Copy Markdown
Contributor Author

nyonson commented Nov 7, 2025

Follow up issue for the allocation improvements: #5258

jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Nov 15, 2025
add batched allocation for witness decoder.
we also apply batched allocation for the index space to avoid allocating
the full 16 MB upfrontwhen an attacker claims 4 million witness_elements.

instead, we allocate gradually in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Nov 17, 2025
add batched allocation for witness decoder.
we also apply batched allocation for the index space to avoid allocating
the full 16 MB upfrontwhen an attacker claims 4 million witness_elements.

instead, we allocate gradually in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Nov 24, 2025
Add batched allocation for witness decoder (data area).
we allocate incrementally in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Nov 24, 2025
Add batched allocation for witness decoder (data area).
we allocate incrementally in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Nov 26, 2025
Add batched allocation for witness decoder (data area).
we allocate incrementally in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Jan 16, 2026
Add batched allocation for witness decoder (data area).
we allocate incrementally in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)
jrakibi added a commit to jrakibi/rust-bitcoin that referenced this pull request Jan 17, 2026
Add batched allocation for witness decoder (data area).
we allocate incrementally in ~1 MB batches

The current approach automatically addresses the concerns raised in
rust-bitcoin#5258 and
rust-bitcoin#5239 (comment)

we aslo add `reserve_batch` to Mutants exclusion list
Mutation testing in WitnessDecoder::reserve_batch changes the buffer
allocation logic and they cause infinite loop during decoding
apoelstra added a commit that referenced this pull request Jan 24, 2026
…nessDecoder

9bf1522 consensus_encoding: implement batched allocation for WitnessDecoder (jrakibi)

Pull request description:

  Currently an attacker can force a 32 MB memory allocation by only providing 10 bytes as inputs (see test case that proves that in #5239 (comment))
  
  With the current approach we reduce this attack surface to ~17 MB.
  
  ### Attack vector (test case inputs):
  
  - `[0xFE, 0x00, 0x09, 0x3D, 0x00]` = 4_000_000 (witness count)
  - `[0xFE, 0x00, 0x09, 0x3D, 0x00]` = 4_000_000 (1st element length)
  
  10 bytes of input will trigger the allocation below
  
  ---
  
  Old approach with `resize_if_needed` (32 MB allocation):
  * Index space = 16 MB (4_000_000 witness elements × 4 bytes)
  * Initial buffer = 16 MB + 128 bytes
  * When the 4 MB element length is decoded, `resize_if_needed` doubles the buffer
  * Because `required_len = 16 MB + 128 + 4 MB = 20 MB + 128 bytes`, it doubles from 16 MB to 32 MB
  * Total allocation = 32 MB
  
  
  New approach with `reserve_batch` (~17 MB allocation):
  * Index space = 16 MB
  * Initial buffer = 16 MB + 128 bytes
  * When a 4 MB element length is decoded, `reserve_batch` allocates in 1 MB batches
  * Total allocation = 16 MB + 128 bytes + 1 MB = 17_000_128 bytes
  
  The current approach automatically addresses the concerns raised in  #5258 and  #5239 (comment)


ACKs for top commit:
  tcharding:
    ACK 9bf1522
  apoelstra:
    ACK 9bf1522; successfully ran local tests


Tree-SHA512: 81ad652f6158d333fc473846e0bb56d0726d4bf1b68a0ba82d4e673915224be306593a685b948d3e71bb114577033cb0d9f935ad69c39ba07b9ffe971a69a9d7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants