Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Make read_vec more robust#134

Merged
TyOverby merged 3 commits intobincode-org:masterfrom
slyrz:oom
Mar 9, 2017
Merged

Make read_vec more robust#134
TyOverby merged 3 commits intobincode-org:masterfrom
slyrz:oom

Conversation

@slyrz
Copy link
Copy Markdown
Contributor

@slyrz slyrz commented Mar 8, 2017

This PR makes bincode survive garbage input when decoding types that invoke
the read_vec function. Currently read_vec allocates a vector of the given
length without any sanity checks, which makes it easy to crash the decoder
with extreme values.

Since the Read trait does not provide a way to check the size of the
remaining data, this PR splits read_vec into two functions:

  1. read_vec_small that read vectors the way bincode currently does,
  2. read_vec_large that tries to read largish vectors in chunks.

On encountering a large vector size, read_vec will call read_vec_large
to load the vector in reasonable sized chunks. read_vec_large avoids allocating
the vector at once and grows it instead. If length turns out to be garbage,
bincode will reach the end of the data before it runs out of memory.

Of course, growing a buffer like this will come with a few extra allocations and
copying, but with a reasonable large buffer size (I picked 64KB out of thin air),
read_vec_large will rarely be invoked.

This PR is a mere suggestion. I don't know your stance on broken inputs and
how bincode should handle them. Some might say "Just add a checksum to your data"
and that's fine, so no feelings hurt if you don't want to merge this.


fn read_bytes(&mut self, count: u64) -> Result<()> {
self.read += count;
if let (read, false) = self.read.overflowing_add(count) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super worried about this case right now. If the "number of bytes read into the struct" exceeds 2.3 exabytes, then catching SizeLimit errors is somewhat moot.

As it is, read_bytes is on the hot-path, so I'd prefer to avoid the extra computation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry was a maliciously crafted length value causing an overflow panic. But you're right. People will have overflow checks disabled in their deployed release builds, which means an overflow won't cause their programs to terminate with a panic. I'll revert this. But be warned, once you add fuzzer tests to bincode, you'll find yourself adding this again. ;)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, I see where this is happening. Fortunately, I think there's a pretty easy solution:

Right now we are using read_bytes in two locations:

  • read_type
  • read_vec

(I should probably rename one of those two methods, because despite the name, they are not similar at all).

read_type is safe, because it is only being used to increment the read counter for statically known types.
read_vec is not currently safe because it trusts the length that was read out of the byte-stream.

This second case would be pretty easy to fix simply by sticking the calls to read_bytes inside of the read_vec_small and read_vec_large functions immediately after any calls to read_exact.

Ok(result)
}

fn read_vec_large(&mut self, mut len: usize, block: usize) -> Result<Vec<u8>> {
Copy link
Copy Markdown
Collaborator

@TyOverby TyOverby Mar 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

Shouldn't this fix be possible while only maintaining one (result) buffer?

Here's my quick attempt: https://is.gd/Gq8N3C

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will change it. Your version needs another cmp::min in

let mut result = Vec::with_capacity(BLOCK_SIZE);

or else even the smallest Vectors reserve space for at least BLOCK_SIZE bytes. I'll try to get around this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I think it's worse than that actually. I think that it should read Vec::new(), because the very next thing that gets executed is a call to reserve.

In my implementation I think I'm always over-allocating.

@TyOverby
Copy link
Copy Markdown
Collaborator

TyOverby commented Mar 9, 2017

Looks good! Thanks for the changes!

@TyOverby TyOverby merged commit cff1ec6 into bincode-org:master Mar 9, 2017
@slyrz slyrz mentioned this pull request Mar 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants