Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Feature/config limit#439

Merged
VictorKoenders merged 6 commits intotrunkfrom
feature/config-limit
Dec 11, 2021
Merged

Feature/config limit#439
VictorKoenders merged 6 commits intotrunkfrom
feature/config-limit

Conversation

@VictorKoenders
Copy link
Contributor

Adds a limit to configuration and keeps track of how many bytes are going to be read, returning a DecodeError::LimitExceeded if this exceeds the amount of bytes read.

The reason this is set up like this is because we use a lot of Container::with_capacity when decoding containers. The limit exists primarily to not allocate too much value, for example when data is garbage, we don't want to accidentally allocate 10GB of memory when the data is invalid.

I've added an unclaim system as well. As per the Decoder comments:

When decoding container types, a typical implementation would claim to read len * size_of::<T>() bytes.
This is to ensure that bincode won't allocate several GB of memory while constructing the container.

Because the implementation claims len * size_of::<T>(), but then has to decode each T, this would be marked
as double. This function allows us to un-claim each T that gets decoded.

We cannot check if len * size_of::<T>() is valid without claiming it, because this would mean that if you have
a nested container (e.g. Vec<Vec<T>>), it does not know how much memory is already claimed, and could easily
allocate much more than the user intends.

@VictorKoenders
Copy link
Contributor Author

VictorKoenders commented Nov 29, 2021

TODO(self):

  • Add tests to see if the limit actually gets hit
  • Maybe add a check SliceReader to see if the slice is larger than the current limit, and error immediately?

@codecov-commenter
Copy link

codecov-commenter commented Nov 29, 2021

Codecov Report

Merging #439 (afc6719) into trunk (882e227) will increase coverage by 0.68%.
The diff coverage is 92.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##            trunk     #439      +/-   ##
==========================================
+ Coverage   62.19%   62.87%   +0.68%     
==========================================
  Files          46       46              
  Lines        3383     3456      +73     
==========================================
+ Hits         2104     2173      +69     
- Misses       1279     1283       +4     
Impacted Files Coverage Δ
src/error.rs 0.00% <ø> (ø)
src/config.rs 80.95% <50.00%> (-7.29%) ⬇️
src/de/decoder.rs 65.00% <76.92%> (+22.14%) ⬆️
src/de/impls.rs 79.07% <100.00%> (+1.80%) ⬆️
src/de/mod.rs 79.31% <100.00%> (+10.88%) ⬆️
src/features/impl_alloc.rs 98.24% <100.00%> (+0.16%) ⬆️
src/features/impl_std.rs 75.86% <100.00%> (+0.33%) ⬆️
tests/alloc.rs 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 882e227...afc6719. Read the comment docs.

@VictorKoenders VictorKoenders changed the title [WIP] Feature/config limit Feature/config limit Dec 9, 2021
@VictorKoenders VictorKoenders requested a review from a user December 9, 2021 08:28
super::impl_core::collect_into_array(&mut (0..N).map(|_| T::decode(&mut decoder)));
let result = super::impl_core::collect_into_array(&mut (0..N).map(|_| {
// See the documentation on `unclaim_bytes_read` as to why we're doing this here
decoder.unclaim_bytes_read(core::mem::size_of::<T>());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than unclaiming in a loop, we could just unclaim right after claiming. That should reduce loop maths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I unclaim per item is because of nested containers.

Consider the situation where there is data type Vec<Vec<u8>>, where each Vec is 100 entries. This means the entire container would allocate 10kb of memory.

If our limit is set to 2500, the current impl would:

  • Allocate the outer Vec<Vec<_>> (100 * 24), count = 2400
  • Deallocate 1 inner Vec<u8> (24 bytes), count = 2376
  • Allocate the inner Vec<u8> (100 bytes), count = 2476
  • Deallocate 1 inner Vec<u8> (24 bytes), count = 2452
  • Allocate the inner Vec<u8> (100 bytes), count = 2552

This would fail at the 3rd allocation

Whereas if we unclaim the entire vec immediately, we would get:

  • Allocate the outer Vec<Vec<_>> (100 * 24), count = 2400
  • Deallocate this, count = 0
  • 25x:
    • Allocate the inner Vec<u8> (100 bytes), count = 100, 200, etc

This would fail at the 25th allocation

{
fn decode<D: Decoder>(mut decoder: D) -> Result<Self, DecodeError> {
let len = crate::de::decode_slice_len(&mut decoder)?;
decoder.claim_container_read::<T>(len)?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this used sometimes and other times (like the array impl) done manually? Also, same comment here about immediately unclaiming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically for containers (e.g. Vec, HashMap, etc). An array [T; N] is in my eyes not a container

@VictorKoenders VictorKoenders merged commit 240558d into trunk Dec 11, 2021
@VictorKoenders VictorKoenders deleted the feature/config-limit branch December 11, 2021 14:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants