Use MAX_PREALLOCATION consistently#605
Conversation
MAX_PREALLOCATION consistentlyMAX_PREALLOCATION consistently
MAX_PREALLOCATION consistentlyMAX_PREALLOCATION consistently
Use `MAX_PREALLOCATION` both when reading a vec from bytes and when decoding each element.
Increase MAX_PREALLOCATION in order to avoid calling realloc too often
| // If there is input len and it cannot be pre-allocated then return directly. | ||
| if input_len.map(|l| l < byte_len).unwrap_or(false) { | ||
| return Err("Not enough data to decode vector".into()); | ||
| num_undecoded_items = num_undecoded_items.saturating_sub(chunk_len); |
There was a problem hiding this comment.
This saturating_sub's completely unnecessary here since impossible to have chunk_len > num_undecoded_items due to the min.
| if let Some(input_len) = input.remaining_len()? { | ||
| if input_len < len { | ||
| return Err("Not enough data to decode vector".into()); | ||
| } | ||
| } |
There was a problem hiding this comment.
This isn't correct as deserializing T might take any number of bytes (including even zero bytes, e.g. ()).
What we should do here is to have a serialized_size_hint() method (or, more specifically, probably an associated const so that it can be checked statically to fit within MAX_PREALLOCATION) or something like that on T which would return a value that could allow this check. (We already have encoded_fixed_size there, but that returns an exact number of bytes; it could be used here, but technically that's too strict and we can do better here by using the minimum.)
There was a problem hiding this comment.
Yeah, alternatively we can drop it. Although having it here can have one benefit - if we end up not having enough data then this will return an early error instead of wasting time trying to deserialize it. Nice to have, but not strictly necessary.
There was a problem hiding this comment.
To implement this, we would need to write quite a lot of code. For example for an enum we would need to know the variant that requires the least amount of bytes. However, it could then still fail at decoding because we try to decode always the enum variant that uses much more bytes etc.
There was a problem hiding this comment.
Hm, well, would it be that much code? I implemented this in my serialization crate and it's mostly fine; with enums you essentially just autogenerate a (min(variant1, variant2, ..), max(variant1, variant2, ..)) in your impl. Of course this is just an optimization (in some cases it would make incomplete deserializations fail early, and in some cases it would allow the compiler to remove per-element size checks), and as you've said it can still fail at decoding depending on what you're decoding.
Anyway, I'm fine with going with your suggestion to just delete the check.
| T: ToMutByteSlice + Default + Clone, | ||
| F: FnMut(&mut Vec<T>, usize) -> Result<(), Error>, | ||
| { | ||
| debug_assert!(MAX_PREALLOCATION >= mem::size_of::<T>(), "Invalid precondition"); |
There was a problem hiding this comment.
We should make this into a static assert and check it at compile time.
There was a problem hiding this comment.
I couldn't manage to do this so far. I tried something like
const _: () = {
assert!(MAX_PREALLOCATION >= mem::size_of::<T>())
}
inside decode_vec_chunked()
But I'm getting an error: can't use generic parameters from outer item.
Any suggestion would be helpful
There was a problem hiding this comment.
You don't need to define a constant; since Rust 1.79 you can use a const {} block to force const evaluation of an expression.
There was a problem hiding this comment.
Yes, this works, thanks ! PTAL on #615
But the CI fails, because the CI image uses rust 1.73.0 . We can try to release a paritytech/ci-unified:bullseye-1.79.0 image. Will check how this can be done.
|
Thanks for the review ! Will address the comments in a new PR today. LE: Here is the PR: #615 |
| unsafe { | ||
| decoded_vec.set_len(decoded_vec_len + chunk_len); | ||
| } | ||
|
|
||
| while items_remains > 0 { | ||
| let items_len_read = max_preallocated_items.min(items_remains); | ||
| let bytes_slice = decoded_vec.as_mut_byte_slice(); | ||
| input.read(&mut bytes_slice[decoded_vec_size..]) |
There was a problem hiding this comment.
This is not the right way of doing it and actually unsound.
Input::read() is a safe method of a safe trait, it doesn't guarantee an invariant of not reading its argument. So it is possible to have a perfectly safe implementation of Input that reads some bytes before writing to them, but since they are uninitialized (you just blindly called Vec::set_len()), it is an instant undefined behavior!
The right thing to do is to have a separate unsafe method that takes a pointer or add a method that takes something like &mut [MaybeUninit<T>] instead.
There was a problem hiding this comment.
Yes, you're right. A few points though:
- A lot of this code traces back to a time where the rules about UB weren't entirely clear, and it was still an open question whether reading uninitialized memory of a type which doesn't have a niche is UB or not.
- The "proper" blessed way of doing this (that is,
Vec::spare_capacity_mut) was introduced long after some of the code in this crate was written. - Changing this means we need to break backwards compatibility (which isn't technically a problem per se, it's just painful because of all of the extra churn as now everyone needs to update).
- Unless the
readactually reads the memory it's not actually a problem, both in practice and in theory (but it's technically a landmine because there's nounsafeand reading the memory is technically unsound). - Even if
readactually reads the uninitialized memory currently nothing bad actually happens (it's only UB in theory AFAIK), and it has been like this for years.
I'm not trying to excuse this, just trying to be pragmatic, as this is essentially an issue of "this code was designed when the rules about UB weren't clear, and now we have a function which should be marked unsafe but it isn't because it can trigger UB when misused".
Anyway, feel free to create an issue about this. If/when we'll be bumping the major version we'd definitely want to fix this.
Related to #609
Use
MAX_PREALLOCATIONboth when reading a vec from bytes and when decoding each element.