Use `MAX_PREALLOCATION` consistently by serban300 · Pull Request #605 · paritytech/parity-scale-codec

serban300 · 2024-06-19T07:12:07Z

Related to #609

Use MAX_PREALLOCATION both when reading a vec from bytes and when decoding each element.

serban300 · 2024-07-17T12:16:21Z

@bkchr @koute could you PTAL on this PR since it's somewhat related to #609 ? Even though it's not a prerequisite, I think it would be nice to have.

Use `MAX_PREALLOCATION` both when reading a vec from bytes and when decoding each element.

Increase MAX_PREALLOCATION in order to avoid calling realloc too often

koute · 2024-07-22T04:28:45Z

src/codec.rs

-	// If there is input len and it cannot be pre-allocated then return directly.
-	if input_len.map(|l| l < byte_len).unwrap_or(false) {
-		return Err("Not enough data to decode vector".into());
+		num_undecoded_items = num_undecoded_items.saturating_sub(chunk_len);


This saturating_sub's completely unnecessary here since impossible to have chunk_len > num_undecoded_items due to the min.

Addressed in #615

koute · 2024-07-22T04:33:36Z

src/codec.rs

+	if let Some(input_len) = input.remaining_len()? {
+		if input_len < len {
+			return Err("Not enough data to decode vector".into());
+		}
+	}


This isn't correct as deserializing T might take any number of bytes (including even zero bytes, e.g. ()).

What we should do here is to have a serialized_size_hint() method (or, more specifically, probably an associated const so that it can be checked statically to fit within MAX_PREALLOCATION) or something like that on T which would return a value that could allow this check. (We already have encoded_fixed_size there, but that returns an exact number of bytes; it could be used here, but technically that's too strict and we can do better here by using the minimum.)

We should just drop this check.

Yeah, alternatively we can drop it. Although having it here can have one benefit - if we end up not having enough data then this will return an early error instead of wasting time trying to deserialize it. Nice to have, but not strictly necessary.

To implement this, we would need to write quite a lot of code. For example for an enum we would need to know the variant that requires the least amount of bytes. However, it could then still fail at decoding because we try to decode always the enum variant that uses much more bytes etc.

Hm, well, would it be that much code? I implemented this in my serialization crate and it's mostly fine; with enums you essentially just autogenerate a (min(variant1, variant2, ..), max(variant1, variant2, ..)) in your impl. Of course this is just an optimization (in some cases it would make incomplete deserializations fail early, and in some cases it would allow the compiler to remove per-element size checks), and as you've said it can still fail at decoding depending on what you're decoding.

Anyway, I'm fine with going with your suggestion to just delete the check.

Removed it for the moment: #615

koute · 2024-07-22T04:45:16Z

src/codec.rs

-	T: ToMutByteSlice + Default + Clone,
+	F: FnMut(&mut Vec<T>, usize) -> Result<(), Error>,
 {
 	debug_assert!(MAX_PREALLOCATION >= mem::size_of::<T>(), "Invalid precondition");


We should make this into a static assert and check it at compile time.

I couldn't manage to do this so far. I tried something like

const _: () = { assert!(MAX_PREALLOCATION >= mem::size_of::<T>()) }

inside decode_vec_chunked()

But I'm getting an error: can't use generic parameters from outer item.

Any suggestion would be helpful

You don't need to define a constant; since Rust 1.79 you can use a const {} block to force const evaluation of an expression.

Yes, this works, thanks ! PTAL on #615

But the CI fails, because the CI image uses rust 1.73.0 . We can try to release a paritytech/ci-unified:bullseye-1.79.0 image. Will check how this can be done.

serban300 · 2024-07-22T06:59:24Z

Thanks for the review ! Will address the comments in a new PR today.

LE: Here is the PR: #615

* Address #605 code review comments * Check MAX_PREALLOCATION >= mem::size_of::<T> statically * Update CI image to paritytech/ci-unified:bullseye-1.79.0 This reverts commit c54689d.

nazar-pc · 2025-05-06T19:50:42Z

src/codec.rs

+		unsafe {
+			decoded_vec.set_len(decoded_vec_len + chunk_len);
+		}

-		while items_remains > 0 {
-			let items_len_read = max_preallocated_items.min(items_remains);
+		let bytes_slice = decoded_vec.as_mut_byte_slice();
+		input.read(&mut bytes_slice[decoded_vec_size..])


This is not the right way of doing it and actually unsound.

Input::read() is a safe method of a safe trait, it doesn't guarantee an invariant of not reading its argument. So it is possible to have a perfectly safe implementation of Input that reads some bytes before writing to them, but since they are uninitialized (you just blindly called Vec::set_len()), it is an instant undefined behavior!

The right thing to do is to have a separate unsafe method that takes a pointer or add a method that takes something like &mut [MaybeUninit<T>] instead.

Yes, you're right. A few points though:

A lot of this code traces back to a time where the rules about UB weren't entirely clear, and it was still an open question whether reading uninitialized memory of a type which doesn't have a niche is UB or not.

The "proper" blessed way of doing this (that is, Vec::spare_capacity_mut) was introduced long after some of the code in this crate was written.

Changing this means we need to break backwards compatibility (which isn't technically a problem per se, it's just painful because of all of the extra churn as now everyone needs to update).

Unless the read actually reads the memory it's not actually a problem, both in practice and in theory (but it's technically a landmine because there's no unsafe and reading the memory is technically unsound).

Even if read actually reads the uninitialized memory currently nothing bad actually happens (it's only UB in theory AFAIK), and it has been like this for years.

I'm not trying to excuse this, just trying to be pragmatic, as this is essentially an issue of "this code was designed when the rules about UB weren't clear, and now we have a function which should be marked unsafe but it isn't because it can trigger UB when misused".

Anyway, feel free to create an issue about this. If/when we'll be bumping the major version we'd definitely want to fix this.

serban300 self-assigned this Jun 19, 2024

serban300 marked this pull request as draft June 19, 2024 07:15

serban300 changed the title ~~Use MAX_PREALLOCATION consistently~~ [WIP] Use MAX_PREALLOCATION consistently Jun 19, 2024

serban300 force-pushed the small-fixes branch from 744c41c to a7ad4db Compare June 19, 2024 07:17

serban300 changed the title ~~[WIP] Use MAX_PREALLOCATION consistently~~ Use MAX_PREALLOCATION consistently Jun 19, 2024

serban300 marked this pull request as ready for review June 19, 2024 07:34

serban300 force-pushed the small-fixes branch from a7ad4db to 05f06f2 Compare July 17, 2024 10:18

serban300 requested review from bkchr and koute July 17, 2024 12:15

serban300 added 3 commits July 17, 2024 18:56

Use MAX_PREALLOCATION consistently

c52cb31

Use `MAX_PREALLOCATION` both when reading a vec from bytes and when decoding each element.

Simplify VecDeque::encode_to()

31d6c23

Increase MAX_PREALLOCATION

247c9e0

Increase MAX_PREALLOCATION in order to avoid calling realloc too often

serban300 force-pushed the small-fixes branch from 152e4ad to 247c9e0 Compare July 17, 2024 15:56

bkchr approved these changes Jul 18, 2024

View reviewed changes

serban300 merged commit 36baa4f into paritytech:master Jul 19, 2024

koute reviewed Jul 22, 2024

View reviewed changes

serban300 mentioned this pull request Jul 22, 2024

Follow-up on #605 #615

Merged

serban300 added a commit that referenced this pull request Jul 23, 2024

Follow-up on #605 (#615)

a388fa9

* Address #605 code review comments * Check MAX_PREALLOCATION >= mem::size_of::<T> statically * Update CI image to paritytech/ci-unified:bullseye-1.79.0 This reverts commit c54689d.

niklasad1 mentioned this pull request Oct 24, 2024

Prep to release 3.7.1 #645

Closed

jsdw mentioned this pull request Oct 24, 2024

Prep to release 3.7.0 #646

Merged

vedhavyas mentioned this pull request May 6, 2025

Bump upstream commit with backported codec related fixes autonomys/subspace#3518

Merged

1 task

nazar-pc reviewed May 6, 2025

View reviewed changes

nazar-pc mentioned this pull request May 7, 2025

Unsound read_vec_from_u8s() #730

Open

Conversation

serban300 commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

serban300 commented Jul 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serban300 Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serban300 Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serban300 commented Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

serban300 commented Jun 19, 2024 •

edited

Loading

serban300 Jul 22, 2024 •

edited

Loading

serban300 Jul 22, 2024 •

edited

Loading

serban300 commented Jul 22, 2024 •

edited

Loading