Revamp deserialize_char by slyrz · Pull Request #133 · bincode-org/bincode

slyrz · 2017-03-08T10:12:55Z

This PR does the following things in deserialize_char:

Removes a call to unwrap() that was probably impossible to panic
since str::from_utf8 checks the encoding, but anyway.
Uses a single buffer to read the char into.
Replaces the while loop with read_exact because the while loop may not
return on malformed input:

let err = deserialize::<char>(&[0b_1110_0000]).is_err();
println!("{:?}", err);

Things I didn't change:

The error gets always created if the compiler isn't smart enough
to move it into the right places. It is boxed, so this means one heap allocation/deallocation
on every call to deserialize_char?
The big UTF8_CHAR_WIDTH table and the utf8_char_width function could be replaced with something like

match (!buf[0]).leading_zeros() {
    0 => 1,
    2 => 2,
    3 => 3,
    4 => 4,
    _ => error
}

TyOverby · 2017-03-08T18:48:38Z

Great cleanup, thanks!

The error gets always created if the compiler isn't smart enough
to move it into the right places. It is boxed, so this means one heap allocation/deallocation
on every call to deserialize_char?

Ah yeah. I think this code was written prior to error-boxing, and I must have missed it in code review. This allocation should probably be avoided as I'm pretty sure the compiler isn't smart enough to remove. If you want to take this on in your PR, I'd put error creation into a closure, and call it when an error occurs. If not, I'll address it in a follow-up PR.

The big UTF8_CHAR_WIDTH table and the utf8_char_width function could be replaced with something like (leading_zeros)

I'm pretty sure that I benchmarked this earlier and found that the lookup table was faster. That might have been specific to the architecture that I was on though? It looks like most CPUs have a pretty fast LZCNT instruction.

slyrz · 2017-03-08T20:38:25Z

I'd put error creation into a closure, and call it when an error occurs. If not, I'll address it in a follow-up PR.

let error = || { 
    ErrorKind::InvalidEncoding{
        desc: "Invalid char encoding",
        detail: None
    }.into()
};
...
if width == 0 { return Err(error())}

Like this?

I'm pretty sure that I benchmarked this earlier and found that the lookup table was faster. That might have been specific to the architecture that I was on though? It looks like most CPUs have a pretty fast LZCNT instruction.

I must admit I had no performance benefits in mind. I guess the bit twiddling + match might be slower than the lookup table.

TyOverby · 2017-03-08T20:49:54Z

Like this?

Yep!

I must admit I had no performance benefits in mind. I guess the bit twiddling + match might be slower than the lookup table.

Yeah. I'll file an issue to do more benchmarking at a later date.

TyOverby · 2017-03-09T19:02:22Z

👍

slyrz added 4 commits March 8, 2017 09:45

Remove unneccesary unwrap

9bb716f

Use a single buffer for reading a char

c2f5542

Replace while loop with read_exact

fe740c3

Remove first_byte variable

7f1e680

TyOverby mentioned this pull request Mar 8, 2017

Look into using LZCNT for char-size calculation #135

Closed

slyrz added 2 commits March 9, 2017 17:31

Use read_exact to avoid waiting for data after EOF

acdf8d5

Create error in a closure

0608e5d

TyOverby merged commit 7210865 into bincode-org:master Mar 9, 2017

slyrz mentioned this pull request Mar 9, 2017

Fuzzing #136

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp deserialize_char#133

Revamp deserialize_char#133
TyOverby merged 6 commits intobincode-org:masterfrom
slyrz:master

slyrz commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 8, 2017

Uh oh!

slyrz commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

slyrz commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 8, 2017

Uh oh!

slyrz commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 8, 2017

Uh oh!

TyOverby commented Mar 9, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants