Skip to content
This repository was archived by the owner on Aug 15, 2025. It is now read-only.

Revamp deserialize_char#133

Merged
TyOverby merged 6 commits intobincode-org:masterfrom
slyrz:master
Mar 9, 2017
Merged

Revamp deserialize_char#133
TyOverby merged 6 commits intobincode-org:masterfrom
slyrz:master

Conversation

@slyrz
Copy link
Copy Markdown
Contributor

@slyrz slyrz commented Mar 8, 2017

This PR does the following things in deserialize_char:

  1. Removes a call to unwrap() that was probably impossible to panic
    since str::from_utf8 checks the encoding, but anyway.
  2. Uses a single buffer to read the char into.
  3. Replaces the while loop with read_exact because the while loop may not
    return on malformed input:
let err = deserialize::<char>(&[0b_1110_0000]).is_err();
println!("{:?}", err);

Things I didn't change:

  1. The error gets always created if the compiler isn't smart enough
    to move it into the right places. It is boxed, so this means one heap allocation/deallocation
    on every call to deserialize_char?
  2. The big UTF8_CHAR_WIDTH table and the utf8_char_width function could be replaced with something like
match (!buf[0]).leading_zeros() {
    0 => 1,
    2 => 2,
    3 => 3,
    4 => 4,
    _ => error
}

@TyOverby
Copy link
Copy Markdown
Collaborator

TyOverby commented Mar 8, 2017

Great cleanup, thanks!

The error gets always created if the compiler isn't smart enough
to move it into the right places. It is boxed, so this means one heap allocation/deallocation
on every call to deserialize_char?

Ah yeah. I think this code was written prior to error-boxing, and I must have missed it in code review. This allocation should probably be avoided as I'm pretty sure the compiler isn't smart enough to remove. If you want to take this on in your PR, I'd put error creation into a closure, and call it when an error occurs. If not, I'll address it in a follow-up PR.

The big UTF8_CHAR_WIDTH table and the utf8_char_width function could be replaced with something like (leading_zeros)

I'm pretty sure that I benchmarked this earlier and found that the lookup table was faster. That might have been specific to the architecture that I was on though? It looks like most CPUs have a pretty fast LZCNT instruction.

@slyrz
Copy link
Copy Markdown
Contributor Author

slyrz commented Mar 8, 2017

I'd put error creation into a closure, and call it when an error occurs. If not, I'll address it in a follow-up PR.

let error = || { 
    ErrorKind::InvalidEncoding{
        desc: "Invalid char encoding",
        detail: None
    }.into()
};
...
if width == 0 { return Err(error())}

Like this?

I'm pretty sure that I benchmarked this earlier and found that the lookup table was faster. That might have been specific to the architecture that I was on though? It looks like most CPUs have a pretty fast LZCNT instruction.

I must admit I had no performance benefits in mind. I guess the bit twiddling + match might be slower than the lookup table.

@TyOverby
Copy link
Copy Markdown
Collaborator

TyOverby commented Mar 8, 2017

Like this?

Yep!

I must admit I had no performance benefits in mind. I guess the bit twiddling + match might be slower than the lookup table.

Yeah. I'll file an issue to do more benchmarking at a later date.

@TyOverby
Copy link
Copy Markdown
Collaborator

TyOverby commented Mar 9, 2017

👍

@TyOverby TyOverby merged commit 7210865 into bincode-org:master Mar 9, 2017
@slyrz slyrz mentioned this pull request Mar 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants