-
Notifications
You must be signed in to change notification settings - Fork 102
perf: parse headers in blocks and scan for magic numbers with memchr #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9437681 to
0c2def2
Compare
867915c to
00acf23
Compare
00acf23 to
f940c2b
Compare
|
Thanks for the update; but I see a number of my comments from earlier are still unresolved. Any clue how soon you'll be able to address them? |
7819dd7 to
f70aef4
Compare
|
I've since refactored this to cover your comments. The current implementation is not quite complete, but please let me know if this is easier to follow. |
2f74396 to
180bcdb
Compare
|
Please feel free to revert cc4fe7f if it helps, as long as if you delete changes then you add them back in a merge PR. |
cc4fe7f to
fa2c8e1
Compare
I just rebased against master and force pushed, which deleted your merge history. I believe the merge commit was just a bump against master, and not any additional work you added on top of my changes, so I'm not missing anything at fa2c8e1? Just making sure! |
9f579f4 to
93a7bf0
Compare
020a308 to
3e04004
Compare
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Fix errors Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Motivation
Parsing in Blocks
Aesthetics
We currently parse the zip binary format with a series of e.g.
.read_u32_le()calls. While this does work, I believe this approach obscures some of the control flow and makes it more difficult to correlate to the zip specification.Performance
Additionally, when reading a zip file from a non-buffered stream, I believe a series of individual reads may take longer than a single bulk read (I have not verified this).
Scanning with
memchrWe currently perform an extremely inefficient byte-by-byte loop to locate central directory records, e.g.:
zip2/src/spec.rs
Lines 66 to 80 in 103f1ec
We can lean on string scanning utilities like
memchrin order to speed up this loop that occurs whenever we read a zip file!Implementation
*Blockstructs declared with#[repr(packed)]so we can extract metadata using bulk reads.Blocktrait with several utility methods for the different types of "block" objects introduced in this PR.memchrto scan for central directory magic numbers.Result
We get a very noticeable speedup in the new benchmarks!
Note that the benchmarks have been renamed upon review, so the output looks more like this: