ARM-specific optimisations for inflate.#256
Conversation
Change-Id: Id4cda552b39bfb39ab35ec499dbe122b43b6d1a1
In inflate_fast() the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance. Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
|
@ProgramMax, more of the PNG optimisation here, FYI. Corresponding Chromium patch (now with green bots!) is here. |
|
Thank you for pinging me. :) |
| exploited. | ||
| */ | ||
| static inline unsigned char FAR *chunkcopy_safe(unsigned char FAR *out, | ||
| const unsigned char FAR * Z_RESTRICT from, |
There was a problem hiding this comment.
I'm not sure if you can use Z_RESTRICT here. Maybe that's true if you came in via inflate.c, but maybe not if you came in via infback.c.
There's a longer discussion of that at https://chromium-review.googlesource.com/c/chromium/src/+/641575/4/third_party/zlib/contrib/arm/chunkcopy.h#230
There was a problem hiding this comment.
My inclination is to give infback.c and inflate.c different implementations; but it could still be argued that the assumption is too dangerous for something, somewhere out there, written in the last twenty-something years.
|
This is past-life work, now, and I'm not sure how I'm supposed to reconcile that now that I need to fix it. So I won't. |
|
I can fix it and add it to the Adler-32 + CRC32 merge request in: #251 |
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This adds two optimizations for ARM: NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs) ARM(v7+) specific optimization for inflate I've also connected inflate optimization to the build using the following source as template. mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16 Additional info: https://codereview.chromium.org/2676493007/ https://codereview.chromium.org/2722063002/ Sources: madler/zlib#251 (only the first commit) madler/zlib#256 Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
In
inflate_fast()the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance.For PNG decode this comes out at about 33% faster overall across a wide set of files. Small PNGs tend to benefit the least because they don't ever enter into
inflate_fast()where the most straightforward assumptions can be made.