Skip to content

ARM-specific optimisations for inflate.#256

Closed
ghost wants to merge 2 commits intomasterfrom
unknown repository
Closed

ARM-specific optimisations for inflate.#256
ghost wants to merge 2 commits intomasterfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Apr 27, 2017

In inflate_fast() the output pointer always has plenty of room to write. This means that so long as the target is capable, wide un-aligned loads and stores can be used to transfer several bytes at once. When the reference distance is too short simply unroll the data a little to increase the distance.

For PNG decode this comes out at about 33% faster overall across a wide set of files. Small PNGs tend to benefit the least because they don't ever enter into inflate_fast() where the most straightforward assumptions can be made.

Simon Hosie added 2 commits April 26, 2017 17:19
Change-Id: Id4cda552b39bfb39ab35ec499dbe122b43b6d1a1
In inflate_fast() the output pointer always has plenty of room to write. This
means that so long as the target is capable, wide un-aligned loads and stores
can be used to transfer several bytes at once. When the reference distance is
too short simply unroll the data a little to increase the distance.

Change-Id: I59854eb25d2b1e43561c8a2afaf9175bf10cf674
@ghost
Copy link
Copy Markdown
Author

ghost commented May 4, 2017

@ProgramMax, more of the PNG optimisation here, FYI. Corresponding Chromium patch (now with green bots!) is here.

@ProgramMax
Copy link
Copy Markdown

Thank you for pinging me. :)

exploited.
*/
static inline unsigned char FAR *chunkcopy_safe(unsigned char FAR *out,
const unsigned char FAR * Z_RESTRICT from,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you can use Z_RESTRICT here. Maybe that's true if you came in via inflate.c, but maybe not if you came in via infback.c.

There's a longer discussion of that at https://chromium-review.googlesource.com/c/chromium/src/+/641575/4/third_party/zlib/contrib/arm/chunkcopy.h#230

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My inclination is to give infback.c and inflate.c different implementations; but it could still be argued that the assumption is too dangerous for something, somewhere out there, written in the last twenty-something years.

@ghost
Copy link
Copy Markdown
Author

ghost commented Sep 28, 2017

This is past-life work, now, and I'm not sure how I'm supposed to reconcile that now that I need to fix it. So I won't.

@ghost ghost closed this Sep 28, 2017
@Adenilson
Copy link
Copy Markdown

I can fix it and add it to the Adler-32 + CRC32 merge request in: #251

jow- pushed a commit to lede-project/source that referenced this pull request Jan 2, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
SpiralP pushed a commit to SpiralP/lede-source that referenced this pull request Jan 2, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
jollaman999 pushed a commit to jollaman999/openwrt that referenced this pull request Jan 13, 2018
This adds two optimizations for ARM:
NEON optimized Adler(-)32 checksum algorithm (ARMv7 and newer NEON CPUs)
ARM(v7+) specific optimization for inflate
I've also connected inflate optimization to the build using the following
source as template.
mirror/chromium@0397489#diff-a62ad2db6c83dbc205d34bb9a8884f16

Additional info:
https://codereview.chromium.org/2676493007/
https://codereview.chromium.org/2722063002/

Sources:
madler/zlib#251 (only the first commit)
madler/zlib#256

Signed-off-by: Daniel Engberg <daniel.engberg.lists@pyret.net>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants