Add optimized slide_hash for Power processors#457
Open
mscastanho wants to merge 2 commits intomadler:developfrom
Open
Add optimized slide_hash for Power processors#457mscastanho wants to merge 2 commits intomadler:developfrom
mscastanho wants to merge 2 commits intomadler:developfrom
Conversation
This was referenced Dec 10, 2019
5a93654 to
d62e658
Compare
Author
|
Force push to add changes to feature detection on |
Optimized functions for Power will make use of GNU indirect functions, an extension to support different implementations of the same function, which can be selected during runtime. This will be used to provide optimized functions for different processor versions. Since this is a GNU extension, we placed the definition of the Z_IFUNC macro under `contrib/gcc`. This can be reused by other archs as well. Author: Matheus Castanho <msc@linux.ibm.com> Author: Rogerio Alves <rcardoso@linux.ibm.com>
751c961 to
862cc11
Compare
Considerable time is spent on deflate.c:slide_hash() during deflate. This commit introduces a new slide_hash function that uses VSX vector instructions to slide 8 hash elements at a time, instead of just one as the standard code does. The choice between the optimized and default versions is made only on the first call to the function, enabling a fallback to standard behavior if the host processor does not support VSX instructions, so the same binary can be used for multiple Power processor versions. Author: Matheus Castanho <msc@linux.ibm.com>
862cc11 to
505b2fd
Compare
Closed
|
A long time ago, I have done this ticket: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi,
During performance tests, we noticed that slide_hash consumes considerable CPU during compression on Power processors. This PR introduces an optimized version using VSX vector instructions to make it faster. The main difference is that it slides 8 elements at a time, instead of just one as the standard code does.
The implementation uses GNU indirect function (ifunc) feature to choose the correct function version to be used on the first call during runtime. Later calls will all go directly to the selected function. This way, the same binary can be used for all Power processor versions. The ifunc helper code, however, is not limited to Power, and can be reused by other archs if wanted, so it was placed under
contrib/gcc.I tried to make as few changes as possible to top-level files (
deflate.c), and instead place most Power-specific code undercontrib/power.To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.
The results below show compression throughput in MB/s using RAW deflate, for all compression levels:
jpeg
pngpixels
executable
html