Add optimized longest_match for Power processors#459
Open
mscastanho wants to merge 2 commits intomadler:developfrom
Open
Add optimized longest_match for Power processors#459mscastanho wants to merge 2 commits intomadler:developfrom
mscastanho wants to merge 2 commits intomadler:developfrom
Conversation
57b7495 to
290ce53
Compare
Author
|
Force push to add changes to feature detection on |
Optimized functions for Power will make use of GNU indirect functions, an extension to support different implementations of the same function, which can be selected during runtime. This will be used to provide optimized functions for different processor versions. Since this is a GNU extension, we placed the definition of the Z_IFUNC macro under `contrib/gcc`. This can be reused by other archs as well. Author: Matheus Castanho <msc@linux.ibm.com> Author: Rogerio Alves <rcardoso@linux.ibm.com>
290ce53 to
5490ed4
Compare
nmoinvaz
reviewed
Apr 19, 2022
| * bytes where LSB == 0 is the same as counting the length of the match. | ||
| */ | ||
| #ifdef __LITTLE_ENDIAN__ | ||
| asm volatile("vctzlsbb %0, %1\n\t" : "=r" (len) : "v" (vc)); |
Contributor
There was a problem hiding this comment.
The assembly in both versions is identical. Is this intended?
Contributor
There was a problem hiding this comment.
Actually I was wrong. One letter off.
This commit introduces an optimized version of the longest_match function for Power processors. It uses VSX instructions to match 16 bytes at a time on each comparison, instead of one by one. Author: Matheus Castanho <msc@linux.ibm.com>
5490ed4 to
44d19e3
Compare
Closed
|
A long time ago, I have done this ticket: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello again,
This optimization uses VSX vector (SIMD) instructions to try to match multiple bytes at the same time during the search for the longest match. A vector load + comparison (16 bytes) has just a small overhead if compared to their regular versions, so the optimized longest_match tries to match as many bytes as possible on every comparison.
This PR shares 1 commit with #457 and #458, which can be removed if either one gets merged first. It also uses GNU indirect functions to choose which function version (optimized or default) to run on the first call to longest_match during runtime.
To test the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.
The results below show compression throughput in MB/s using RAW deflate, for all compression levels:
pngpixels
jpeg
executable
html