Skip to content

Port Intel optimizations (adler32, chunkcopy) to cloudflare#23

Merged
vkrasnov merged 2 commits intocloudflare:gcc.amd64from
janaknat:intel-optimizations
Sep 28, 2020
Merged

Port Intel optimizations (adler32, chunkcopy) to cloudflare#23
vkrasnov merged 2 commits intocloudflare:gcc.amd64from
janaknat:intel-optimizations

Conversation

@janaknat
Copy link
Copy Markdown

This series ports the x86 inflate performance improvements from the chromium fork of zlib.

Based on 3 patches:

With these changes, the inflate performance improvement is 15-35% when tested with a modified zpipe.c and the Silesia corpus.

Based on the adler32-simd patch from Noel Gordon for the chromium fork of zlib.
17bbb3d73c84 ("zlib adler_simd.c")

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
Based on 2 patches from zlib chromium fork:

* Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
  3060dcb - "zlib: inflate using wider loads and stores"

* Noel Gordon (noel@chromium.org)
  64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy

The improvement in inflate performance is around 15-35%, based
on the workload, when checked with a modified zpipe.c and the
Silesia corpus.

Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>
@janaknat
Copy link
Copy Markdown
Author

Performance numbers using the Silesia Corpus and a modified zpipe.c

Dataset File Throughput before (kb/ms) Throughput after (kb/ms) % change
Silesia xml 65.21 84.29 +29%
x-ray 138.18 159.91 +15%
webster 75.03 95.93 +26%
sao 155.98 189.81 +21%
samba 88.70 111.74 +26%
reymont 73.50 98.45 +34%
osdb 99.36 120.32 +21%
ooffice 98.12 118.96 +20%
nci 55.70 69.41 +25%
mr 92.76 115.98 +25%
mozilla 101.37 121.30 +19%
dickens 81.98 109.82 +34%

@janaknat
Copy link
Copy Markdown
Author

@vkrasnov Any feedback on this series?

@vkrasnov
Copy link
Copy Markdown

This is perfect!

@vkrasnov vkrasnov merged commit 82035d0 into cloudflare:gcc.amd64 Sep 28, 2020
@janaknat
Copy link
Copy Markdown
Author

@vkrasnov Thanks for merging the PR!

@RJVB
Copy link
Copy Markdown

RJVB commented Sep 29, 2020 via email

@janaknat
Copy link
Copy Markdown
Author

@RJVB Can you provide a link to the source tarball? How did you measure the time taken?

Also, the changes made in this PR are for de-compression(inflate). No changes were made to the compression code path.

@vkrasnov
Copy link
Copy Markdown

This is very weird, but he is right, in fact inflate is slower too, because somehow the -O3 flags goes missing after running configure.

@vkrasnov
Copy link
Copy Markdown

Fixed now in #24

@vkrasnov
Copy link
Copy Markdown

Thanks @RJVB !

@janaknat
Copy link
Copy Markdown
Author

@vkrasnov @RJVB Apologies for the typo.

@RJVB
Copy link
Copy Markdown

RJVB commented Sep 29, 2020 via email

@janaknat
Copy link
Copy Markdown
Author

janaknat commented Oct 2, 2020

@RJVB Do you see the same compression slow-down without the changes in this PR?

@neurolabusc
Copy link
Copy Markdown

neurolabusc commented Oct 2, 2020

@janaknat I do not see any benefits for compressing Silesia files to/from gzip format. I compared the latest version of this repository (zlibCFX) to a fork from June (zlibCF) on my MacBook with in-built SSD. Same performance regardless of whether I use make or cmake to compile the application. I suspect this is because Gzip uses crc32 not Adler-32.

On the other hand, decompression is faster, but only appreciably if compiled with ./configure --static; make which reports the DINFLATE_CHUNK_SIMD_SSE2 option. @janaknat, since you are familiar with these new features, I would be grateful if you would consider updating cmake to support them.

CompressMethod Level Min Mean Max mb/s %
zlibCF 1 2350 2405 2483 90 35.80
zlibCF 2 2472 2482 2498 86 35.00
zlibCF 3 2682 2736 2802 79 34.37
zlibCF 4 2907 3072 3248 73 33.46
zlibCF 5 3585 4071 4497 59 32.61
zlibCF 6 4460 4860 5542 48 32.35
zlibCF 7 5008 5459 5806 42 32.25
zlibCF 8 7400 7551 7856 29 32.17
zlibCF 9 10099 10678 11507 21 32.15
zlibCFX 1 2524 2584 2678 84 35.80
zlibCFX 2 2542 2624 2761 83 35.00
zlibCFX 3 2772 2899 3000 76 34.37
zlibCFX 4 2891 3141 3443 73 33.46
zlibCFX 5 3688 3758 3928 57 32.61
zlibCFX 6 4475 4678 5103 47 32.35
zlibCFX 7 5193 5403 5980 41 32.25
zlibCFX 8 7498 7671 7996 28 32.17
zlibCFX 9 10093 10274 10558 21 32.15
DecompressMethod Min Mean Max mb/s
zlibCF 42683 43312 43883 134.07
zlibCFX(cmake) 41457 42038 42833 138.03
zlibCFX(make) 36631 36814 37127 156.22

@RJVB
Copy link
Copy Markdown

RJVB commented Oct 2, 2020 via email

@janaknat
Copy link
Copy Markdown
Author

janaknat commented Oct 5, 2020

@neurolabusc These are de-compression optimization changes, so not surprising there are any changes in compression performance.

I can take a look at the work that needs to be done for cmake.

fhanau pushed a commit to fhanau/zlib that referenced this pull request Feb 27, 2023
…flare#23)

It's unnecessary to await a promise returned from an async function
(except if, as not needed here, the exception is to be caught in this
function instead of by a caller).

It's slightly verbose and advised against:
https://github.com/eslint/eslint/blob/master/docs/rules/no-return-await.md
https://jakearchibald.com/2017/await-vs-return-vs-return-await/

Since this is the only use of await, this function is not actually
async. Finally, it should be named with the same casing convention as
the previous example.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants