Port ARM inflate performance improvement patches (chunk SIMD, read64le) by janaknat · Pull Request #22 · cloudflare/zlib

janaknat · 2020-08-26T02:16:59Z

This series ports ARM inflate performance improvement patches. With this, the performance improvement during inflate, tested using a modified zpipe.c and the Silesia corpus, is around 17-34%.

Patches 1-3 include a bug-fix and some code improvements taken from madler/zlib.

Patch 4 is a code readability port from zlib chromium.

* Chris Blume (cblume@chromium):
  8888511 - "Zlib: Use defines for inffast"
  b9c1566 - "Share inffast names in zlib"

Patch 5 introduces 3 new files: inffast_chunk.c, inffast_chunk.h and chunkcopy.h.
These incorporate the changes from 2 patches in zlib chromium.

* Adenilson Cavalcanti (adenilson.cavalcanti@arm.com)
  3060dcb - "zlib: inflate using wider loads and stores"

* Noel Gordon (noel@chromium.org)
  64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy"

Patch 6 is a port of a performance improvement patch from zlib chromium.

* Noel Gordon (noel@chromium.org)
  8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t"

the zlib header. The allowed values of the four-bit field are 0..7, but when windowBits is zero, values greater than 7 are permitted and acted upon, resulting in large, mostly unused memory allocations. This fix rejects such invalid zlib headers.

The undocumented (except in these commit comments) function inflateValidate(strm, check) can be called after an inflateInit(), inflateInit2(), or inflateReset2() with check equal to zero to turn off the check value (CRC-32 or Adler-32) computation and comparison. Calling with check not equal to zero turns checking back on. This should only be called immediately after the init or reset function. inflateReset() does not change the state, so a previous inflateValidate() setting will remain in effect. This also turns off validation of the gzip header CRC when present. This should only be used when a zlib or gzip stream has already been checked, and repeated decompressions of the same stream no longer need to be validated.

expected type of state, deflate or inflate, and that at least the first several bytes of the internal state have not been clobbered.

This combines two patches which help in improving the readability and maintainability of the code by making magic numbers into #defines. Based on Chris Blume's (cblume@chromium) patches for zlib chromium: 8888511 - "Zlib: Use defines for inffast" b9c1566 - "Share inffast names in zlib" These patches are needed when introducing chunk SIMD NEON enchancements. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti (adenilson.cavalcanti@arm.com) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon (noel@chromium.org) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The two patches combined provide around 5-25% increase in inflate performance, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

Update the chunk-copy code with a wide input data reader, which consumes input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the wide reader. Based on Noel Gordon's (noel@chromium.org) patch for the zlib chromium fork 8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t" This patch provides 7-10% inflate performance improvement when tested with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <janakan@amazon.com>

janaknat · 2020-08-26T02:26:57Z

Some inflate performance number breakdown (average of 5 runs):

Dataset	File	Throughput before (kb/ms)	Throughput after (kb/ms)	% change
Silesia	xml	70.86	94.81	+34%
	x-ray	138.68	155.01	+12%
	webster	83.47	108.71	+30%
	sao	154.70	180.67	+17%
	samba	94.49	122.20	+30%
	reymont	82.65	108.34	+31%
	osdb	108.96	128.85	+18%
	ooffice	107.34	126.89	+17%
	nci	61.99	82.12	+34%
	mr	94.72	122.25	+29%
	mozilla	107.81	131.39	+22%
	dickens	88.84	118.96	+34%

vkrasnov · 2020-09-04T18:02:59Z

Nice, what CPU was used for the benchmarks? It looks like Intel should be faster as well?

janaknat · 2020-09-08T17:53:28Z

@vkrasnov This was tested with a Graviton2 CPU. With this patch series only ARM enhancements are ported.

janaknat · 2020-09-14T14:20:29Z

Any feedback on this patch series?

vkrasnov · 2020-09-17T20:51:33Z

I am sorry, I was a bit busy, will review tomorrow.

vkrasnov · 2020-09-23T17:32:34Z

This is great. I actually managed to apply some of the optimizations to Intel too for a 16% inflate speedup. I will apply those too.

janaknat · 2020-09-23T18:07:29Z

@vkrasnov Thanks. I had the Intel optimizations lined up next. Was waiting for this PR to land. If you've got that covered, that's great.

vkrasnov · 2020-09-23T18:09:04Z

I am happy to accept another PR, I just have a PoC

janaknat · 2020-09-24T18:48:45Z

@vkrasnov I've created a PR with the Intel optimizations.

madler and others added 6 commits August 25, 2020 19:26

This verifies that the state has been initialized, that it is the

da63292

expected type of state, deflate or inflate, and that at least the first several bytes of the internal state have not been clobbered.

vkrasnov self-requested a review September 23, 2020 17:06

vkrasnov merged commit e76d32d into cloudflare:gcc.amd64 Sep 23, 2020

simonis mentioned this pull request Nov 7, 2022

Export 'inflateValidate' as ZLIB_1.2.9 symbol #33

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port ARM inflate performance improvement patches (chunk SIMD, read64le)#22

Port ARM inflate performance improvement patches (chunk SIMD, read64le)#22
vkrasnov merged 6 commits intocloudflare:gcc.amd64from
janaknat:chunk-simd-neon

janaknat commented Aug 26, 2020

Uh oh!

janaknat commented Aug 26, 2020

Uh oh!

vkrasnov commented Sep 4, 2020

Uh oh!

janaknat commented Sep 8, 2020

Uh oh!

janaknat commented Sep 14, 2020

Uh oh!

vkrasnov commented Sep 17, 2020

Uh oh!

vkrasnov commented Sep 23, 2020

Uh oh!

janaknat commented Sep 23, 2020

Uh oh!

vkrasnov commented Sep 23, 2020

Uh oh!

janaknat commented Sep 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janaknat commented Aug 26, 2020

Uh oh!

janaknat commented Aug 26, 2020

Uh oh!

vkrasnov commented Sep 4, 2020

Uh oh!

janaknat commented Sep 8, 2020

Uh oh!

janaknat commented Sep 14, 2020

Uh oh!

vkrasnov commented Sep 17, 2020

Uh oh!

vkrasnov commented Sep 23, 2020

Uh oh!

janaknat commented Sep 23, 2020

Uh oh!

vkrasnov commented Sep 23, 2020

Uh oh!

janaknat commented Sep 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants