Add new crc32 tests and optimized crc32 for POWER#637
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #637 +/- ##
===========================================
+ Coverage 77.72% 78.32% +0.59%
===========================================
Files 82 84 +2
Lines 8463 8723 +260
Branches 1381 1392 +11
===========================================
+ Hits 6578 6832 +254
- Misses 1354 1356 +2
- Partials 531 535 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
test/crc32_test.c
Outdated
|
|
||
| typedef struct { | ||
| int line; | ||
| uLong crc; |
There was a problem hiding this comment.
Use uint32_t instead of uLong.
| char* buf; | ||
| int len; | ||
| uLong expect; | ||
| } crc32_test; |
There was a problem hiding this comment.
Use standard fixed int types in this struct.
There was a problem hiding this comment.
I actually ifdef'd this struct's definition to match the different crc32 declarations for the compat and zlib-ng APIs. When building the default API, it will use fixed int types.
This way we was use the proper argument types and avoid potential problems with "hidden" type casting.
functable.c
Outdated
|
|
||
| #ifdef __ARM_FEATURE_CRC32 | ||
| extern uint32_t crc32_acle(uint32_t, const unsigned char *, uint64_t); | ||
| #elif defined(POWER8) && (defined(__powerpc64__) || defined(__PPC64__)) |
There was a problem hiding this comment.
Use #elif defined(POWER8_CRC32) instead.
| "crc32_z takes size_t but internally we have a uint64_t len"); | ||
| /* return a function pointer for optimized arches here after a capability test */ | ||
|
|
||
| functable.crc32 = &crc32_generic; |
|
This code is GPL or Apache? Can you get it relicensed under zlib license? |
arch/power/crc32_power8.c
Outdated
| * This program is free software; you can redistribute it and/or | ||
| * modify it under the terms of either: | ||
| * | ||
| * a) the GNU General Public License as published by the Free Software |
|
@nmoinvaz today the code is dual licensed (GPL / Apache). I'm checking to see what we can do in that regard. |
|
All code should be licensed using zlib license, this avoids any viral effects of licenses like GPL. |
|
Incorporated suggestions from @nmoinvaz and rebased against latest |
|
@mscastanho were you able to resolve the licensing issues? |
|
@nmoinvaz No updates on this so far =/ I believe this won't make to the first release. |
|
This needs a rebase. |
|
@Dead2 Agreed. I'll split the tests into a separate PR and submit it. |
I believe everything is now ready for review. |
test/crc32_test.c
Outdated
| typedef unsigned long crc32_type; | ||
| typedef unsigned char buf_type; | ||
| typedef unsigned int len_type; | ||
| # define CRC_FMT "%08lX" |
There was a problem hiding this comment.
All optimized crc32 functions should use only uint32_t as CRC32 variable type, the conversion for ZLIB_COMPAT API is done in crc32_z() and crc32() inside crc32.c. All length parameters should always be size_t. Conversion from unsigned int is also done in crc32.c.
There was a problem hiding this comment.
get_crc_table may have produced unsigned long return type in zlib.
There was a problem hiding this comment.
If we change the test to just call crc32_z as @mtl1979 suggested, these typedefs become irrelevant, as most types are the same between compat and zlib-ng. But the crc still has different types on each.
There was a problem hiding this comment.
If the test is statically linked to zlib-ng, it can access functable directly, and such there will be no differences... For dynamic linking, simple cast when calling crc32_z is enough. I pointed out to how it is done in crc32.c, because essentially what is needed is same backwards. It would be a lot easier, if there would be "undocumented API" to access functable when building against shared version of zlib-ng.
arch/power/clang_workaround.h
Outdated
| @@ -0,0 +1,98 @@ | |||
| /* Helper functions to work around issues with clang builtins | |||
There was a problem hiding this comment.
There is a fallback_builtins.h in the main directory. I think it should be moved to arch/x86/fallback_builtins.h. And then this file should be renamed arch/power/fallback_builtins.h.
There was a problem hiding this comment.
I renamed clang_workarounds.h to fallback_builtins.h. I haven't moved the x86 one though.
nmoinvaz
left a comment
There was a problem hiding this comment.
I only have style changes as I don't have the hardware. It is good that there are crc32 tests to prove it out.
arch/power/crc32_power8.c
Outdated
|
|
||
| unsigned long len = (unsigned long) _len; | ||
|
|
||
| if (p == (const unsigned char *) 0x0) return 0; |
arch/power/crc32_power8.c
Outdated
| offset += 16; | ||
|
|
||
| v0 = vec_xor(v0, va0); | ||
| va0 = __builtin_crypto_vpmsumd ((__vector unsigned long |
There was a problem hiding this comment.
Random functions with space after function name above and below this line.
There was a problem hiding this comment.
Ok, I think I fixed all of those now.
arch/power/fallback_builtins.h
Outdated
| } | ||
|
|
||
| #if BYTE_ORDER == BIG_ENDIAN | ||
| #define __builtin_unpack_vector_0(a) __builtin_unpack_vector ((a), 0) |
There was a problem hiding this comment.
Extra space after function name here too.
arch/power/crc32_power8.c
Outdated
| offset += 128; | ||
|
|
||
| v0 = (__vector unsigned long long)__builtin_crypto_vpmsumw ( | ||
| (__vector unsigned int)vdata0,(__vector unsigned int)v0); |
There was a problem hiding this comment.
No space between function arguments.
| * left 32 bits so it occupies the least significant bits in the | ||
| * bit reflected domain. | ||
| */ | ||
| v0 = (__vector unsigned long long)vec_sld((__vector unsigned char)v0, |
There was a problem hiding this comment.
@mscastanho what do you think about using typedef?
_vector unsigned long long = vector_uint64_t?
_vector unsigned char = vector_uint8_t?
Or just using fixed types if possible?
Anyways, just an idea because I noticed there is a lot of casting with long types.
There was a problem hiding this comment.
@nmoinvaz I don't think there are vector types with "fixed type names" available for Power (something like vector uint64_t). But the types available do have fixed sizes (e.g. vector unsigned long long is a vector with 2 doublewords).
I think I'm neutral regarding using typedefs. It would certainly save some chars and make the code neater, but would add an extra indirection for the "reader" to find out which underlying type is actually being used.
If you think using typedefs would be better, I can make the change.
|
@nmoinvaz I'm so sorry. I completely missed your last comments 🤦🏼 I think I addressed all of them, and also rebased against current develop to fix some merge conflicts. |
|
LGTM. |
|
Gentle ping =) |
arch/power/crc32_power8.c
Outdated
| #if defined (__clang__) | ||
| #include "fallback_builtins.h" | ||
| #else | ||
| #define __builtin_pack_vector(a, b) __builtin_pack_vector_int128 ((a), (b)) |
There was a problem hiding this comment.
This seems unnecessary... parameters are passed in same order without any additional cast.
There was a problem hiding this comment.
This was needed because an alternative __builtin_pack_vector was defined in fallback_builtins for clang, so we needed a definition for the GCC case. But clang now has __builtin_pack_vector_int128, so we can use that directly instead, so this was removed.
da528e5 to
4ac05e1
Compare
mtl1979
left a comment
There was a problem hiding this comment.
Just small style fixes ;)
4ac05e1 to
dc5e9c7
Compare
|
This needs a minor rebase now, sorry 😉 |
Reorganize statements inside crc32_stub() to match more closely the format used for other function stubs in functable.c.
dc5e9c7 to
7b15e92
Compare
|
@Dead2 Done =) |
mtl1979
left a comment
There was a problem hiding this comment.
There is still quite a lot of casts that make the code hard to read, but I don't think keeping them blocks merging as trying to avoid casts by using inline utility functions makes debug build larger.
arch/power/crc32_power8.c
Outdated
| p = (char *)p + 128; | ||
|
|
||
| /* | ||
| * main loop. We modulo schedule it such that it takes three |
There was a problem hiding this comment.
This line is hard to read. Possibly missing punctuation.
There was a problem hiding this comment.
This comment was inherited from previous iterations of this code. I replaced it with simpler high-level comment.
This commit adds an optimized version of the crc32 function based on crc32-vpmsum from https://github.com/antonblanchard/crc32-vpmsum/ . The code has been relicensed to the zlib license. This is the C implementation created by Rogerio Alves <rogealve@br.ibm.com> It makes use of vector instructions to speed up CRC32 algorithm. Decompression times were improved by +30% on tests. Based on Daniel Black's work for the original zlib (madler/zlib#478).
7b15e92 to
d082606
Compare
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1071 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
- Fix hangs on macOS #1031 - Fix minideflate write buffers being overwritten #1060 - Fix deflateBound and compressBound returning too small size estimates #1049 #1071 - Fix incorrect function declaration warning #1080 - Fix build problems when building outside of source dir #1049 - Fix build problems on arm2-7 #1030 - Fixed some compile warnings #1020 #1036 #1037 #1048 - Improved posix memalign support #888 - Improvements to testing #637 #1026 #1032 #1035 #1049 #1051 #1056 #1063 #1067 #1079 - Improvements for integration into other projects #1022 #1042 - Code style fixes #637 #1040 #1050 #1075
Hi,
This PR adds an optimized version of the crc32 function for POWER processors. This is a port of madler/zlib#478 to zlib-ng. The algorithm is mostly intact, as originally developed by @racardoso and added to the original zlib by @grooverdan.
While preparing this PR, I also found some issues in the way
$ARCHwas detected on the CI using CMake, so I added a fix.This PR also adds new crc32 tests that can be used by all targets.
I'm marking this as a Draft PR for a few reasons:
crc32_power8onfunctable.cas it's not quite clear to me whatif (sizeof(void *) == sizeof(ptrdiff_t))is checking.To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.
The results below show compression and decompression throughput in MB/s using gzip wrapper, for all compression levels: