Disable SLP vectorizer for FIPS shared library builds on GCC 14+#2977
Disable SLP vectorizer for FIPS shared library builds on GCC 14+#2977
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2977 +/- ##
=======================================
Coverage 78.30% 78.31%
=======================================
Files 689 689
Lines 120959 120959
Branches 16985 16984 -1
=======================================
+ Hits 94717 94728 +11
+ Misses 25345 25336 -9
+ Partials 897 895 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Since FIPS-shared builds seem to be inadequately covered in our CI, perhaps we could update this job to include a FIPS-shared variant? |
I posed the question about shared build coverage in the call-outs. Adding it as a dimension would add 32 more AWS-LC compilations to the CI. Is this additional coverage something the team wants? Or should we pare new coverage down to something more targeted? Aside from observing the red failing builds on every PR, I'm not sure how much strain the CI is under already. Options in this space:
|
GCC 14 introduced changes to the Superword Level Parallelism (SLP) vectorizer that cause it to combine adjacent static function pointer stores into vector operations, which places the source addresses in .data.rel.ro.local sections. The FIPS linker script (gcc_fips_shared.lds) discards these sections by design to ensure hashability. This results in undefined reference errors at link time. This change disables the SLP vectorizer (-fno-tree-slp-vectorize) for the bcm_library target on Linux x86_64 GCC 14+ builds.
### Description of changes: * Bump urllib3 from 2.6.0 to 2.6.3 in /tests/ci by @dependabot[bot] in #2932 * Add weekly automated check for outdated third-party test vectors by @sgmenda in #2933 * Enable Hybrid PQ KeyShares by default by @alexw91 in #2531 * Remove AVX conditional from cmake script by @torben-hansen in #2958 * openssl-ca command implementation for self-sign certificates by @skmcgrail in #2937 * Initial Framework for Using Doxygen to Document Public Header Files by @m271828 in #2908 * Move md4 out of FIPS module by @torben-hansen in #2956 * Fix image-build-windows workflow to only push on workflow_call and workflow_dispatch by @skmcgrail in #2961 * Remove FIPS counter framework and other tidying up by @torben-hansen in #2947 * Model Device Farm CI Resources in CDK by @skmcgrail in #2965 * Adds a new randomness generation API by @torben-hansen in #2963 * Migrate Android Testing to GitHub Actions by @skmcgrail in #2969 * Ensure pkcs7 checks ASN1_TYPE->type by @skmcgrail in #2968 * Fix checkout logic for android-omnibus by @skmcgrail in #2970 * Add missing env vars to check-vectors workflow step by @sgmenda in #2962 * Shorten Windows Build Directory Path by @skmcgrail in #2974 * Bump mysql cluster version by @WillChilds-Klein in #2967 * Integrate Wycheproof ML-DSA test vectors by @sgmenda in #2973 * Simplify FIPS conditional in top-level build script by @torben-hansen in #2976 * Fix aws-lc-rs CI job by @justsmth in #2966 * Add method to get type of ML-DSA instance configured under EVP PKEY by @torben-hansen in #2980 * Nmap build needs liblinear by @justsmth in #2985 * Disable SLP vectorizer for FIPS shared library builds on GCC 14+ by @geedo0 in #2977 * Update Wycheproof ECDSA test vectors and fix workflow typo by @sgmenda in #2972 * Address some CMake findings by @skmcgrail in #2979 * Bump bytes from 1.7.1 to 1.11.1 in /tests/ci/lambda by @dependabot[bot] in #2983 * Support GCC 4.8 for aarch64 by @justsmth in #2964 * Free potential memory before assigning new pointer by @torben-hansen in #2989 * Add PyOpenSSL integration test by @WillChilds-Klein in #2992 * Ensure index argument is not negative in ASN1_BIT_STRING_set_bit by @torben-hansen in #2987 * Ensure no overflow in signed output length in do_buf by @torben-hansen in #2988 * Remove redundant CPython 3.9 integration test by @WillChilds-Klein in #2996 * Ensure public key is set before verifying through ML-DSA verify by @torben-hansen in #2990 * Correct CCM nids in object definition by @torben-hansen in #2991 * Address Reported Bug Findings by @skmcgrail in #3000 * Fix CI: gcc-4.8 by @justsmth in #3011 * Fix Windows CI: use `cd /d` in run_windows_tests.bat to handle cross-drive paths by @justsmth in #3012 * Fix OPENSSL_memchr per C23 by @justsmth in #3008 * Fix argument order in `hmac_copy` by @justsmth in #3014 * Miscellaneous CI improvements by @skmcgrail in #2978 * Fix CI: mariadb by @justsmth in #3015 * Update Ubuntu 24:04 image compiler verification by @skmcgrail in #3017 * Support WASM/Emscripten by @justsmth in #2959 * Generate Rust Bindings by @justsmth in #2999 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.
Issues:
Addresses: CryptoAlg-3376, AWS-LC-995
Description of changes:
GCC 14 introduced changes to the Superword Level Parallelism (SLP) vectorizer which encourages the compiler to combine adjacent static function pointer stores into vector operations. This also induces the compiler to store some of these pointer values in the
.data.rel.ro.localsection in order to load them from memory rather than just computing them usingleainstructions. This is problematic because the FIPS linker script (gcc_fips_shared.lds) discards these sections by design to ensure hashability of the FIPS module. This results in undefined reference errors at link time. This change disables the SLP vectorizer (-fno-tree-slp-vectorize) for the bcm_library target on Linux x86_64 GCC 14+ builds.Analysis
The problemative code which triggers this issue with the SLP vectorizer can be found here. As a macro, all invocations of this macro and the associated function pointers are liable to be stored in
.data.rel.ro.localso we opt to apply this mitigation across the entire compilation unit. To illustrate, the below snippet consistently produces code which stores the _Init pointer in memory and computes the _Update pointer with lea. This results in a garbage pointer for the former and a working pointer for the latter if you useld.goldas your linker which stubs undefined references with NULL pointers.Here is the disassembly with
tree-slp-vectorizeleft on. This initializes two pointers withinAWSLC_hmac_in_place_methods_storageby performing a single 64-bit load, lea/punpcklqdq operations, and a single 128-bit store.Here is the dissassembly with
tree-slp-vectorizedisabled. The same two pointers are now initialized using anleainstruction and a single 64-bit store. The difference between these two is that the vectorized version is arguably worse performance wise (in this particular instance, can't generalize to all impacted code blocks).Call-outs:
Should we add a new dimension to our testing for SHARED builds? This would effectively double the entire test matrix if done blindly.
Right now, this change is tightly scoped to the impacted shared library build. We have the option to expand this to the static library build which uses a different mitigation if we wanted to. That mitigation is likely more performant as it uses assembly re-writes to get the desired effect, but this could simplify the build.
Performance Impact
This impacts the compilation of the entire
bcm.c.ounit, but notably not any of the assembly optimized cryptographic algorithm code. The performance impact, if any, would only be felt in the API layer. Using @torben-hansen's benchmarking framework I ran a couple tests of this flag against GCC13 (Since GCC14 wouldn't work and could not be controlled for). Given the results, I'm inclined to believe that disabling this optimization flag on the FIPS module has a negligible performance impact.shared-fips-slp-on vs static-fips-slp-offcompares the shared build with the assembly re-write mitigation versus the static build with this PR's mitigation. We see negligible impact with the exception of -11% for RNG16. This could be merely a difference between shared/static. Reviewers, feel free to press on this.static-noasm-slp-off vs static-noasm-slp-oncompares the NO_ASM build with and without the SLP vectorizer. Funny enough, it suggests a marginal performance improvement (Up to 2%) when disabling the SLP vectorizer.Testing:
How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?
I setup an Ubuntu 24.04 container on an x86_64 host to test the compilation, run tests, and perform benchmarking.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.