Skip to content

Disable SLP vectorizer for FIPS shared library builds on GCC 14+#2977

Merged
geedo0 merged 2 commits intoaws:mainfrom
geedo0:gcc14
Feb 5, 2026
Merged

Disable SLP vectorizer for FIPS shared library builds on GCC 14+#2977
geedo0 merged 2 commits intoaws:mainfrom
geedo0:gcc14

Conversation

@geedo0
Copy link
Copy Markdown
Contributor

@geedo0 geedo0 commented Feb 2, 2026

Issues:

Addresses: CryptoAlg-3376, AWS-LC-995

Description of changes:

GCC 14 introduced changes to the Superword Level Parallelism (SLP) vectorizer which encourages the compiler to combine adjacent static function pointer stores into vector operations. This also induces the compiler to store some of these pointer values in the .data.rel.ro.localsection in order to load them from memory rather than just computing them using lea instructions. This is problematic because the FIPS linker script (gcc_fips_shared.lds) discards these sections by design to ensure hashability of the FIPS module. This results in undefined reference errors at link time. This change disables the SLP vectorizer (-fno-tree-slp-vectorize) for the bcm_library target on Linux x86_64 GCC 14+ builds.

Analysis

The problemative code which triggers this issue with the SLP vectorizer can be found here. As a macro, all invocations of this macro and the associated function pointers are liable to be stored in .data.rel.ro.local so we opt to apply this mitigation across the entire compilation unit. To illustrate, the below snippet consistently produces code which stores the _Init pointer in memory and computes the _Update pointer with lea. This results in a garbage pointer for the former and a working pointer for the latter if you use ld.gold as your linker which stubs undefined references with NULL pointers.

    out->methods[idx].init = AWS_LC_TRAMPOLINE_##HASH_NAME##_Init;           \
    out->methods[idx].update = AWS_LC_TRAMPOLINE_##HASH_NAME##_Update;       \

Here is the disassembly with tree-slp-vectorize left on. This initializes two pointers within AWSLC_hmac_in_place_methods_storage by performing a single 64-bit load, lea/punpcklqdq operations, and a single 128-bit store.

movq   0x0(%rip),%xmm0  # Load garbage address for SHA256_Init into xmm0
lea    -0x1ff42(%rip),%rax # Compute the correct address for SHA256_Update
movq   %rax,%xmm2 # Store SHA256_Update into xmm2
punpcklqdq %xmm2,%xmm0 # Get both SHA256_Init (Garbage) and SHA256_Update into xmm0
movaps %xmm0,0x27b5eb(%rip) # Store xmm0 into AWSLC_hmac_in_place_methods_storage

Here is the dissassembly with tree-slp-vectorize disabled. The same two pointers are now initialized using an lea instruction and a single 64-bit store. The difference between these two is that the vectorized version is arguably worse performance wise (in this particular instance, can't generalize to all impacted code blocks).

lea    0x26c56(%rip),%rax # Compute address of SHA256_Init
mov    %rax,0x27ab1f(%rip) # Store SHA256_Init
lea    -0x20178(%rip),%rax # Compute address of SHA256_Update
mov    %rax,0x27ab19(%rip) # Store SHA256_Update

Call-outs:

Should we add a new dimension to our testing for SHARED builds? This would effectively double the entire test matrix if done blindly.

Right now, this change is tightly scoped to the impacted shared library build. We have the option to expand this to the static library build which uses a different mitigation if we wanted to. That mitigation is likely more performant as it uses assembly re-writes to get the desired effect, but this could simplify the build.

Performance Impact

This impacts the compilation of the entire bcm.c.o unit, but notably not any of the assembly optimized cryptographic algorithm code. The performance impact, if any, would only be felt in the API layer. Using @torben-hansen's benchmarking framework I ran a couple tests of this flag against GCC13 (Since GCC14 wouldn't work and could not be controlled for). Given the results, I'm inclined to believe that disabling this optimization flag on the FIPS module has a negligible performance impact.

shared-fips-slp-on vs static-fips-slp-off compares the shared build with the assembly re-write mitigation versus the static build with this PR's mitigation. We see negligible impact with the exception of -11% for RNG16. This could be merely a difference between shared/static. Reviewers, feel free to press on this.

static-noasm-slp-off vs static-noasm-slp-on compares the NO_ASM build with and without the SLP vectorizer. Funny enough, it suggests a marginal performance improvement (Up to 2%) when disabling the SLP vectorizer.

Screenshot 2026-02-03 at 3 24 19 PM

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

I setup an Ubuntu 24.04 container on an x86_64 host to test the compilation, run tests, and perform benchmarking.

# Cmake invocation
CC=gcc-14 CXX=g++-14 cmake -GNinja -B build -DFIPS=1 -DBUILD_SHARED_LIBS=1 -DCMAKE_BUILD_TYPE=RelWithDebInfo

# Built and ran tests successfully
ninja
ninja run_tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.31%. Comparing base (478ca60) to head (05a6baf).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2977   +/-   ##
=======================================
  Coverage   78.30%   78.31%           
=======================================
  Files         689      689           
  Lines      120959   120959           
  Branches    16985    16984    -1     
=======================================
+ Hits        94717    94728   +11     
+ Misses      25345    25336    -9     
+ Partials      897      895    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@justsmth
Copy link
Copy Markdown
Contributor

justsmth commented Feb 3, 2026

Since FIPS-shared builds seem to be inadequately covered in our CI, perhaps we could update this job to include a FIPS-shared variant?

@geedo0
Copy link
Copy Markdown
Contributor Author

geedo0 commented Feb 3, 2026

Since FIPS-shared builds seem to be inadequately covered in our CI, perhaps we could update this job to include a FIPS-shared variant?

I posed the question about shared build coverage in the call-outs. Adding it as a dimension would add 32 more AWS-LC compilations to the CI. Is this additional coverage something the team wants? Or should we pare new coverage down to something more targeted? Aside from observing the red failing builds on every PR, I'm not sure how much strain the CI is under already.

Options in this space:

  • Testing shared only for the FIPS dimension
  • Adding a check for this specific problem of having symbols inside .data.rel.ro.local
  • Testing it only for GCC14+

GCC 14 introduced changes to the Superword Level Parallelism (SLP) vectorizer
that cause it to combine adjacent static function pointer stores into vector
operations, which places the source addresses in .data.rel.ro.local sections.
The FIPS linker script (gcc_fips_shared.lds) discards these sections by design
to ensure hashability. This results in undefined reference errors at link time.
This change disables the SLP vectorizer (-fno-tree-slp-vectorize) for the
bcm_library target on Linux x86_64 GCC 14+ builds.
@geedo0 geedo0 marked this pull request as ready for review February 4, 2026 16:55
@geedo0 geedo0 requested a review from a team as a code owner February 4, 2026 16:55
@geedo0 geedo0 merged commit a605df4 into aws:main Feb 5, 2026
433 of 439 checks passed
@justsmth justsmth mentioned this pull request Feb 20, 2026
justsmth added a commit that referenced this pull request Feb 20, 2026
### Description of changes: 
* Bump urllib3 from 2.6.0 to 2.6.3 in /tests/ci by @dependabot[bot] in
#2932
* Add weekly automated check for outdated third-party test vectors by
@sgmenda in #2933
* Enable Hybrid PQ KeyShares by default by @alexw91 in
#2531
* Remove AVX conditional from cmake script by @torben-hansen in
#2958
* openssl-ca command implementation for self-sign certificates by
@skmcgrail in #2937
* Initial Framework for Using Doxygen to Document Public Header Files by
@m271828 in #2908
* Move md4 out of FIPS module by @torben-hansen in
#2956
* Fix image-build-windows workflow to only push on workflow_call and
workflow_dispatch by @skmcgrail in
#2961
* Remove FIPS counter framework and other tidying up by @torben-hansen
in #2947
* Model Device Farm CI Resources in CDK by @skmcgrail in
#2965
* Adds a new randomness generation API by @torben-hansen in
#2963
* Migrate Android Testing to GitHub Actions by @skmcgrail in
#2969
* Ensure pkcs7 checks ASN1_TYPE->type by @skmcgrail in
#2968
* Fix checkout logic for android-omnibus by @skmcgrail in
#2970
* Add missing env vars to check-vectors workflow step by @sgmenda in
#2962
* Shorten Windows Build Directory Path by @skmcgrail in
#2974
* Bump mysql cluster version by @WillChilds-Klein in
#2967
* Integrate Wycheproof ML-DSA test vectors by @sgmenda in
#2973
* Simplify FIPS conditional in top-level build script by @torben-hansen
in #2976
* Fix aws-lc-rs CI job by @justsmth in
#2966
* Add method to get type of ML-DSA instance configured under EVP PKEY by
@torben-hansen in #2980
* Nmap build needs liblinear by @justsmth in
#2985
* Disable SLP vectorizer for FIPS shared library builds on GCC 14+ by
@geedo0 in #2977
* Update Wycheproof ECDSA test vectors and fix workflow typo by @sgmenda
in #2972
* Address some CMake findings by @skmcgrail in
#2979
* Bump bytes from 1.7.1 to 1.11.1 in /tests/ci/lambda by
@dependabot[bot] in #2983
* Support GCC 4.8 for aarch64 by @justsmth in
#2964
* Free potential memory before assigning new pointer by @torben-hansen
in #2989
* Add PyOpenSSL integration test by @WillChilds-Klein in
#2992
* Ensure index argument is not negative in ASN1_BIT_STRING_set_bit by
@torben-hansen in #2987
* Ensure no overflow in signed output length in do_buf by @torben-hansen
in #2988
* Remove redundant CPython 3.9 integration test by @WillChilds-Klein in
#2996
* Ensure public key is set before verifying through ML-DSA verify by
@torben-hansen in #2990
* Correct CCM nids in object definition by @torben-hansen in
#2991
* Address Reported Bug Findings by @skmcgrail in
#3000
* Fix CI: gcc-4.8 by @justsmth in
#3011
* Fix Windows CI: use `cd /d` in run_windows_tests.bat to handle
cross-drive paths by @justsmth in
#3012
* Fix OPENSSL_memchr per C23 by @justsmth in
#3008
* Fix argument order in `hmac_copy` by @justsmth in
#3014
* Miscellaneous CI improvements by @skmcgrail in
#2978
* Fix CI: mariadb by @justsmth in
#3015
* Update Ubuntu 24:04 image compiler verification by @skmcgrail in
#3017
* Support WASM/Emscripten by @justsmth in
#2959
* Generate Rust Bindings by @justsmth in
#2999

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
@geedo0 geedo0 deleted the gcc14 branch March 27, 2026 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants