Skip to content

AES-GCM: Add function pointer trampolines to avoid delocator issue#2919

Merged
jakemas merged 3 commits intoaws:mainfrom
jakemas:delocate-aes-gcm-wrappers
Jan 14, 2026
Merged

AES-GCM: Add function pointer trampolines to avoid delocator issue#2919
jakemas merged 3 commits intoaws:mainfrom
jakemas:delocate-aes-gcm-wrappers

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented Dec 19, 2025

Delocate AES, GCM, and cipher wrapper functions

On AArch64, the delocator can patch up the computation of function pointers only if the pointers can be computed with a PC-relative offset in the range (-1MB, 1MB).

For the function pointer computations in crypto/fipsmodule/aes/mode_wrappers.c, crypto/fipsmodule/cipher/e_aes.c, and crypto/fipsmodule/modes/gcm.c, this bounds condition is about to be violated by further code additions to AWS-LC, as witnessed in AES-unrelated PRs.

This commit preventatively fixes the issue by adding function pointer trampolines to these files: These are stub functions immediately branching into the desired assembly routines, but close enough to the C code computing their address to ensure that their addresses will be computable using a PC-relative offset.

This fix is similar to previous delocator fixes addressing the same AArch64 PC-relative offset limitation, see #2165, #2294 for examples.

AWS-LC-Verification

As there are SAW proofs for AES GCM, these changes affect the proofs (formal-verification / fv-saw-x86_64-aes-gcm (pull_request)) and require changes in aws-lc-verification to continue proof support -- this has been added in awslabs/aws-lc-verification#180.

Testing:

Stability of the fix was tested in #2903 which added ~10,000 lines of additional AVX2 backend.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

github-actions[bot]

This comment was marked as duplicate.

@jakemas jakemas mentioned this pull request Dec 19, 2025
@jakemas jakemas marked this pull request as ready for review December 19, 2025 20:16
@jakemas jakemas requested a review from a team as a code owner December 19, 2025 20:16
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.21%. Comparing base (406a018) to head (363ba1b).

Files with missing lines Patch % Lines
crypto/fipsmodule/cipher/e_aes.c 66.66% 3 Missing ⚠️
crypto/fipsmodule/aes/mode_wrappers.c 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2919      +/-   ##
==========================================
- Coverage   78.22%   78.21%   -0.01%     
==========================================
  Files         690      690              
  Lines      118745   118750       +5     
  Branches    16680    16679       -1     
==========================================
- Hits        92890    92885       -5     
- Misses      24968    24976       +8     
- Partials      887      889       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nebeid
Copy link
Copy Markdown
Contributor

nebeid commented Dec 24, 2025

Can we collect benchmarks on c6i, c7i, c6g, c7g and r8g for GCM init and encrypt/decrypt. Just to make sure the trampoline is not noticeable?

@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented Jan 13, 2026

Ok benchmarked c6i, c7i, c6g, c7g and r8g on Main vs delocate-aes-gcm-wrappers (Deloc). The delocate-aes-gcm-wrappers branch demonstrates no significant performance impact from trampoline wrappers across all tested instance types. A summary of ./tool/bssl speed -filter GCM is shown.

Operation                                   c6i Main   c6i Deloc   Diff%     c7i Main   c7i Deloc   Diff%     c6g Main   c6g Deloc   Diff%     c7g Main   c7g Deloc   Diff%     r8g Main   r8g Deloc   Diff%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AEAD-AES-128-GCM open init                 8,170,730   8,170,804   0.00%   13,395,710  13,585,416   1.42%    7,123,343   7,244,199   1.70%   10,453,156  10,462,520   0.09%   12,575,887  12,557,374  -0.15%
AEAD-AES-128-GCM seal init                 8,166,288   8,168,338   0.03%   13,459,502  13,569,256   0.82%    7,131,615   7,207,899   1.07%   10,371,448  10,475,864   1.01%   12,596,574  12,581,987  -0.12%
AEAD-AES-256-GCM open init                 7,333,375   7,486,888   2.09%   11,988,317  12,240,780   2.11%    6,940,688   6,948,431   0.11%   10,107,828  10,053,089  -0.54%   12,269,350  12,105,237  -1.34%
AEAD-AES-256-GCM seal init                 7,332,478   7,485,334   2.08%   11,960,019  12,200,018   2.01%    6,928,806   6,940,292   0.17%   10,083,450  10,049,319  -0.34%   12,256,091  12,209,329  -0.38%
EVP-AES-128-GCM decrypt init               6,096,378   6,108,719   0.20%    7,298,546   7,250,246  -0.66%    4,741,711   4,726,429  -0.32%    6,806,648   6,800,803  -0.09%    7,612,583   7,579,651  -0.43%
EVP-AES-128-GCM encrypt init               6,092,044   6,108,585   0.27%    7,315,210   7,236,030  -1.08%    4,739,616   4,726,508  -0.28%    6,823,481   6,810,251  -0.19%    7,620,627   7,580,265  -0.53%
EVP-AES-192-GCM decrypt init               6,014,404   6,021,807   0.12%    7,442,668   7,465,660   0.31%    4,664,587   4,669,503   0.11%    6,595,230   6,631,137   0.54%    7,440,433   7,486,820   0.62%
EVP-AES-192-GCM encrypt init               6,014,283   6,031,934   0.29%    7,487,858   7,445,176  -0.57%    4,665,629   4,671,708   0.13%    6,599,618   6,640,084   0.61%    7,441,743   7,485,235   0.58%
EVP-AES-256-GCM decrypt init               5,765,264   5,772,448   0.12%    7,590,227   7,567,705  -0.30%    4,556,790   4,593,302   0.80%    6,480,760   6,493,042   0.19%    7,386,919   7,426,747   0.54%
EVP-AES-256-GCM encrypt init               5,761,816   5,781,717   0.35%    7,620,520   7,605,424  -0.20%    4,561,653   4,577,995   0.36%    6,491,072   6,495,153   0.06%    7,388,514   7,430,324   0.57%
Summary:
- c6i: 0.00% to +2.09% Improvement (avg: +0.55%)
- c7i : -1.08% to +2.11% Improvement (avg: +0.47%)
- c6g (Graviton2): -0.32% to +1.70% Improvement (avg: +0.38%)
- c7g (Graviton3): -0.54% to +1.01% Improvement (avg: +0.13%)
- r8g (Graviton4): -1.34% to +0.62% Improvement (avg: -0.06%)

@jakemas jakemas enabled auto-merge (squash) January 14, 2026 20:56
@jakemas jakemas merged commit 1494e78 into aws:main Jan 14, 2026
397 of 400 checks passed
jakemas added a commit that referenced this pull request Jan 20, 2026
### Issues:
Related PRs:
- Import-mldsa-native-NTT stress test on delocator
[#2903](#2903)
- AES-GCM: Add function pointer trampolines to avoid delocator issue
[#2919](#2919)
- Service Indicator: Add error call trampoline to avoid delocator issue
[#2920](#2920)

### Import mldsa-native

This imports mldsa-native
(https://github.com/pq-code-package/mldsa-native) into AWS-LC.

This PR focuses on the minimal configuration of mldsa-native: No
assembly and no FIPS-202 code are imported.

mldsa-native is a high-performance, high-assurance C90 implementation of
ML-DSA developed under the Post-Quantum Cryptography Alliance (PQCA) and
the Linux Foundation. It is a fork of the Dilithium reference
implementation.

### Import Mechanism

The mldsa-native source code is unmodified and imported using the
importer script `crypto/fipsmodule/ml_dsa/importer.sh;` the details of
the import are in META.yml.

A custom config is provided for mldsa-native which in particular
includes a small 'compatibility layer' between AWS-LC/OpenSSL and
mldsa-native -- see below.

### Future imports (C-only)

Future updates of the C-only mldsa-native source tree should happen
through a re-import of mldsa-native: That is, (a) delete
`crypto/fipsmodule/ml_dsa/mldsa` and (b) re-run import.sh. This will
re-import `mldsa-native/main`, though you can set the `GITHUB_SHA` and
`GITHUB_REPOSITORY` environment variables to point to any other
mldsa-native repository/fork.

### Future imports (native code)

Once we have verified meaningful parts of the mldsa-native assembly
backends, PRs will be filed to integrate those. The details for this
integration are TBD and not necessary to finalize for this PR. The
options are (a) extending import.sh to import larger parts of the
mldsa-native upstream source tree, including native backends, (b)
writing custom backends, backed by sources living in the s2n-bignum
source tree. Both is possible and compatible with this PR.

### Import Scope

mldsa-native has a C-only version as well as native 'backends' in AVX2
and Neon for high performance. This commit only imports the C-only
version. Integration of native backends will be done separately.

mldsa-native offers its own FIPS-202 implementation, including fast
versions of batched FIPS-202. However, this commit does not import
those, but instead provides glue-code around AWS-LC's own FIPS-202
implementation. The path to leveraging the FIPS-202 performance
improvements in mldsa-native would be to integrate them directly into
`crypto/fipsmodule/sha`.

### Impact on build

None. No build-files are modified. The multilevel build process remains
unchanged.

### Internal API changes
3 Removed functions:
```
  [D] 'function void ml_dsa_44_params_init(ml_dsa_params*)'    {ml_dsa_44_params_init}
  [D] 'function void ml_dsa_65_params_init(ml_dsa_params*)'    {ml_dsa_65_params_init}
  [D] 'function void ml_dsa_87_params_init(ml_dsa_params*)'    {ml_dsa_87_params_init}
```

### Compatibility layer

The configuration file `mldsa_native_config.h` includes a compatibility
layer between AWS-LC/OpenSSL and mldsa-native, covering:

* FIPS/PCT: If AWSLC_FIPS is set, `MLD_CONFIG_KEYGEN_PCT` is enabled to
include a PCT.
* FIPS/PCT: If `BORINGSSL_FIPS_BREAK_TESTS` is set,
`MLD_CONFIG_KEYGEN_PCT_BREAKAGE_TEST` is set and `mld_break_pct` defined
via `boringssl_fips_break_test("MLDSA_PWCT")`, to include
runtime-breakage of the PCT for testing purposes.
* CT: If `BORINGSSL_CONSTANT_TIME_VALIDATION` is set, then
`MLD_CONFIG_CT_TESTING_ENABLED` is set to enable valgrind testing.
* Zeroization: `MLD_CONFIG_CUSTOM_ZEROIZE` is set and `mld_zeroize`
mapped to `OPENSSL_cleanse` to use OpenSSL's zeroization function.
* Randombytes: `MLD_CONFIG_CUSTOM_RANDOMBYTES` is set and
`mld_randombytes` mapped to `RAND_bytes` to use AWS-LC's randombytes
function.

### Side-channels

mldsa-native's CI uses a patched version of valgrind to check for
various compilers and compile flags that there are no secret-dependent
memory accesses, branches, or divisions. The relevant assertions have
been kept but are unused unless `MLD_CONFIG_CT_TESTING_ENABLED` is set,
which is the case if and only if `BORINGSSL_CONSTANT_TIME_VALIDATION` is
set.

mldsa-native uses value barriers to block potentially harmful compiler
reasoning and optimization. Where standard gcc/clang inline assembly is
not available, mldsa-native falls back to a slower 'opt blocker' based
on a volatile global -- both are described in ct.h.

### Formal Verification

All C-code imported in this commit is formally verified using the C
Bounded Model Checker (CBMC) to be free of various classes of undefined
behaviour, including out-of-bounds memory accesses and arithmetic
overflow; the latter is of particular interest for ML-DSA because of the
use of lazy modular reduction for improved performance.

The heart of the CBMC proofs are function contract and loop annotations
to the C-code. Function contracts are denoted `__contract__(...)`
clauses and occur at the time of declaration, while loop contracts are
denoted `__loop__` and follow the for statement.

The function contract and loop statements are kept in the source, but
removed by the preprocessor so long as the CBMC macro is undefined.
Keeping them simplifies the import, and care has been taken to make them
readable to the non-expert, and thereby serve as precise documentation
of assumptions and guarantees upheld by the code.

### FIPS Compliance

mldsa-native unconditionally includes stack zeroization. mldsa-native's
default secure memset is replaced by `OPENSSL_cleanse`.

mldsa-native conditionally includes a PCT, guarded by
`MLD_CONFIG_KEYGEN_PCT`. This is set in the config if and only if
`AWSLC_FIPS` is set.

While not part of the FIPS standard, the `pk_from_sk` function includes
validation of both t0 (low-order bits) and tr (hash of public key) using
constant-time comparison functions (`mld_ct_memcmp`), providing strong
assurance of key consistency.

### Testing

We KAT ML-DSA with test vectors obtained from
https://github.com/post-quantum-cryptography/KAT within
`PQDSAParameterTest.KAT`. We select the KATs for the signing mode
`hedged`, which derives the signing private random seed (rho)
pseudorandomly from the signer's private key, the message to be signed,
and a 256-bit string `rnd` which is generated at random. The `pure`
variant of these KATs were used, as they provide test vector inputs for
"pure" i.e., non-pre-hashed messages.

We also run the ACVP test vectors obtained from
https://github.com/usnistgov/ACVP-Server within the three functions
`PerMLDSATest.ACVPKeyGen`, `PerMLDSATest.ACVPSigGen` and
`PerMLDSATest.ACVPSigVer`. These correspond to the tests found at
ML-DSA-keyGen-FIPS204, ML-DSA-sigGen-FIPS204, and ML-DSA-sigVer-FIPS204.
To test ML-DSA pure, non-deterministic mode, we use `tgId = 19, 21, 23`
of sigGen and `tgId = 7, 9, 11` of sigVer.
To test ML-DSA ExternalMu, non-deterministic mode, we use `tgId = 20,
22, 24` of sigGen and `tgId = 8, 10, 12` of sigVer.

**Test Results**:
- ML-DSA Tests: 100% passing (61/61 tests)

### Formatting

Code in `crypto/fipsmodule/ml_dsa/mldsa` is directly imported from
mldsa-native and comes with its own
`crypto/fipsmodule/ml_dsa/mldsa/.clang-format`.

### Prefix build

The prefix build should not be affected by the import, since no
definitions of external linkage are imported (everything is tagged
either static directly, or `MLD_EXTERNAL_API` or `MLD_INTERNAL_API`,
both of which are set to static in the context of the import, too).

### Performance

Performance should be comparable to the previous integration as both are
based on C-only code with AWS-LC's FIPS-202 implementation. The fast
mldsa-native backends are not yet imported.

### Multilevel build

At the core, mldsa-native is currently a 'single-level' implementation
of ML-DSA: A build of the main source tree provides an implementation of
exactly one of ML-DSA-44/65/87, depending on the
`MLD_CONFIG_PARAMETER_SET` parameter.

To build all security levels, level-specific sources are built 3 times,
once per security level, and linked with a single build of the
level-independent code. The single-compilation-unit approach pursued by
AWS-LC makes this process fairly simple since one merely needs to
include the single-compilation-unit file provided by mldsa-native three
times, and configure it so that the level-independent code is included
only once. The final include moreover #undef'ines all macros defined by
mldsa-native, reducing the risk of name clashes with other parts of
`crypto/fipsmodule/bcm.c`.

Note that this process is entirely internal to ml_dsa.c, and does not
affect the AWS-LC build.

HashML-DSA: mldsa-native includes lots of HashML-DSA functionality that
we dont need in aws-lc. Perhaps we should add config upstream to
mldsa-native to choose which of pure/externalmu/hash modes are imported
to reduce unused code.

### Main differences from reference implementation

mldsa-native is a fork of the ML-DSA reference implementation
(Dilithium).

The following gives an overview of the major changes:

- CBMC and debug annotations, and minor code restructurings or signature
changes to facilitate the CBMC proofs. For example, functions are
structured to make loop bounds and memory access patterns explicit for
formal verification.
- Introduction of 4x-batched versions of some functions from the
reference implementation. This is to leverage 4x-batched Keccak-f1600
implementations if present. The batching happens at the C level even if
no native backend for FIPS 202 is present.
- FIPS 204 compliance: Introduced optional PCT (FIPS 204, Section 4.4,
Pairwise Consistency) and zeroization of stack buffers as required by
(FIPS 204, Section 3.6.3, Destruction of intermediate values).
- Introduction of native backend implementations for AVX2. Those are
drop-in replacements for the corresponding C functions and dispatched at
compile-time. (Not in this PR, but the C code prep is in place).
- Restructuring of files to separate level-specific from level-generic
functionality. This is needed to enable a multi-level build of
mldsa-native where level-generic code is shared between levels.
- More pervasive use of value barriers to harden constant-time
primitives, even when Link-Time-Optimization (LTO) is enabled. The use
of LTO can lead to insecure compilation in case of the reference
implementation.

### License
mldsa-native (everything under `crypto/fipsmodule/ml_dsa/mldsa/**`) is
imported under the Apache 2.0 license and the ISC license. The LICENSE
file remains unchanged.

Integration-specific code (everything with direct parent
`crypto/fipsmodule/ml_dsa/*`) is made under the terms of the Apache 2.0
license and the ISC license.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
@justsmth justsmth mentioned this pull request Jan 21, 2026
justsmth added a commit that referenced this pull request Jan 22, 2026
### Description of changes: 
Prepare AWS-LC v1.67.0

#### What's Changed
* Migrate Wycheproof test vectors for ECDSA, RSA PKCS#1, and some more
by @sgmenda in #2887
* increase timeout for SDE tests by @sgmenda in
#2936
* Rename volatile state/memory to unique state/memory by @torben-hansen
in #2935
* Fix failing Windows Docker image build by @nhatnghiho in
#2931
* Service Indicator: Add error call trampoline to avoid delocator issue
by @jakemas in #2920
* Add support for Big Endian in ACVP tool by @samuel40791765 in
#2938
* AES-GCM: Add function pointer trampolines to avoid delocator issue by
@jakemas in #2919
* Use already defined macro for no inline by @torben-hansen in
#2942
* Remove Kyber completely by @torben-hansen in
#2941
* Windows 7 support by @justsmth in
#2940
* Import mldsa-native by @jakemas in
#2902
* Use existing session context if new is actually NULL by @torben-hansen
in #2946
* Integrate Wycheproof ML-KEM test vectors by @sgmenda in
#2891
* Avoid cross-compilation build failure by @justsmth in
#2944


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants