ML-KEM refactor to support all three variants and some code cleanup#26054
ML-KEM refactor to support all three variants and some code cleanup#26054vdukhovni wants to merge 6 commits intoopenssl:feature/ml-kemfrom
Conversation
067b124 to
bbf67b8
Compare
bbf67b8 to
0db998e
Compare
0db998e to
16a9fa1
Compare
16a9fa1 to
04b3879
Compare
04b3879 to
83b27f5
Compare
vdukhovni
left a comment
There was a problem hiding this comment.
Thanks for the feedback. Any idea why the CI bot keeps wanting the PR as affecting FIPS. I don't see any changes that go into the FIPS module.
|
The KAT tests for ML-KEM-512 and ML-KEM-1024 are I think on @baentsch's plate. There should soon be another PR that is based on this one, and integrates the algorithms into the provider and adds KAT tests. Michael and I are on the hook for docs, I think I'm doing reviews and polish, but I'll check whether I'm also authoring any. |
6bffadd to
3ab8817
Compare
|
The style guide section 5 says upper case for type names. Many such here are not. |
| * First precomputed array from Appendix A of FIPS 203, or else Python: | ||
| * kNTTRoots = [pow(17, bitreverse(i), p) for i in range(128)] | ||
| */ | ||
| static const uint16_t kNTTRoots[128] = { |
There was a problem hiding this comment.
The style guide section 4 advises against mixed case names and type based naming.
The names here are readable enough so I guess this isn't a mandatory change.
There was a problem hiding this comment.
This is inherited from Google's C/C++ internal style guide, and so not surprising in code contributed from BoringSSL. If this is important to change, that's certainly an option. The names in question are file-local.
| * SHA3-256, then |outlen| must be the correct length for that function. | ||
| */ | ||
| static __owur | ||
| int single_keccak(uint8_t *out, size_t outlen, const uint8_t *in, size_t inlen, |
There was a problem hiding this comment.
The style guide section 8 recommends DOxygen comments for functions. This was specifically about internal functions -- public ones are documented in POD files.
There was a problem hiding this comment.
How internal did you have in mind here? These are static internal, so never seen outside the file in question.
| * Rank mismatch should not happen, distinct ML-KEM variants have separate | ||
| * dispatch methods. | ||
| */ | ||
| if (v1->rank != v2->rank) |
There was a problem hiding this comment.
I am now comparing public key SHA3-256 hashes, which are the same size across all variants. So any rank mismatch will also be detected, barring catastrophic failure of collision resistance in SHA3-256.
The more interesting question here is whether two "bare" keys (with just the "vinfo" and hash handles, no private or public key material as-yet generated or imported) are to be considered "equal". The ML-KEM implementation returns no, which makes sense to me, like SQL NULL values, we simply don't know whether the keys are really intended to be the same, and "no" seems like the safer answer.
No tests seem to expect equality of "bare" keys...
| * https://www.openssl.org/source/license.html | ||
| */ | ||
|
|
||
| /* Copyright (c) 2024, Google Inc. */ |
There was a problem hiding this comment.
Whatever we decide to do with attributing for google, it probably needs to be consistent across files.
slontis
left a comment
There was a problem hiding this comment.
It seems that the main advantage of templating (macroing) the code is that it avoids having to do a few mallocs. The cost is reduced readability, in what could be argued is already complex enough (considering that there is not really that much base code).
I see little reason to do this for the ML_DSA case which has quite a few more functions.
Storing a matrix and vector as just pointers to a scalar and then having init functions that sets the scalar ptr is much simpler.
When a key is allocated it mallocs enough space for all its vectors and matrix scalars and then just sets the scalar pointer to point into the malloced blob.
The same thing can be done for temporary stack vectors and matricies by either allocating a scalar blob and/or having a stack based scalar blob and then setting the temp scalar pointers to point to the blob.
Done in #26172 |
In #26172, none remain that are not file-local. Just two are exported by
So progress has been made along the lines you suggest, but I am reluctant to avoid short names that private to a given module. |
|
Superseded by #26172 |
Multi-variant ML-KEM
This introduces support for ML-KEM-512 and ML-KEM-1024 using the same
underlying implementation parameterised by a few macros for the
associated types and constants.
KAT tests are added for ML-KEM 512 and 1024, to complement the previous
tests for ML-KEM-768.
MLKEM TLS group codepoints are updated to make the additional KEMs
known to the TLS layer (ML-KEM is not enabled by default).
Most of the underlying code is shared, but the external
generate/encap/decap API is variant-specific, and uses
some implicit buffer sizes and dedicated key types.
Variant-specific) code is compiled separately for each of ML-KEM-512,
ML-KEM-768 and ML-KEM-1024 from a common source file, which contains
short wrappers for the shared code.
There is an additional API that takes an "const ossl_ml_kem_vinfo"
pointer, (obtained from ossl_ml_kem_*_get_vinfo()). This checks the
otherwise implicit sizes and uses (void *) key pointers, converting them
to the appropriate type before calling the right variant-specific API.
This is used to simplify the code in the provider, which other than
initialising the right variant parameters is then share across all the
variants.
The original BoringSSL code was refactored to eliminate the opaque
public/private key structure wrappers, since these structures will be
private to the provider implementation, and are not part of the public
API.
Support for ML-KEM-512 required adding a centered binomial distribution
helper function to deal with η_1 == 3 in that algorithm.
There were some unrelated literal 32's and 64's in the code, these were
replaced by the appropriate macros.
Some additional comments were added to highlight how the code relates to
the ML-KEM specification in FIPS 203.
Some tests were added to excercise also the non-derandomised ML-KEM
APIs, but so far the "vinfo" driven functions above the provider level,
the provider-layer (end-to-end) non-derandomised tests are yet to be
written, but should be fairly tractable.