Skip to content

Update s2n to latest revision of PQ Hybrid TLS 1.3 Draft RFC#3800

Merged
maddeleine merged 10 commits intoaws:mainfrom
alexw91:pq_latest_draft_revision_wip
Mar 2, 2023
Merged

Update s2n to latest revision of PQ Hybrid TLS 1.3 Draft RFC#3800
maddeleine merged 10 commits intoaws:mainfrom
alexw91:pq_latest_draft_revision_wip

Conversation

@alexw91
Copy link
Copy Markdown
Contributor

@alexw91 alexw91 commented Feb 1, 2023

Resolved issues:

N/A

Description of changes:

s2n currently implements Draft 0 of Hybrid key exchange in TLS 1.3 Draft RFC. More recent drafts of this RFC have made minor tweaks to the wire format of PQ Hybrid KeyShares that are sent during the TLS Handshake, notably whether internal KeyShare values should be length prefixed. The original format for a PQ Hybrid KeyShare was (Total Length, ECC Length, ECC Share, PQ Length, PQ Share), while in later drafts the format has become (Total Length, ECC Share, PQ Share). The removal of PQ Length and ECC Length shortens the Total Length of the PQ Hybrid KeyShare by 4 bytes, since each length prefix was 2 bytes each.

s2n also implements a different Draft RFC for Hybrid PQ TLS 1.2, which used the original length-prefixed format for TLS 1.2 Hybrid KeyShares, and has not changed. So to keep s2n interoperable with both draft RFC's, a new len_prefixed boolean variable has been added and tracked within the KEM Parameters struct in every connection to determine whether to use the old PQ Hybrid KeyShare format or the new one.

Previously existing s2n TLS Security Policies with PQ KEMs will not be changed, and will continue to send Hybrid KeyShares using the old format so as to not break compatibility with existing AWS TLS server endpoints that are only compatible with the old format. Once existing TLS service endpoints have been updated to support receiving ClientHellos in both the old and the new format, support for s2n client sending PQ Hybrid KeyShares using the old format can be removed from s2n.

Call-outs:

None.

Testing:

  1. New s2n unit tests that test all combinations of old and new Hybrid KeyShare formats.
  2. New integration tests with LibOQS's Openssl fork
  3. Manual integration testing with LibOQS's Openssl fork, with s2n as both client and server.
  4. New Fuzz testing with len_prefixed enabled and disabled.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from 2117f0a to 1ce0504 Compare February 1, 2023 00:35
@alexw91 alexw91 marked this pull request as ready for review February 1, 2023 05:37
@alexw91 alexw91 requested a review from dougch as a code owner February 1, 2023 05:37
@alexw91 alexw91 requested a review from maddeleine February 2, 2023 22:39
Copy link
Copy Markdown
Contributor

@maddeleine maddeleine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with either of these PQ RFCs. Why are we able to remove the length prefixes? Are the key shares always the same size?

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from 991dea3 to 1f54352 Compare February 4, 2023 00:35
Copy link
Copy Markdown
Contributor

@lrstewart lrstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary because some implementations aren't supporting both formats / maintaining backwards compatibility? Since that would be simpler, is that an option for us too?

{
/* Carefully consider any increases to this number. */
const uint16_t max_connection_size = 4250;
const uint16_t max_connection_size = 4264;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this bump coming from? I see one new bool on s2n_kem, is there any other new memory?

Copy link
Copy Markdown
Contributor Author

@alexw91 alexw91 Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct s2n_connection; contains one member value of struct s2n_kex_parameters, which contains two seperate struct s2n_kem_group_params;, one for the server-side and a 2nd for the client-side:

s2n-tls/tls/s2n_crypto.h

Lines 37 to 38 in ecedd2b

struct s2n_kem_group_params server_kem_group_params;
struct s2n_kem_group_params client_kem_group_params;

So theoretically, my change should increase the size of an s2n_connection by at least 2 bits. Since there aren't any other bitfield members available in the structs that I added that bit to, each new bit is rounded up to 1 whole byte. Then on top of that, each struct member is usually required to be native-word aligned so that it begins on a (64-bit/8-byte) boundary, so if that 1-byte of extra memory in server_kem_group_params and client_kem_group_params crosses over into the next word, each struct gets padded out to 8 bytes, resulting in a ~16-byte increase to the size of s2n_connection (assuming that the actual hard limit of max_connection_size was 4248 instead of 4250 before this change).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a (very small, granted) argument for tracking len_prefixed once on the connection, where there's already a bitfield.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch 2 times, most recently from 8ba3b77 to a252a12 Compare February 7, 2023 00:35
Copy link
Copy Markdown
Contributor

@bbutch bbutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took one pass at it; posted just a few high level comments. The overall implementation of the latest draft spec looks correct. I'm still digesting and thinking about some things so I will take another pass tomorrow and probably have more feedback.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch 8 times, most recently from 4985ddf to bc872c5 Compare February 16, 2023 17:32
@alexw91
Copy link
Copy Markdown
Contributor Author

alexw91 commented Feb 16, 2023

Is this necessary because some implementations aren't supporting both formats / maintaining backwards compatibility?

Yes, LibOQS maintains a fork of Openssl with Hybrid KEMs added, but it's meant for interoperability testing, and there's no backwards compatibility guarantees. They updated their fork to the new format without keeping support for the old format.

Since that would be simpler, is that an option for us too?

We could do the same, but the client SDK's and AWS services consume s2n updates at different cadences. If we did a hard migration in s2n, without temporarily supporting both options, then the likely result would be that the SDK's would pick up this s2n change first, and the PQ TLS feature would be broken until the AWS services deployed the corresponding updates, which might be a few months. We'd strongly prefer not to un-launch this feature, even if it's temporary.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from bc872c5 to bf4111a Compare February 17, 2023 03:41
Copy link
Copy Markdown
Contributor

@lrstewart lrstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of changes here, so I'll probably need to take another pass later.

Comment on lines +71 to +74
/* Do not GUARD this call to s2n_kem_generate_keypair(), as it will fail when attempting to generate the KEM
* key pair if !s2n_pq_is_enabled(). However, even if it does fail, it will have allocated zeroed memory for the
* public and private keys needed for setup_connection() to complete. */
s2n_result result = s2n_kem_generate_keypair(param);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wasn't this true before? Is this a behavior change?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
If PQ is disabled, why run this test at all?

Copy link
Copy Markdown
Contributor Author

@alexw91 alexw91 Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These code changes are keeping the previously existing behavior of the fuzz test, but fixes the initialization function to successfully initialize both Kyber parameters instead of only the first one.

Previously when running this fuzz test with Openssl-1.0.2-fips (where PQ is automatically disabled), s2n_kem_generate_keypair() would allocate zeroed memory for the Kyber key pairs but eventually fail with S2N_ERR_PQ_DISABLED and return S2N_FAILURE from s2n_fuzz_init(). This non-zero return code from the fuzz init function does NOT cause libfuzzer to skip running s2n_fuzz_test(), and libfuzzer will still fuzz the s2n_client_key_recv() function. s2n_client_key_recv() will call into both the ECDHE and PQ code, but with a zeroed Kyber key-pair. Since s2n_pq_is_enabled() will return false in this mode, if s2n_client_key_recv() gets far enough to call into Kyber's decapsulate method, it will also immediately fail with S2N_ERR_PQ_DISABLED, and the zeroed Kyber key-pair will never actually be used by s2n_client_key_recv().

So when PQ is disabled this fuzz test is still fuzzing the ECDHE half of the hybrid key share, and if the ECDHE half is successful, the PQ portion is guaranteed to eventually fail with S2N_ERR_PQ_DISABLED.

If we attempt to skip this test altogether by adding if (!s2n_pq_is_enabled()) { return S2N_SUCCESS; } to the first line of s2n_fuzz_test(), it will trigger our fuzz test safety checks that ensures each fuzz test covers a minimum number of branches, and will actually cause the test to start failing instead of being skipped. Ideally, long term, we should have a better way to skip running fuzz tests based on runtime checks, but for now, keeping this fuzz tests behavior as-is still provides some value due to it still fuzzing the ECDHE half when PQ is disabled, and fuzzing both halves in all other configurations.

Copy link
Copy Markdown
Contributor

@bbutch bbutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks pretty good to me; left a few minor comments.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from 7ac3949 to 3001271 Compare February 24, 2023 00:41
@alexw91 alexw91 requested a review from lrstewart February 24, 2023 21:48
@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from e86868d to 635c44a Compare February 24, 2023 21:48
@bbutch
Copy link
Copy Markdown
Contributor

bbutch commented Feb 26, 2023

On latest pass, just looking for clarification on the question lrstewart asked. Other than that, we should expect a new draft revision in the next day or two :)

Edit to add: see latest review below

Copy link
Copy Markdown
Contributor

@bbutch bbutch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought...

@bbutch
Copy link
Copy Markdown
Contributor

bbutch commented Feb 27, 2023

Other than that, we should expect a new draft revision in the next day or two

https://datatracker.ietf.org/doc/html/draft-ietf-tls-hybrid-design-06 The diff between drafts 5 and 6 is negligible, and doesn't affect this PR at all.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch 2 times, most recently from 6276c17 to 8eeb324 Compare February 28, 2023 19:48
{
/* Carefully consider any increases to this number. */
const uint16_t max_connection_size = 4250;
const uint16_t max_connection_size = 4264;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not a fan of the unnecessary memory increase. Small increases have big impacts for customers with a lot of connections, and this change shouldn't actually need any extra memory.

@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from 8eeb324 to a42ccb2 Compare March 1, 2023 18:32
@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from a42ccb2 to b7e324f Compare March 2, 2023 00:23
@alexw91 alexw91 force-pushed the pq_latest_draft_revision_wip branch from b7e324f to 80ac4e3 Compare March 2, 2023 05:46
@maddeleine maddeleine merged commit 7640009 into aws:main Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants