Skip to content

refactor(aws-kms-tls-auth): psk provider using HMAC psks#5530

Merged
jmayclin merged 5 commits intoaws:mainfrom
jmayclin:2025-09-25-psk-provider
Oct 10, 2025
Merged

refactor(aws-kms-tls-auth): psk provider using HMAC psks#5530
jmayclin merged 5 commits intoaws:mainfrom
jmayclin:2025-09-25-psk-provider

Conversation

@jmayclin
Copy link
Copy Markdown
Contributor

Description of changes:

This refactors the PSK Provider to use the HMAC based PSK derivation.

Testing:

Let me know folks thoughts on testing 🤔

Testing these sorts of timed rotation things is always a little bit unpleasant.

I ended up going for the "shallow" approach of refactoring all of the business logic into a "poll" method that I can call in tests. The more involved method would be to include a generic Clock parameter/trait for the PskProvider, but that is gonna make the struct a bit uglier.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

.find(|secret| secret.key_epoch == current_epoch)
.cloned();

if needs_rotation && rotation_key.is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If needs_rotation is true, and rotation_key.is_some() is false, this would be bad, right? Do we want to log anything when this happens or something?

Though I guess you'd already know this was going to happen since your failure notification would be invoked?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, the failure notification handles that, but might as well be helpful and log something.

.await
.unwrap_err();

assert_eq!(error.to_string(), "service error");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does "service error" come from? It doesn't seem very helpful.

jmayclin and others added 3 commits October 1, 2025 18:35
Co-authored-by: Sam Clark <3758302+goatgoose@users.noreply.github.com>
Co-authored-by: Lindsay Stewart <stewart.r.lindsay@gmail.com>
* add log statement for failed rotation
* add test for best effort key usage
* document that the server is responsible for lifetime enforcement
* move smoothing factor into ProviderSecrets
@jmayclin jmayclin requested a review from lrstewart October 2, 2025 18:30
Comment on lines +143 to +147
None => {
// the next epoch has already started. This might be the case if
// the fetch happened late in the epoch and had high latency.
ONE_HOUR
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just set the key for current_epoch, not (current_epoch + 1). If we wait an hour, won't we keep using current_epoch's key for (current_epoch + 1)? Shouldn't we rotate again immediately instead? Or am I missing something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really missing anything.

This isn't really a problem because we actually allow 48 hours of key lifetime on the server side.

You are correct that we could immediately rotate. I was just being paranoid because I didn't like the "hot loop" aspect of it, but I think you're right that I was being a bit too paranoid. Fixed in the next revision.

Comment on lines +158 to +159
.retain(|secret| secret.key_epoch > current_epoch);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be >= instead? Is it safe to delete the current key, or could that interact in unexpected ways with other logic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's safe to delete the "current" key because it's stored in self.current_key.

* immediately poll if the next epoch has started
@jmayclin jmayclin requested a review from lrstewart October 7, 2025 21:36
.build();
let kms_client = mock_client!(aws_sdk_kms, [&rule]);

// we successfully initialize, resulting in 3 KMS calls
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't chain KMS calls together? Like each request has to be a separate network call? Seems like kind of a shame.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, they are "chained" together. iirc, the SDK will maintain a connection pool, so there should only be one TLS handshake/TCP connection for all of these requests.

connection: &mut s2n_tls::connection::Connection,
) -> Result<Option<Pin<Box<dyn ConnectionFuture>>>, s2n_tls::error::Error> {
let psk = self.secret_state.current_secret().new_connection_psk()?;
connection.append_psk(&psk)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the design was that the client was going to send multiple PSKs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, but we'll need some additional s2n-tls features to support the "closed-ish loop" rotation.

@jmayclin jmayclin requested a review from maddeleine October 10, 2025 00:15
@jmayclin jmayclin added this pull request to the merge queue Oct 10, 2025
Merged via the queue into aws:main with commit 1774b46 Oct 10, 2025
50 checks passed
@jmayclin jmayclin deleted the 2025-09-25-psk-provider branch October 10, 2025 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants