refactor(aws-kms-tls-auth): psk provider using HMAC psks#5530
refactor(aws-kms-tls-auth): psk provider using HMAC psks#5530
Conversation
| .find(|secret| secret.key_epoch == current_epoch) | ||
| .cloned(); | ||
|
|
||
| if needs_rotation && rotation_key.is_some() { |
There was a problem hiding this comment.
If needs_rotation is true, and rotation_key.is_some() is false, this would be bad, right? Do we want to log anything when this happens or something?
Though I guess you'd already know this was going to happen since your failure notification would be invoked?
There was a problem hiding this comment.
Ya, the failure notification handles that, but might as well be helpful and log something.
| .await | ||
| .unwrap_err(); | ||
|
|
||
| assert_eq!(error.to_string(), "service error"); |
There was a problem hiding this comment.
Where does "service error" come from? It doesn't seem very helpful.
Co-authored-by: Sam Clark <3758302+goatgoose@users.noreply.github.com>
Co-authored-by: Lindsay Stewart <stewart.r.lindsay@gmail.com>
* add log statement for failed rotation * add test for best effort key usage * document that the server is responsible for lifetime enforcement * move smoothing factor into ProviderSecrets
| None => { | ||
| // the next epoch has already started. This might be the case if | ||
| // the fetch happened late in the epoch and had high latency. | ||
| ONE_HOUR | ||
| } |
There was a problem hiding this comment.
We just set the key for current_epoch, not (current_epoch + 1). If we wait an hour, won't we keep using current_epoch's key for (current_epoch + 1)? Shouldn't we rotate again immediately instead? Or am I missing something?
There was a problem hiding this comment.
Not really missing anything.
This isn't really a problem because we actually allow 48 hours of key lifetime on the server side.
You are correct that we could immediately rotate. I was just being paranoid because I didn't like the "hot loop" aspect of it, but I think you're right that I was being a bit too paranoid. Fixed in the next revision.
| .retain(|secret| secret.key_epoch > current_epoch); | ||
| } |
There was a problem hiding this comment.
Should this be >= instead? Is it safe to delete the current key, or could that interact in unexpected ways with other logic?
There was a problem hiding this comment.
Yup, it's safe to delete the "current" key because it's stored in self.current_key.
* immediately poll if the next epoch has started
| .build(); | ||
| let kms_client = mock_client!(aws_sdk_kms, [&rule]); | ||
|
|
||
| // we successfully initialize, resulting in 3 KMS calls |
There was a problem hiding this comment.
You can't chain KMS calls together? Like each request has to be a separate network call? Seems like kind of a shame.
There was a problem hiding this comment.
Internally, they are "chained" together. iirc, the SDK will maintain a connection pool, so there should only be one TLS handshake/TCP connection for all of these requests.
| connection: &mut s2n_tls::connection::Connection, | ||
| ) -> Result<Option<Pin<Box<dyn ConnectionFuture>>>, s2n_tls::error::Error> { | ||
| let psk = self.secret_state.current_secret().new_connection_psk()?; | ||
| connection.append_psk(&psk)?; |
There was a problem hiding this comment.
I thought the design was that the client was going to send multiple PSKs?
There was a problem hiding this comment.
Discussed offline, but we'll need some additional s2n-tls features to support the "closed-ish loop" rotation.
Description of changes:
This refactors the PSK Provider to use the HMAC based PSK derivation.
Testing:
Let me know folks thoughts on testing 🤔
Testing these sorts of timed rotation things is always a little bit unpleasant.
I ended up going for the "shallow" approach of refactoring all of the business logic into a "poll" method that I can call in tests. The more involved method would be to include a generic
Clockparameter/trait for the PskProvider, but that is gonna make the struct a bit uglier.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.