Reuse vault client/token when possible#1537
Conversation
There was a problem hiding this comment.
Greetings!
Thank you for contributing to this project!
If this is your first time contributing, please make
sure to read the Developer and Contributing Process guides.
Please also mind and follow our Code of Conduct.
Useful commands:
make fmt: Formats the codemake check-diff: Ensures the branch is cleanmake reviewable: Ensures a PR is ready for review
👇 Click on the image for a new way to code review
Legend |
1e63e95 to
9b05163
Compare
moolen
left a comment
There was a problem hiding this comment.
Hey, thanks for bringing this up 🙇!
I think this is a sensible approach! There are a couple of edge cases that i haven't see in the aws cache implementation that i just noticed reviewing your PR, see comments.
Fixed a couple of mis-named functions
Thank you, that bugged me in the past as well 😄!
Is there any documentation that needs to be updated to reflect this change?
We don't have a docs page yet that shows the different cli args available; In fact i'm in the process of restructuring the docs to add that page soon ™️. From my POV no need to add that.
e0d6abf to
f8b534b
Compare
|
/ok-to-test sha=7fdfa37 |
|
Okay, I've added the LRU cache. Right now the size is hard-coded at 100. What's the preferred way to allow users to set the value? A CLI parameter, or does the Vault schema need a new field? |
|
Awesome, thanks! Yeah, CLI would make sense. |
|
I think you need to run |
|
Ah, thanks. Will do. One other question that just occurred to me. Does the cache need to be protected with a mutex? If I understand correctly, we could conceivably have multiple secrets that reference the same store being reconciled concurrently. |
Yes, it absolutely should be! Good catch! It depends on the |
|
Alright. Things were getting a little messy, so I refactored the caching stuff into a struct in a separate file. The cache should be protected by a mutex now. |
|
/ok-to-test sha=d6d0ae4 |
|
some e2e tests are failing (this job). You can run them locally (given you have a kind cluster running and your kubectl context points to it) I've also fixed an issue with Publish Image job failing, please rebase 🙏 |
d6d0ae4 to
148eb39
Compare
|
Rebased. I'm getting those same timeout errors when I run the e2e tests on |
pkg/provider/vault/vault.go
Outdated
| tokenExists, err := checkToken(ctx, v.token) | ||
| if tokenExists { | ||
| v.log.V(1).Info("Re-using existing token") | ||
| return err | ||
| } |
There was a problem hiding this comment.
The e2e tests on main seem to be fine. only a particular auth method (cert) is failing, everything is is good.
I assume this checkToken causes the cert-based authentication to fail. It seems that when using the token from the cert/auth/login requires the use of TLS with a client certificate. Tho i couldn't find that detail in the documentation 🤨.
Also, calling .lock() before .initialize() gives a hard error due c.cache and c.mu being nil.
There was a problem hiding this comment.
Good catch on the uninitialized pointer. Should be fixed by getting rid of the pointer. Now the field will initialize to an unlocked mutex.
I think I've fixed the cert tests. The e2e tests are a little flaky on my machine.
ef7e151 to
0cbff26
Compare
|
@moolen This should be ready to go. |
|
Great, i'll take a look at it asap |
moolen
left a comment
There was a problem hiding this comment.
Found one issue, looks very good otherwise 🚀
| if cachedClient.ResourceVersion == store.GetObjectMeta().ResourceVersion { | ||
| return cachedClient.Client, true, nil | ||
| } | ||
| // revoke token and clear old item from cache if resource has been updated |
There was a problem hiding this comment.
I found one issue: if we use a static token we must not revoke it.
cachedClient := value.(clientCacheValue)
if cachedClient.ResourceVersion == store.GetObjectMeta().ResourceVersion {
return cachedClient.Client, true, nil
}
defer c.cache.Remove(key)
// do not revoke if we're having a static token from a secretRef
if store.GetSpec().Provider.Vault.Auth.TokenSecretRef != nil {
return nil, false, nil
}
// revoke token and clear old item from cache if resource has been updated
err := revokeTokenIfValid(ctx, cachedClient.Client)
if err != nil {
return nil, false, err
}There was a problem hiding this comment.
Went for a slightly different approach. Stores using a static token bypass the caching mechanism entirely. No reason to use up memory on these.
|
LGTM 👍! Could you please squash the commits into one and sign it? |
The new functionality is controlled using the newly-introduced --experimental-enable-vault-token-cache and --experimental-vault-token-cache-size command-line flags. Signed-off-by: NicEggert <nicholas.eggert@target.com>
d568b3d to
a9db19a
Compare
|
Kudos, SonarCloud Quality Gate passed!
|
|
/ok-to-test sha=a9db19a |
moolen
left a comment
There was a problem hiding this comment.
LGTM 👍
Thank you so much for your contribution ❤️









This resolves #1273 by caching Vault clients and re-using existing tokens as long as they're valid. The approach is very similar to #1244, which implements something similar for the AWS provider. As in that PR, this new feature is gated behind a CLI flag:
--experimental-enable-vault-token-cache. The only significant difference from the AWS implementation is that the cache key uses the SecretStore namespace rather than the secret namespace.Also note that the token revocation implemented in #381 is disabled when token caching is enabled.I think this is the right thing to do. We're now solving the underlying "lease leak" problem by not creating multiple token leases in the first place, which removes the need to clean them up.
Other minor changes:
setSecretKeyTokenandsetAppRoleTokenfunction names were swapped from what they actually did)Is there any documentation that needs to be updated to reflect this change?