Skip to content

Reuse vault client/token when possible#1537

Merged
moolen merged 1 commit intoexternal-secrets:mainfrom
neggert:reuse-vault-token
Sep 30, 2022
Merged

Reuse vault client/token when possible#1537
moolen merged 1 commit intoexternal-secrets:mainfrom
neggert:reuse-vault-token

Conversation

@neggert
Copy link
Copy Markdown
Contributor

@neggert neggert commented Sep 8, 2022

This resolves #1273 by caching Vault clients and re-using existing tokens as long as they're valid. The approach is very similar to #1244, which implements something similar for the AWS provider. As in that PR, this new feature is gated behind a CLI flag: --experimental-enable-vault-token-cache. The only significant difference from the AWS implementation is that the cache key uses the SecretStore namespace rather than the secret namespace.

Also note that the token revocation implemented in #381 is disabled when token caching is enabled.I think this is the right thing to do. We're now solving the underlying "lease leak" problem by not creating multiple token leases in the first place, which removes the need to clean them up.

Other minor changes:

  • Fixed a couple of mis-named functions (setSecretKeyToken and setAppRoleToken function names were swapped from what they actually did)
  • Added some more mocking that was needed to get tests passing
  • Added logging when tokens are requested.

Is there any documentation that needs to be updated to reflect this change?

Copy link
Copy Markdown
Contributor

@paul-the-alien paul-the-alien bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greetings!
Thank you for contributing to this project!
If this is your first time contributing, please make
sure to read the Developer and Contributing Process guides.
Please also mind and follow our Code of Conduct.

Useful commands:

  • make fmt: Formats the code
  • make check-diff: Ensures the branch is clean
  • make reviewable: Ensures a PR is ready for review

@ghost
Copy link
Copy Markdown

ghost commented Sep 8, 2022

👇 Click on the image for a new way to code review
  • Make big changes easier — review code in small groups of related files

  • Know where to start — see the whole change at a glance

  • Take a code tour — explore the change with an interactive tour

  • Make comments and review — all fully sync’ed with github

    Try it now!

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map Legend

Copy link
Copy Markdown
Member

@moolen moolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for bringing this up 🙇!
I think this is a sensible approach! There are a couple of edge cases that i haven't see in the aws cache implementation that i just noticed reviewing your PR, see comments.

Fixed a couple of mis-named functions

Thank you, that bugged me in the past as well 😄!

Is there any documentation that needs to be updated to reflect this change?

We don't have a docs page yet that shows the different cli args available; In fact i'm in the process of restructuring the docs to add that page soon ™️. From my POV no need to add that.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 14, 2022

/ok-to-test sha=7fdfa37

@neggert
Copy link
Copy Markdown
Contributor Author

neggert commented Sep 14, 2022

Okay, I've added the LRU cache. Right now the size is hard-coded at 100. What's the preferred way to allow users to set the value? A CLI parameter, or does the Vault schema need a new field?

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 14, 2022

Awesome, thanks! Yeah, CLI would make sense.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 15, 2022

I think you need to run make reviewable to clean up go.mod / go.sum, see failing CI job

@neggert
Copy link
Copy Markdown
Contributor Author

neggert commented Sep 15, 2022

Ah, thanks. Will do.

One other question that just occurred to me. Does the cache need to be protected with a mutex? If I understand correctly, we could conceivably have multiple secrets that reference the same store being reconciled concurrently.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 15, 2022

Does the cache need to be protected with a mutex?

Yes, it absolutely should be! Good catch! It depends on the --concurrent flag which defaults to 1 but it certainly must be protected :D (another thing to fix on the aws/auth package 👍 )

@neggert
Copy link
Copy Markdown
Contributor Author

neggert commented Sep 15, 2022

Alright. Things were getting a little messy, so I refactored the caching stuff into a struct in a separate file. The cache should be protected by a mutex now.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 15, 2022

/ok-to-test sha=d6d0ae4

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 16, 2022

some e2e tests are failing (this job).

Summarizing 6 Failures:
  [FAIL] [vault] sync secrets [It] [common] should find secrets by name using .DataFrom[] with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96
  [FAIL] [vault] sync secrets [It] [common] should sync json secrets with template with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96
  [FAIL] [vault] sync secrets [It] [common] should sync secrets with dataFrom with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96
  [FAIL] [vault] sync secrets [It] [common] should sync multiple secrets from .Data[] with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96
  [FAIL] [vault] sync secrets [It] [common] should sync docker configurated json secrets with template with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96
  [FAIL] [vault] sync secrets [It] [common] should sync with empty target name, using json. with cert auth [vault]
  /home/runner/go/pkg/mod/github.com/external-secrets/external-secrets@v0.5.8/e2e/framework/testcase.go:96

Ran 124 of 217 Specs in 1071.694 seconds
FAIL! -- 118 Passed | 6 Failed | 1 Flaked | 0 Pending | 93 Skipped

You can run them locally (given you have a kind cluster running and your kubectl context points to it)

$ make test TEST_SUITES="provider" VERSION=e2e IMAGE_REGISTRY=local GINKGO_LABELS='vault'

I've also fixed an issue with Publish Image job failing, please rebase 🙏

@neggert
Copy link
Copy Markdown
Contributor Author

neggert commented Sep 16, 2022

Rebased.

I'm getting those same timeout errors when I run the e2e tests on main, so I'm not sure if they're related to this PR.

Comment on lines +863 to +874
tokenExists, err := checkToken(ctx, v.token)
if tokenExists {
v.log.V(1).Info("Re-using existing token")
return err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The e2e tests on main seem to be fine. only a particular auth method (cert) is failing, everything is is good.

I assume this checkToken causes the cert-based authentication to fail. It seems that when using the token from the cert/auth/login requires the use of TLS with a client certificate. Tho i couldn't find that detail in the documentation 🤨.

Also, calling .lock() before .initialize() gives a hard error due c.cache and c.mu being nil.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the uninitialized pointer. Should be fixed by getting rid of the pointer. Now the field will initialize to an unlocked mutex.

I think I've fixed the cert tests. The e2e tests are a little flaky on my machine.

@neggert
Copy link
Copy Markdown
Contributor Author

neggert commented Sep 28, 2022

@moolen This should be ready to go.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 28, 2022

Great, i'll take a look at it asap

Copy link
Copy Markdown
Member

@moolen moolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one issue, looks very good otherwise 🚀

if cachedClient.ResourceVersion == store.GetObjectMeta().ResourceVersion {
return cachedClient.Client, true, nil
}
// revoke token and clear old item from cache if resource has been updated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one issue: if we use a static token we must not revoke it.

cachedClient := value.(clientCacheValue)
if cachedClient.ResourceVersion == store.GetObjectMeta().ResourceVersion {
    return cachedClient.Client, true, nil
}
defer c.cache.Remove(key)
// do not revoke if we're having a static token from a secretRef
if store.GetSpec().Provider.Vault.Auth.TokenSecretRef != nil {
    return nil, false, nil
}
// revoke token and clear old item from cache if resource has been updated
err := revokeTokenIfValid(ctx, cachedClient.Client)
if err != nil {
    return nil, false, err
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went for a slightly different approach. Stores using a static token bypass the caching mechanism entirely. No reason to use up memory on these.

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 30, 2022

LGTM 👍! Could you please squash the commits into one and sign it?

The new functionality is controlled using the newly-introduced
--experimental-enable-vault-token-cache and
--experimental-vault-token-cache-size command-line flags.

Signed-off-by: NicEggert <nicholas.eggert@target.com>
@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@moolen
Copy link
Copy Markdown
Member

moolen commented Sep 30, 2022

/ok-to-test sha=a9db19a

Copy link
Copy Markdown
Member

@moolen moolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Thank you so much for your contribution ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[HashiVault] - Reuse provider clients when possible

2 participants