Skip to content

Cache separate vault clients for each namespace if necessary#4706

Merged
gusfcarvalho merged 5 commits intoexternal-secrets:mainfrom
ChristianCiach:fix-vault-session-cache
May 13, 2025
Merged

Cache separate vault clients for each namespace if necessary#4706
gusfcarvalho merged 5 commits intoexternal-secrets:mainfrom
ChristianCiach:fix-vault-session-cache

Conversation

@ChristianCiach
Copy link
Copy Markdown
Contributor

@ChristianCiach ChristianCiach commented Apr 25, 2025

Problem Statement

When enabling --experimental-enable-vault-token-cache I encountered persisting 403 status errors when connecting to our Vault server. Please read the thread starting at https://github.com/external-secrets/external-secrets/discussions/3132#discussioncomment-12605322 to catch up with the discussion around this issue.

While debugging this, I realized that this issue only appears when using a ClusterSecretStore that references a namespaced auth-token where the namespace is not fixed, like this:

      auth:
        jwt:
          path: jwt-kube
          role: eso-default
          kubernetesServiceAccountToken:
            serviceAccountRef:
              name: default  # Every namespace has its own default service-account

^ Note that the serviceAccountRef does not state a fixed namespace. The namespace of the default service account is decided later, based on the namespace of the ExternalSecret that uses this ClusterSecretStore.

When configuring it like this, ESO will currently generate the same cache key for every namespace, because it currently uses the namespace of the ClusterSecretStore (which is obviously always the empty string), even though the auth-token needs to be different for every namespace that contains ExternalSecrets.

Thanks to @gusfcarvalho who encouraged me to create this PR.

Related Issue

No issue yet. See the linked discussion above. Please let be know if you absolutely need an issue for tracking this PR.

Proposed Changes

In pkg/provider/vault/provider.go I found the existing function isReferentSpec that I think does exactly what I needed to fix this: This function seems to return "true" in cases where the configured vault-auth is referencing an auth-token that is based on the namespace of the referencing ExternalSecret.

If this function returns true, I just use namespace of the requesting ExternalSecret object as part of the cache-key.

Checklist

  • I have read the contribution guidelines
  • All commits are signed with git commit --signoff
  • My changes have reasonable test coverage
  • All tests pass with make test
  • I ensured my PR is ready for review with make reviewable

@ChristianCiach ChristianCiach force-pushed the fix-vault-session-cache branch from e602ae4 to aa1f091 Compare April 25, 2025 15:11
@gusfcarvalho
Copy link
Copy Markdown
Member

this approach looks solid to me - plus all the checks are already passing 😄

@gusfcarvalho
Copy link
Copy Markdown
Member

/ok-to-test sha=aa1f0914de8fe8e828250d7ff82449a12e18b93d

@eso-service-account-app
Copy link
Copy Markdown
Contributor

@gusfcarvalho
Copy link
Copy Markdown
Member

I think this looks good @ChristianCiach . We just need to maybe improve the test coverage by adding a test case with a different namespace on ExternalSecret manifest and see if the method will actually cache that other info over the "default" one on the test you've refactored.

Then, this will look very good.

@ChristianCiach
Copy link
Copy Markdown
Contributor Author

We just need to maybe improve the test coverage

I will see what I can do. I still struggle with unit testing in Go, since it works very differently than the testing frameworks of Java and Python that I am familiar with. It may take a few days, but I will come back to you.

If you don't want to wait, feel free to add a test by yourself :)

@ChristianCiach ChristianCiach force-pushed the fix-vault-session-cache branch from 4f1497e to 25307c3 Compare April 29, 2025 14:12
@ChristianCiach
Copy link
Copy Markdown
Contributor Author

ChristianCiach commented Apr 29, 2025

Alright, this took a while and I struggled quite a bit.

At first I tried to integrate the the new tests into the existing test cases, but this was not really possible:

  • None of the existing unit tests are executed with caching enabled. In fact, there was no function available to enable caching for unit tests. So I extracted some cache initialization code from lateInit into separate functions.
  • The existing test cases are designed to to retrieve a vault-client exactly once per test case and then inspect the result. This is obviously not sufficient for testing a cache, so I added completely independent unit test functions.

Instead of configuring a Provider.Vault.Auth.Kubernetes.ServiceAccountRef I tried to configure a Provider.Vault.Auth.TokenSecretRef at first and I got confused when the caching did not work in this case. As it turned out, caching was explicitly disabled by the existing implementation when using a static TokenSecretRef. I've removed this exception, because it is confusing and I see absolutely no valid reason to disable caching in this case. Even if the token does never change, caching of the vault-client can still be beneficial for re-using http connections, avoiding rate limiting and saving performance on client initialization.

There was also a "condition is always false" warning when comparing an err to nil, so I've removed these lines.

@ChristianCiach ChristianCiach marked this pull request as ready for review April 30, 2025 13:42
@ChristianCiach ChristianCiach requested a review from a team as a code owner April 30, 2025 13:42
@gusfcarvalho
Copy link
Copy Markdown
Member

gusfcarvalho commented May 8, 2025

Instead of configuring a Provider.Vault.Auth.Kubernetes.ServiceAccountRef I tried to configure a Provider.Vault.Auth.TokenSecretRef at first and I got confused when the caching did not work in this case. As it turned out, caching was explicitly disabled by the existing implementation when using a static TokenSecretRef. I've removed this exception, because it is confusing and I see absolutely no valid reason to disable caching in this case. Even if the token does never change, caching of the vault-client can still be beneficial for re-using http connections, avoiding rate limiting and saving performance on client initialization.

this skip adding tokens to cache was done for a reason. When users are sending out tokens to external-secrets, we are considering them to be possible batch tokens (subject to being revoked), and we shouldn't absolutely do that.

See #1537 (comment) for more context.

@ChristianCiach ChristianCiach force-pushed the fix-vault-session-cache branch from 1c96dfa to c762836 Compare May 13, 2025 08:13
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
Signed-off-by: Christian Ciach <christian.ciach@gmail.com>
@ChristianCiach ChristianCiach force-pushed the fix-vault-session-cache branch from c762836 to 9b874e6 Compare May 13, 2025 08:14
@ChristianCiach
Copy link
Copy Markdown
Contributor Author

ChristianCiach commented May 13, 2025

we are considering them to be possible batch tokens (subject to being revoked)

I think you've got this backwards. Batch-tokens cannot be revoked (they don't have an associated session in Vault, which is the point). You are probably talking about service-tokens. I agree that these shouldn't be revoked, but IMHO that's not a very good argument to not cache the vault client.

Anyway, I don't really care either way, so I've removed this commit from this PR :)

@gusfcarvalho
Copy link
Copy Markdown
Member

/ok-to-test sha=9b874e6ed80fa5a42b638f58431d44a42939587c

@gusfcarvalho
Copy link
Copy Markdown
Member

Looks good to me :) just waiting to see if CI/CD & e2e tests pass and this is good to go IMO :)

I like the small tweaks you did to make the codebase better! much more maintainable now, thanks!

@eso-service-account-app
Copy link
Copy Markdown
Contributor

@gusfcarvalho
Copy link
Copy Markdown
Member

thanks for your contribution @ChristianCiach ! 🥳

@gusfcarvalho gusfcarvalho enabled auto-merge (squash) May 13, 2025 13:54
@gusfcarvalho gusfcarvalho merged commit e89497f into external-secrets:main May 13, 2025
1 check passed
@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants