fix(oracle): prevent TCP connections leakage#5491
fix(oracle): prevent TCP connections leakage#5491moolen merged 8 commits intoexternal-secrets:mainfrom
Conversation
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
8382ccd to
d4a3707
Compare
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
pkg/provider/oracle/oracle.go
Outdated
| // Caching by store name as a key is sufficient, as each VaultManagementService instance is tied to a single store. | ||
| _, ok := vms.authConfigurationsCache[store.GetName()] | ||
| if !ok { | ||
| vms.authConfigurationsCache[store.GetName()], err = auth.OkeWorkloadIdentityConfigurationProviderWithServiceAccountTokenProvider(tokenProvider) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| } | ||
|
|
||
| return vms.authConfigurationsCache[store.GetName()], nil |
There was a problem hiding this comment.
I might be wrong, but if a SecretStore /ClusterSecretStore changes configuration (e.g. due to user typo), this would never be reflected and updated in this cache, right?
There was a problem hiding this comment.
Also, if SecretStores/ClusterSecretStores have the same name, you'll have even a potentially worse situation 🤔 (cache conflicts)
There was a problem hiding this comment.
You're right, the store is esv1.GenericStore. I was focused only on ClusterSecretStore which is cluster-level object. I'll use a different key for that, probably UID.
There was a problem hiding this comment.
Used UID as caching key instead. Thank you for flagging.
There was a problem hiding this comment.
Sorry, I double checked and realized that UID's are not changed when users make any changes to the manifests. ResourceVersion - does. So caching by store.GetResourceVersion() key is safer and addresses the first question you raised.
Changed the key to store.GetResourceVersion() still.
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
97662f7 to
9fdc926
Compare
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
|
The way this bug is described, it seems like a bug in the OCI Go SDK where the client is not properly cleaned up by GC. @adutchak have you raised a similar issue on that project? |
| compartment string | ||
| encryptionKey string | ||
| workloadIdentityMutex sync.Mutex | ||
| authConfigurationsCache map[string]auth.ConfigurationProviderWithClaimAccess |
There was a problem hiding this comment.
It appears this map will grow in size over the lifetime of the operator. Perhaps not enough to make a difference, but "old" store resource versions will accumulate clients over time.
There was a problem hiding this comment.
Yes, the map size will be (SecretStores_N * ResourceVersions_N) + (ClusterSecretStores_N * ResourceVersions_N).
I can implement a cache reset as soon as the map’s key-value count reaches, let’s say, 50 items.
There was a problem hiding this comment.
9aafe56
50 is still some sort of magic number for me, but it sets at least some boundaries. For more bigger environments, it won't let it over-consuming the memory.
No, but this is good point. I will open an issue at the SDK repository too. |
|
Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org>
|
* fix(oracle): cache ServiceAccountTokenProvider per clustersecretstore Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * rename OKETokenProvider property Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * rename tokenProviders property Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * using UID for caching key Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * Using GetResourceVersion() as caching key Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * Removing unnecessary string conversion Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> * Introducing cache reset with authConfigurationsCachePoolSize Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> --------- Signed-off-by: Anatoliy Dutchak <anatoliy.dutchak@avalabs.org> Signed-off-by: Samuel Molling <samuelmolling@gmail.com>



Problem Statement
Prevents creating new TCP connection for the each secret reconciliation in Oracle OCI\OKE environment when using Workload principal type of authentication.
Related Issue
Fixes #5490
Proposed Changes
How do you like to solve the issue and why?
It turned out that each call to
auth.OkeWorkloadIdentityConfigurationProviderWithServiceAccountTokenProvider()(invoked byNewClient()) opens a new TCP connection in the OKE environment and never closes it. The SDK also does not expose any way to close the connection, making it impossible to close it later.The issue is resolved by saving an
auth.ConfigurationProviderWithClaimAccessinstance for eachClusterSecretStorein theVaultManagementServicestructure, so that whenever a new call to the NewClient() function occurs upstream, we either create the configuration provider if it’s missing or return the previously saved one.Format
Please ensure that your PR follows the following format for the title:
Where
scopeis optionally one of:Checklist
git commit --signoffmake testmake reviewable