Fix memory leak due to misuse of K8s clients#198
Merged
Conversation
Contributor
Author
|
@joelanford sorry, I forgot to explicitly assign a reviewer at PR creation time. |
Pull Request Test Coverage Report for Build 3049010052
💛 - Coveralls |
misberner
pushed a commit
to misberner/operator-sdk
that referenced
this pull request
Sep 8, 2022
…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
2 tasks
misberner
added a commit
to misberner/operator-sdk
that referenced
this pull request
Sep 8, 2022
…ions. See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
Member
|
Beautiful! Thanks for finding and fixing this! /approve |
a4ff5a6 to
6e4d7e6
Compare
Contributor
Author
|
@joelanford thanks for approving, it looks like I cannot merge this. Also, is the go-apidiff failure of any concern? If you get a chance, could you please take a look at the equivalent fix in |
Member
|
I don't think the go-apidiff change is a concern. This repo is not yet 1.0, and the change seems minimal enough that the few number of projects depending on this library should be able to cope. I'll go ahead and merge. Thanks again! |
jberkhahn
pushed a commit
to operator-framework/operator-sdk
that referenced
this pull request
Jan 17, 2023
…ions. (#6026) See operator-framework/helm-operator-plugins#198 for a detailed description of the issue. This commit ports over the relevant changes. Signed-off-by: Malte Isberner <malte.isberner@gmail.com> Signed-off-by: Malte Isberner <malte.isberner@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When investigating high memory usage in the RHACS operator, profiling showed that most of the memory consumed by the discovery client's processing of the OpenAPI schema:

The profile seemed odd because an ever-growing amount of memory is allocated in a
sync.Once, suggesting repeated execution.After a debugging run, I found that the
ActionClientGetteressentially creates:restClientGetter, with a per-instance lazily created, cached discovery client (seems minor)k8s.io/kubectl/pkg/cmd/util.Factoryinstance via thekube.New(rcg)callFactoryvia thecmdutil.NewFactory(rcg)call (only used for the secrets clients)kubernetes.ClientSet(seems minor)for each reconciliation of each CR managed by the operator. The major impact is that each factory has its own discovery client, which is initialized again and again, because it is guarded by different instances of the
sync.Once.When I provoked a reconciliation failure, and thereby having the operator perform one reconciliation after the other, I could get the process memory usage to over 2GB in less than a minute.
I haven't fully figured out why garbage collection was of so little use, but I suspect it has something to do with goroutines holding references to some of the objects.
Because all that the various clients need to differ on per CR instance is the namespace, the following PR changes the code to use a single rest client getter, factory, and clientset for everything, injecting the namespace in a lightweight manner when obtaining a per-CR instance action client. With this change, the memory usage hovered around 80MB, even in the above-described reconciliation failure situation.
I believe that what I found and fixed is also the culprit behind several reports of high memory usage: