policy: fix ResourceID leak in policy importer during policy churn#44724
Merged
fristonio merged 2 commits intocilium:mainfrom Mar 12, 2026
Merged
policy: fix ResourceID leak in policy importer during policy churn#44724fristonio merged 2 commits intocilium:mainfrom
fristonio merged 2 commits intocilium:mainfrom
Conversation
This fixes a slow leak where the ResourceID of a given policy is leaked in the prefixesByResource map when a policy without a single CIDR rule is removed. This is a rather slow leak, where our testing shows that churning ~10 policies that all leak, results in ~75MiB of leaked memory after 24 hours. For testing purposes, we did this with both namespaces and names of policies being UUIDs to get consistent results. This affects all agents in the cluster, no matter if the policies select endpoints on the node or not. This affects cilium v1.17, and given the code has not changed, it most likely also affects all stable versions. A following test update shows this is indeed the case. This code was introduced in the first mentioned commit below, but it was initially unused. Then, the last referenced commit is the one that made policy imports use this code. This code does deserve some rewrite to make it simpler to avoid this, but keeping this as a simple fix to ease backports. Fixes: 783465b ("policy: add policy import cell") Fixes: b762c3f ("policy/k8s: switch to policy importer cell") Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com>
This adds some more cases we want to test to ensure we don't leak old resourceIDs when a policy is removed. Signed-off-by: Odin Ugedal <odin@uged.al> Signed-off-by: Odin Ugedal <ougedal@palantir.com>
24799f9 to
aef9fbe
Compare
Member
Author
|
/test |
1 similar comment
Member
Author
|
/test |
Member
Author
|
/ci-clustermesh |
Member
Author
|
/ci-l3-l4 |
Member
Author
|
cc @squeed |
8 tasks
Member
Author
|
Thanks @fristonio! Can you also add the backport labels for all supported releases now? Given the severity of this leak, and the simple fix, I think it deserves being backported to v1.17, v1.18 and v1.19. |
Member
Author
|
Thanks @fristonio 🙏 |
4 tasks
4 tasks
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes a slow leak where the ResourceID of a given policy is leaked
in the prefixesByResource map when a policy without a single CIDR rule
is removed.
This is a rather slow leak, where our testing shows that churning ~10
policies that all leak, results in ~75MiB of leaked memory after 24
hours. For testing purposes, we did this with both namespaces and names
of policies being UUIDs to get consistent results. This affects all
agents in the cluster, no matter if the policies select endpoints on the
node or not.
This affects cilium v1.17, and given the code has not changed, it most
likely also affects all stable versions. A following test update shows
this is indeed the case. This code was introduced in the first mentioned
commit below, but it was initially unused. Then, the last referenced
commit is the one that made policy imports use this code.
This code does deserve some rewrite to make it simpler to avoid this,
but keeping this as a simple fix to ease backports.
Fixes: 783465b ("policy: add policy import cell")
Fixes: b762c3f ("policy/k8s: switch to policy importer cell")
Fixes: #issue-number