-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Cilium v1.19.0 excessive memory usage #44310
Copy link
Copy link
Closed
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.19.0 and lower than v1.20.0
What happened?
After upgrading from v1.18.6 to v1.19.0 we noticed Cilium-agent was using an excessive amount of memory, sometimes so much that kubelet would be OOMkilled - killing the node's ability to recover.
Datadog metrics from Cilium:
Reverting back to v1.18.6 resolved the memory pressure.
I followed the guide here to investigate and took some pprof snapshots: https://docs.cilium.io/en/latest/contributing/development/debugging/#cpu-profiling-and-memory-leaks
Here is the pprof output after I ran kubectl exec -it -n kube-system cilium-tvtg5 -- cilium-bugtool --get-pprof --pprof-trace-seconds 10
cilium-bugtool-20260211-221005.093+0000-UTC-4259347162.zip
How can we reproduce the issue?
Cilium helm values:
cilium:
socketLB:
hostNamespaceOnly: true
extraEnv:
- name: MALLOC_ARENA_MAX
value: "1"
k8sServicePort: 443
localRedirectPolicy: true
enableIPv4Masquerade: true
egressMasqueradeInterfaces: "eth0 primary"
extraArgs:
- "--api-rate-limit"
- "endpoint-create=rate-limit:5/s,rate-burst:40,parallel-requests:40,log:true,endpoint-delete=rate-burst:40,parallel-requests:40,log:true" # 10x base rate limits, see https://docs.cilium.io/en/stable/configuration/api-rate-limiting/
- "--enable-stale-cilium-endpoint-cleanup=false" # Disable Stale Cilium Endpoint Cleanup to solve pods with missing IPs or CiliumEndpoints after cilium restarts
resources:
requests:
cpu: 250m
memory: 128Mi
tls:
# Disable secretsNamespace & secretSync
readSecretsOnlyFromSecretsNamespace: false
secretsNamespace:
create: false
secretSync:
enabled: false
scheduling:
mode: kube-scheduler
cni:
# Defaults to true. Setting to false avoids removal of CNI configuration
# files during upgrades in order to ensure nodes do not go unmanageable.
uninstall: false
logOptions:
format: json-ts
operator:
nodeSelector:
role.node.kubernetes.io/critical-addon: "true"
tolerations:
- operator: Exists
priorityClassName: system-cluster-critical
podDisruptionBudget:
enabled: true
prometheus:
enabled: true
kubeProxyReplacement: true
# enable metrics
prometheus:
enabled: true
# workaround for cilium mem leak in l2 neighbor discovery error handling in 1.16.0-1.16.1+
l2NeighDiscovery:
enabled: false
# perf settings for high pod-churn clusters
ipam:
ciliumNodeUpdateRate: 5s # Ref https://github.com/cilium/cilium/pull/23017
operator:
externalAPILimitBurstSize: 100
externalAPILimitQPS: 10.0
k8sClientRateLimit:
burst: 100
qps: 20
ciliumEndpointSlice:
enabled: true
bpf:
policyMapMax: 65536 # limited to 16 bits https://github.com/cilium/cilium/issues/27866
mapDynamicSizeRatio: 0.005 # Ref https://docs.cilium.io/en/stable/network/ebpf/maps/
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 10%
envoy:
enabled: true
resources:
requests:
cpu: 250m
memory: 128Mi
loadBalancer:
l7:
backend: envoy
### Cilium Version
v1.19.0
### Kernel Version
`6.12.58-82.121.amzn2023.x86_64` (AWS AL2023)
### Kubernetes Version
`v1.34.2-eks-ecaa3a6`
### Regression
v1.18.6
### Sysdump
_No response_
### Relevant log output
```shell
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.