Bug description
We are running Istio v1.3.3 (this also happend on v1.3.0), most of our applications are running fine. However, in one of our deployments the CPU on the sidecar is pegged at 1.0 for the life of the pod. This happens for the majority of the pods in the same deployment but not all. The pod contains nginx, istio-proxy and an application container all of which have normal CPU use except istio-proxy. I turned on debugging on the sidecar however there are not really interesting.
istio-proxy [2019-09-25 09:45:06.322][27][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C1145026] response complete
istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
istio-proxy [2019-09-25 09:45:06.322][27][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:199] [C1145026] response complete
istio-proxy [2019-09-25 09:45:06.322][27][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:237] [C1145026] moving to ready
istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac
istio-proxy [2019-09-25 09:45:06.323][27][debug][filter] [src/istio/control/client_context_base.cc:138] Report attributes: attributes {
istio-proxy key: "connection.mtls"
istio-proxy value {
istio-proxy bool_value: false
istio-proxy }
istio-proxy }
istio-proxy attributes {
istio-proxy key: "context.protocol"
istio-proxy value {
istio-proxy string_value: "http"
istio-proxy }
istio-proxy }
istio-proxy attributes {
istio-proxy key: "context.prox
istio-proxy [2019-09-25 09:45:06.324][27][debug][connection] [external/envoy/source/common/network/connection_impl.cc:520] [C1145208] remote close
istio-proxy [2019-09-25 09:45:06.324][27][debug][connection] [external/envoy/source/common/network/connection_impl.cc:190] [C1145208] closing socket: 0
istio-proxy [2019-09-25 09:45:06.324][27][debug][main] [external/envoy/source/server/connection_handler_impl.cc:80] [C1145208] adding to cleanup list
istio-proxy [2019-09-25 09:45:06.324][27][debug][filter] [src/envoy/tcp/mixer/filter.cc:100] [C1138233] Called tcp filter onRead bytes: 17
istio-proxy [2019-09-25 09:45:06.324][28][debug][router] [external/envoy/source/common/router/router.cc:1002] [C637330][S2924433164096281138] upstream headers complete: end_stream=false
istio-proxy [2019-09-25 09:45:06.324][28][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
istio-proxy [2019-09-25 09:45:06.324][28][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1435] [C637330][S2924433164096281138] encoding headers via codec (end_stream=false):
I was able to collect an strace from the PID of the problematic pod
sudo strace -c -f -p 5296
strace: Process 5296 attached with 15 threads
^Cstrace: Process 5296 detached
strace: Process 5329 detached
strace: Process 5330 detached
strace: Process 5331 detached
strace: Process 5332 detached
strace: Process 5338 detached
strace: Process 5340 detached
strace: Process 5341 detached
strace: Process 5342 detached
strace: Process 5343 detached
strace: Process 5344 detached
strace: Process 5345 detached
strace: Process 5346 detached
strace: Process 5347 detached
strace: Process 6356 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
73.39 0.176384 256 687 44 futex
9.04 0.021728 49 436 nanosleep
8.82 0.021201 122 173 epoll_pwait
6.45 0.015504 3100 5 2 restart_syscall
0.64 0.001548 30 50 write
0.50 0.001208 11 109 50 read
0.31 0.000753 75 10 sched_yield
0.30 0.000715 14 50 25 accept4
0.24 0.000567 22 25 close
0.16 0.000385 7 50 epoll_ctl
0.08 0.000181 7 25 getsockname
0.07 0.000178 7 25 setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 0.240352 1645 121 total
Here is the output of
kubectl -n $namespace exec -it $pod -c istio-proxy -- top on a pod with busy-envoy
top - 15:33:18 up 18:25, 0 users, load average: 1.58, 1.56, 1.53
Tasks: 3 total, 1 running, 2 sleeping, 0 stopped, 0 zombie
%Cpu(s): 22.5 us, 15.0 sy, 0.0 ni, 62.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
KiB Mem : 7807704 total, 1493152 free, 3275896 used, 3038656 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 4147820 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21 istio-p+ 20 0 228232 112156 19180 S 104.7 1.4 51:18.67 envoy
1 istio-p+ 20 0 138296 25400 20892 S 0.0 0.3 0:07.84 pilot-agent
33 istio-p+ 20 0 36836 3016 2584 R 0.0 0.0 0:00.00 top
There has been a forum thread about it, but this is now ongoing without any resolution. So far the only workaround has been to downgrade the sidecar to 1.2.6
https://discuss.istio.io/t/istio-sidecar-consuming-high-cpu/3894
Any ideas on how I can troubleshoot further?
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
CPU not going high
Steps to reproduce the bug
Unable to reproduce
Version (include the output of istioctl version --remote and kubectl version)
citadel version: 1.3.0-1.3.3
galley version: 1.3.0-1.3.3
ingressgateway version: 1.3.0-1.3.3
pilot version: 1.3.0-1.3.3
policy version: 1.3.0-1.3.3
sidecar-injector version: 1.3.0-1.3.3
telemetry version: 1.3.0-1.3.3
How was Istio installed?
helm chart
Environment where bug was observed (cloud vendor, OS, etc)
amazon eks
Bug description
We are running Istio v1.3.3 (this also happend on v1.3.0), most of our applications are running fine. However, in one of our deployments the CPU on the sidecar is pegged at 1.0 for the life of the pod. This happens for the majority of the pods in the same deployment but not all. The pod contains nginx, istio-proxy and an application container all of which have normal CPU use except istio-proxy. I turned on debugging on the sidecar however there are not really interesting.
I was able to collect an strace from the PID of the problematic pod
Here is the output of
kubectl -n $namespace exec -it $pod -c istio-proxy -- top on a pod with busy-envoy
There has been a forum thread about it, but this is now ongoing without any resolution. So far the only workaround has been to downgrade the sidecar to 1.2.6
https://discuss.istio.io/t/istio-sidecar-consuming-high-cpu/3894
Any ideas on how I can troubleshoot further?
Affected product area (please put an X in all that apply)
[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure
Expected behavior
CPU not going high
Steps to reproduce the bug
Unable to reproduce
Version (include the output of istioctl version --remote and kubectl version)
citadel version: 1.3.0-1.3.3
galley version: 1.3.0-1.3.3
ingressgateway version: 1.3.0-1.3.3
pilot version: 1.3.0-1.3.3
policy version: 1.3.0-1.3.3
sidecar-injector version: 1.3.0-1.3.3
telemetry version: 1.3.0-1.3.3
How was Istio installed?
helm chart
Environment where bug was observed (cloud vendor, OS, etc)
amazon eks