Skip to content

Istio Sidecar consuming high CPU Istio 1.30-1.3.3 #18229

@crhuber

Description

@crhuber

Bug description
We are running Istio v1.3.3 (this also happend on v1.3.0), most of our applications are running fine. However, in one of our deployments the CPU on the sidecar is pegged at 1.0 for the life of the pod. This happens for the majority of the pods in the same deployment but not all. The pod contains nginx, istio-proxy and an application container all of which have normal CPU use except istio-proxy. I turned on debugging on the sidecar however there are not really interesting.

 istio-proxy [2019-09-25 09:45:06.322][27][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C1145026] response complete
 istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [src/envoy/http/mixer/filter.cc:214] Called Mixer::Filter : onDestroy state: 2
 istio-proxy [2019-09-25 09:45:06.322][27][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:199] [C1145026] response complete
 istio-proxy [2019-09-25 09:45:06.322][27][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:237] [C1145026] moving to ready
 istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [src/envoy/http/mixer/filter.cc:228] Called Mixer::Filter : log
 istio-proxy [2019-09-25 09:45:06.322][27][debug][filter] [./src/envoy/http/mixer/report_data.h:144] No dynamic_metadata found for filter envoy.filters.http.rbac
 istio-proxy [2019-09-25 09:45:06.323][27][debug][filter] [src/istio/control/client_context_base.cc:138] Report attributes: attributes {
 istio-proxy   key: "connection.mtls"
 istio-proxy   value {
 istio-proxy     bool_value: false
 istio-proxy   }
 istio-proxy }
 istio-proxy attributes {
 istio-proxy   key: "context.protocol"
 istio-proxy   value {
 istio-proxy     string_value: "http"
 istio-proxy   }
 istio-proxy }
 istio-proxy attributes {
 istio-proxy   key: "context.prox
 istio-proxy [2019-09-25 09:45:06.324][27][debug][connection] [external/envoy/source/common/network/connection_impl.cc:520] [C1145208] remote close
 istio-proxy [2019-09-25 09:45:06.324][27][debug][connection] [external/envoy/source/common/network/connection_impl.cc:190] [C1145208] closing socket: 0
 istio-proxy [2019-09-25 09:45:06.324][27][debug][main] [external/envoy/source/server/connection_handler_impl.cc:80] [C1145208] adding to cleanup list
 istio-proxy [2019-09-25 09:45:06.324][27][debug][filter] [src/envoy/tcp/mixer/filter.cc:100] [C1138233] Called tcp filter onRead bytes: 17
 istio-proxy [2019-09-25 09:45:06.324][28][debug][router] [external/envoy/source/common/router/router.cc:1002] [C637330][S2924433164096281138] upstream headers complete: end_stream=false
 istio-proxy [2019-09-25 09:45:06.324][28][debug][filter] [src/envoy/http/mixer/filter.cc:141] Called Mixer::Filter : encodeHeaders 2
 istio-proxy [2019-09-25 09:45:06.324][28][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1435] [C637330][S2924433164096281138] encoding headers via codec (end_stream=false):

I was able to collect an strace from the PID of the problematic pod

sudo strace -c -f -p 5296
strace: Process 5296 attached with 15 threads
^Cstrace: Process 5296 detached
strace: Process 5329 detached
strace: Process 5330 detached
strace: Process 5331 detached
strace: Process 5332 detached
strace: Process 5338 detached
strace: Process 5340 detached
strace: Process 5341 detached
strace: Process 5342 detached
strace: Process 5343 detached
strace: Process 5344 detached
strace: Process 5345 detached
strace: Process 5346 detached
strace: Process 5347 detached
strace: Process 6356 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.39    0.176384         256       687        44 futex
  9.04    0.021728          49       436           nanosleep
  8.82    0.021201         122       173           epoll_pwait
  6.45    0.015504        3100         5         2 restart_syscall
  0.64    0.001548          30        50           write
  0.50    0.001208          11       109        50 read
  0.31    0.000753          75        10           sched_yield
  0.30    0.000715          14        50        25 accept4
  0.24    0.000567          22        25           close
  0.16    0.000385           7        50           epoll_ctl
  0.08    0.000181           7        25           getsockname
  0.07    0.000178           7        25           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.240352                  1645       121 total

Here is the output of
kubectl -n $namespace exec -it $pod -c istio-proxy -- top on a pod with busy-envoy

top - 15:33:18 up 18:25,  0 users,  load average: 1.58, 1.56, 1.53
Tasks:   3 total,   1 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 22.5 us, 15.0 sy,  0.0 ni, 62.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st
KiB Mem :  7807704 total,  1493152 free,  3275896 used,  3038656 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  4147820 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                                                                                                                                              
   21 istio-p+  20   0  228232 112156  19180 S 104.7  1.4  51:18.67 envoy                                                                                                                                                                                                                                                                                                
    1 istio-p+  20   0  138296  25400  20892 S   0.0  0.3   0:07.84 pilot-agent                                                                                                                                                                                                                                                                                          
   33 istio-p+  20   0   36836   3016   2584 R   0.0  0.0   0:00.00 top

There has been a forum thread about it, but this is now ongoing without any resolution. So far the only workaround has been to downgrade the sidecar to 1.2.6

https://discuss.istio.io/t/istio-sidecar-consuming-high-cpu/3894

Any ideas on how I can troubleshoot further?

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure
[ ] Docs
[ ] Installation
[ ] Networking
[x ] Performance and Scalability
[ ] Policies and Telemetry
[ ] Security
[ ] Test and Release
[ ] User Experience
[ ] Developer Infrastructure

Expected behavior
CPU not going high

Steps to reproduce the bug
Unable to reproduce

Version (include the output of istioctl version --remote and kubectl version)

citadel version: 1.3.0-1.3.3
galley version: 1.3.0-1.3.3
ingressgateway version: 1.3.0-1.3.3
pilot version: 1.3.0-1.3.3
policy version: 1.3.0-1.3.3
sidecar-injector version: 1.3.0-1.3.3
telemetry version: 1.3.0-1.3.3

How was Istio installed?
helm chart

Environment where bug was observed (cloud vendor, OS, etc)
amazon eks

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions