-
Notifications
You must be signed in to change notification settings - Fork 5.3k
high CPU in envoy 1.16.0 #14784
Description
After creating some listeners, we saw a spike in the CPU usage of envoy to 600% (8 cores), although there was no traffic.
The listeners we created were set to serve ports that were taken by another process (that eventually released them). Although we created 6 listeners, that serve 6 different ports, we only saw envoy listening to 3 ports.
We didn't have cpu profiling enabled and it's not reproducible, but we do have strace logs that show envoy's workers performing this in a loop, which we think causes the CPU spike:
epoll_wait(41, [{EPOLLHUP, {u32=57, u64=57}}, {EPOLLHUP, {u32=58, u64=58}}, {EPOLLHUP, {u32=59, u64=59}}], 32, 44) = 3
accept4(57, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(57, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
accept4(58, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(58, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
accept4(59, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(59, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
epoll_wait(41, [{EPOLLHUP, {u32=57, u64=57}}, {EPOLLHUP, {u32=58, u64=58}}, {EPOLLHUP, {u32=59, u64=59}}], 32, 44) = 3
accept4(57, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(57, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
accept4(58, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(58, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
accept4(59, 0x7f458aad5070, 0x7f458aad5008, SOCK_NONBLOCK) = -1 EINVAL (Invalid argument)
accept(59, 0x7f458aad5070, 0x7f458aad5008) = -1 EINVAL (Invalid argument)
We think that the missing 3 ports are those mentioned by this strace. By envoy's logs, these listeners were created successfully.
In our scenario, the other process stops serving these ports and envoy starts to - but these events are asynchronous so there's a period of time (several seconds) where envoy tries to bind the ports and gets refused by "address already used" - but we only saw it for the "good ports", and didn't saw it for the ports that it did not serve (for them we only saw a successful log for the creation of the listeners).
The question is - could envoy possibly create the listeners and think it binds the port although it's unsuccessful, that could lead to the above loop?
Running envoy 1.16.0 on ubuntu 16.04.