Native Epoll consuming 100% of one CPU normal? #16236

isaacrivriv · 2026-02-05T19:25:49Z

isaacrivriv
Feb 5, 2026

We are seeing some issues with the Native library where the cpu time is higher with Epoll than with NIO which interferes with some scaling logic we had before given the CPU defaults are now higher than usual. To measure the CPU we are using the MBeanServer APIs getting the attributes of ProcessCpuTime and SystemCpuLoad for the program using ManagementFactory.getOperatingSystemMXBean() which has worked before but not with Epoll.

On average without the Epoll changes, we found that the CPU was normally below 10% and so the scaling logic worked as expected. Now with Epoll, the CPU numbers are above 10% but after some investigation it seems like the issue is because the Epoll thread is consuming 100% of a single CPU with no traffic happening and so on average with 8 CPUs that shows around 12.5% (which is 1CPU completely used over 8 CPUs available). Is this normal? There isn't really any traffic happening in the server since this is on startup so I wouldn't expect the CPU usage it is reporting. From what I could see, the main issue seems to be because of the epollBusyWait case here where a busy wait is happening and seems to be escalating the CPU numbers where in KQueue and NIO the busy wait case does not have that logic... Any comments would be appreciated.

Similar issue #5896 but didn't really see a resolution...

Edit: I was corrected with more info, the 10% cpu issue is really because of the number of CPUs used. A more precise way to describe the behavior is that the Netty Epoll thread is using 100% of a single CPU when there is no traffic happening. On container workload, this would show up as the containers using massive CPU usage which doesn't seem right. I reworded the discussion to match this.

Answered by isaacrivriv

Feb 8, 2026

I opened #16240 since this seems like an issue in the source. Will close this and continue the discussions there.

View full answer

franz1981 · 2026-02-06T05:33:28Z

franz1981
Feb 6, 2026
Collaborator

Is this normal?

Afaik nope, and I don't remember the default select strategy to end up there (BUSY_WAiT).
You have overridden or configured differently it?

Reading the other issue it ends up with no more actionable data point for us (the user switched to a new stack without using a proper profiler): if you can provide some flamegrph/profiling information let's see if we can do something for it 🙏

1 reply

isaacrivriv Feb 6, 2026
Author

Thanks for the reply Francesco! I've been investigating this a bit more and I think I'm understanding what is happening. I was wrong about the BUSY_WAIT, it seems like this happens in the SELECT strategy. The issue happens on startup because we are registering the server channel to the eventloop before binding it to the port after some logic finishes. When the binding finishes, the CPU usage actually goes back down from 100% but before that, it stays very high. If the channel is never binded to a port and left registered, the CPU doesn't go down.

Adding some logging in the Netty classes, I think the issue seems to be in the EpollIoHandler when we are not pendingWakeup and so the epollWaitTimedBox is not called but only the epollWait. I'm seeing that when called, it says delaySeconds is 1 and so I would assume it sleeps for a second before continuing IMO. But that is not what is happening because I can see the run logic of the IoHandler running right after the epollWait method finishes. This would explain why the CPU usage is so high because the thread is constantly running with no pause.

After the bind finishes, I can see the run logic running a couple of times but eventually when the same epollWait method is called, I can see that it is actually waiting one second before running the run logic once again. I think this warrants an issue but I will continue investigating this more.

luis-guideti · 2026-02-06T18:01:27Z

luis-guideti
Feb 6, 2026

When I originally opened that thread, I didn’t have much experience and couldn’t collect useful diagnostic data.

At the time, we worked around the issue by switching from Netty to basic AIO sockets on CentOS 7 with Oracle JRE 8. Later, after moving to a new hosting provider (Ubuntu + Oracle JDK 11), we switched back to Netty and the issue no longer occurred.

I was never able to pinpoint the exact cause of the epoll thread spinning. From what I recall, it mainly happened while idle; CPU usage under load was similar to NIO. As a temporary workaround, I added a small sleep in the event loop when there were no active connections.

1 reply

isaacrivriv Feb 6, 2026
Author

Thanks for the reply! I added some more context to Francesco's answer above after some more investigation. I'll continue taking a look at this but if this was fixed for you after a new Netty version it could be a different issue since we are testing with various JDK versions and with Netty 4.2.7.Final...

isaacrivriv · 2026-02-08T16:47:55Z

isaacrivriv
Feb 8, 2026
Author

I opened #16240 since this seems like an issue in the source. Will close this and continue the discussions there.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native Epoll consuming 100% of one CPU normal? #16236

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Native Epoll consuming 100% of one CPU normal? #16236

Uh oh!

Uh oh!

isaacrivriv Feb 5, 2026

Replies: 3 comments · 2 replies

Uh oh!

Uh oh!

franz1981 Feb 6, 2026 Collaborator

Uh oh!

Uh oh!

isaacrivriv Feb 6, 2026 Author

Uh oh!

luis-guideti Feb 6, 2026

Uh oh!

isaacrivriv Feb 6, 2026 Author

Uh oh!

isaacrivriv Feb 8, 2026 Author

isaacrivriv
Feb 5, 2026

Replies: 3 comments 2 replies

franz1981
Feb 6, 2026
Collaborator

isaacrivriv Feb 6, 2026
Author

luis-guideti
Feb 6, 2026

isaacrivriv Feb 6, 2026
Author

isaacrivriv
Feb 8, 2026
Author