-
-
Notifications
You must be signed in to change notification settings - Fork 92
Integration tests failing after 5 hours on corretto 25 #3188
Description
Keith noticed that main has been failing it's corretto 25 maven build for months https://github.com/kroxylicious/kroxylicious/actions/runs/21229674435/job/61085219104
We tracked it down to the introduction of Authz https://github.com/kroxylicious/kroxylicious/actions/runs/19648936372
Running locally I observed that the test io.kroxylicious.it.filter.authorization.MetadataAuthzEquivalenceIT was slowing down over time, heading towards one of these timeouts and inspection with jstack showed active threads was growing over time. With >400 threads waiting in a state like:
"multiThreadIoEventLoopGroup-749-1" #6070 [39247] prio=10 os_prio=0 cpu=22.83ms elapsed=3.62s tid=0x00007fe48dec6e40 nid=39247 runnable [0x00007fe4bfdfe000]
java.lang.Thread.State: RUNNABLE
at jdk.internal.misc.ScopedMemoryAccess.closeScope0(java.base@25.0.1/Native Method)
at jdk.internal.misc.ScopedMemoryAccess.closeScope(java.base@25.0.1/ScopedMemoryAccess.java:88)
at jdk.internal.foreign.SharedSession.justClose(java.base@25.0.1/SharedSession.java:92)
at jdk.internal.foreign.MemorySessionImpl.close(java.base@25.0.1/MemorySessionImpl.java:240)
at jdk.internal.foreign.ArenaImpl.close(java.base@25.0.1/ArenaImpl.java:47)
at io.netty.util.internal.CleanerJava25$CleanableDirectBufferImpl.clean(CleanerJava25.java:206)
at io.netty.channel.uring.MsgHdrMemory.release(MsgHdrMemory.java:163)
at io.netty.channel.uring.MsgHdrMemoryArray.release(MsgHdrMemoryArray.java:81)
at io.netty.channel.uring.IoUringIoHandler.completeRingClose(IoUringIoHandler.java:463)
at io.netty.channel.uring.IoUringIoHandler.destroy(IoUringIoHandler.java:398)
at io.netty.channel.SingleThreadIoEventLoop.cleanup(SingleThreadIoEventLoop.java:271)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1269)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.runWith(java.base@25.0.1/Thread.java:1487)
at java.lang.Thread.run(java.base@25.0.1/Thread.java:1474)
Which I tracked down to the io.kroxylicious.test.client.KafkaClient simple test client. This was configured to prefer iouring if available, just a copy-paste job of our production code at the time.
So suspect some issue with the combo of netty/IOUring/corretto25 since we can see a java25 specific cleaner in there. My debugging foo hits a wall here since we are jamming up on some native code.
We have talked before about removing IOUring from the test client as we aren't really interested in it for these integration testing purposes. My quick experiment locally noticably dropped the test runtime so I think this is going to fix it up.