Skip to content

Integration tests failing after 5 hours on corretto 25 #3188

@robobario

Description

@robobario

Keith noticed that main has been failing it's corretto 25 maven build for months https://github.com/kroxylicious/kroxylicious/actions/runs/21229674435/job/61085219104

We tracked it down to the introduction of Authz https://github.com/kroxylicious/kroxylicious/actions/runs/19648936372

Running locally I observed that the test io.kroxylicious.it.filter.authorization.MetadataAuthzEquivalenceIT was slowing down over time, heading towards one of these timeouts and inspection with jstack showed active threads was growing over time. With >400 threads waiting in a state like:

"multiThreadIoEventLoopGroup-749-1" #6070 [39247] prio=10 os_prio=0 cpu=22.83ms elapsed=3.62s tid=0x00007fe48dec6e40 nid=39247 runnable  [0x00007fe4bfdfe000]
   java.lang.Thread.State: RUNNABLE
	at jdk.internal.misc.ScopedMemoryAccess.closeScope0(java.base@25.0.1/Native Method)
	at jdk.internal.misc.ScopedMemoryAccess.closeScope(java.base@25.0.1/ScopedMemoryAccess.java:88)
	at jdk.internal.foreign.SharedSession.justClose(java.base@25.0.1/SharedSession.java:92)
	at jdk.internal.foreign.MemorySessionImpl.close(java.base@25.0.1/MemorySessionImpl.java:240)
	at jdk.internal.foreign.ArenaImpl.close(java.base@25.0.1/ArenaImpl.java:47)
	at io.netty.util.internal.CleanerJava25$CleanableDirectBufferImpl.clean(CleanerJava25.java:206)
	at io.netty.channel.uring.MsgHdrMemory.release(MsgHdrMemory.java:163)
	at io.netty.channel.uring.MsgHdrMemoryArray.release(MsgHdrMemoryArray.java:81)
	at io.netty.channel.uring.IoUringIoHandler.completeRingClose(IoUringIoHandler.java:463)
	at io.netty.channel.uring.IoUringIoHandler.destroy(IoUringIoHandler.java:398)
	at io.netty.channel.SingleThreadIoEventLoop.cleanup(SingleThreadIoEventLoop.java:271)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:1269)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.runWith(java.base@25.0.1/Thread.java:1487)
	at java.lang.Thread.run(java.base@25.0.1/Thread.java:1474)

Which I tracked down to the io.kroxylicious.test.client.KafkaClient simple test client. This was configured to prefer iouring if available, just a copy-paste job of our production code at the time.

So suspect some issue with the combo of netty/IOUring/corretto25 since we can see a java25 specific cleaner in there. My debugging foo hits a wall here since we are jamming up on some native code.

We have talked before about removing IOUring from the test client as we aren't really interested in it for these integration testing purposes. My quick experiment locally noticably dropped the test runtime so I think this is going to fix it up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't workingtestRelates to testing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions