Hi Netty team,
First of all, thank you for the excellent io_uring integration. In my benchmarks it consistently outperforms epoll in nearly every aspect — except zero-copy file transfer.
While testing (via Reactor Netty) on Linux ARM 6.14 (QEMU on MacBook M1), I measured throughput and latency for 1 KB, 4 KB, 32 KB, 64 KB, 1 MB, and 16 MB file responses using wrk -c400 -t1 -d10s.
Epoll with sendfile was my baseline; io_uring used IoUringFileRegion. Across all sizes I observed a consistent 10–25 % performance gap. For example, for a 1 MB file:
sendfile ≈ 7.2 k RPS
io_uring (splice-based) ≈ 6 k RPS.
After digging into it, it seems the main overhead comes from the real pipe used under the hood by IORING_OP_SPLICE — especially noticeable when the file exceeds the default 64 KB pipe buffer.
Would it be possible to consider:
An option to use the classic sendfile path on the io_uring transport (either per-channel or per-IoUringFileRegion), until io_uring gains a direct file→socket operation or native sendfile opcode.
A way to configure the pipe size — manually or automatically based on file length — by calling fcntl(F_SETPIPE_SZ) / F_GETPIPE_SZ.
I understand my environment (QEMU VM) may skew absolute numbers, but this behavior aligns with known io_uring splice characteristics.
These two knobs would make io_uring truly competitive for large static file transfers.
Thanks a lot for your time and for maintaining such a performant stack!
Hi Netty team,
First of all, thank you for the excellent io_uring integration. In my benchmarks it consistently outperforms epoll in nearly every aspect — except zero-copy file transfer.
While testing (via Reactor Netty) on Linux ARM 6.14 (QEMU on MacBook M1), I measured throughput and latency for 1 KB, 4 KB, 32 KB, 64 KB, 1 MB, and 16 MB file responses using
wrk -c400 -t1 -d10s.Epoll with sendfile was my baseline; io_uring used IoUringFileRegion. Across all sizes I observed a consistent 10–25 % performance gap. For example, for a 1 MB file:
sendfile ≈ 7.2 k RPS
io_uring (splice-based) ≈ 6 k RPS.
After digging into it, it seems the main overhead comes from the real pipe used under the hood by IORING_OP_SPLICE — especially noticeable when the file exceeds the default 64 KB pipe buffer.
Would it be possible to consider:
An option to use the classic sendfile path on the io_uring transport (either per-channel or per-IoUringFileRegion), until io_uring gains a direct file→socket operation or native sendfile opcode.
A way to configure the pipe size — manually or automatically based on file length — by calling fcntl(F_SETPIPE_SZ) / F_GETPIPE_SZ.
I understand my environment (QEMU VM) may skew absolute numbers, but this behavior aligns with known io_uring splice characteristics.
These two knobs would make io_uring truly competitive for large static file transfers.
Thanks a lot for your time and for maintaining such a performant stack!