-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Migrate event loops from file events to buffer events #17922
Description
Title: Migrate event loops from file events to buffer events
Description:
This can help to spare at least 3 syscalls per request (two writev()s and one readv()).
Currently Envoy relies on Libevent as an implementation for event loops and uses it in the "readiness" paradigm. That is, Libevent (on behalf of the kernel) notifies the application (Envoy) that a certain file descriptor is ready to be written or read; then the application makes a syscall to read or to write to the file descriptor. ~20% of a request time span Envoy spends on waiting for return from the syscalls.
Libevent also provides the API called "bufferevents" for event loops working in the "completeness" paradigm (or buffered/ringed IO). That is, an application registers read and write buffers, then Libevent notifies the application that there is data available in the read buffer or that the chunk of data put in the write buffer has been consumed. In this case if the underlying OS supports such paradigm (like in case of Microsoft's IOCP) then less computing resources are spent on syscalls like writev() or readv() - there is no need for context switches. Otherwise it's still file events hidden under the hood.
Linux supports the "completeness" paradigm natively too with its io_uring syscall, but Libevent hasn't been updated to support it yet. There is an issue for that though.
Perhaps we could modify Envoy's event loop to work in the "completeness" paradigm by relying on "bufferevents" for streaming connections and hoping for io_uring/ioring support to be added soon. Currently I don't know if "bufferevents" incur additional overhead compared with the traditional "readiness" approach. With my quick hack Envoy works even ~5% faster (14500 rps vs 15500 rps), but I presume that's because I have few things broken like e.g. flow control.
Alternatively we could abstract Libevent out somehow and resort to home grown event loops using io_uring/ioring directly when it's available and falling back to Libevent's "readiness" API otherwise. Probably with this approach it would be easier to implement as an extension a hardware accelerated event loop bypassing the kernel completely for network transfers.
/cc @antoniovicente