After some performance analysis, we've discovered that UNIX sockets and semaphores have very high overhead per context switch between shadow and the plugins it is running. Most of the time is spent waiting for the other side of the channel to signal a wakeup.
Some initial rough tests show that shared memory and spinlocks can improve performance as much as 100x.
Of course, we don't want to spin forever and burn CPU cycles; we want an intelligent algorithm that each side of the IPC channel first tries the spinlock for some amount of time, and if the channel isn't ready by that time, it falls back to using semaphores.
It probably makes sense to do this in multiple steps:
Useful links:
After some performance analysis, we've discovered that UNIX sockets and semaphores have very high overhead per context switch between shadow and the plugins it is running. Most of the time is spent waiting for the other side of the channel to signal a wakeup.
Some initial rough tests show that shared memory and spinlocks can improve performance as much as 100x.
Of course, we don't want to spin forever and burn CPU cycles; we want an intelligent algorithm that each side of the IPC channel first tries the spinlock for some amount of time, and if the channel isn't ready by that time, it falls back to using semaphores.
It probably makes sense to do this in multiple steps:
Support dynamic computation of the number of spins before falling back to semaphoreUseful links: