Parker interface & std.Mutex performance improvements#3585
Parker interface & std.Mutex performance improvements#3585andrewrk merged 17 commits intoziglang:masterfrom
Conversation
|
Theres currently a bug on this where on arm & i386 windows, the address being passed into ThreadParker mysteriously changes to another address and I can't figure out why. Could this be related to #3433 ? |
Because you've declared The AArch64 failure you see in the CI is a bug in Qemu that's been fixed ~4years ago ( |
|
@LemonBoy No wonder! Tried doing |
- uses std.ThreadParker - supports static initialization (deprecates StaticallyInitializedMutex)
5a6de50 to
4dbfc48
Compare
|
I'll look into using a newer Ubuntu for the CI in order to get a newer qemu |
9a087a0 to
4dbfc48
Compare
andrewrk
left a comment
There was a problem hiding this comment.
Thank you for this significant contribution!
This is close to being mergeable; I just have a few things for you to consider.
| pub fn sched_yield() void { | ||
| switch (builtin.os) { | ||
| .windows => _ = windows.kernel32.SwitchToThread(), | ||
| else => assert(system.sched_yield() == 0), |
There was a problem hiding this comment.
Maybe instead of asserting just ignore the result?
There was a problem hiding this comment.
If sched_yield is defined to always succeed, which it is on Linux, then assert is correct here. If it can possibly fail, then we need to have that reflected in the error set.
There was a problem hiding this comment.
Is this because sched_yield error'ing out means it will always error out like in FreeBSD's case?
There was a problem hiding this comment.
Nice, OK so we have to treat this as a fallible API now. That's fine. I propose to make the error set:
pub const SchedYieldError = error{
/// The system is not configured to allow yielding
SystemCannotYield,
};
pub fn sched_yield() SchedYieldError!void {
if (builtin.os == .windows) {
// The return value has to do with how many other threads there are; it is not
// an error condition on Windows.
_ = windows.kernel32.SwitchToThread();
return;
}
switch (errno(system.sched_yield()) {
0 => return,
ENOSYS => return error.SystemCannotYield,
else => return error.SystemCannotYield,
}
}Normally we have Unexpected as another possibility, but in this case it's unclear why this distinguishing would be beneficial.
Having the ENOSYS and else on different lines here is useful when looking at an error return trace.
The callsite of os.sched_yield can now choose how to deal with this possible error. Probably by ignoring it. But that's why we have these 2 layers of abstraction; the os.zig file is pretty low level, and exposes all the functionality of the underlying syscall, including errors.
std.Mutexuses a simple locking scheme for linux, relies on CriticalSection for windows and falls back to spinlocking on other platforms. This PR acts as two parts towards improving it:1) Adaptive Locking
For high contention cases, eager blocking mutexes incur a penalty of a syscall when they may not need to. In order to address this, the mutex can spin for a little bit trying to acquire the lock similar to a spinlock before deciding to block. This improves performance when the time spent in the critical section is minimal and acquiring/releasing is done frequently. The implementation chosen for this was that of lock_futex.go from Golang 1.13 as it provides a nice balance between spinning and deciding to block (another possibility could be rust/webkit
parking_lot). Because this implementation only needs a futex interface, it can be reused:2) Parker API
Most synchronization primitives such as Mutexes, RwLocks, Condvars, Events and Semaphores can be built upon atomic instructions and futexes for handling blocking. Another point of this PR was to setup a cross-platform futex (Parker) interface in which other primitives as listed above could be built off of. The default one provided is
ThreadParkerwhich differs from the current blocking implementation scheme:pthread_cond_tfor synchronization which also supports static initialization. This fairs better for longer blocking critical sections compared to the spinlock default of the current std.Mutex implementation.linux_futexso not many improvements besides adaptive spinning thereResults & Future Implications
Because the Parker now has a standardized interface, one could replace
ThreadParkerwith something likeAsyncParkerand reuse the synchronization primitive code forstd.eventsynchronization objects. In order to demonstrate this, I've provided some example code forAsyncParkeras well as a naive benchmark to test the performance ofstd.Mutexin comparison to this new adaptive mutex: https://github.com/kprotty/zig-adaptive-lock/ The results for high contended, small critical section cases are promising: