Parker interface & std.Mutex performance improvements by kprotty · Pull Request #3585 · ziglang/zig

kprotty · 2019-11-04T03:52:04Z

std.Mutex uses a simple locking scheme for linux, relies on CriticalSection for windows and falls back to spinlocking on other platforms. This PR acts as two parts towards improving it:

1) Adaptive Locking

For high contention cases, eager blocking mutexes incur a penalty of a syscall when they may not need to. In order to address this, the mutex can spin for a little bit trying to acquire the lock similar to a spinlock before deciding to block. This improves performance when the time spent in the critical section is minimal and acquiring/releasing is done frequently. The implementation chosen for this was that of lock_futex.go from Golang 1.13 as it provides a nice balance between spinning and deciding to block (another possibility could be rust/webkit parking_lot). Because this implementation only needs a futex interface, it can be reused:

2) Parker API

Most synchronization primitives such as Mutexes, RwLocks, Condvars, Events and Semaphores can be built upon atomic instructions and futexes for handling blocking. Another point of this PR was to setup a cross-platform futex (Parker) interface in which other primitives as listed above could be built off of. The default one provided is ThreadParker which differs from the current blocking implementation scheme:

On windows, it detects at runtime whether to use WaitOnAddress (supported since Win8+ and most similar to linux futex) or NT Keyed Events (supported since WinXP+ and is the inner backing of CriticalSection). This allows the distinction between std.Mutex and std.StaticallyInitializedMutex to dissapear as it can now be initialized statically
On posix platforms, it uses pthread_cond_t for synchronization which also supports static initialization. This fairs better for longer blocking critical sections compared to the spinlock default of the current std.Mutex implementation.
On linux, it still uses linux_futex so not many improvements besides adaptive spinning there

Results & Future Implications

Because the Parker now has a standardized interface, one could replace ThreadParker with something like AsyncParker and reuse the synchronization primitive code for std.event synchronization objects. In order to demonstrate this, I've provided some example code for AsyncParker as well as a naive benchmark to test the performance of std.Mutex in comparison to this new adaptive mutex: https://github.com/kprotty/zig-adaptive-lock/ The results for high contended, small critical section cases are promising:

Windows 10: 7-8x speedup
MacOSX: 19-22x speedup
Linux (Pthread): 2x speedup
Linux (Futex): 2x speedup

kprotty · 2019-11-04T18:33:41Z

Theres currently a bug on this where on arm & i386 windows, the address being passed into ThreadParker mysteriously changes to another address and I can't figure out why. Could this be related to #3433 ?

LemonBoy · 2019-11-05T18:44:43Z

Theres currently a bug on this where on arm & i386 windows, the address being passed into ThreadParker mysteriously changes to another address and I can't figure out why.

Because you've declared WakeByAddressSingle (and WaitOnAddress, together with the declarations in KeyedEvent) to be extern meaning that means that the C calling convention is required. But it's Win32 API we're talking about here and those use the stdcall calling convention! Replacing extern with stdcallcc makes the test pass :)

The AArch64 failure you see in the CI is a bug in Qemu that's been fixed ~4years ago (EXCP_YIELD is indeed defined as 0x10004 in the source code).

kprotty · 2019-11-05T19:24:15Z

@LemonBoy No wonder! Tried doing extern stdcallcc for the fields but it was a syntax error. The arm bug was really throwing me off too. Thanks

- uses std.ThreadParker - supports static initialization (deprecates StaticallyInitializedMutex)

andrewrk · 2019-11-05T22:24:07Z

I'll look into using a newer Ubuntu for the CI in order to get a newer qemu

andrewrk

Thank you for this significant contribution!

This is close to being mergeable; I just have a few things for you to consider.

lib/std/mutex.zig

lib/std/spinlock.zig

lib/std/parker.zig

lib/std/os/windows/kernel32.zig

lib/std/parker.zig

lib/std/spinlock.zig

lib/std/os.zig

Rocknest · 2019-11-08T00:15:55Z

lib/std/os.zig

+pub fn sched_yield() void {
+    switch (builtin.os) {
+        .windows => _ = windows.kernel32.SwitchToThread(),
+        else => assert(system.sched_yield() == 0),


Maybe instead of asserting just ignore the result?

If sched_yield is defined to always succeed, which it is on Linux, then assert is correct here. If it can possibly fail, then we need to have that reflected in the error set.

Is this because sched_yield error'ing out means it will always error out like in FreeBSD's case?

Nice, OK so we have to treat this as a fallible API now. That's fine. I propose to make the error set:

pub const SchedYieldError = error{ /// The system is not configured to allow yielding SystemCannotYield, }; pub fn sched_yield() SchedYieldError!void { if (builtin.os == .windows) { // The return value has to do with how many other threads there are; it is not // an error condition on Windows. _ = windows.kernel32.SwitchToThread(); return; } switch (errno(system.sched_yield()) { 0 => return, ENOSYS => return error.SystemCannotYield, else => return error.SystemCannotYield, } }

Normally we have Unexpected as another possibility, but in this case it's unclear why this distinguishing would be beneficial.

Having the ENOSYS and else on different lines here is useful when looking at an error return trace.

The callsite of os.sched_yield can now choose how to deal with this possible error. Probably by ignoring it. But that's why we have these 2 layers of abstraction; the os.zig file is pretty low level, and exposes all the functionality of the underlying syscall, including errors.

lib/std/parker.zig

Remove StaticallyInitializedMutex

bb6ad1a

kprotty added 2 commits November 5, 2019 08:16

adaptive SpinLock

0d4f4aa

Move hybrid yielding into its own struct

84e370d

kprotty added 4 commits November 5, 2019 13:43

make SpinLock.Backoff pub

14209cf

ThreadParker implementation

465ebf4

Adaptive Mutex:

c9db420

- uses std.ThreadParker - supports static initialization (deprecates StaticallyInitializedMutex)

Disable asm("yield") for arm in SpinLock.yieldCpu() because of CI

4dbfc48

kprotty force-pushed the adaptive_lock branch from 5a6de50 to 4dbfc48 Compare November 5, 2019 22:16

kprotty force-pushed the adaptive_lock branch from 9a087a0 to 4dbfc48 Compare November 6, 2019 20:00

skip std.ThreadParker test on single-threaded

7045f1e

kprotty marked this pull request as ready for review November 7, 2019 13:54

lock the mutex on pthread_cond_signal()

92dac89

andrewrk mentioned this pull request Nov 7, 2019

ci: bump ubuntu from 16.04 to 18.04 #3626

Merged

andrewrk requested changes Nov 7, 2019

View reviewed changes

andrewrk reviewed Nov 7, 2019

View reviewed changes

lib/std/spinlock.zig Outdated Show resolved Hide resolved

kprotty added 5 commits November 7, 2019 15:32

move SpinLock definitions around

b535e86

remove WaitOnAddress backend in std.ThreadParker

b5d8463

fix SpinLock.yield for pull/3626

f41e58d

pthread_sched_yield -> sched_yield

12e68cb

Use system instead of builtin.link_libc

bb4abfd

andrewrk reviewed Nov 7, 2019

View reviewed changes

lib/std/os.zig Outdated Show resolved Hide resolved

lib/std/os.zig Outdated Show resolved Hide resolved

missing os declaration

fe8c1cf

Rocknest mentioned this pull request Nov 7, 2019

Proposal: for on ranges #358

Closed

Rocknest reviewed Nov 8, 2019

View reviewed changes

handle NtCreateKeyedEvent failure by spinning to ensure correctness

4390001

andrewrk reviewed Nov 8, 2019

View reviewed changes

lib/std/parker.zig Outdated Show resolved Hide resolved

cleanup getEventHandle

e2189b6

andrewrk merged commit e2189b6 into ziglang:master Nov 8, 2019

kprotty mentioned this pull request Nov 24, 2019

ResetEvent replacing ThreadParker + std.Mutex improvements #3751

Closed

Uh oh!

Conversation

kprotty commented Nov 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1) Adaptive Locking

2) Parker API

Results & Future Implications

Uh oh!

kprotty commented Nov 4, 2019

Uh oh!

LemonBoy commented Nov 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kprotty commented Nov 5, 2019

Uh oh!

andrewrk commented Nov 5, 2019

Uh oh!

andrewrk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rocknest Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

andrewrk Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

kprotty Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

andrewrk Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kprotty commented Nov 4, 2019 •

edited

Loading

LemonBoy commented Nov 5, 2019 •

edited

Loading