Merged
Conversation
380ca4f to
b5b31d9
Compare
Lukasa
reviewed
Jul 14, 2025
44847a4 to
51094db
Compare
Lukasa
approved these changes
Jul 14, 2025
zaneenders
pushed a commit
to zaneenders/swift-nio
that referenced
this pull request
Jul 23, 2025
### Motivation: `inEventLoop` is very much in the performance path of SwiftNIO, especially these days with Concurrency, `NIOLoopBound` and friends. Previously, we relied on `pthread_equal(pthread_self(), myPthread)`, however, this could cause a number of issues. 1. Holding onto a `pthread_t` after `.join` is actually illegal (fixed in apple#3297) 2. Potential ABA issues when `pthread_t` pointer are re-used for new pthreads 3. Fix would require a lock around `myPthread` which makes things (2x slower, even without contention) ### Modifications: - New type `SelectableEventLoopUniqueID` which can be packed into a `UInt` - Attach them into a C thread local ### Result: - Even faster than the old, incorrect version - old: `measuring: el_in_eventloop_100M: 0.257395375, 0.241049208, 0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541, 0.225078834, 0.236395, 0.233305167` - new: `measuring: el_in_eventloop_100M: 0.175561125, 0.187225625, 0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458, 0.17665125, 0.17897475, 0.18038775` - More correct - Groundwork to make apple#3297 not make things slower
zaneenders
pushed a commit
to zaneenders/swift-nio
that referenced
this pull request
Jul 23, 2025
### Motivation: Structured Concurrency is helpful but so far, MTELG was very difficult to use with it mostly because Swift is still missing `async` in `defer`s. This builds on - apple#3297 - apple#3302 - apple#3304 ### Modifications: - Add MTELG.withELG { ... } ### Result: MTELG + SC = <3
Lukasa
added a commit
to Lukasa/swift-nio
that referenced
this pull request
Feb 27, 2026
Motivation:
Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.
These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.
Modifications:
- Use some fancy C thread-locals, which have much better
performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
thread-local, avoiding dynamic casts.
Result:
Further improved performance for concurrency takeover.
Lukasa
added a commit
to Lukasa/swift-nio
that referenced
this pull request
Feb 27, 2026
Motivation:
Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.
These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.
Modifications:
- Use some fancy C thread-locals, which have much better
performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
thread-local, avoiding dynamic casts.
Result:
Further improved performance for concurrency takeover.
Lukasa
added a commit
that referenced
this pull request
Feb 27, 2026
Motivation: Right now we store our selectable event loop reference in a NIO ThreadSpecificVariable. That's nice enough, but this has become a hot code path in cases where users are using the complete concurrency override. In those cases, the ThreadSpecificVariable forces us to both call `pthread_getspecific` and to end up needing to perform a dynamic cast to our expected type. These dynamic casts are costly: in one sample project where we spent fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of that dynamic cast _alone_ was 0.5% of the runtime, with pthread_getspecific being another 0.2%. This is 10% of the cost of dispatching a job, entirely unnecessarily. Modifications: - Use some fancy C thread-locals, which have much better performance: see #3302 for more on that. - Use Unamanged to pass the self reference into and out of the thread-local, avoiding dynamic casts. Result: Further improved performance for concurrency takeover. Before this change: ``` any() from event loop thread (fast path): 30.654833 ns/iteration any() from outside (fallback path): 17.881084 ns/iteration ``` After this change: ``` any() from event loop thread (fast path): 11.571958 ns/iteration any() from outside (fallback path): 17.52175 ns/iteration ``` Nearly a 3x speedup on that particular operation. Not so bad. Note the irony that before this change it was actually more expensive to call `any` from _on_ an EL thread than off it. Both calls had to call pthread_getspecific, which means all of this cost was the runtime type conversion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
inEventLoopis very much in the performance path of SwiftNIO, especially these days with Concurrency,NIOLoopBoundand friends. Previously, we relied onpthread_equal(pthread_self(), myPthread), however, this could cause a number of issues.pthread_tafter.joinis actually illegal (fixed in NIOThread refactor: pthread_t lifetimes #3297)pthread_tpointer are re-used for new pthreadsmyPthreadwhich makes things (2x slower, even without contention)Modifications:
SelectableEventLoopUniqueIDwhich can be packed into aUInttestInEventLoopABAProblem()which does some basic testing (passes now, failed before this PR)Result:
measuring: el_in_eventloop_100M: 0.257395375, 0.241049208, 0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541, 0.225078834, 0.236395, 0.233305167measuring: el_in_eventloop_100M: 0.175561125, 0.187225625, 0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458, 0.17665125, 0.17897475, 0.18038775