improve inEventLoop checks by weissi · Pull Request #3302 · apple/swift-nio

weissi · 2025-07-14T13:52:27Z

Motivation:

inEventLoop is very much in the performance path of SwiftNIO, especially these days with Concurrency, NIOLoopBound and friends. Previously, we relied on pthread_equal(pthread_self(), myPthread), however, this could cause a number of issues.

Holding onto a pthread_t after .join is actually illegal (fixed in NIOThread refactor: pthread_t lifetimes #3297)
ABA issues when pthread_t pointer are re-used for new pthreads
Fix would require a lock around myPthread which makes things (2x slower, even without contention)

Modifications:

New type SelectableEventLoopUniqueID which can be packed into a UInt
Attach them into a C thread local
Added testInEventLoopABAProblem() which does some basic testing (passes now, failed before this PR)

Result:

Even faster than the old, incorrect version
- old: measuring: el_in_eventloop_100M: 0.257395375, 0.241049208, 0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541, 0.225078834, 0.236395, 0.233305167
- new: measuring: el_in_eventloop_100M: 0.175561125, 0.187225625, 0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458, 0.17665125, 0.17897475, 0.18038775
More correct
Groundwork to make NIOThread refactor: pthread_t lifetimes #3297 not make things slower

Sources/NIOCore/ByteBuffer-int.swift

Sources/NIOPosix/SelectableEventLoop.swift

### Motivation: Structured Concurrency is helpful but so far, MTELG was very difficult to use with it mostly because Swift is still missing `async` in `defer`s. This builds on - #3297 - #3302 - #3304 ### Modifications: - Add MTELG.withELG { ... } ### Result: MTELG + SC = <3

### Motivation: `inEventLoop` is very much in the performance path of SwiftNIO, especially these days with Concurrency, `NIOLoopBound` and friends. Previously, we relied on `pthread_equal(pthread_self(), myPthread)`, however, this could cause a number of issues. 1. Holding onto a `pthread_t` after `.join` is actually illegal (fixed in apple#3297) 2. Potential ABA issues when `pthread_t` pointer are re-used for new pthreads 3. Fix would require a lock around `myPthread` which makes things (2x slower, even without contention) ### Modifications: - New type `SelectableEventLoopUniqueID` which can be packed into a `UInt` - Attach them into a C thread local ### Result: - Even faster than the old, incorrect version - old: `measuring: el_in_eventloop_100M: 0.257395375, 0.241049208, 0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541, 0.225078834, 0.236395, 0.233305167` - new: `measuring: el_in_eventloop_100M: 0.175561125, 0.187225625, 0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458, 0.17665125, 0.17897475, 0.18038775` - More correct - Groundwork to make apple#3297 not make things slower

### Motivation: Structured Concurrency is helpful but so far, MTELG was very difficult to use with it mostly because Swift is still missing `async` in `defer`s. This builds on - apple#3297 - apple#3302 - apple#3304 ### Modifications: - Add MTELG.withELG { ... } ### Result: MTELG + SC = <3

Motivation: Right now we store our selectable event loop reference in a NIO ThreadSpecificVariable. That's nice enough, but this has become a hot code path in cases where users are using the complete concurrency override. In those cases, the ThreadSpecificVariable forces us to both call `pthread_getspecific` and to end up needing to perform a dynamic cast to our expected type. These dynamic casts are costly: in one sample project where we spent fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of that dynamic cast _alone_ was 0.5% of the runtime, with pthread_getspecific being another 0.2%. This is 10% of the cost of dispatching a job, entirely unnecessarily. Modifications: - Use some fancy C thread-locals, which have much better performance: see apple#3302 for more on that. - Use Unamanged to pass the self reference into and out of the thread-local, avoiding dynamic casts. Result: Further improved performance for concurrency takeover.

Motivation: Right now we store our selectable event loop reference in a NIO ThreadSpecificVariable. That's nice enough, but this has become a hot code path in cases where users are using the complete concurrency override. In those cases, the ThreadSpecificVariable forces us to both call `pthread_getspecific` and to end up needing to perform a dynamic cast to our expected type. These dynamic casts are costly: in one sample project where we spent fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of that dynamic cast _alone_ was 0.5% of the runtime, with pthread_getspecific being another 0.2%. This is 10% of the cost of dispatching a job, entirely unnecessarily. Modifications: - Use some fancy C thread-locals, which have much better performance: see #3302 for more on that. - Use Unamanged to pass the self reference into and out of the thread-local, avoiding dynamic casts. Result: Further improved performance for concurrency takeover. Before this change: ``` any() from event loop thread (fast path): 30.654833 ns/iteration any() from outside (fallback path): 17.881084 ns/iteration ``` After this change: ``` any() from event loop thread (fast path): 11.571958 ns/iteration any() from outside (fallback path): 17.52175 ns/iteration ``` Nearly a 3x speedup on that particular operation. Not so bad. Note the irony that before this change it was actually more expensive to call `any` from _on_ an EL thread than off it. Both calls had to call pthread_getspecific, which means all of this cost was the runtime type conversion.

weissi requested a review from Lukasa July 14, 2025 13:52

weissi force-pushed the jw-el-id branch 3 times, most recently from 380ca4f to b5b31d9 Compare July 14, 2025 14:14

weissi added the 🔨 semver/patch No public API change. label Jul 14, 2025

weissi enabled auto-merge (squash) July 14, 2025 14:15

Lukasa reviewed Jul 14, 2025

View reviewed changes

Sources/NIOCore/ByteBuffer-int.swift Show resolved Hide resolved

Sources/NIOPosix/SelectableEventLoop.swift Show resolved Hide resolved

Sources/NIOPosix/SelectableEventLoop.swift Show resolved Hide resolved

Sources/NIOPosix/SelectableEventLoop.swift Show resolved Hide resolved

weissi force-pushed the jw-el-id branch 2 times, most recently from 44847a4 to 51094db Compare July 14, 2025 14:38

weissi requested a review from Lukasa July 14, 2025 14:39

weissi force-pushed the jw-el-id branch from 51094db to d80e125 Compare July 14, 2025 14:43

improve inEventLoop checks

ee698ed

weissi force-pushed the jw-el-id branch from d80e125 to ee698ed Compare July 14, 2025 15:34

Lukasa approved these changes Jul 14, 2025

View reviewed changes

weissi merged commit 0b65385 into apple:main Jul 14, 2025
41 checks passed

weissi deleted the jw-el-id branch July 14, 2025 16:22

weissi mentioned this pull request Jul 15, 2025

Structured Concurrency compliant MTELG create/shutdown #3296

Merged

Lukasa mentioned this pull request Feb 27, 2026

Improve the performance of looking up the current SEL #3530

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve inEventLoop checks#3302

improve inEventLoop checks#3302
weissi merged 1 commit intoapple:mainfrom
weissi:jw-el-id

weissi commented Jul 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weissi commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation:

Modifications:

Result:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weissi commented Jul 14, 2025 •

edited

Loading