Skip to content

improve inEventLoop checks#3302

Merged
weissi merged 1 commit intoapple:mainfrom
weissi:jw-el-id
Jul 14, 2025
Merged

improve inEventLoop checks#3302
weissi merged 1 commit intoapple:mainfrom
weissi:jw-el-id

Conversation

@weissi
Copy link
Copy Markdown
Member

@weissi weissi commented Jul 14, 2025

Motivation:

inEventLoop is very much in the performance path of SwiftNIO, especially these days with Concurrency, NIOLoopBound and friends. Previously, we relied on pthread_equal(pthread_self(), myPthread), however, this could cause a number of issues.

  1. Holding onto a pthread_t after .join is actually illegal (fixed in NIOThread refactor: pthread_t lifetimes #3297)
  2. ABA issues when pthread_t pointer are re-used for new pthreads
  3. Fix would require a lock around myPthread which makes things (2x slower, even without contention)

Modifications:

  • New type SelectableEventLoopUniqueID which can be packed into a UInt
  • Attach them into a C thread local
  • Added testInEventLoopABAProblem() which does some basic testing (passes now, failed before this PR)

Result:

  • Even faster than the old, incorrect version
    • old: measuring: el_in_eventloop_100M: 0.257395375, 0.241049208, 0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541, 0.225078834, 0.236395, 0.233305167
    • new: measuring: el_in_eventloop_100M: 0.175561125, 0.187225625, 0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458, 0.17665125, 0.17897475, 0.18038775
  • More correct
  • Groundwork to make NIOThread refactor: pthread_t lifetimes #3297 not make things slower

@weissi weissi requested a review from Lukasa July 14, 2025 13:52
@weissi weissi force-pushed the jw-el-id branch 3 times, most recently from 380ca4f to b5b31d9 Compare July 14, 2025 14:14
@weissi weissi added the 🔨 semver/patch No public API change. label Jul 14, 2025
@weissi weissi enabled auto-merge (squash) July 14, 2025 14:15
@weissi weissi force-pushed the jw-el-id branch 2 times, most recently from 44847a4 to 51094db Compare July 14, 2025 14:38
@weissi weissi requested a review from Lukasa July 14, 2025 14:39
@weissi weissi merged commit 0b65385 into apple:main Jul 14, 2025
41 checks passed
@weissi weissi deleted the jw-el-id branch July 14, 2025 16:22
weissi added a commit that referenced this pull request Jul 15, 2025
### Motivation:

Structured Concurrency is helpful but so far, MTELG was very difficult
to use with it mostly because Swift is still missing `async` in
`defer`s.

This builds on
- #3297 
- #3302
- #3304

### Modifications:

- Add MTELG.withELG { ... }

### Result:

MTELG + SC = <3
zaneenders pushed a commit to zaneenders/swift-nio that referenced this pull request Jul 23, 2025
### Motivation:

`inEventLoop` is very much in the performance path of SwiftNIO,
especially these days with Concurrency, `NIOLoopBound` and friends.
Previously, we relied on `pthread_equal(pthread_self(), myPthread)`,
however, this could cause a number of issues.

1. Holding onto a `pthread_t` after `.join` is actually illegal (fixed
in apple#3297)
2. Potential ABA issues when `pthread_t` pointer are re-used for new
pthreads
3. Fix would require a lock around `myPthread` which makes things (2x
slower, even without contention)

### Modifications:

- New type `SelectableEventLoopUniqueID` which can be packed into a
`UInt`
- Attach them into a C thread local

### Result:

- Even faster than the old, incorrect version
- old: `measuring: el_in_eventloop_100M: 0.257395375, 0.241049208,
0.243188792, 0.259125916, 0.24843225, 0.229690125, 0.244281541,
0.225078834, 0.236395, 0.233305167`
- new: `measuring: el_in_eventloop_100M: 0.175561125, 0.187225625,
0.199269375, 0.19740975, 0.1922695, 0.179850958, 0.177612458,
0.17665125, 0.17897475, 0.18038775`
- More correct
- Groundwork to make apple#3297 not make things slower
zaneenders pushed a commit to zaneenders/swift-nio that referenced this pull request Jul 23, 2025
### Motivation:

Structured Concurrency is helpful but so far, MTELG was very difficult
to use with it mostly because Swift is still missing `async` in
`defer`s.

This builds on
- apple#3297 
- apple#3302
- apple#3304

### Modifications:

- Add MTELG.withELG { ... }

### Result:

MTELG + SC = <3
Lukasa added a commit to Lukasa/swift-nio that referenced this pull request Feb 27, 2026
Motivation:

Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.

These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.

Modifications:

- Use some fancy C thread-locals, which have much better
    performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
    thread-local, avoiding dynamic casts.

Result:

Further improved performance for concurrency takeover.
Lukasa added a commit to Lukasa/swift-nio that referenced this pull request Feb 27, 2026
Motivation:

Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.

These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.

Modifications:

- Use some fancy C thread-locals, which have much better
    performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
    thread-local, avoiding dynamic casts.

Result:

Further improved performance for concurrency takeover.
Lukasa added a commit that referenced this pull request Feb 27, 2026
Motivation:

Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a hot
code path in cases where users are using the complete concurrency
override. In those cases, the ThreadSpecificVariable forces us to both
call `pthread_getspecific` and to end up needing to perform a dynamic
cast to our expected type.

These dynamic casts are costly: in one sample project where we spent
fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of
that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.

Modifications:

- Use some fancy C thread-locals, which have much better performance:
see #3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
thread-local, avoiding dynamic casts.

Result:

Further improved performance for concurrency takeover.

Before this change:

```
any() from event loop thread (fast path): 30.654833 ns/iteration
any() from outside (fallback path): 17.881084 ns/iteration
```

After this change:

```
any() from event loop thread (fast path): 11.571958 ns/iteration
any() from outside (fallback path): 17.52175 ns/iteration
```

Nearly a 3x speedup on that particular operation. Not so bad.

Note the irony that before this change it was actually more expensive to
call `any` from _on_ an EL thread than off it. Both calls had to call
pthread_getspecific, which means all of this cost was the runtime type
conversion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants