Improve the performance of looking up the current SEL#3530
Merged
Lukasa merged 2 commits intoapple:mainfrom Feb 27, 2026
Merged
Improve the performance of looking up the current SEL#3530Lukasa merged 2 commits intoapple:mainfrom
Lukasa merged 2 commits intoapple:mainfrom
Conversation
Motivation:
Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.
These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.
Modifications:
- Use some fancy C thread-locals, which have much better
performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
thread-local, avoiding dynamic casts.
Result:
Further improved performance for concurrency takeover.
de3e50c to
0b477e1
Compare
weissi
approved these changes
Feb 27, 2026
Member
weissi
left a comment
There was a problem hiding this comment.
:chefs-kiss: perfect, thank you!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation:
Right now we store our selectable event loop reference in a NIO ThreadSpecificVariable. That's nice enough, but this has become a hot code path in cases where users are using the complete concurrency override. In those cases, the ThreadSpecificVariable forces us to both call
pthread_getspecificand to end up needing to perform a dynamic cast to our expected type.These dynamic casts are costly: in one sample project where we spent fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of that dynamic cast alone was 0.5% of the runtime, with pthread_getspecific being another 0.2%. This is 10% of the cost of dispatching a job, entirely unnecessarily.
Modifications:
Result:
Further improved performance for concurrency takeover.
Before this change:
After this change:
Nearly a 3x speedup on that particular operation. Not so bad.
Note the irony that before this change it was actually more expensive to call
anyfrom on an EL thread than off it. Both calls had to call pthread_getspecific, which means all of this cost was the runtime type conversion.