Skip to content

Improve the performance of looking up the current SEL#3530

Merged
Lukasa merged 2 commits intoapple:mainfrom
Lukasa:cb-cheaper-lookup-for-current-mtelg
Feb 27, 2026
Merged

Improve the performance of looking up the current SEL#3530
Lukasa merged 2 commits intoapple:mainfrom
Lukasa:cb-cheaper-lookup-for-current-mtelg

Conversation

@Lukasa
Copy link
Copy Markdown
Contributor

@Lukasa Lukasa commented Feb 27, 2026

Motivation:

Right now we store our selectable event loop reference in a NIO ThreadSpecificVariable. That's nice enough, but this has become a hot code path in cases where users are using the complete concurrency override. In those cases, the ThreadSpecificVariable forces us to both call pthread_getspecific and to end up needing to perform a dynamic cast to our expected type.

These dynamic casts are costly: in one sample project where we spent fully 7.84% of the runtime in dispatching tasks onto the EL, the cost of that dynamic cast alone was 0.5% of the runtime, with pthread_getspecific being another 0.2%. This is 10% of the cost of dispatching a job, entirely unnecessarily.

Modifications:

  • Use some fancy C thread-locals, which have much better performance: see improve inEventLoop checks #3302 for more on that.
  • Use Unamanged to pass the self reference into and out of the thread-local, avoiding dynamic casts.

Result:

Further improved performance for concurrency takeover.

Before this change:

any() from event loop thread (fast path): 30.654833 ns/iteration
any() from outside (fallback path): 17.881084 ns/iteration

After this change:

any() from event loop thread (fast path): 11.571958 ns/iteration
any() from outside (fallback path): 17.52175 ns/iteration

Nearly a 3x speedup on that particular operation. Not so bad.

Note the irony that before this change it was actually more expensive to call any from on an EL thread than off it. Both calls had to call pthread_getspecific, which means all of this cost was the runtime type conversion.

@Lukasa Lukasa added the 🔨 semver/patch No public API change. label Feb 27, 2026
Motivation:

Right now we store our selectable event loop reference in a NIO
ThreadSpecificVariable. That's nice enough, but this has become a
hot code path in cases where users are using the complete
concurrency override. In those cases, the ThreadSpecificVariable
forces us to both call `pthread_getspecific` and to end up needing
to perform a dynamic cast to our expected type.

These dynamic casts are costly: in one sample project where we
spent fully 7.84% of the runtime in dispatching tasks onto the EL,
the cost of that dynamic cast _alone_ was 0.5% of the runtime, with
pthread_getspecific being another 0.2%. This is 10% of the cost of
dispatching a job, entirely unnecessarily.

Modifications:

- Use some fancy C thread-locals, which have much better
    performance: see apple#3302 for more on that.
- Use Unamanged to pass the self reference into and out of the
    thread-local, avoiding dynamic casts.

Result:

Further improved performance for concurrency takeover.
@Lukasa Lukasa force-pushed the cb-cheaper-lookup-for-current-mtelg branch from de3e50c to 0b477e1 Compare February 27, 2026 16:36
Copy link
Copy Markdown
Member

@weissi weissi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:chefs-kiss: perfect, thank you!

@Lukasa Lukasa enabled auto-merge (squash) February 27, 2026 16:45
@Lukasa Lukasa merged commit d38344d into apple:main Feb 27, 2026
55 checks passed
@Lukasa Lukasa deleted the cb-cheaper-lookup-for-current-mtelg branch February 27, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔨 semver/patch No public API change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants