Added local queue scheduling and "next_task" optimization by nullchinchilla · Pull Request #22 · smol-rs/async-executor

nullchinchilla · 2021-03-20T01:25:44Z

Two major changes significantly improve performance:

When Executor::run() is called, a handle to the local queue and ticker are cached into TLS. This lets tasks schedule to a thread-local queue rather than always to the global queue.
Within the local queue, we implement a next_task optimization (see https://tokio.rs/blog/2019-10-scheduler) to greatly reduce context-switch costs in message-passing patterns. We avoid putting the same task into next_task twice to avoid starvation.

Through both unit testing and production deployment in https://github.com/geph-official/geph4, whose QUIC-like sosistab protocol is structured in an actor-like fashion that greatly stresses the scheduler, I see significant improvements in real-world throughput (up to 30%, and this is in a server dominated by cryptography CPU usage) and massive improvements in microbenchmarks (up to 10x faster in the yield_now benchmark and similar context-switch benchmarks). I see no downsides --- the code should gracefully fall back to pushing to the global queeu in case e.g. nesting Executors invalidates the TLS cache.

I also added criterion benchmarks.

…om being put there

… issues

…t_task field cannot be stolen. this can lead to deadlocks because run() is a future that may not be constantly polled, and thus there's no guarantee that local queues will make progress.

notgull · 2023-10-17T02:29:58Z

@nullchinchilla Would you be open to rebasing/cutting down on this PR? These optimizations are important and I would be open to reviewing it.

nullchinchilla · 2023-10-17T13:21:28Z

I've actually decided on a different course, since I've realized that local scheduling and an unstealable next_task cell can cause issues (such as deadlocks if we nest smol::block_on).

You can check my latest executor work in the "smolscale" crate, which uses smol::Task as well and is fully compatible with the smol-rs ecosystem, but to be easier to optimize forces a global executor.

notgull · 2023-10-18T04:18:03Z

Thanks for letting me know! I think that this crate should act more as a "reference" executor, that aims to implement features rather than be as optimal as possible. I'll close this for now.

nullchinchilla added 30 commits March 13, 2021 14:15

remove lifetime bound on State to simplify

af99c34

remove

26bc361

hack to make TLS work

faf2ff7

make run_pinned a future

bb3d168

should be working now

96e1d21

remove concurrentqueue

a6edf1b

new benches

b99dd29

fix

99f067d

actually fast

d27d960

remove weak hack

16781cf

fix stuff

5d98fc4

don't use local queues for nested execution

4edee8c

tokio benches

0bfc883

fixed

d378ddc

attempt at next_task

1c3204c

fix

f0b426f

fix call to num_cpus in tight loop

ddca550

avoid starvation due to next_task by preventing two tasks in a row fr…

dc8ef55

…om being put there

remove debug statement

96f78a4

revert

ae542a1

fix

cfeba8d

crossbeam-deque

1a04003

fix nested

6fd4b5c

fix

88b8582

CORRECT yield detection

a449656

further optimize performance

f8a7515

reduce notify frequency even more

66d1c50

test

3070a1d

sibling notification is now doubleplusgood

c0d33c6

fix a bug...

e613c06

nullchinchilla added 11 commits March 27, 2021 18:29

artificially cripple sibling notification, yet again

38f5454

somewhat final version

579ef5f

small opt

987eb42

back to old sibling notification strategy to avoid certain starvation…

9319c4e

… issues

spsc

b4f78c0

spsc

297a001

1.4.1

b1b70f9

haha

c86c95e

fix stuff getting lost

af94ae9

unfortunately we cannot use a next_task optimization, because the nex…

acb5ba2

…t_task field cannot be stolen. this can lead to deadlocks because run() is a future that may not be constantly polled, and thus there's no guarantee that local queues will make progress.

revert

0d6f0aa

notgull mentioned this pull request Jan 23, 2023

feat: Push tasks directly to the local runner #36

Closed

notgull closed this Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added local queue scheduling and "next_task" optimization#22

Added local queue scheduling and "next_task" optimization#22
nullchinchilla wants to merge 41 commits intosmol-rs:masterfrom
geph-official:master

nullchinchilla commented Mar 20, 2021

Uh oh!

notgull commented Oct 17, 2023

Uh oh!

nullchinchilla commented Oct 17, 2023

Uh oh!

notgull commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

nullchinchilla commented Mar 20, 2021

Uh oh!

notgull commented Oct 17, 2023

Uh oh!

nullchinchilla commented Oct 17, 2023

Uh oh!

notgull commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants