Skip to content
This repository was archived by the owner on Feb 28, 2023. It is now read-only.

Prefer using Tokio as the main task runtime#33

Merged
Xanewok merged 9 commits intobernhard-futures-migrationfrom
igor-futures-fixes
Mar 27, 2021
Merged

Prefer using Tokio as the main task runtime#33
Xanewok merged 9 commits intobernhard-futures-migrationfrom
igor-futures-fixes

Conversation

@Xanewok
Copy link
Copy Markdown
Contributor

@Xanewok Xanewok commented Mar 25, 2021

Most notably this:

  • fixes a hang in sccache_cargo integration test (edce5ab)
  • marks ClientToolchains::put_toolchain as synchronous (77828ad)
  • prunes futures::executor::ThreadPool entirely and replaces it with (blocking calls to) Tokio executor all over

I tried to put some context/rationale into corresponding commits; let me know if I should squash the last threadpool/Tokio executor commits.

This still fails the tests (but on a new one now 🙃):

thread 'test_sccache_command' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', tests/system.rs:119:9

interestingly enough, when testing locally, it seems that the new expected value seems from before ac6baf0. Maybe that's a difference in environment? I'm on Ubuntu 20.10 with relevant gcc -v:

xanewok@faerun-dev:~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 10.2.0-13ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-JvwpWM/gcc-10-10.2.0/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.0 (Ubuntu 10.2.0-13ubuntu1) 

Closes #34

Xanewok added 3 commits March 25, 2021 03:17
Since Tokio 0.2, spawning the `tokio::process::Command` must be done in
the Tokio context, so make sure to spawn the compilation root task in
the Tokio runtime that's now available in the `SccacheService`.

This fixes a hang in the `sccache_cargo` integration test.
It seems entirely synchronous, as it doesn't hit any await points and
additionally seems to be trigger-happy with mutex locks in inner calls,
so better be safe and prevent the future reader from thinking that this
function is async-safe.
@drahnr
Copy link
Copy Markdown
Contributor

drahnr commented Mar 25, 2021

Hmm, I am not entirely sure this is a good idea. In my mental model we want a thread pool for some blocking tasks, which might take "long" and thus block the entire system because of blocking all execution threads - so a separate thread pool for those actually would make sense imho. But I think you are correct that the thread pool usage was bleeding into a few places where it was not needed.

This is up for discussion though, so don't take this as the ground truth, but rather a data point.

@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Mar 25, 2021

In principle I agree, but in this specific case I think it's better to use a single solution rather than trying to combine two.

Most of the intermediate tasks need access to Tokio runtime anyway, because of how pervasive run_input_output helper is. This internally spawns an async-enabled child process, which must be spawned in the Tokio context. This requirement for Tokio child processes is introduced with Tokio 0.2, hence why it pops up now.

Currently, the blocking thread pool is mostly to perform disk I/O or to use blocking HTTP (for simplicity?). In the context of our usage, using Handle::spawn_blocking does not differ much from using a futures::executor::ThreadPool here. As the docs mention, the threads are spawned if necessary and the upper limit is high by default as this API is mostly designed for I/O-bound blocking operations rather than CPU-bound ones. This seem to be our use case; the few CPU-bound tasks we seem to be doing are:

  • BLAKE3 digest (currently we do that on a single dedicated thread from the pool and it's intertwined with disk I/O)
  • zstd (intertwined with disk I/O; zstd also exposes an async version we could migrate to)
  • bincode serde in blocking HTTP requests (if I'm not missing anything it seems the requests/responses are small enough comparing to blocking HTTP time).

The majority of other use cases is just regular std::fs disk I/O. I suspect we could adapt that to use tokio::fs instead and let the runtime schedule disk I/O accordingly (which might be even more performant depending on the backend used).

Until (if?) we use an executor-agnostic child process support, I think simplifying the implementation here gives us more than we'd get by seemingly separating concerns when using futures::executor::ThreadPool that we'd have to combine with Tokio runtime anyway.

An alternative is to create our own executor abstraction that'd wrap Tokio-enabled IO futures that we'd spawn on the separate thread pool but poll inside the Tokio context.

I propose to (in no specific order):

  • try to land the first two commits to work towards green CI (and possibly leave remaining work until we have green CI?)
  • see if we can use tokio::fs in the places mentioned above to reduce the need for work in an explicit blocking pool
  • Keep ThreadPool for now but use it only in the HTTP-related module to help migrate things piecewise and because it seems more cleanly separated from the rest of the code
  • at some point try to rewrite HTTP client code to use async

@drahnr WDYT?

@drahnr
Copy link
Copy Markdown
Contributor

drahnr commented Mar 25, 2021

In principle I agree, but in this specific case I think it's better to use a single solution rather than trying to combine two.

Most of the intermediate tasks need access to Tokio runtime anyway, because of how pervasive run_input_output helper is. This internally spawns an async-enabled child process, which must be spawned in the Tokio context. This requirement for Tokio child processes is introduced with Tokio 0.2, hence why it pops up now.

A shait, I forgot about this tokio specific thingy.

Currently, the blocking thread pool is mostly to perform disk I/O or to use blocking HTTP (for simplicity?). In the context of our usage, using Handle::spawn_blocking does not differ much from using a futures::executor::ThreadPool here. As the docs mention, the threads are spawned if necessary and the upper limit is high by default as this API is mostly designed for I/O-bound blocking operations rather than CPU-bound ones. This seem to be our use case; the few CPU-bound tasks we seem to be doing are:

* BLAKE3 digest (currently we do that on a single dedicated thread from the pool and it's intertwined with disk I/O)

* zstd (intertwined with disk I/O; zstd also exposes an async version we could migrate to)

* bincode serde in blocking HTTP requests (if I'm not missing anything it seems the requests/responses are small enough comparing to blocking HTTP time).

The majority of other use cases is just regular std::fs disk I/O. I suspect we could adapt that to use tokio::fs instead and let the runtime schedule disk I/O accordingly (which might be even more performant depending on the backend used).

Until (if?) we use an executor-agnostic child process support, I think simplifying the implementation here gives us more than we'd get by seemingly separating concerns when using futures::executor::ThreadPool that we'd have to combine with Tokio runtime anyway.

An alternative is to create our own executor abstraction that'd wrap Tokio-enabled IO futures that we'd spawn on the separate thread pool but poll inside the Tokio context.

That sounds like quite a bit of work for not much gain right now.

* try to land the first two commits to work towards green CI (and possibly leave remaining work until we have green CI?)

Given the above, I think I want to land this as is.

* see if we can use `tokio::fs` in the places mentioned above to reduce the need for work in an explicit blocking pool

I'd defer this one until we see if we practically run into the upper thread count limit.

* Keep `ThreadPool` for now but use it only in the HTTP-related module to help migrate things piecewise and because it seems more cleanly separated from the rest of the code

* at some point try to rewrite HTTP client code to use async

I think this is a cluster, that should be migrated to tokio/hyper in one go and is on the agenda to get rid of rouille.

Xanewok added 5 commits March 27, 2021 18:44
The reason for that being not to mix the Tokio runtime and the `futures`
executor. While it's reasonable and possible to do so, I believe there's
bigger benefit for now to stick to a single executor.
It would be great if we could keep the code using 'vanilla' `futures`
crate but Tokio support is so baked in (e.g. child process harness) that
it seems that it'd require a lot more change for questionable benefit in
this case.
Doing so may yield another benefit - currently, there seems to be a lot
of disk I/O being done on a dedicated, blocking thread pool. It's possible,
now that we use a single executor, to use Tokio facilities for async I/O
if this proves to be faster when supported properly by the native
environment at some point (e.g. with `io_uring` on Linux).
This lets test pass locally on Ubuntu 20.10 with:
gcc (Ubuntu 10.2.0-13ubuntu1) 10.2.0
Ubuntu clang version 11.0.0-2
@Xanewok Xanewok force-pushed the igor-futures-fixes branch from 78038d4 to 8fbab2f Compare March 27, 2021 17:44
@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Mar 27, 2021

Fixed up 78fc70f with

diff --git a/src/compiler/compiler.rs b/src/compiler/compiler.rs
index 93f995d..269ec02 100644
--- a/src/compiler/compiler.rs
+++ b/src/compiler/compiler.rs
@@ -1599,7 +1599,7 @@ LLVM version: 6.0",
                     cwd.to_path_buf(),
                     vec![],
                     CacheControl::Default,
-                    pool.clone(),
+                    pool,
                 ).await
             })
             .unwrap();

to minimize the diff (I added that mistakingly) and added the last expected test change commit 788832b with a more descriptive info as 8fbab2f

@Xanewok Xanewok force-pushed the igor-futures-fixes branch from afe17c0 to 12f21d6 Compare March 27, 2021 18:18
@Xanewok
Copy link
Copy Markdown
Contributor Author

Xanewok commented Mar 27, 2021

I'm merging this as per #34 (comment) and I verified locally that stuff works, as well as in the CI docker container. Left a #23 (comment), I'd be happy to resolve that separately as it doesn't block the future work of splitting the branch into digestible commits in preparation to merge either it to our main branch or upstream in some form.

@drahnr thanks for taking a look!

@Xanewok Xanewok merged commit 6837633 into bernhard-futures-migration Mar 27, 2021
@Xanewok Xanewok deleted the igor-futures-fixes branch March 27, 2021 18:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants