Creating a LocalCluster on a multi-core machine can take a while

I'm playing with a machine that has 80 logical cores.  In this case creating a `LocalCluster` or raw `Client` can take a while.  I believe that this is because it tries creating 80 workers at once with forkserver.

```python
In [1]: from dask.distributed import Client

In [2]: %time client = Client()
distributed.nanny - WARNING - Worker process still alive after 47 seconds, killing
distributed.nanny - WARNING - Worker process 15200 was killed by unknown signal
<things crash>
```

There are a few potential approaches to solve this:

1.  We can try to reduce the cost of creating a new process running a worker.  My *guess* is that we're spending a long time importing libraries, but I'm not sure.  Someone could investigate this.
2.  We can use the multiprocessing `fork` approach rather than forkserver (though dask breaks for other reasons here)
3.  We can change the mixture of threads and processes as we get to high number of cores

```python
In [1]: from dask.distributed import Client

In [2]: %time client = Client(n_workers=8, threads_per_worker=10)
CPU times: user 984 ms, sys: 252 ms, total: 1.24 s
Wall time: 6.43 s
```

Probably we should do some combination of 1 and 3.  Any thoughts or suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating a LocalCluster on a multi-core machine can take a while #2450

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Creating a LocalCluster on a multi-core machine can take a while #2450

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions