Distributed client/scheduler detection without dask.config

Currently `dask.compute` infers whether or not a `dask.distributed.Client` exists by reading the config value `scheduler`. If this value is set, it tries to access the `default_client` and uses this to submit the futures.

See 

https://github.com/dask/dask/blob/f463fcf28ff6404cc04f478b73704e35664bb5f0/dask/base.py#L1394-L1395

https://github.com/dask/dask/blob/f463fcf28ff6404cc04f478b73704e35664bb5f0/dask/base.py#L1372-L1375

This config value is set in the distributed client **if** the initialized client is supposed to be considered a "default client" or global client, see https://github.com/dask/distributed/blob/3619923a1aec2fa51bf9dcd099560742b61121cc/distributed/client.py#L950-L953

This mechanism is quite brittle and requires relatively complex code in `dask.distributed` to reset this value properly when this client closes, specifically in a context where there are multiple clients (regardless of default or not).

Apart from code complexity, this imposes thread safety problems that are extremely challenging to address. An attempt has been made in https://github.com/dask/distributed/pull/5901 to make this setting/resetting thread safe but it appears that the proposed fix is not even sufficient.

I propose to change the mechanism to infer whether or not a distributed client should be used to not use the dask config system but instead use a proper thread safe API

Assuming that an existing global client always take precedence if no explicit scheduler is provided, we could simply do something like

```python

def get_scheduler(get=None, scheduler=None, collections=None, cls=None):
    if scheduler is not None:
        try:
            from distributed.worker import get_client
            return get_client()
        except ValueError:
            pass
        ...
```

I would suggest to use a similar logic to deal with the `shuffle=tasks` setting


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed client/scheduler detection without dask.config #9807

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if config.get("scheduler", None):
	return get_scheduler(scheduler=config.get("scheduler", None))

	elif scheduler in ("dask.distributed", "distributed"):
	from distributed.worker import get_client

	return get_client().get

Uh oh!

Distributed client/scheduler detection without dask.config #9807

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions