-
Notifications
You must be signed in to change notification settings - Fork 450
Add automatic imports for external backends #653
Description
Currently when using dask with joblib you need to know to import distributed.joblib, which is not particularly intuitive.
from joblib import parallel_backend
with parallel_backend("dask"):
pass
KeyError: 'dask'I suggest that we consider baking this code into joblib in a way that does not require an explicit dependency, and raises informative errors when dask.distributed is not present.
BACKENDS = {
'multiprocessing': MultiprocessingBackend,
'threading': ThreadingBackend,
'sequential': SequentialBackend,
'loky': LokyBackend,
}
EXTERNAL_BACKENDS = {}
def register_dask():
try:
import distributed.joblib
except ImportError:
raise ImportError("To use the dask backend to joblib you first need to install the dask.distributed scheduler. See http://dask.pydata.org/en/latest/install.html for details")
EXTERNAL_BACKENDS['dask'] = register_daskdef parallel_backend(backend, ...):
if backend not in BACKENDS and backend in EXTERNAL_BACKENDS:
register = EXTERNAL_BACKENDS[backend]
register()
if backend in BACKENDS:
backend = BACKENDS[backend]
...This does leak information of external packages into joblib, which may raise concerns. It's well controlled though and removes a step from the setup process that is somewhat likely to trip-up new users.
FWIW we've been doing similar things with Dask for data access (we don't keep all of the cloud data storage systems in the main library ) and it's been very effective. We used to receive many error reports about HDFS or S3 not being installed, and now we don't. The informative error messages seem to handle everything for new users.