-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
RandomForests Performance Bug #5993
Copy link
Copy link
Open
Description
Running randomforests with dask joblib backend does not scale well (4.8 minutes to run the below code). Multiprocessing performs 20X better. The machine used is m5.8xlarge instance. A code to reproduce:
from joblib import parallel_backend
import numpy as np
from dask.distributed import Client
from sklearn.datasets import load_digits
from sklearn.ensemble import RandomForestClassifier
digits = load_digits()
from sklearn.model_selection import cross_val_score
client=Client(processes=None)
clf = RandomForestClassifier(n_estimators=45000,verbose=1)
X = np.concatenate((digits.data,digits.data),axis=0)
y = np.concatenate((digits.target,digits.target))
with parallel_backend('dask'):
clf.fit(X,y)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels