Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py#21626
Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py#21626jeremiedbb merged 6 commits intomainfrom unknown repository
Conversation
| param_grid={"min_samples_split": range(2, 403, 10)}, | ||
| scoring=scoring, | ||
| refit="AUC", | ||
| cv=3, |
There was a problem hiding this comment.
I don't think we want to reduce CV to 3, especially since people tend to copy/paste code.
Did you check the effect of reducing the number of samples?
There was a problem hiding this comment.
@adrinjalali Agreed, we want to keep it generic :)
Removed the cv param and decreased the sample number from 8000 to 6000: result is slightly worse but still 2X faster, so a good compromise I'd say


Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
adrinjalali
left a comment
There was a problem hiding this comment.
LGTM, you may also check if reducing the number of samples makes much of a difference, my guess would be that n_jobs=2 is giving most of the speed up here.
|
@adrinjalali You are right, reducing the sample number below 6000 makes the example plot worse. Reducing from 8000 to 6000 however was good. Indeed, the |
|
@ogrisel wanna have a second look at this one? |
jeremiedbb
left a comment
There was a problem hiding this comment.
Instead of reducing the number of samples, I suggest to reduce the number of min_samples_split values. That way the plot will be the same, with just a little less points.
jeremiedbb
left a comment
There was a problem hiding this comment.
time is now 7sec instead of 30sec. LGTM. Thanks @sveneschlbeck !
…uation.py (scikit-learn#21626) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
#21598 @adrinjalali @sply88
Increased speed significantly by adding the parameters
cvandn_jobs. I setcv=3andn_jobs=-1. By settingn_jobs=-1the available number of cpu cores is picked automatically to optimize calculations.