Skip to content

Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py#21626

Merged
jeremiedbb merged 6 commits intomainfrom
unknown repository
Feb 23, 2022
Merged

Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py#21626
jeremiedbb merged 6 commits intomainfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Nov 10, 2021

#21598 @adrinjalali @sply88

Increased speed significantly by adding the parameters cv and n_jobs. I set cv=3 and n_jobs=-1. By setting n_jobs=-1 the available number of cpu cores is picked automatically to optimize calculations.

multimetric_before
multimetric_after

Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sveneschlbeck

param_grid={"min_samples_split": range(2, 403, 10)},
scoring=scoring,
refit="AUC",
cv=3,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to reduce CV to 3, especially since people tend to copy/paste code.

Did you check the effect of reducing the number of samples?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Agreed, we want to keep it generic :)

Removed the cv param and decreased the sample number from 8000 to 6000: result is slightly worse but still 2X faster, so a good compromise I'd say
mm_before
mm_after

@adrinjalali adrinjalali changed the title Increased speed by adding cv and n_jobs params Increased speed by adding cv and n_jobs params plot_multi_metric_evaluation.py Nov 12, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@adrinjalali adrinjalali mentioned this pull request Nov 12, 2021
41 tasks
Copy link
Copy Markdown
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, you may also check if reducing the number of samples makes much of a difference, my guess would be that n_jobs=2 is giving most of the speed up here.

@ghost
Copy link
Copy Markdown
Author

ghost commented Nov 23, 2021

@adrinjalali You are right, reducing the sample number below 6000 makes the example plot worse. Reducing from 8000 to 6000 however was good. Indeed, the n_jobs param is doing the heavy lifting here.

@adrinjalali
Copy link
Copy Markdown
Member

@ogrisel wanna have a second look at this one?

Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reducing the number of samples, I suggest to reduce the number of min_samples_split values. That way the plot will be the same, with just a little less points.

Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time is now 7sec instead of 30sec. LGTM. Thanks @sveneschlbeck !

@jeremiedbb jeremiedbb merged commit 5137abf into scikit-learn:main Feb 23, 2022
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
…uation.py (scikit-learn#21626)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants