Changed n_jobs parameter to increase speed in plot_validation_curve.py#21638
Changed n_jobs parameter to increase speed in plot_validation_curve.py#21638adrinjalali merged 3 commits intomainfrom unknown repository
Conversation
|
Running with -1 by default is problematic because on machines with a large number of CPUs (e.g. 64 or more), spawning the workers can dominate with concurrent access to the hard disk just to start the python interpreters and import the modules. Furthermore it can also use too much memory and cause crashes. This is why we would rather use a small number of workers (e.g. 2 instead of -1) when we want to use parallelism in examples or tests in scikit-learn. |
|
I agree with @ogrisel , and I think alternative is to find other ways to speed up the example. You can set the n_jobs to 2, and find other ways to further make the example faster. |
|
@ogrisel @adrinjalali Okay, that makes sense, thanks for the explanation :) Will set |
ogrisel
left a comment
There was a problem hiding this comment.
LGTM, let's hope it runs faster on circle ci :)
|
This example uses the digits dataset, and I think that's the main source of it being slow. It'd be nice if you could try either iris or a synthetic dataset to see if you can get similar plots while making it significantly faster (I've seen a 100x speedup in some examples by getting rid of the digits dataset) |
|
I believe the combo of Gaussian RBF + digits is important to get such charateristic validation curves for gamma. But maybe it would be possible to get similar results with a random sub-sample, or considering a binary classification subproblem such as 1 vs 2 (to make it non trivial): X, y = load_digits(return_X_y=True)
subset_mask = np.isin(y, [1, 2]) # binary classification: 1 vs 2
X, y = X[subset_mask], y[subset_mask]Since SVC is and One vs Rest classifier that should greatly help ;) Edit: changed to 1 vs 2 which is slightly harder than 1 vs 7 |
|
@sveneschlbeck could you please apply Olivier's suggestion? |
|
@adrinjalali Yes, am on it! |
|
@adrinjalali @ogrisel The result makes a big difference in exec time (18 sec vs. 3 sec) but the "C" isn't as big and clearly shaped as before. What do you think? Should I change the code after this result?: |
|
To me it still shows the effect the same way, I'd be happy with it. |
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py
* Changed n_jobs parameter to increase speed * Update plot_validation_curve.py * Update plot_validation_curve.py




#21598 @adrinjalali
Adapted the

n_jobsparameter from 1 to -1 (auto-detect mode) which halfed the time needed to run the module