MNT accelerate plot_iterative_imputer_variants_comparison.py#21748
MNT accelerate plot_iterative_imputer_variants_comparison.py#21748jeremiedbb merged 20 commits intoscikit-learn:mainfrom
Conversation
…raping to ETrees and changed folds to 3
|
MSE using 3 folds instead of 5: 5 fold: 3 fold: MSE is worse for all but difference is similar. |
|
We have a lot of |
|
I'll try to find a way to avoid |
|
The only other example of iterative imputation also uses California housing dataset. If using other datasets for this one is an option I can try to find the one with minimum |
|
Even after 250 iterations DecisionTreeRegressor does not converge for 5 variables. |
@adrinjalali xref: #14338 I was planning to have a look at this. |
|
There's no warnings with the new implementation, but I had to change the Tree and set the tolerance for each estimator. |
|
Iterative Imputation without scaling: Original Full Data 0.631302 With Robust Scaling: Original Full Data 0.630870 |
adrinjalali
left a comment
There was a problem hiding this comment.
otherwise, plus @ogrisel's suggestion, LGTM.
There was a problem hiding this comment.
The code looks good and the speed-up is nice but the top-level docstring still needs to be adapted to reflect the content of the code.
- ExtraTreesRegressor need to be replace by RandomForestRegressor in several occurrences;
- mentions of DecisionTreeRegressor needs to be removed;
- the pipeline with the expansion of a degree 2 polynomial kernel needs to be introduce.
And while we are at it we could add a final comment emphasizing that while some methods are seemingly better than others on average, the error bars observed on the cross-validated scores are still very wide in call cases.
We could finally emphasize that some estimators such as HistGradientBoostingRegressor can natively deal with missing features and are often recommended over building pipelines with complex and costly missing values imputation strategies.
…ity to deal with missing values
|
|
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
jeremiedbb
left a comment
There was a problem hiding this comment.
time is now 4sec instead of 16sec. LGTM. Thanks @siavrez !
…cikit-learn#21748) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

Adding bootstrapping to Extratrees with 0.75 sample_fraction improves runtime 4.8 seconds in 5 folds and 3 seconds in 3 folds. Also changed number of folds to 3. Total runtime is now 10 .1 +/- 1.3 seconds. from 24 +/- 3.3 seconds.
Reference Issues/PRs
#21598
What does this implement/fix? Explain your changes.
Any other comments?