Skip to content

Fixing warnings in examples/impute/plot_missing_values#14148

Merged
glemaitre merged 5 commits intoscikit-learn:masterfrom
rodneyosodo:warning_examples/impute/plot_missing_values
Jul 2, 2019
Merged

Fixing warnings in examples/impute/plot_missing_values#14148
glemaitre merged 5 commits intoscikit-learn:masterfrom
rodneyosodo:warning_examples/impute/plot_missing_values

Conversation

@rodneyosodo
Copy link
Copy Markdown
Contributor

@rodneyosodo rodneyosodo commented Jun 22, 2019

I was able to fix the following warning

  • ensemble warning from random forest class
    I included the n_estimators=100 argument
  • Runtime warning
{parent_dir}\sk-venv\lib\site-packages\numpy\lib\function_base.py:2530: RuntimeWarning: invalid value encountered in true_divide
  c /= stddev[:, None]

I fixed it by ignoring the numpy warning

  • Convergence warning
{parent_dir}\SK\sk-venv\lib\site-packages\sklearn\impute\_iterative.py:599: ConvergenceWarning: [IterativeImputer] Early stopping criterion not reached.
  " reached.", ConvergenceWarning)

I fixed it by ignoring all covergence warning

related to #14117
#WiDSML
@adrinjalali

@adrinjalali
Copy link
Copy Markdown
Member

Have you tried fixing the warnings before ignoring them? They can probably be avoided, which is a much better solution to ignoring the warnings.

@rodneyosodo
Copy link
Copy Markdown
Contributor Author

I have tried fixing the convergence warnings but it was a dead end
The other Runtime error was due to numpy library

@adrinjalali
Copy link
Copy Markdown
Member

It can be tricky to fix those warnings sometimes, but going deeper in the library and finding why you get the warning, you get a better idea of which parameters to change so that the warnings are gone :)

@NicolasHug
Copy link
Copy Markdown
Member

Fixing the ConvergenceWarning can be done by either setting sample_posterior to True, or setting a much higher tol.

@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Jun 24, 2019

You cannot put the tol higher since it will modify the figure inappropriately (and make the take-home message different). However, sample_posterior=True seems to be reasonable.

import warnings
from sklearn.exceptions import ConvergenceWarning

warnings.filterwarnings("ignore",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid having to do that inside the example. Users might not understand what this is doing. It is better to fix the parameters to fix the convergence warning instead.

@glemaitre glemaitre self-requested a review July 2, 2019 08:24
@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Jul 2, 2019

Looking more in depth. This example reveals a bug in the IterativeImputer. The division by zero should not happen or should be taken handled internally. The fourth feature in boston as an std. dev. of zero leading to an error when computing the correlation coefficient between feature.

This issue is hidden by the cross_val_score which ignore the error.

Edit:
This is a warning in fact. So we should probably catch it.

@glemaitre
Copy link
Copy Markdown
Member

FYI, the fourth feature is a categorical feature:

- CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

@glemaitre
Copy link
Copy Markdown
Member

For this PR, I would only apply the fix propose by @NicolasHug (sample_posterior=True).
The other warning should be caught within the code itself.

@glemaitre glemaitre merged commit 06f6626 into scikit-learn:master Jul 2, 2019
@glemaitre
Copy link
Copy Markdown
Member

@0x6f736f646f Thanks for your contribution

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants