Skip to content

[MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation)#9131

Merged
jmschrei merged 17 commits intoscikit-learn:masterfrom
herilalaina:fix_NB_5814
Jun 19, 2017
Merged

[MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation)#9131
jmschrei merged 17 commits intoscikit-learn:masterfrom
herilalaina:fix_NB_5814

Conversation

@herilalaina
Copy link
Copy Markdown
Contributor

Reference Issue

Fixes #5814, continuation of #7477

What does this implement/fix? Explain your changes.

Move alpha setting into fit and partial_fit

@herilalaina herilalaina changed the title [WIP] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) [MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) Jun 15, 2017
Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please wrap those fit calls in assert_warns in tests to ensure the warning is triggered.

We should also have a small test for the ValueError case, using assert_raise_messsage or similar.

Thanks for taking this on.

@herilalaina
Copy link
Copy Markdown
Contributor Author

Thanks for reviewing! Tests are done.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick work

# Test sparse X
X = scipy.sparse.csr_matrix(X)
nb = BernoulliNB(alpha=0.)
nb.fit(X, y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also raise the warning which we would rather not see in tests. Either assert or ignore.

y = np.array([0, 1])
b_nb = BernoulliNB(alpha=-0.1)
m_nb = MultinomialNB(alpha=-0.1)
assert_raises(ValueError, b_nb.fit, X, y)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better if this tests the error message with the assert_raise_* variants

@jnothman
Copy link
Copy Markdown
Member

Please add an entry to the change log at doc/whats_new.rst.


def _check_alpha(self):
if self.alpha < 0:
raise ValueError('Smoothing parameter alpha = %e. '
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can now see that %e is clearly a bad pick. %.1e would be better and just as informative

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point! done

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, and assuming tests pass, LGTM

def _check_alpha(self):
if self.alpha < 0:
raise ValueError('Smoothing parameter alpha = %.1e. '
'alpha must be >= 0!' % self.alpha)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this says >= which is a little awkward we warn when it's 0 and change the value

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps alternate language is "alpha should be > 0", as opposed to 'must be'?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, it's already fixed

else:
self.class_log_prior_ = np.zeros(n_classes) - np.log(n_classes)

def _check_alpha(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you went through this before, but why isn't alpha just initially set to an appropriate value and then used as before, instead of changing the code a lot as below?

@jnothman jnothman changed the title [MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) [MRG+1] Fix MultinomialNB and BernoulliNB alpha=0 bug (continuation) Jun 16, 2017
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Jun 16, 2017 via email

@jmschrei
Copy link
Copy Markdown
Member

It wouldn't be modifying attributes, it would be checking alpha when passed into init and setting it appropriately initially.

@herilalaina
Copy link
Copy Markdown
Contributor Author

Not sure to understand your point but if we move the checking at __init__, the following script won't work

clf = BernoulliNB(alpha=0.1)
clf.fit(X, y)
params={'alpha': 0.0}
clf.set_params(**params)
clf.fit(X, y) # Will crash

@jnothman jnothman added this to the 0.19 milestone Jun 17, 2017
@jmschrei
Copy link
Copy Markdown
Member

This looks good to me. Thanks for the contribution!

@jmschrei jmschrei merged commit b4b5de8 into scikit-learn:master Jun 19, 2017
@herilalaina herilalaina deleted the fix_NB_5814 branch June 19, 2017 19:36
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…cikit-learn#9131)

* Fix scikit-learn#5814

* Fix pep8 in naive_bayes.py:716

* Fix sparse matrix incompatibility

* Fix python 2.7 problem in test_naive_bayes

* Make sure the values are probabilities before log transform

* Improve docstring of  `_safe_logprob`

* Clip alpha solution

* Clip alpha solution

* Clip alpha in fit and partial_fit

* Add what's new entry

* Add test

* Remove .project

* Replace assert method

* Update what's new

* Format float into %.1e

* Update ValueError msg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multinomial Bayes issue

4 participants