[MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug#7477
[MRG] Fix MultinomialNB and BernoulliNB alpha=0 bug#7477yl565 wants to merge 8 commits intoscikit-learn:masterfrom
Conversation
sklearn/naive_bayes.py
Outdated
| th = 1e30 | ||
| p = np.asarray(p) | ||
| if (p > 1).any() or (p < 0).any(): | ||
| raise ValueError('Input `p` must be within [0, 1] range!') |
There was a problem hiding this comment.
if this happens for numerical reasons it does not explain how to make it work. Besides change the data.
why not clipping systematically?
There was a problem hiding this comment.
@agramfort Thanks for your comment. It just happens when some elements contains inf (or very close to overflow which leads to inf), np.dot or np.array.sum may result in nan. Some simple examples:
In[3]: np.inf - np.inf
Out[3]: nan
In[7]: 1e+308 + 1e+308
Out[7]: inf
In[8]: (1e+308 + 1e+308) - (1e+308 + 1e+308)
Out[8]: nanSince we are calculating log probability, I think it is reasonable to use a large negative number like -1e30 to replace log(0) = -inf when probability is 0.
why not clipping systematically?
I'm not sure I understand but do you mean creating wrappers for numpy to clip all values?
There was a problem hiding this comment.
I think he means clipping p to be between 0 and 1
|
something went wrong with your rebase... |
|
@amueller I fixed the rebase problem. Also, I now adopted a simple solution: clip alpha to be never smaller than |
|
|
||
| def __init__(self, alpha=1.0, fit_prior=True, class_prior=None): | ||
| self.alpha = alpha | ||
| self._alpha = alpha |
There was a problem hiding this comment.
this violates scikit-learn API. Please check the setting of alpha in fit. Also, only clip it when it is used, and don't change the value of self.alpha.
|
@yl565 Are you stilll working on this? |
|
Go ahead if you want to work on it |
|
I think we need another contributor, or can you make the minor fixes and add a what's new entry, @yl565? |
|
Fixed via #9131 |
PR to #5814
This change is