-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
partial_fit does not account for unobserved target values when fitting priors to data #13232
Copy link
Copy link
Closed
Description
My understanding is that priors should be fitted to the data using observed target frequencies and a variant of Laplace smoothing to avoid assigning 0 probability to targets not yet observed.
It seems the implementation of partial_fit does not account for unobserved targets at the time of the first training batch when computing priors.
import numpy as np
import sklearn
from sklearn.naive_bayes import MultinomialNB
print('scikit-learn version:', sklearn.__version__)
# Create toy training data
X = np.random.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
# All possible targets
classes = np.append(y, 7)
clf = MultinomialNB()
clf.partial_fit(X, y, classes=classes)/home/skojoian/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py:465: RuntimeWarning: divide by zero encountered in log
self.class_log_prior_ = (np.log(self.class_count_) -
scikit-learn version: 0.20.2
This behavior is not very intuitive to me. It seems partial_fit requires classes for the right reason, but doesn't actually offset target frequencies to account for unobserved targets.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels