My understanding is that priors should be fitted to the data using observed target frequencies and a variant of Laplace smoothing to avoid assigning 0 probability to targets not yet observed.
It seems the implementation of partial_fit does not account for unobserved targets at the time of the first training batch when computing priors.
import numpy as np
import sklearn
from sklearn.naive_bayes import MultinomialNB
print('scikit-learn version:', sklearn.__version__)
# Create toy training data
X = np.random.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])
# All possible targets
classes = np.append(y, 7)
clf = MultinomialNB()
clf.partial_fit(X, y, classes=classes)
/home/skojoian/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py:465: RuntimeWarning: divide by zero encountered in log
self.class_log_prior_ = (np.log(self.class_count_) -
scikit-learn version: 0.20.2
This behavior is not very intuitive to me. It seems partial_fit requires classes for the right reason, but doesn't actually offset target frequencies to account for unobserved targets.
My understanding is that priors should be fitted to the data using observed target frequencies and a variant of Laplace smoothing to avoid assigning 0 probability to targets not yet observed.
It seems the implementation of
partial_fitdoes not account for unobserved targets at the time of the first training batch when computing priors.This behavior is not very intuitive to me. It seems
partial_fitrequiresclassesfor the right reason, but doesn't actually offset target frequencies to account for unobserved targets.