partial_fit does not account for unobserved target values when fitting priors to data

My understanding is that priors should be fitted to the data using observed target frequencies **and a variant of [Laplace smoothing](https://en.wikipedia.org/wiki/Additive_smoothing) to avoid assigning 0 probability to targets  not yet observed.**


It seems the implementation of `partial_fit` does not account for unobserved targets at the time of the first training batch when computing priors.

```python
    import numpy as np
    import sklearn
    from sklearn.naive_bayes import MultinomialNB
    
    print('scikit-learn version:', sklearn.__version__)
    
    # Create toy training data
    X = np.random.randint(5, size=(6, 100))
    y = np.array([1, 2, 3, 4, 5, 6])
    
    # All possible targets
    classes = np.append(y, 7)
    
    clf = MultinomialNB()
    clf.partial_fit(X, y, classes=classes)
```
-----------------------------------
    /home/skojoian/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py:465: RuntimeWarning: divide by zero encountered in log
      self.class_log_prior_ = (np.log(self.class_count_) -
    scikit-learn version: 0.20.2

This behavior is not very intuitive to me. It seems `partial_fit` requires `classes` for the right reason, but doesn't actually offset target frequencies to account for unobserved targets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

partial_fit does not account for unobserved target values when fitting priors to data #13232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

partial_fit does not account for unobserved target values when fitting priors to data #13232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions