Describe the bug
The documentation for sklearn.feature_selection.mutual_info_classif states that "True mutual information can’t be negative. If its estimate turns out to be negative, it is replaced by zero." (See link.) However, I've recently started seeing negative values for the estimated mutual information that mutual_info_classif returns.
This started happening in the past approximately week or so; I had not seen this function return a negative estimated MI before then.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
from sklearn.feature_selection import mutual_info_classif
data = [[1, 2.2, 0.4], [1, 1.3, 0.2], [0, 1.3, 1.7], [0, 0.5, 0.4], [1, 0.3, 4.4], [1, 2.3, 44.0]]
df = pd.DataFrame(data, columns = ['fa', 'fb', 'fd'])
labels = ['Label', 'Label', 'Label', 'Label', 'Label', 'Label']
discrete_feature_mask = [True, False, False]
seed = 10
result = mutual_info_classif(df.values, labels, discrete_features=discrete_feature_mask, copy=True, random_state=seed)
print(result)
Expected Results
A non-negative value for the estimated mutual information between each feature and the target.
Actual Results
result = [-1.11022302e-16 0.00000000e+00 0.00000000e+00]
Versions
Python 3.7
Linux 5.2.17
I've also seen this with Python2.7 and MacOS.
Python deps:
sklearn: 0.20.4
numpy: 1.16.6
scipy: 1.4.1
pandas: 0.24.2
The setup.py for the code where I've been seeing this sets the relevant dependency versions in the following ranges. These ranges have not recently changed.
scikit-learn>=0.18,<0.22
numpy>=1.16,<2
pandas>=0.24,<1
Describe the bug
The documentation for
sklearn.feature_selection.mutual_info_classifstates that "True mutual information can’t be negative. If its estimate turns out to be negative, it is replaced by zero." (See link.) However, I've recently started seeing negative values for the estimated mutual information thatmutual_info_classifreturns.This started happening in the past approximately week or so; I had not seen this function return a negative estimated MI before then.
Steps/Code to Reproduce
Expected Results
A non-negative value for the estimated mutual information between each feature and the target.
Actual Results
result = [-1.11022302e-16 0.00000000e+00 0.00000000e+00]
Versions
Python 3.7
Linux 5.2.17
I've also seen this with Python2.7 and MacOS.
Python deps:
sklearn: 0.20.4
numpy: 1.16.6
scipy: 1.4.1
pandas: 0.24.2
The setup.py for the code where I've been seeing this sets the relevant dependency versions in the following ranges. These ranges have not recently changed.
scikit-learn>=0.18,<0.22
numpy>=1.16,<2
pandas>=0.24,<1