-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
mutual_info_classif returns a negative estimated mutual information #16355
Description
Describe the bug
The documentation for sklearn.feature_selection.mutual_info_classif states that "True mutual information can’t be negative. If its estimate turns out to be negative, it is replaced by zero." (See link.) However, I've recently started seeing negative values for the estimated mutual information that mutual_info_classif returns.
This started happening in the past approximately week or so; I had not seen this function return a negative estimated MI before then.
Steps/Code to Reproduce
import numpy as np
import pandas as pd
from sklearn.feature_selection import mutual_info_classif
data = [[1, 2.2, 0.4], [1, 1.3, 0.2], [0, 1.3, 1.7], [0, 0.5, 0.4], [1, 0.3, 4.4], [1, 2.3, 44.0]]
df = pd.DataFrame(data, columns = ['fa', 'fb', 'fd'])
labels = ['Label', 'Label', 'Label', 'Label', 'Label', 'Label']
discrete_feature_mask = [True, False, False]
seed = 10
result = mutual_info_classif(df.values, labels, discrete_features=discrete_feature_mask, copy=True, random_state=seed)
print(result)Expected Results
A non-negative value for the estimated mutual information between each feature and the target.
Actual Results
result = [-1.11022302e-16 0.00000000e+00 0.00000000e+00]
Versions
Python 3.7
Linux 5.2.17
I've also seen this with Python2.7 and MacOS.
Python deps:
sklearn: 0.20.4
numpy: 1.16.6
scipy: 1.4.1
pandas: 0.24.2
The setup.py for the code where I've been seeing this sets the relevant dependency versions in the following ranges. These ranges have not recently changed.
scikit-learn>=0.18,<0.22
numpy>=1.16,<2
pandas>=0.24,<1