Skip to content

mutual_info_classif returns a negative estimated mutual information #16355

@caveness

Description

@caveness

Describe the bug

The documentation for sklearn.feature_selection.mutual_info_classif states that "True mutual information can’t be negative. If its estimate turns out to be negative, it is replaced by zero." (See link.) However, I've recently started seeing negative values for the estimated mutual information that mutual_info_classif returns.

This started happening in the past approximately week or so; I had not seen this function return a negative estimated MI before then.

Steps/Code to Reproduce

import numpy as np
import pandas as pd
from sklearn.feature_selection import mutual_info_classif

data = [[1, 2.2, 0.4], [1, 1.3, 0.2], [0, 1.3, 1.7], [0, 0.5, 0.4], [1, 0.3, 4.4], [1, 2.3, 44.0]]
df = pd.DataFrame(data, columns = ['fa', 'fb', 'fd'])
labels = ['Label', 'Label', 'Label', 'Label', 'Label', 'Label']
discrete_feature_mask = [True, False, False]
seed = 10

result = mutual_info_classif(df.values, labels, discrete_features=discrete_feature_mask, copy=True, random_state=seed)
print(result)

Expected Results

A non-negative value for the estimated mutual information between each feature and the target.

Actual Results

result = [-1.11022302e-16 0.00000000e+00 0.00000000e+00]

Versions

Python 3.7
Linux 5.2.17
I've also seen this with Python2.7 and MacOS.

Python deps:
sklearn: 0.20.4
numpy: 1.16.6
scipy: 1.4.1
pandas: 0.24.2

The setup.py for the code where I've been seeing this sets the relevant dependency versions in the following ranges. These ranges have not recently changed.
scikit-learn>=0.18,<0.22
numpy>=1.16,<2
pandas>=0.24,<1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions