Skip to content

positive values from mixture.GaussianMixture._estimate_log_prob() #8453

@btdai

Description

@btdai

Hi folks,

The method _estimate_log_prob() in mixture.GaussianMixture should give all negative values since the logarithm of probabilities are all negative, but I do find positive values when the covariance matrix of a component becomes singular. This problem even gives positive scores, which is defined as the log likelihood per sample under the model. Positive log likelihoods do not make sense at all.

I wonder if it is possible to make log_prob all negative, or at least make the weights of the singular components zero so that the log likelihood per sample won't be positive at least.

Special attention to @xuewei4d , Please help me take a look at this issue, thank you.

Here is an example:

import numpy as np
from sklearn import mixture

X = np.array([
       [ 348.0642649 ,  220.1027477 ,  333.79187528,  228.47500621,
         330.64101372,  309.92065431,  383.79846045,  479.36595699,
         315.91982394,  355.16442079],
       [ 370.92252763,  355.10249628,  362.38038361,  340.09326647,
         125.05759436,  318.98325082,  199.57411207,  336.80376233,
         314.40947255,  256.83259141],
       [ 292.61493816,  215.41451049,  247.95610029,  307.25120769,
         207.14801633,  460.98070355,  154.01541947,  392.21319156,
         173.93891545,  323.82024746],
       [ 313.47509949,  246.62863512,  372.88314549,  282.57001562,
         217.85191012,  383.98272381,  413.92074607,  193.61450763,
         276.39384523,  541.01804129],
       [ 158.29051437,  416.98426038,  335.68175024,  222.09570898,
         294.9502239 ,  366.75418966,  318.2023477 ,  252.93975865,
         223.03301035,  223.49127131],
       [ 232.23888428,  383.64540264,  261.04276067,  291.01982185,
         302.46507724,  322.7884817 ,  332.68525914,  292.27426413,
         260.82602398,  223.84892024],
       [ 249.77454314,  358.4167613 ,  276.32856834,  166.51658052,
         348.97041915,  302.51389298,  331.16005286,  323.82459729,
         428.77477548,  367.91104809],
       [ 373.69345074,  391.9082168 ,  192.89929037,  179.60985104,
         318.57457191,  213.07208918,  127.93848811,  355.7125317 ,
         357.20454541,  109.67371864],
       [ 320.72635994,  337.91542448,  283.00357297,  205.39331321,
         328.55159544,  338.94462864,  326.20799907,  134.23157654,
         268.76168444,  210.75170314],
       [ 239.63355923,  308.25366027,  322.46319776,  204.21007037,
         309.18231665,  346.09646464,  312.11479783,  290.12891857,
         271.88081982,  339.96771277],
       [ 243.47673151,  346.03100086,  339.27909943,  329.41826235,
         369.00910727,  338.86972674,  196.2446163 ,  352.81830183,
         316.5503773 ,  366.81393057],
       [ 303.55780983,  243.61519329,  287.42944658,  193.72671097,
         323.80031523,  290.6998636 ,  220.99155862,  367.26269069,
         279.54737979,  356.71505227],
       [ 405.88508496,  249.2460709 ,  179.20834632,  280.71236085,
         231.75903523,  302.91279933,  267.8826511 ,  238.1188701 ,
         348.90432629,  306.12335786],
       [ 301.2512729 ,  390.10667737,  388.61923838,  315.29740674,
         370.81820198,  254.01441364,  390.29189365,  315.77735005,
         237.34237878,  387.86482286],
       [ 158.54651108,  309.57114144,  517.79372693,  217.62415063,
         215.29627178,  429.02937068,  119.76147713,  208.18586194,
         396.6687827 ,  278.3061514 ],
       [ 349.64607588,  304.39326556,  356.60601877,  318.06959968,
         333.14291211,  355.3571366 ,  340.66890652,  364.60796912,
         267.40784502,  342.39369939],
       [ 358.76272576,  188.56257198,  295.72427105,  342.90890361,
         238.99073868,  423.22103744,  140.48553174,  360.04930588,
         257.70284   ,  308.20900963],
       [ 217.22759962,  286.632076  ,  324.84560171,  330.93361409,
         216.85652808,  224.29006216,  315.73247997,  282.16223869,
         275.86470308,  377.50193125],
       [ 206.17728025,  437.11531943,  287.25175417,  405.92215691,
         269.38645757,  368.61085476,  291.16638721,  248.20716896,
         244.99130166,  299.39876376],
       [ 336.83792164,  292.299402  ,  307.26730888,  334.76327564,
         397.54796606,  320.41272813,  339.03609954,  273.9001094 ,
         253.9578611 ,  376.36502833]])

em10 = mixture.GaussianMixture(n_components=10, random_state=2017, covariance_type='full')
em10.fit(X)
em10.score(X)
# 34.565479815055916
# next, reduce to 5 mixtures

em5 = mixture.GaussianMixture(n_components=5, random_state=2017, covariance_type='full')
em5.fit(X)
em5.score(X)
# -26.815211196956305
# score negative, but there are singular components

for i in range(em5.n_components):
    print(np.linalg.det(em5.covariances_[i]))

# 4.82015366386e+33
# 1e-60
# 1e-60
# 1e-60
# 4.8918147762e-51

log_prob = em5._estimate_log_prob(X)

for i in range(log_prob.shape[0]):
    for j in range(log_prob.shape[1]):
        if log_prob[i,j] >= 0:
            print(i,j)
# 1 2
# 2 4
# 3 3
# 7 1
# 16 4
# only mixture 0 do not give positive log prob

em5.weights_
# array([ 0.75,  0.05,  0.05,  0.05,  0.1 ])
# indeed, 17 data points contribute to mixture 0
# 1 each for mixture 1:3, 2 points contribute to mixture 4
# this tally with earlier print on positive values in log_prob

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions