-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
positive values from mixture.GaussianMixture._estimate_log_prob() #8453
Copy link
Copy link
Closed
Description
Hi folks,
The method _estimate_log_prob() in mixture.GaussianMixture should give all negative values since the logarithm of probabilities are all negative, but I do find positive values when the covariance matrix of a component becomes singular. This problem even gives positive scores, which is defined as the log likelihood per sample under the model. Positive log likelihoods do not make sense at all.
I wonder if it is possible to make log_prob all negative, or at least make the weights of the singular components zero so that the log likelihood per sample won't be positive at least.
Special attention to @xuewei4d , Please help me take a look at this issue, thank you.
Here is an example:
import numpy as np
from sklearn import mixture
X = np.array([
[ 348.0642649 , 220.1027477 , 333.79187528, 228.47500621,
330.64101372, 309.92065431, 383.79846045, 479.36595699,
315.91982394, 355.16442079],
[ 370.92252763, 355.10249628, 362.38038361, 340.09326647,
125.05759436, 318.98325082, 199.57411207, 336.80376233,
314.40947255, 256.83259141],
[ 292.61493816, 215.41451049, 247.95610029, 307.25120769,
207.14801633, 460.98070355, 154.01541947, 392.21319156,
173.93891545, 323.82024746],
[ 313.47509949, 246.62863512, 372.88314549, 282.57001562,
217.85191012, 383.98272381, 413.92074607, 193.61450763,
276.39384523, 541.01804129],
[ 158.29051437, 416.98426038, 335.68175024, 222.09570898,
294.9502239 , 366.75418966, 318.2023477 , 252.93975865,
223.03301035, 223.49127131],
[ 232.23888428, 383.64540264, 261.04276067, 291.01982185,
302.46507724, 322.7884817 , 332.68525914, 292.27426413,
260.82602398, 223.84892024],
[ 249.77454314, 358.4167613 , 276.32856834, 166.51658052,
348.97041915, 302.51389298, 331.16005286, 323.82459729,
428.77477548, 367.91104809],
[ 373.69345074, 391.9082168 , 192.89929037, 179.60985104,
318.57457191, 213.07208918, 127.93848811, 355.7125317 ,
357.20454541, 109.67371864],
[ 320.72635994, 337.91542448, 283.00357297, 205.39331321,
328.55159544, 338.94462864, 326.20799907, 134.23157654,
268.76168444, 210.75170314],
[ 239.63355923, 308.25366027, 322.46319776, 204.21007037,
309.18231665, 346.09646464, 312.11479783, 290.12891857,
271.88081982, 339.96771277],
[ 243.47673151, 346.03100086, 339.27909943, 329.41826235,
369.00910727, 338.86972674, 196.2446163 , 352.81830183,
316.5503773 , 366.81393057],
[ 303.55780983, 243.61519329, 287.42944658, 193.72671097,
323.80031523, 290.6998636 , 220.99155862, 367.26269069,
279.54737979, 356.71505227],
[ 405.88508496, 249.2460709 , 179.20834632, 280.71236085,
231.75903523, 302.91279933, 267.8826511 , 238.1188701 ,
348.90432629, 306.12335786],
[ 301.2512729 , 390.10667737, 388.61923838, 315.29740674,
370.81820198, 254.01441364, 390.29189365, 315.77735005,
237.34237878, 387.86482286],
[ 158.54651108, 309.57114144, 517.79372693, 217.62415063,
215.29627178, 429.02937068, 119.76147713, 208.18586194,
396.6687827 , 278.3061514 ],
[ 349.64607588, 304.39326556, 356.60601877, 318.06959968,
333.14291211, 355.3571366 , 340.66890652, 364.60796912,
267.40784502, 342.39369939],
[ 358.76272576, 188.56257198, 295.72427105, 342.90890361,
238.99073868, 423.22103744, 140.48553174, 360.04930588,
257.70284 , 308.20900963],
[ 217.22759962, 286.632076 , 324.84560171, 330.93361409,
216.85652808, 224.29006216, 315.73247997, 282.16223869,
275.86470308, 377.50193125],
[ 206.17728025, 437.11531943, 287.25175417, 405.92215691,
269.38645757, 368.61085476, 291.16638721, 248.20716896,
244.99130166, 299.39876376],
[ 336.83792164, 292.299402 , 307.26730888, 334.76327564,
397.54796606, 320.41272813, 339.03609954, 273.9001094 ,
253.9578611 , 376.36502833]])
em10 = mixture.GaussianMixture(n_components=10, random_state=2017, covariance_type='full')
em10.fit(X)
em10.score(X)
# 34.565479815055916
# next, reduce to 5 mixtures
em5 = mixture.GaussianMixture(n_components=5, random_state=2017, covariance_type='full')
em5.fit(X)
em5.score(X)
# -26.815211196956305
# score negative, but there are singular components
for i in range(em5.n_components):
print(np.linalg.det(em5.covariances_[i]))
# 4.82015366386e+33
# 1e-60
# 1e-60
# 1e-60
# 4.8918147762e-51
log_prob = em5._estimate_log_prob(X)
for i in range(log_prob.shape[0]):
for j in range(log_prob.shape[1]):
if log_prob[i,j] >= 0:
print(i,j)
# 1 2
# 2 4
# 3 3
# 7 1
# 16 4
# only mixture 0 do not give positive log prob
em5.weights_
# array([ 0.75, 0.05, 0.05, 0.05, 0.1 ])
# indeed, 17 data points contribute to mixture 0
# 1 each for mixture 1:3, 2 points contribute to mixture 4
# this tally with earlier print on positive values in log_probReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels