Skip to content

[MRG+2] Modification of GaussianMixture class.#7123

Merged
agramfort merged 5 commits intoscikit-learn:masterfrom
tguillemot:gmm-modif
Aug 10, 2016
Merged

[MRG+2] Modification of GaussianMixture class.#7123
agramfort merged 5 commits intoscikit-learn:masterfrom
tguillemot:gmm-modif

Conversation

@tguillemot
Copy link
Copy Markdown
Contributor

The PR is to simplify the integration of the BayesianGaussianMixture class #6651.
I've simplify the GaussianMixture class by integrating a function which computes the determinant of the cholesky decomposition of the precision matrix (which will be usefull for BayesianGaussianMixture).

I've also corrected a bug during the EM process: normally, the lower bound is computed after the M-step not after the E-step. It's not a problem for GMM but that creates some problem for BayesianGaussianMixture.

@agramfort @ogrisel @amueller Can you have a look ?

The purpose here is to prepare the integration of BayesianGaussianMixture.
if do_init:
self._initialize_parameters(X)
current_log_likelihood, resp = self._e_step(X)
self.lower_bound_ = np.infty
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you document the new attribute?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, sorry for that mistake.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know np.infty also returns np.inf... Interesting...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be self.lower_bound_ = -np.infty ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been merged but I re-ask the question: shouldn't it be -np.infty ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoix Sorry I forgot that - indeed. I don't know why I've missed your comment.
Sorry for that. I've solved that on #7180

@agramfort
Copy link
Copy Markdown
Member

that's it for me

@tguillemot tguillemot changed the title Modification of GaussianMixture class. [MRG] Modification of GaussianMixture class. Aug 1, 2016
@tguillemot tguillemot changed the title [MRG] Modification of GaussianMixture class. [MRG+1] Modification of GaussianMixture class. Aug 2, 2016
@tguillemot
Copy link
Copy Markdown
Contributor Author

@amueller @ogrisel @raghavrv If you have time to do another review :)

X : array-like, shape (n_samples, n_features)
"""
n_samples = X.shape[0]
n_samples, _ = X.shape
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this was suggested before...

Copy link
Copy Markdown
Contributor Author

@tguillemot tguillemot Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just I prefer like this and it also check that the shape of X is a tuple.
Sorry for these useless little modifications.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thx!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also checks implicitly that X.ndim == 2. It do the same in my code.

n_iter_ : int
Number of step used by the best fit of EM to reach the convergence.

lower_bound_ : float
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower_bound_ -> best_log_likelihood_ ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact for GMM this is the best log likelihood but not for VBGMM which is lower bound.
I've chosen lower_bound_ because it was the most understandable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tguillemot
Copy link
Copy Markdown
Contributor Author

@amueller @ogrisel I really need this to be merged before you review #6651.

@tguillemot
Copy link
Copy Markdown
Contributor Author

@ngoix This PR is related to Bayesian Gaussian Mixture, if you have some time to review it.
Thx in advance :)

@agramfort
Copy link
Copy Markdown
Member

one more +1 and we're good here...


for n_iter in range(self.max_iter):
prev_log_likelihood = current_log_likelihood
prev_log_likelihood = self.lower_bound_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_log_likelihood -> prev_lower_bound ?

@ngoix
Copy link
Copy Markdown
Contributor

ngoix commented Aug 8, 2016

By testing the code, I realized that there is a problem which was here before this PR:
In gaussian_mixture._set_parameters(), self.covariances_ is not updated. This causes that with n_init > 1, the covariances outputed do not correspond to the means.


def _compute_lower_bound(self, _, log_prob_norm):
return log_prob_norm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _set_parameters below, compute self.covariances_ from self.precisions_.

@ngoix
Copy link
Copy Markdown
Contributor

ngoix commented Aug 8, 2016

That's all from me!

@tguillemot
Copy link
Copy Markdown
Contributor Author

@ngoix Indeed you're right. I've missed that. Thanks.
I've corrected that and added a test to check it.

@tguillemot
Copy link
Copy Markdown
Contributor Author

@agramfort merge ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants