[MRG] Integration of GSoC2015 Gaussian Mixture (first step)#6407
[MRG] Integration of GSoC2015 Gaussian Mixture (first step)#6407tguillemot wants to merge 3 commits intoscikit-learn:masterfrom
Conversation
4476dc3 to
2f1200e
Compare
sklearn/mixture/gmm.py
Outdated
| ------- | ||
| X : array, (n_samples, n_features) | ||
| """ | ||
| X = check_array(X, dtype=[np.float64, np.float32], ensure_2d=False) |
There was a problem hiding this comment.
why not having ensure_2d=True, allow_nd=False ?
There was a problem hiding this comment.
also if you check X here, you can remove the check in GMM._fit
|
I don't understand why you work on |
|
I think the goal is to make the diff readable, and then to do a different PR with a simple file renaming |
|
Well, the new classes in my PR have huge differences. That's not a small amount of diff. |
|
@TomDLT Thanks for your comments, I will correct all those things fastly.
Indeed. The only thing I've done for now is to add the class _MixtureBase and modify the old GMM class to derive from it. I plan next to do another PR to move _MixtureBase into base.py. @xuewei4d Maybe I didn't explain my goal correctly and I'm sorry for that. Of course your PR has huge differences and obviously I have noticed it. My purpose is to merge all your work. For me, the best solution is to merge it step by step. It is what I've proposed with this PR. Of course, it's just a little part of your work but it's easier to analyse it in that way. |
|
The problem is to deprecate the old So I would suggest to:
This |
|
And include |
|
In fact, I thought add |
e7841e7 to
a1cea27
Compare
sklearn/mixture/gmm.py
Outdated
|
|
||
|
|
||
| class GMM(BaseEstimator): | ||
| class _GMMBase(_MixtureBase): |
There was a problem hiding this comment.
This should not inherit from the new _MixtureBase class. Just BaseEstimator as previously. We do not want to change the implementation and behavior of the old GMM class in any way besides issuing a deprecation warning in the init.
There was a problem hiding this comment.
In that case, I have to send all the code of _MixtureBase and GaussianMixture in the same PR.
To simplify the review, I can propose to split the code in different commit. I don't see a better way of not pushing 1000 lines at once.
There was a problem hiding this comment.
yes but just the code for GaussianMixture (not the variational Bayes sub classes).
|
@tguillemot any news on this PR? |
|
Sorry for being late. I'll send everything before the end of the day. |
a7922ff to
a878b60
Compare
sklearn/mixture/base.py
Outdated
| return logsumexp(self._estimate_weighted_log_prob(X), axis=1) | ||
|
|
||
| def score(self, X, y=None): | ||
| """Compute the log-likelihood of the model on the given data X. |
There was a problem hiding this comment.
The docstring is not correct: this is the per-sample average log-likelihood: as we use p.mean() instead of p.sum().
Alright but I don't understand why dividing tol by Also shouldn't we divide tol the log-likelihood change by |
|
My mistake on the old class it's not divide by |
|
Ok great that makes sense to me then. It would be good to check how other packages check convergence for GMM trained with EM: http://www.mixmod.org |
| for cov_type in COVARIANCE_TYPE: | ||
| gmm = GaussianMixture(n_components=n_samples, covariance_type=cov_type, | ||
| reg_covar=0) | ||
| assert_raise_message(ValueError, |
There was a problem hiding this comment.
I wonder if we should not introduce a dedicated exception such as DivergenceError in sklearn.exceptions but maybe we can stick to ValueError for now.
There was a problem hiding this comment.
I will do another PR for that once we have finished with the GMM, ok ?
2930805 to
13a8ccf
Compare
|
Ok I need help to solve the travis problem. Do you have any explination ? |
This is weird. Maybe try to squash and rebase on top of master. |
b0eed8a to
628e6a5
Compare
|
Another solution could be to propose a new PR with a new branch. |
|
or # delete your branch
git checkout GSoC-integration-step1
git checkout -b GSoC-integration-step1_backup
git branch -D GSoC-integration-step1
# create new branch with same name
git checkout master
git checkout -b GSoC-integration-step1
# cherry-pick and push
git cherry-pick 628e6a56c861aa582cf7f6d4032c9bff24bd07de
git push -f origin GSoC-integration-step1
git branch -D GSoC-integration-step1_backupMaybe do some |
9f7ece7 to
2201ef7
Compare
Depreciation of the GMM class. Modification of the GaussianMixture class. Some functions from the original GSoC code have been removed, renamed or simplified. Some new functions have been introduced (as the 'check_parameters' function). Some parameters names have been changed : - covars_ -> covariances_ : to be coherent with sklearn/covariances Addition of the parameter 'warm_start' allowing to fit data by using the previous computation. The old examples have been modified to replace the deprecated GMM class by the new GaussianMixture class. Every exemple use the eigenvectors norm to solve the scale ellipse problem (Issues 6548). Correction of all commentaries from the PR - Rename MixtureBase -> BaseMixture - Remove n_features_ - Fix some problems - Add some tests Correction of the bic/aic test. Fix the test_check_means and test_check_covariances. Remove all references to the deprecated GMM class. Remove initialized_. Add and correct docstring. Correct the order of random_state. Fix small typo. Some fix in prevision of the integration of the new BayesianGaussianMixture class. Modification in preparation of the integration of the BayesianGaussianMixture class. Add 'best_n_iter' attribute. Fix some bugs and tests. Change the parameter order in the documentation. Change best_n_iter_ name to n_iter_. Fix of the warm_start problem. Fix the divergence error message. Correction of the random state init in the test file. Fix the testing problems.
2201ef7 to
12d66fc
Compare
|
Ok I've tried your solution and also try to do it manually and it still doesn't work. |
|
Interestingly you solved the probem by adding the file, but unsolved it by removing the file (only py35 though) |
|
To solve the problem with the strange travis bug, I've created another PR #6666. |
@xuewei4d has done a huge work for the GM classes.
Nevertheless it seems that #4802 is not integrated yet. I thought it was because of the length of the contribution and the difficulty to measure the efficiency of the new classes with respect to the previous ones.
I propose to integrate this work step by step in several pull requests (400 lines max each time).
This pull request introduces the new abstract class _MixtureBase with some common GMM attributes and methods. Now, GMM derives from _MixtureBase.
@ogrisel @amueller @agramfort