[WIP] Sparse PCA online in the direction of samples by arthurmensch · Pull Request #4873 · scikit-learn/scikit-learn

arthurmensch · 2015-06-18T13:35:09Z

Following #4856, I implemented a dictionary learning algorithm that output a dense code and a sparse dictionary, which corresponds to the description of SPCA given in [1]. It alternated between ridge regression for learning code for a batch of samples, and projected block coordinate descent on the elastic net ball.

For each features (i.e. pixel in faces example) we project the n_component vector that contains all dictionary component value for this feature, which explicitly enforce dictionary components to be non overlapping (this differ from [1], where projection occurs in the feature space and not the component space)

For the moment there is no unit tests, but output of the method can be seen running example.decomposition.lot_decomposition_faces. As we use a large outer loop of n_feature iterations and an inner loop for elastic net projection, I wrote a cython extension for speed consideration.

Does this method seem to have an interest for the project ?

Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11, 19-60.

ogrisel · 2015-06-18T13:42:50Z

sklearn/decomposition/dict_learning.py

ols or least_squares would be a better (more explicit) name in my opinion.

Actually this will not yield a sparse code, so I wonder if adding this option here does not reflect a design problem.

ogrisel · 2015-06-18T14:10:03Z

I think this method is interesting but I think the class organisation needs a rework:

the current MiniBatchSparsePCA does minibatches in the feature space, I think this is an abuse. At least it's not consistent with other use of the MiniBatch prefix in scikit-learn. I would rather have called this BlockedCoordinateSparsePCA or something similar.
the new method would implement Sparse PCA with mini-batches in the sample space (via ENet projections of the components after each minibatch update if I understand correctly) but is currently implemented as part of the online dictionary learning code base for code reuse purpose.

However sparse PCA is not dictionary learning as the activations / code are not sparse. From a user point of you I find this really confusing. I think we should find a way to refactor the code to have the estimator name reflect the model name (the class of objective function with either sparse components or sparse code) and not the kind of solver used internally.

ogrisel · 2015-06-18T14:14:57Z

sklearn/decomposition/dict_learning.py

I think this should be called ~~projected_gradient~~ sparse_projection. Elastic net by itself is not a solver like lars and coordinate descent but rather a kind of penalty.

Edit: it's not the gradient that is projected, but the weights after an update, so projected_gradient is a bad name.

Also does is the current implementation well defined and equivalent to l1-ball projection when l1_ratio=1.0?

It should. Do we have existent l1-ball projection to compare with ? It is actually a block coordinate descent with projection, so cd_proj could be used, but it would be confusing as it describe the dictionary update algorithm and not the code learning algorithm

ogrisel · 2015-06-18T14:20:34Z

Out of curiosity what is the speed impact of using the python version of the Elastic Net projection? If the use of the the cython version is critical for speed, I think we should just write tests for that version and completely drop the Python version from the code base. There is no point in keeping 2 versions if one is always sub optimal.

arthurmensch · 2015-06-18T14:23:13Z

We gain a factor ~40 in performance on the face dataset. I kept the python version in the database for testing, but it should be included in a test file.

arthurmensch · 2015-06-19T11:39:28Z

Actually the online SPCA method presented in this PR can also enforce code sparsity : however it is often the case that you do not want both sparse code and sparse dictionary.

I think we could refactor as follow :

Allow user selection of code computation algorithm, ie {'cd', 'lars', 'ols'}, such allowing him to use a non sparse code. This would involve a parameter alpha controlling code sparsity.
Allow user selection of dictionary constraint, i.e. {'ordinary', 'sparse'}, with a parameter l1_ratio controlling dictionary sparsity.

MiniBatchDictionaryLearning would then default to sparse code, dense dictionary, while MiniBatchSparsePCA would default to sparse dictionary, sparse code, which is its expected behaviour as for version 0.16.

We could keep a class MiniBatchFeatureSparsePCA (or what not...) that would keep computing a sparse code and a dense dictionary by transposing X.

I still need to do some benchmarking to see if we do not loose considerable performance depending on X shape.

One of the problems with this approach is that the output of fit using new MiniBatchDictionaryLearning would be different using the same parameter, as there is no obvious correspondence between regulation parameters l1_ratio and alpha (which is used today to regulate code sparsity and thus component_ sparsity in MiniBatchSparsePCA).

arthurmensch · 2015-06-22T13:35:35Z

Sorry for the multiple commits, to sum up I added a tolerance based early stopping, using the surrogate function $\hat f (t)$ described in above reference, and I changed the API so we can control code learning algorithm ('lars, 'cd' (not working on master, but working using PR #4775 ), 'ols'), and dictionary learning constraints (either euclidian or enet, which is structure inducing).

arthurmensch · 2015-06-23T07:37:49Z

I tweaked the cython code so that it can release the gil. The for loop in _update_dict_enet is embarissingly parallel, are we allowed to use prange or should we always stick to joblib ?

ogrisel · 2015-06-23T08:20:54Z

prange cannot be currently used in scikit-learn because:

not all supported compilers (e.g. clang on OSX or MSVC 2008 on windows) implement OpenMP
the OpenMP implementation of gcc (libgomp) is still crashing when used in a program that uses multiprocessing somewhere else: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

However you can either use joblib with backend="threading" if you do not do many calls to parallel in a row (otherwise the thread pool creation and destruction will be killing the performance). Alternatively you can maintain you own pool (the use of contextlib is necessary to use the ThreadPool as a context manager in Python 2.6):

from contextlib import closing
from multiprocessing.pool import ThreadPool

with closing(ThreadPool(n_jobs)) as p:
    for X_batch in minibatch_iterator:
        # use the pool to apply a function in parallel for instance p.map

ogrisel · 2015-06-23T08:55:54Z

Note that we could also work to make joblib.Parallel a context manager with __enter__ and __exit__ methods to keep a managed pool internally for successive jobs. This requires a lot more work and would not bring any benefit fro the threading backend case. It could be useful for the multiprocessing backend though.

arthurmensch · 2015-06-23T09:41:38Z

Since we are only using a ThreadPool for enet version, I explicitly close at the end of dict_learning_online. I also tried to use the constructed Pool in sparse_encode, without gain in performance (I guess we need a multiprocessing Pool for that).

ogrisel · 2015-06-25T08:30:36Z

I changed this PR to [WIP] as we need to resolve the API issue. I suggest we introduce classes with name IncrementalDictionaryLearning and IncrementalSparsePCA to host the implementation of the cases where we have a partial_fit method and keep DictionaryLearning and SparsePCA for the cases where we cannot implement a partial_fit method (even if the fit method does some kind of internal 'online' optimization in the axis of the features).

We could then deprecate the current MiniBatchDictionaryLearning and MiniBatchSparsePCA as they are introducing confusion as they are not consistent.

ogrisel · 2015-06-25T08:31:44Z

@arthurmensch when you rebase and squash commit, please cleanup the full commit message: they are old phony messages included in the commit message for 0fbc54b.

arthurmensch · 2015-07-08T12:36:13Z

The major issue here is now a naming problem : this PR code could be used to change MiniBatchSparsePCA into a real online estimator.

However, we would have to keep present MiniBatchSparsePCA for the next few versions and deprecate it, while creating another estimator (IncrementalSparsePCA ?)

arthurmensch · 2015-07-08T12:40:15Z

sklearn/decomposition/dict_learning.py

This scaling is the one recommended in the Lasso litterature. I can write a test showing that sparsity level stays roughly the same no matter what input feature size is provided.

arthurmensch · 2015-07-13T10:25:20Z

Output from example :

Dataset consists of 400 faces
Extracting the top 30 Online Dictionary learning...
Iteration   0 (elapsed time:   0s,  0.0mn)
Iteration  10 (elapsed time:   0s,  0.0mn)
Iteration  20 (elapsed time:   0s,  0.0mn)
Iteration  30 (elapsed time:   0s,  0.0mn)
Iteration  40 (elapsed time:   0s,  0.0mn)
Iteration  50 (elapsed time:   1s,  0.0mn)
Iteration  60 (elapsed time:   1s,  0.0mn)
Iteration  70 (elapsed time:   1s,  0.0mn)
Iteration  80 (elapsed time:   1s,  0.0mn)
Iteration  90 (elapsed time:   2s,  0.0mn)
done in 2.416s
Online Dictionary learning - Component density
0.0601806640625
Code density
1.0

arthurmensch · 2015-07-31T08:31:54Z

I am closing this PR for the moment as it needs serious rework from my side

amueller · 2015-07-31T16:40:32Z

ok thanks for working on this :)

ogrisel reviewed Jun 18, 2015
View reviewed changes

arthurmensch force-pushed the spca_online_sample branch from 2d63da6 to 1411d21 Compare June 22, 2015 13:10

arthurmensch force-pushed the spca_online_sample branch from 3157d29 to 6e1644a Compare June 24, 2015 13:56

arthurmensch changed the title ~~[WIP] Sparse PCA online in the direction of samples~~ [MRG] Sparse PCA online in the direction of samples Jun 24, 2015

arthurmensch force-pushed the spca_online_sample branch 2 times, most recently from 871d2f0 to 0fbc54b Compare June 24, 2015 14:34

ogrisel changed the title ~~[MRG] Sparse PCA online in the direction of samples~~ [WIP] Sparse PCA online in the direction of samples Jun 25, 2015

ogrisel added the API label Jun 25, 2015

arthurmensch force-pushed the spca_online_sample branch from 0fbc54b to c0abd2d Compare June 25, 2015 09:51

arthurmensch mentioned this pull request Jun 25, 2015

PERF Use joblib.Parallel backend='threading' for sparse_encode when called with algorithm='lasso_cd' #4896

Closed

test

bf6cb89

arthurmensch mentioned this pull request Jul 8, 2015

Simple L1 dictionary learning nilearn/nilearn#638

Closed

1 task

arthurmensch reviewed Jul 8, 2015
View reviewed changes

ogrisel mentioned this pull request Jul 15, 2015

[MRG] New context manager API for joblib.Parallel joblib/joblib#221

Merged

vighneshbirodkar mentioned this pull request Jul 20, 2015

Object detect module scikit-image/scikit-image#1570

Closed

TomDLT mentioned this pull request Jul 22, 2015

our_rand_r is duplicated and should not be seeded with 0 #5015

Closed

Arthur Mensch and others added 2 commits July 27, 2015 13:46

Merge branch 'master' of github.com:scikit-learn/scikit-learn

ec7fcd9

Elastic net constraint on dictionary learning for new online sparse PCA

e37d1f2

arthurmensch force-pushed the spca_online_sample branch from a21ea54 to e37d1f2 Compare July 27, 2015 13:02

Arthur Mensch added 7 commits July 29, 2015 14:10

Normalizing dictionary before sparse coding

dcad7d1

Normalizing dictionary before sparse coding

0674ce0

Normalizing dictionary before sparse coding

e06aa45

WIP

47a996e

Tweak for initial projection

e31bca8

WIP

e152610

WIP

f193600

arthurmensch closed this Jul 31, 2015

arthurmensch mentioned this pull request Oct 19, 2015

[WIP] Online sparse PCA in the direction of samples #5462

Closed

Uh oh!

Conversation

arthurmensch commented Jun 18, 2015

Uh oh!

ogrisel Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 18, 2015

Uh oh!

ogrisel Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

arthurmensch Jun 18, 2015

Choose a reason for hiding this comment

Uh oh!

arthurmensch Jun 19, 2015

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 18, 2015

Uh oh!

arthurmensch commented Jun 18, 2015

Uh oh!

arthurmensch commented Jun 19, 2015

Uh oh!

arthurmensch commented Jun 22, 2015

Uh oh!

arthurmensch commented Jun 23, 2015

Uh oh!

ogrisel commented Jun 23, 2015

Uh oh!

ogrisel commented Jun 23, 2015

Uh oh!

arthurmensch commented Jun 23, 2015

Uh oh!

ogrisel commented Jun 25, 2015

Uh oh!

ogrisel commented Jun 25, 2015

Uh oh!

arthurmensch commented Jul 8, 2015

Uh oh!

arthurmensch Jul 8, 2015

Choose a reason for hiding this comment

Uh oh!

arthurmensch commented Jul 13, 2015

Uh oh!

arthurmensch commented Jul 31, 2015

Uh oh!

amueller commented Jul 31, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants