Skip to content

[MRG] Ridge can use sample weights in feature space (X.T.dot(X) gram matrix)#3034

Merged
ogrisel merged 2 commits intoscikit-learn:masterfrom
eickenberg:ridge_sample_weights_in_feature_space
Apr 15, 2014
Merged

[MRG] Ridge can use sample weights in feature space (X.T.dot(X) gram matrix)#3034
ogrisel merged 2 commits intoscikit-learn:masterfrom
eickenberg:ridge_sample_weights_in_feature_space

Conversation

@eickenberg
Copy link
Copy Markdown
Contributor

Hello everybody,

here is an enhancement to the ridge_regression function: Up until now, if sample_weight is given, the decision tree imposes a treatment in sample space, which can lead to Memory errors if the number of samples is large even though the number of features is small. See http://stackoverflow.com/questions/22766978/sklearn-ridge-and-sample-weight-gives-memory-error for an external issue on this.

All I added was the possibility to treat sample weights in _solve_dense_cholesky and not only in _solve_dense_cholesky_kernel.

I provide a batch of tests checking that the results of these two functions are the same given sample weights.

Further I provide a test to ensure that the correct branch is chose in ridge_regression (i.e. use _solve_dense_cholesky if n_samples > n_features even if there are sample weights). The way I implemented this is that it will cause a MemoryError if the wrong branch is taken. By TDD I made sure it fails first - now it passes. Please comment on this practice: Evidently in this current, corrected case, the MemoryError will not be raised, but if ever the branching is wrong, and the computer tested has 2TB of memory available, this test will no longer raise a MemoryError. If there are better ways of checking the right branching, please let me know.

PS: There are some inconsistencies in function signature default values: _solve_dense_cholesky_kernel has sample_weight=None as default, which will, if called directly, be interpreted as though sample_weights are actually present. This has never caused an error, because the public function ridge_regression always calls with sample_weight=1. if no sample weights are given. Should I correct this in the present PR or elsewhere?

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same when pulling fd81c72 on eickenberg:ridge_sample_weights_in_feature_space into 42663e4 on scikit-learn:master.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should raise a ValueError is sample_weight.shape > 1 instead of calling ravel. I do not see any valid case for higher than 1D sample_weight.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise a ValueError is sample_weight.shape > 1 instead of
calling ravel.

👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample weight could have been 0-d, just a scalar. This was my way of making it an array. But it is true that the subsequent multiplications work with scalars as well as 1d arrays. The crucial point is that the array must not be higher dimensional, so yes, I will add this verification

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using at_least_1d in order to be sure to obtain something on which I can call .shape justifiable or do you prefer that I check isinstance(numbers.Number) vs isinstance(np.ndarray)?

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

About the test that checks the right branching in Ridge by giving data that would cause a memory error if Ridge decides to pick the wrong solver, I think it's fine.

If other people think that this is too dangerous (for instance if someone introduces a regression while refactoring the Ridge class on a developer box with 2TB+ of RAM), then we would need to find a way to instrument the ridge_regression function but that sounds too complicated.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

The docstring for the "solver" and "sample_weight" params need to be updated to document that new behavior in all the Ridge* classes and in the ridge_regression function.

@GaelVaroquaux
Copy link
Copy Markdown
Member

If other people think that this is too dangerous (for instance if someone
introduces a regression while refactoring the Ridge class on a developer box
with 2TB+ of RAM),

Yes, on an shared box (like drago), which would then screw up everybody
on the box. I am not too excited about it :$

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

About the sample_weight=1. default, I think we should switch to sample_weight=None by default to be consistent with other models such as SGDClassifier for instance, while still handling the sample_weight=1 case internally to keep backward compat...

@GaelVaroquaux
Copy link
Copy Markdown
Member

About the sample_weight=1. default, I think we should switch to sample_weight=
None by default to be consistent with other models

+1

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

I think the checks at line:

https://github.com/eickenberg/scikit-learn/blob/ridge_sample_weights_in_feature_space/sklearn/linear_model/ridge.py#L271

are not consistent: if the user passes a sparse matrix as input along with a sample_weight array the dense_cholesky solver will be used with a warning and then will probably crash. I think this should raise a ValueError with an explicit error message (and tests).

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

Yes, on an shared box (like drago), which would then screw up everybody on the box. I am not too excited about it :$

We can just remove that test then. Monkey-patching is too cumbersome just for a test I think.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Both branches in solver="dense_cholesky" use safe_sparse_dot. But it is true that I added a non-safe multiplication by sample weights in there. I will try to make it safe as elegantly as possible (it amounts to a multiplication with a diagonal matrix, which necessarily needs to be sparse, because otherwise enormous).
Independently of this I can change the warning to a ValueError also

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 3, 2014

I think I prefer ValueError instead of warning when solver is explicitly passed by the user (to the non default value). But @mblondel might prefer to keep the current behavior.

+1 for a test for the sample_weight + sparse data case if you can make it work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample weight error raised already in fit. Otherwise np.average, which is called by _center_data will raise an equivalent error message

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be useful on a more abstract level

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same when pulling 8a5e2a9 on eickenberg:ridge_sample_weights_in_feature_space into 42663e4 on scikit-learn:master.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 4, 2014

This PR seems to fix #1190 but maybe not as @mblondel intended.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 4, 2014

@eickenberg if you think you are done with the changes you planned on your side please prefix the title of the PR with the "[MRG]" tag to let reviewers know that this PR is ready for final review. Otherwise please put the "[WIP]" tag instead.

@eickenberg eickenberg changed the title [ENH] Ridge can use sample weights in feature space (X.T.dot(X) gram matrix) [MRG] Ridge can use sample weights in feature space (X.T.dot(X) gram matrix) Apr 4, 2014
@eickenberg
Copy link
Copy Markdown
Contributor Author

Switched to MRG. Happy to discuss further extensions / refactorings if they make the ridge story more consistent.
WRT issue #1190 : what I did was write the same loss function that is presented, but I do not distribute square roots of the sample weights, since they multiply back together at a later stage, or in fact can be kept together. I address sparse safe multiplication by using a scipy.sparse.dia_matrix instead of working directly on the data. This should not cause overhead (except for a copy of the data of course ... if this is an issue, we are in gradient descent territory anyway).

@eickenberg
Copy link
Copy Markdown
Contributor Author

I don't completely understand the (non-)failure of travis CI builds at some of my commits: I wrote a failing test for sparse matrices, it definitely fails on my machine, but travis says it passes eickenberg@8c9410d

Conversely, after fixing the failing test, travis says that one of its builds fails, but 2 pass eickenberg@ecb8a99

Same for the commit after PEP8+257 corrections

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are those 2 commented out? Please include them (the code should internally convert them to one of csr / csc).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this conversion be done within the ridge? Should I warn that the format is not appropriate?

Certain array checking functions (safe_as_array I think) seem to choke on these because they don't expose .data properly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in other estimators we do not issue a warning when a sparse matrix conversion is required. The fact that safe_asarray chokes might be a bug, but should probably be fixed in a separate PR. Please add an inline comment such as XXX: add support for other sparse matrix formats without data copy if possible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_asarray is already called in _BaseRidge.fit, so the conversion either needs to go before this. I'd prefer to check this within ridge_regression, but this would call for placing the safe_asarray logic within ridge_regression as well.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe_asarray is already called in _BaseRidge.fit, so the conversion either
needs to go before this. I'd prefer to check this within ridge_regression, but
this would call for placing the safe_asarray logic within ridge_regression as
well.

Sounds good to me.

@mblondel
Copy link
Copy Markdown
Member

mblondel commented Apr 4, 2014

The solution suggested in #1190 would still be useful for the "lsqr" solver (which is very fast on sparse data).

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 4, 2014

Indeed the test history looks weird. I checked that in the last build the sparse + sample_weight test is actually running. However it raises a warning:

Sample weights must work with sparse matrices ... /home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/ridge.py:283: UserWarning: sample_weight and class_weight not supported in sparse_cg, fall back to dense_cholesky.
"dense_cholesky." % solver)
ok

I think using the model with the default parameters should not raise a warning. If the input is sparse and sparse_weight are given and the solver is left to "auto", the model should automatically select the cholesky solver without warning and this should be documented in the docstring.

@mblondel
Copy link
Copy Markdown
Member

mblondel commented Apr 4, 2014

I'd suggest renaming dense_cholesky to just cholesky. dense_cholesky is misleading in that one may think that it cannot handle sparse data but it can.

@GaelVaroquaux
Copy link
Copy Markdown
Member

I'd suggest renaming dense_cholesky to just cholesky. dense_cholesky is
misleading in that one may think that it cannot handle sparse data but it can.

+1

@larsmans
Copy link
Copy Markdown
Member

I think you should rebase again. git pull --rebase upstream master (where upstream is the main repo), then git push -f origin ridge_sample_weights_in_feature_space.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Did another rebase which lead to an interesting telescopic sum of conflicting diffs :)

But I don't know if it now looks as it is supposed to.

@larsmans
Copy link
Copy Markdown
Member

Pretty much, but before merging I do prefer a further squash (though that's just my opinion). Ideally there should at least be no commits that leave the code base in a broken state, i.e. where the tests don't pass, since it should be possible to run git bisect to fix bugs and not be bothered by stuff not related to the bug you're chasing.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restrict lines to 80 cols. My editor doesn't like longer lines. Also the stuff about #1190 is an implementation detail, and this log is primarily for users to see if they need to upgrade anything.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same when pulling 878d264 on eickenberg:ridge_sample_weights_in_feature_space into fb795eb on scikit-learn:master.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 11, 2014

Apart from #3034 (comment) I think @larsmans would like most commits of this PR to be squashed and then we are good to go for merging this I think.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Simplified has_sw to has_sw = sample_weight is None for ridge_regression, Ridge, RidgeClassifier and _BaseRidge.

I didn't touch it in _RidgeGCV etc, because it is assumed to be 1 implicitly in the code if nothing was set (there are multiplications by sample_weight). As indicated before, I am very willing to undertake a full refactoring of this code, but right now my understanding of the analytical ridge loov/e calculations is purely theoretical (and the coding style in that part of the file somewhat different to what I have edited).

I can tackle it in this PR if it adds sufficiently to consistency.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Ok, it ended up being easy to make this work for RidgeGCV etc with sample_weight=None as default, but I had to set it to 1.0 if initialized with None in the body of RidgeClassifierCV.fit.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 14, 2014

@eickenberg do you need help for the squashing? Here is some doc:

http://gitready.com/advanced/2009/02/10/squashing-commits-with-rebase.html

… commit over around 20 commits to avoid failing tests
@eickenberg
Copy link
Copy Markdown
Contributor Author

Thanks for that. I think it worked. @fabianp encouraged me through the sticky parts :)

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 15, 2014

Looks good, for some reason travis did not pickup your new commit. I will run the tests on my box. If they pass I will merge.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Yes, something is weird - I am not getting update emails anymore either (saw your comment 20h later), so probably some hooks aren't running?

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 15, 2014

Hum the cause is a conflict in the whats new file as usual. I am on it.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Ok, thanks!

@ogrisel ogrisel merged commit 8307eee into scikit-learn:master Apr 15, 2014
@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 17, 2014

I forgot to say: thanks @eickenberg for the non-trivial fix!

@eickenberg
Copy link
Copy Markdown
Contributor Author

thanks for seeing it through with me :) @ogrisel @mblondel @larsmans @GaelVaroquaux @agramfort @fabianp

On Thursday, April 17, 2014, Olivier Grisel notifications@github.com
wrote:

I forgot to say: thanks @eickenberg https://github.com/eickenberg for
the non-trivial fix!


Reply to this email directly or view it on GitHubhttps://github.com//pull/3034#issuecomment-40746833
.

@mblondel
Copy link
Copy Markdown
Member

You should add your name to the header next time.

@ogrisel
Copy link
Copy Markdown
Member

ogrisel commented Apr 17, 2014

Indeed, please @eickenberg feel free to open a new PR for this and I will merge it right away.

@eickenberg
Copy link
Copy Markdown
Contributor Author

Haha OK I'll do that, because I love making pull requests! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants