[MRG] Add jitter to LassoLars by angelaambroz · Pull Request #15179 · scikit-learn/scikit-learn

angelaambroz · 2019-10-11T01:02:56Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Following the discussion in the above issue, this PR adds a jitter keyword argument to the Lars and LassoLars classes. The argument, defaulted to 0.0001, applies uniformly distributed noise, bounded by jitter, to the y variable when fitting.

Any other comments?

Main question:

By adding this random noise, we start to get failing tests in test_least_angle.py from differences in floating point estimates for model coefficients. One way to address this would be loosening the precision of the assert_array_almost_equal() statements. Is this how we'd like to proceed?
Does it make sense to set a np.random.seed() and "lock in" the noise so that the model gives predictable results once it's been imported, even if the user keeps instantiating and fitting it? I think no, but... I'm not sure.

Thanks to all the maintainers of sklearn! It's a great project. Thanks for the contributing guidelines also, those were very helpful.

sklearn/linear_model/least_angle.py

angelaambroz · 2019-10-15T00:30:27Z

Addressing MR comments (thanks, @agramfort):

Going down one order of magnitute on jitter to 0.001 from 0.0001.
Fixing the randomness with RandomState (this answers my original Q - thanks).

Still pending: 3 tests fail now due to floating point precision stuff. Should I just loosen the precision of those to let things pass?

Also: I've moved the jitter default value up into a global var (JITTER, at the top), since I didn't like repeating it in the class definitions twice (for Lars and LassoLars). Not sure if this is stylistically okay.

agramfort · 2019-10-20T09:08:23Z

@angelaambroz please do not use a jitter by default. It's working for many people as it is now. Don't affect the behavior of many just for your usecase. thanks

angelaambroz · 2019-10-21T14:13:59Z

@agramfort Not actually my use case - I was addressing what was discussed in #2746. A default of 10e-5 was discussed there; though I didn't get specific confirmation on that being the agreed-on default value. I can default jitter=None, so that it won't affect existing implementations (and the tests), but keep it as a kwarg if folks want to change it. Does that work?

agramfort · 2019-10-21T19:59:48Z

yes that would be fine with me. thanks

…

agramfort · 2019-10-25T16:58:16Z

@angelaambroz there are unrelated changes in the diff (in cython files). This needs to be cleaned up. thanks

angelaambroz · 2019-10-25T17:02:54Z

Ack, unintended! Will fix shortly.

angelaambroz · 2019-10-25T19:55:01Z

The error message I'm seeing in the Azure pipeline is:

    ImportError:
    Importing the multiarray numpy extension module failed.  Most
    likely you are trying to import a failed build of numpy.
    If you're working with a numpy git repo, try `git clean -xdf` (removes all
    files not under version control).  Otherwise reinstall numpy.
    
    Original error was: cannot import name 'multiarray' from partially initialized module 'numpy.core' (most likely due to a circular import) (/tmp/pip-build-env-zfiic5r2/overlay/lib/python3.8/site-packages/numpy/core/__init__.py)

I expect this is something orthogonal to my changes - I don't see how my diff could have affected numpy stuff. Help?

rth · 2019-10-30T21:24:21Z

sklearn/linear_model/_least_angle.py

                 precompute='auto', n_nonzero_coefs=500,
-                 eps=np.finfo(np.float).eps, copy_X=True, fit_path=True):
+                 eps=np.finfo(np.float).eps, copy_X=True, fit_path=True,
+                 jitter=DEFAULT_JITTER):


Suggested change

jitter=DEFAULT_JITTER):

jitter=None):

rth · 2019-10-30T21:24:33Z

sklearn/linear_model/_least_angle.py

 from ..exceptions import ConvergenceWarning

 SOLVE_TRIANGULAR_ARGS = {'check_finite': False}
+DEFAULT_JITTER = None


This can be removed.

rth · 2019-10-30T21:26:13Z

sklearn/linear_model/_least_angle.py

        with a small alpha.

+    jitter : float, default=None
+        Uniform noise parameter, added to the y values, to satisfy \


Please remove trailing \

rth · 2019-10-30T21:26:41Z

sklearn/linear_model/_least_angle.py

    LassoLars(alpha=0.01)
    >>> print(reg.coef_)
-    [ 0.         -0.963257...]
+    [ 0.         -0.9632...]


Please revert this, the default shouldn't have changed.

rth · 2019-10-30T21:27:02Z

sklearn/linear_model/_least_angle.py

                 normalize=True, precompute='auto', max_iter=500,
                 eps=np.finfo(np.float).eps, copy_X=True, fit_path=True,
-                 positive=False):
+                 positive=False, jitter=DEFAULT_JITTER):


Suggested change

positive=False, jitter=DEFAULT_JITTER):

positive=False, jitter=None):

rth · 2019-10-30T21:28:46Z

I expect this is something orthogonal to my changes - I don't see how my diff could have affected numpy stuff. Help?

Yes, merging master in might help.

doc/modules/linear_model.rst

sklearn/linear_model/_least_angle.py

agramfort · 2019-12-13T18:15:03Z

sklearn/linear_model/_least_angle.py

        else:
            max_iter = self.max_iter

+        if self.jitter:


Suggested change

if self.jitter:

if self.jitter is not None:

sklearn/linear_model/_least_angle.py

agramfort · 2019-12-13T18:18:57Z

sklearn/linear_model/tests/test_least_angle.py

+    y = np.array(y_list)
+    expected_output = np.array(expected_y)
+    alpha = 0.001
+    fit_intercept = False


why forcing fit_intercept = False?

This is the at the edge of my stats/linear algebra understanding, but I think we need to force it to be False since the error only occurs for exactly aligned values (e.g. this comment).

from your comment I understand that the test would not be a non-regression test with fit_intercept=True as you only see an error with fit_intercept. However below the X and y you chose do work too with jitter=None. Did you check that the test still pass with fit_intercept=True?

Since the target is constant (-2.5, -2.5), fitting with the intercept would mean the coeffs are just all 0 (and the intercept is -2.5).

So I guess it makes sense to leave fit_intercept=False. It properly reproduces the original example from the issue

NicolasHug

Thanks for the PR and for your patience so far @angelaambroz.

Made a few comments. Are you available to make the changes? Otherwise we'll do it.
This also needs an entry in doc/whats_new/v0.23.rst

Thanks!

NicolasHug · 2020-04-06T12:50:05Z

sklearn/linear_model/tests/test_least_angle.py

+    y = np.array(y_list)
+    expected_output = np.array(expected_y)
+    alpha = 0.001
+    fit_intercept = False


Since the target is constant (-2.5, -2.5), fitting with the intercept would mean the coeffs are just all 0 (and the intercept is -2.5).

So I guess it makes sense to leave fit_intercept=False. It properly reproduces the original example from the issue

sklearn/linear_model/tests/test_least_angle.py

NicolasHug · 2020-04-06T12:55:53Z

sklearn/linear_model/tests/test_least_angle.py

+    w_nojitter = lars.coef_
+    w_jitter = lars_with_jitter.coef_
+
+    assert not np.array_equal(w_jitter, w_nojitter)


This is too easy to pass, so maybe instead check the MSD

assert np.mean((w_jitter - w_nojitter)**2) > .1

sklearn/linear_model/_least_angle.py

adrinjalali · 2020-04-15T12:56:18Z

@NicolasHug seems like @angelaambroz may not be available for this one. Would you like to take it over? (trying to clean up the milestone and prepare for release).

NicolasHug · 2020-04-15T12:58:16Z

yup I'll do it.
@angelaambroz feel free to check in whenever and take it over again

…sue_2746

agramfort · 2020-04-16T18:51:48Z

sklearn/linear_model/_least_angle.py

                 normalize=True, precompute='auto', max_iter=500,
-                 eps=np.finfo(np.float).eps, copy_X=True, positive=False):
+                 eps=np.finfo(np.float).eps, copy_X=True, positive=False,
+                 random_state=None):


why adding a random_state param here and not jitter?

agramfort · 2020-04-17T15:14:34Z

thx @angelaambroz and @NicolasHug !

* Adding jitter to LassoLars fit * CircleCI fail * MR comments * Jitter becomes default, added test based on issue description * flake8 fixes * Removing unexpected cython files * Better coverage * PR comments * PR comments * PR comments * PR comments * PR comments * Linting * Apply suggestions from code review * addressed comments * added whatnew entry * test both estimators * update whatsnew * removed random_state for lassolarsIC Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

angelaambroz added 2 commits October 10, 2019 20:39

Adding jitter to LassoLars fit

1b182c6

CircleCI fail

7db0301

agramfort reviewed Oct 12, 2019

View reviewed changes

sklearn/linear_model/least_angle.py Outdated Show resolved Hide resolved

agramfort reviewed Oct 12, 2019

View reviewed changes

sklearn/linear_model/least_angle.py Outdated Show resolved Hide resolved

MR comments

d075950

angelaambroz added 3 commits October 25, 2019 10:51

Jitter becomes default, added test based on issue description

7b43af8

Merge remote-tracking branch 'upstream/master' into issue_2746

dae942e

flake8 fixes

b6c11fa

Removing unexpected cython files

9c77fbe

angelaambroz changed the title ~~[WIP] Adding jitter to LassoLars fit~~ ENH: Adding jitter to LassoLars fit Oct 25, 2019

Better coverage

038b7d1

rth reviewed Oct 30, 2019

View reviewed changes

angelaambroz added 2 commits November 2, 2019 18:42

PR comments

f1cefce

Merge branch 'master' into issue_2746

233343a

agramfort reviewed Nov 24, 2019

View reviewed changes

doc/modules/linear_model.rst Outdated Show resolved Hide resolved

sklearn/linear_model/_least_angle.py Outdated Show resolved Hide resolved

sklearn/linear_model/_least_angle.py Outdated Show resolved Hide resolved

angelaambroz added 4 commits December 13, 2019 11:04

PR comments

d82ce51

Merge branch 'master' into issue_2746

a2dec09

PR comments

ec5156c

PR comments

b817abf

agramfort reviewed Dec 13, 2019

View reviewed changes

angelaambroz added 2 commits December 13, 2019 15:10

PR comments

6c3a24f

Merge conflict

9e4a81c

Linting

f6e7aa5

github-actions bot added the module:linear_model label Mar 2, 2020

NicolasHug reviewed Apr 6, 2020

View reviewed changes

NicolasHug changed the title ~~ENH: Adding jitter to LassoLars fit~~ [MRG] Add jitter to LassoLars Apr 6, 2020

NicolasHug added this to the 0.23 milestone Apr 6, 2020

NicolasHug added 6 commits April 15, 2020 09:04

Apply suggestions from code review

0a19225

Merge branch 'master' of github.com:scikit-learn/scikit-learn into is…

51375cc

…sue_2746

addressed comments

58e807b

added whatnew entry

547d416

test both estimators

e9839f5

update whatsnew

09e3cc3

NicolasHug approved these changes Apr 16, 2020

View reviewed changes

agramfort reviewed Apr 16, 2020

View reviewed changes

removed random_state for lassolarsIC

24e7b81

agramfort merged commit abfb6fd into scikit-learn:master Apr 17, 2020

thomasjpfan mentioned this pull request Apr 20, 2020

Correctness issue in LassoLars for unluckily aligned values #2746

Closed

	positive=False, jitter=DEFAULT_JITTER):
	positive=False, jitter=None):

Uh oh!

Conversation

angelaambroz commented Oct 11, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

angelaambroz commented Oct 15, 2019

Uh oh!

agramfort commented Oct 20, 2019

Uh oh!

angelaambroz commented Oct 21, 2019

Uh oh!

agramfort commented Oct 21, 2019 via email

Uh oh!

agramfort commented Oct 25, 2019

Uh oh!

angelaambroz commented Oct 25, 2019

Uh oh!

angelaambroz commented Oct 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rth commented Oct 30, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali commented Apr 15, 2020

Uh oh!

NicolasHug commented Apr 15, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone