[MRG] MNT Clean some deprecation stuff for 0.21 by jeremiedbb · Pull Request #13443 · scikit-learn/scikit-learn

jeremiedbb · 2019-03-13T17:16:31Z

Remove some deprecated params from 0.19
Change the max_iter and tol defaults of sgd (and everything depending on it).

jeremiedbb · 2019-03-13T17:20:19Z

sklearn/linear_model/tests/test_sgd.py

-
-@ignore_warnings(category=(DeprecationWarning, FutureWarning))
 def test_tol_and_max_iter_default_values():
    # Test that the default values are correctly changed


I'm not sure this whole test is relevant anymore

Yeah, doesn't look necessary, certainly not if we remove those private attributes

jnothman · 2019-03-13T21:47:24Z

Yes, this is a good idea ;)

rth · 2019-03-13T21:48:15Z

sklearn/feature_extraction/tests/test_feature_hasher.py

-    assert Xt.data.min() < 0 and Xt.data.max() > 0
-    Xt = FeatureHasher(alternate_sign=True, non_negative=True,
-                       input_type="pair").fit_transform(X)
-    assert Xt.data.min() > 0


I think it doesn't hurt to keep this test (but remove the non_negative=False parts), unless you think it's redundant?

This test without the non-negative argument is the exact same test as test_hasher_alter_sign.

rth · 2019-03-13T21:53:12Z

sklearn/utils/validation.py

-        .. deprecated:: 0.19
-           Passing 'None' to parameter ``accept_sparse`` in methods is
-           deprecated in version 0.19 "and will be removed in 0.21. Use
-           ``accept_sparse=False`` instead.


I wonder if we should keep this notice, the parameter is still there after all? The wording is not great, one cannot "remove passing 'None' to parameter accept_sparse"

What happens when one passes accept_sparse=None with this change?

If X is dense, accept_sparse is ignored as it was before.
If X is sparse, accept_sparse=None raises an informative error:
Parameter 'accept_sparse' should be a string, boolean or list of strings. You provided 'accept_sparse=None'
I think the behavior is good.

jeremiedbb · 2019-03-14T09:10:17Z

The failing tests come from test_estimators with name=PassiveAgressiveClassifier and check=check_class_weight_classifiers. It's due to the change of default for the tol parameter from None to 1e-3. Now it converges too quickly (only ~20 iter although max_iter=1000) and the score is not high enough.

Do I have to add a special case for this estimator and set a smaller tol ?

jnothman · 2019-03-17T22:35:15Z

Is this a bad default tol for PassiveAggressiveClassifier?

We could just add the poor_score tag to PassiveAggressiveClassifier and perhaps note that this is due to too large a tolerance for now? @amueller, wdyt?

jeremiedbb · 2019-03-19T13:44:41Z

Ok so it turns out it's not a tol issue. The issue is that due to very noisy data and very small number of samples, the loss is very noisy in the first iterations.

For now I fixed it by setting the n_iter_no_change parameter to 20.
(I also tried to increase the number of samples but I had to go up to 500 and since it affects all the estimators the duration of the test would be significantly longer)

jeremiedbb · 2019-03-19T13:48:30Z

after discussing with @ogrisel, we thought that the default params of SGD might not be best for very small datasets. We could change n_iter_no_change to "auto", which would be higher for very small datasets. We could instead add another parameter min_iter. What do you think ?

jeremiedbb · 2019-03-19T16:57:35Z

I forgot some stuff in sgd (deprecated loss_function) and in _split (change of test_size default value).
I should have everything now.

jnothman

min_iter in SGD might be reasonable except that it still might be hard to set a good value that is independent of dataset size. Proposed heuristic for n_iter_no_change='auto'?

jnothman · 2019-03-20T06:38:27Z

sklearn/linear_model/stochastic_gradient.py

        The stopping criterion. If it is not None, the iterations will stop
-        when (loss > previous_loss - tol). Defaults to None.
-        Defaults to 1e-3 from 0.21.
+        when (loss > previous_loss - tol).


This should probably be updated to reflect n_iter_no_change

Not essential to this PR

jnothman · 2019-03-20T06:40:18Z

sklearn/linear_model/stochastic_gradient.py

            return
-        # n_iter deprecation, set self._max_iter, self._tol
+
        self._tol = self.tol


I'm not sure we need these private attributes anymore. I think they were invented for the deprecation

right, I removed them

jnothman · 2019-03-20T06:41:41Z

sklearn/linear_model/tests/test_sgd.py

-
-@ignore_warnings(category=(DeprecationWarning, FutureWarning))
 def test_tol_and_max_iter_default_values():
    # Test that the default values are correctly changed


Yeah, doesn't look necessary, certainly not if we remove those private attributes

jnothman · 2019-03-20T06:45:33Z

sklearn/model_selection/_split.py

-        The default will change in version 0.21. It will remain 0.2 only
-        if ``train_size`` is unspecified, otherwise it will complement
-        the specified ``train_size``.
+        complement of the train size.


I'm not sure how you implement this by setting test_size=0.2 by default. We still need a placeholder value that acts like complement or 0.2 when unspecified. We still want to support the case that both train_size and test_size are set explicitly

There's a validation function for train_size and test_size: _validate_shuffle_split.
It does what's described. I just added an error raise when both are None for train_test_split.

I could be mistaken, but I don't think you're right. We agree on:

ShuffleSplit() => (train=90%, test=10%) ShuffleSplit(test_size=.2) => (train=80%, test=20%) ShuffleSplit(test_size=.2, train_size=.5) => (train=50%, test=20%)

But while I think this PR (and IIRC the code before 0.19) does:

ShuffleSplit(train_size=.8) => (train=80%, test=10%)

the documented (and desired) behaviour is:

ShuffleSplit(train_size=.8) => (train=80%, test=20%)

Similarly

ShuffleSplit(train_size=.99) => (train=99%, test=1%)

not an error.

ahhh ok I misunderstood the desired behavior.

I don't see how we can have a default test_size=0.2 and both

ShuffleSplit(train_size=.5) => (train=50%, test=50%) ShuffleSplit(train_size=.5, test_size=.2) => (train=50%, test=20%)

there's no way to know that test_size=0.2 has been explicitely set by a user.
What we could do is set the default to None and:

set it to 0.2 if train_size is None

complement train_size otherwise

I moved this part to #13483 because it needs more work

ogrisel

Thank you very much, this looks good. Merging.

…#13443)" This reverts commit f255273.

jeremiedbb added 9 commits March 13, 2019 16:31

remove deprecations in feature_extraction

2d29140

remove deprecated in validation

c7361e5

remove deprecated n_iter in sgd

090f922

remove deprecated n_iter in sgd

1a9a280

remove deprecated n_iter in sgd

248dbfe

remove deprecated in validation

3731e7d

remove deprecated from gradient boosting

a749f0f

change default max_iter tol in sgd

9e3baea

n_iter tol defaults stuff

b976e91

jeremiedbb commented Mar 13, 2019

View reviewed changes

undo set tol

8548b6a

jnothman added this to the 0.21 milestone Mar 13, 2019

jnothman added the Blocker label Mar 13, 2019

rth reviewed Mar 13, 2019

View reviewed changes

reduce noise in common test

8ebe721

jeremiedbb added 6 commits March 19, 2019 15:19

increase n_samples

eb3c9fe

n_samples <-> n_iter_no_change

503c89d

remove loss_function from sgd

847552c

n_iter, tol in doc

c7a3da9

default test_size in _split

3f0efb7

merge master

9978680

jnothman approved these changes Mar 20, 2019

View reviewed changes

jeremiedbb added 2 commits March 20, 2019 10:12

remove _tol & _max_iter in sgd

34ff022

error when trai_size & test_size both None

bc84029

jeremiedbb added 4 commits March 20, 2019 10:43

improve sgd tol docstring

e92597f

good behavior for new test train size default

4c54b93

remove split related

19c2fea

remove split related

4a95c0d

jeremiedbb mentioned this pull request Mar 20, 2019

[MRG] Update default test_size of ShuffleSplit for 0.21 #13483

Merged

jnothman approved these changes Mar 20, 2019

View reviewed changes

ogrisel approved these changes Mar 21, 2019

View reviewed changes

ogrisel merged commit e574990 into scikit-learn:master Mar 21, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

[MRG] MNT Clean some deprecation stuff for 0.21 (scikit-learn#13443)

f255273

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG] MNT Clean some deprecation stuff for 0.21 (scikit-learn…

e6bb735

…#13443)" This reverts commit f255273.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG] MNT Clean some deprecation stuff for 0.21 (scikit-learn…

2a82368

…#13443)" This reverts commit f255273.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

[MRG] MNT Clean some deprecation stuff for 0.21 (scikit-learn#13443)

300ab9d

jeremiedbb deleted the clean-deprecation-0.21 branch July 20, 2020 14:59

Uh oh!

Conversation

jeremiedbb commented Mar 13, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 13, 2019 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Mar 17, 2019

Uh oh!

jeremiedbb commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb commented Mar 19, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeremiedbb commented Mar 14, 2019 •

edited

Loading

jeremiedbb commented Mar 19, 2019 •

edited

Loading

jeremiedbb commented Mar 19, 2019 •

edited

Loading