[MRG] MNT Clean some deprecation stuff for 0.21#13443
[MRG] MNT Clean some deprecation stuff for 0.21#13443ogrisel merged 23 commits intoscikit-learn:masterfrom
Conversation
|
|
||
| @ignore_warnings(category=(DeprecationWarning, FutureWarning)) | ||
| def test_tol_and_max_iter_default_values(): | ||
| # Test that the default values are correctly changed |
There was a problem hiding this comment.
I'm not sure this whole test is relevant anymore
There was a problem hiding this comment.
Yeah, doesn't look necessary, certainly not if we remove those private attributes
|
Yes, this is a good idea ;)
|
| assert Xt.data.min() < 0 and Xt.data.max() > 0 | ||
| Xt = FeatureHasher(alternate_sign=True, non_negative=True, | ||
| input_type="pair").fit_transform(X) | ||
| assert Xt.data.min() > 0 |
There was a problem hiding this comment.
I think it doesn't hurt to keep this test (but remove the non_negative=False parts), unless you think it's redundant?
There was a problem hiding this comment.
This test without the non-negative argument is the exact same test as test_hasher_alter_sign.
| .. deprecated:: 0.19 | ||
| Passing 'None' to parameter ``accept_sparse`` in methods is | ||
| deprecated in version 0.19 "and will be removed in 0.21. Use | ||
| ``accept_sparse=False`` instead. |
There was a problem hiding this comment.
I wonder if we should keep this notice, the parameter is still there after all? The wording is not great, one cannot "remove passing 'None' to parameter accept_sparse"
What happens when one passes accept_sparse=None with this change?
There was a problem hiding this comment.
If X is dense, accept_sparse is ignored as it was before.
If X is sparse, accept_sparse=None raises an informative error:
Parameter 'accept_sparse' should be a string, boolean or list of strings. You provided 'accept_sparse=None'
I think the behavior is good.
|
The failing tests come from Do I have to add a special case for this estimator and set a smaller tol ? |
|
Is this a bad default tol for We could just add the |
|
Ok so it turns out it's not a For now I fixed it by setting the |
|
after discussing with @ogrisel, we thought that the default params of SGD might not be best for very small datasets. We could change |
|
I forgot some stuff in sgd (deprecated |
jnothman
left a comment
There was a problem hiding this comment.
min_iter in SGD might be reasonable except that it still might be hard to set a good value that is independent of dataset size. Proposed heuristic for n_iter_no_change='auto'?
| The stopping criterion. If it is not None, the iterations will stop | ||
| when (loss > previous_loss - tol). Defaults to None. | ||
| Defaults to 1e-3 from 0.21. | ||
| when (loss > previous_loss - tol). |
There was a problem hiding this comment.
This should probably be updated to reflect n_iter_no_change
| return | ||
| # n_iter deprecation, set self._max_iter, self._tol | ||
|
|
||
| self._tol = self.tol |
There was a problem hiding this comment.
I'm not sure we need these private attributes anymore. I think they were invented for the deprecation
There was a problem hiding this comment.
right, I removed them
|
|
||
| @ignore_warnings(category=(DeprecationWarning, FutureWarning)) | ||
| def test_tol_and_max_iter_default_values(): | ||
| # Test that the default values are correctly changed |
There was a problem hiding this comment.
Yeah, doesn't look necessary, certainly not if we remove those private attributes
sklearn/model_selection/_split.py
Outdated
| The default will change in version 0.21. It will remain 0.2 only | ||
| if ``train_size`` is unspecified, otherwise it will complement | ||
| the specified ``train_size``. | ||
| complement of the train size. |
There was a problem hiding this comment.
I'm not sure how you implement this by setting test_size=0.2 by default. We still need a placeholder value that acts like complement or 0.2 when unspecified. We still want to support the case that both train_size and test_size are set explicitly
There was a problem hiding this comment.
There's a validation function for train_size and test_size: _validate_shuffle_split.
It does what's described. I just added an error raise when both are None for train_test_split.
There was a problem hiding this comment.
I could be mistaken, but I don't think you're right. We agree on:
ShuffleSplit() => (train=90%, test=10%)
ShuffleSplit(test_size=.2) => (train=80%, test=20%)
ShuffleSplit(test_size=.2, train_size=.5) => (train=50%, test=20%)But while I think this PR (and IIRC the code before 0.19) does:
ShuffleSplit(train_size=.8) => (train=80%, test=10%)the documented (and desired) behaviour is:
ShuffleSplit(train_size=.8) => (train=80%, test=20%)Similarly
ShuffleSplit(train_size=.99) => (train=99%, test=1%)not an error.
There was a problem hiding this comment.
ahhh ok I misunderstood the desired behavior.
There was a problem hiding this comment.
I don't see how we can have a default test_size=0.2 and both
ShuffleSplit(train_size=.5) => (train=50%, test=50%)
ShuffleSplit(train_size=.5, test_size=.2) => (train=50%, test=20%)there's no way to know that test_size=0.2 has been explicitely set by a user.
What we could do is set the default to None and:
- set it to 0.2 if train_size is None
- complement train_size otherwise
There was a problem hiding this comment.
I moved this part to #13483 because it needs more work
ogrisel
left a comment
There was a problem hiding this comment.
Thank you very much, this looks good. Merging.
Remove some deprecated params from 0.19
Change the
max_iterandtoldefaults of sgd (and everything depending on it).