[MRG] Ignore and pass-through NaN values in MaxAbsScaler and maxabs_scale#11011
Conversation
jnothman
left a comment
There was a problem hiding this comment.
Travis is resorting test failures. You might need to use .A or equivalently .toarray() to assert things about sparse matrices in your tests
|
|
||
| X = sparse_format_func(X) | ||
|
|
||
| X_test[:, 0] = np.nan # make sure this boundary case is tested |
There was a problem hiding this comment.
You should do this before converting to sparse
| assert np.any(np.isnan(X_train), axis=0).all() | ||
| assert np.any(np.isnan(X_test), axis=0).all() | ||
|
|
||
| X = sparse_format_func(X) |
There was a problem hiding this comment.
You're only converting X to sparse, not X_train or X_test
|
Thanks for this PR @LucijaGregov ! @ reviewers: there is an updated version of the |
|
@LucijaGregov yes, I think, that would be simpler. Also please add a link to the general issue about this under "Reference Issues/PRs" section of the issue description. |
|
@rth I am not sure if I understood you correctly. This is not mentioned on the 'Issues' list. Do you mean that I should add this as an issue? |
|
I meant to link to #10404 in the description to provide some context for reviewers as was done in your previous PR. No worries, I added it here. |
|
@rth Right, I missed that bit. Thank you. |
|
#11012 was merged; could you please click on the "Resolve conflicts" button to merge with master and also remove any redundant changes from this PR for |
|
@rth Done. I will continue working on this. |
|
I was checking how to we implement the min/max detection for sparse and it does not seem that easy. we rely on the scipy implementation which use the The problem of those |
|
@jnothman did I missed a functionality in scikit-learn which already provide such functions? |
|
@glemaitre @jnothman Is it safe to continue working on this as it is, or you have something else in mind to do first? |
|
Why would it be unsafe? |
|
@jnothman I meant whether if I need to wait further for things to be merged because of the comments above but I guess it is good to go. |
Basically, the #11011 (comment) is something that you might need to go through to make thing work :) Then, whenever you think to have a potential solution you can implement or we can discuss it here. |
|
pep8 failing |
|
@amueller I know, it is work in progress. |
…ndling-maxabs-scaler
|
Note that you will need to change |
|
Why do we have |
|
I thunk it makes sense to deprecate it. Sent from my phone - sorry to be brief and potential misspell.
|
I opened the issue #11300 to have a consensus on what to do. I just want to be sure that |
|
@LucijaGregov I made a quick push of what missing to make the PR works. |
| (MinMaxScaler(), minmax_scale, False), | ||
| (QuantileTransformer(n_quantiles=10), quantile_transform, True)] | ||
| ) | ||
| def test_missing_value_handling(est, func, support_sparse): |
There was a problem hiding this comment.
I've realised we should have assert_no_warnings in here when fitting and transforming
There was a problem hiding this comment.
It is a good point. I just catched that the QuantileTransformer was still raising a warning at inverse_transform. Since it is a single line I would included here.
|
@ogrisel @jorisvandenbossche @jeremiedbb @lesteve @rth @amueller @qinhanmin2014 |
| X_test[:, 0] = np.nan # make sure this boundary case is tested | ||
|
|
||
| Xt = est.fit(X_train).transform(X_test) | ||
| with pytest.warns(None) as records: |
There was a problem hiding this comment.
Why not using sklearn.utils.testing.assert_no_warnings?
There was a problem hiding this comment.
I wanted to stick to pytest awaiting for this feature: pytest-dev/pytest#1830
The second point is that I find more readable
with pytest.warns(None):
X_t = est.whatever(X)than
X_t = assert_no_warnings(est, whaterver, X)@ogrisel are you also in favor of assert_no_warnings. If yes, 2 vs 1 and I will make the change :)
|
@glemaitre This PR need a merge from the current master. |
|
nice syntactic magic would be `if not pytest.warns(...)`
|
|
@glemaitre can you please merge master to check that the tests still pass? |
Actually wrong mention. I meant @LucijaGregov instead of @glemaitre, sorry. |
I think Guillaume will not be online most of the day, so if you want to merge this, might be easier to quickly to the merge of master yourself |
|
Actually, let me push the merge commit my-self. |
|
I let you do the merging. I will not be on the keyboard this morning Sent from my phone - sorry to be brief and potential misspell.
|
| Xt_col = est.transform(X_test[:, [i]]) | ||
| with pytest.warns(None) as records: | ||
| Xt_col = est.transform(X_test[:, [i]]) | ||
| assert len(records) == 0 |
There was a problem hiding this comment.
This new test_common.py assertion breaks with the PowerTransform that complains with all-nan columns.
I am not sure if we should raise this warning or not (maybe not at transform time) but this should be consistent across all the transformers.
There was a problem hiding this comment.
I made a push. The reason was on the np.nanmin(X) to check that the matrix was strictly positive. This case will return a full NaN matrix as well so everything will be fine (or at least the problem is forwarded to the next step in the pipeline).
|
Up to now all transformers do not raise warning. We should use an np.errstate. Is it a warning or an error. Sent from my phone - sorry to be brief and potential misspell.
|
Reference Issues/PRs
Related to #10404
What does this implement/fix? Explain your changes.
Any other comments?