[MRG+1] Add add_indicator_features option to show missing values in the output#6607
Conversation
|
Option to |
sklearn/preprocessing/imputation.py
Outdated
| else: | ||
| self.imputed_features = features_with_missing_values | ||
| if self.return_imputed: | ||
| X = np.hstack((X,imputed_mask)) |
ebb9384 to
3507580
Compare
|
@jnothman Sorry I didn't make it clear in the tests regarding the case when a feature has all missing values. I hope it is clearer now. In that case with axis=0, the indicator doesn't give the information about the discarded feature hence the masking on |
sklearn/preprocessing/imputation.py
Outdated
| imputed_features_ : array of shape (n_features_with_missing, ) | ||
| The input features which have been imputed during transform | ||
| The size of this attribute will be the number of features with | ||
| at least one missing value. |
There was a problem hiding this comment.
(at least one and fewer than all in the axis=0 case, if we want to be pedantic)
There was a problem hiding this comment.
Yeah sure, that makes it clearer.
|
You're not currently testing the correctness of the concatenated indicator matrix. Please do. I'm not sure what it'd be useful for, but this change also presents the possibility of implementing |
|
And I'd really like to see an example added to this PR, and perhaps an extension to the narrative docs. |
|
Regarding checking the indicator matrix, sorry I couldn't decide whether it is sufficient to only check the indicator matrix or the whole output. Would something like calling fit_transform with And regarding an example, should I try something on lines of this example ? Thanks ! |
|
I have added the check for the values in the indicator matrix as explained above. There is definitely a lot of code redundancy in it( except for checking the |
|
I don't think there's any question that an example which shows this improves the value of imputation would be a real boon to this PR. |
| >>> imp.fit([[1, 2], [np.nan, 3], [7, 6]]) | ||
| Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean', verbose=0) | ||
| Imputer(add_missing_indicator=False, axis=0, copy=True, missing_values='NaN', | ||
| strategy='mean', verbose=0) |
There was a problem hiding this comment.
Please indent appropriately for PEP8.
There was a problem hiding this comment.
I have changed this after the previous commit failed on doctest and updated based on the actual output when I run the command. I understand pep8 warrants more indentation, but this is documentation and we should be able to give the output as it would if we run the code right ? I have never changed anything related to doctests, so let me know if pep8 guidelines need to be followed here also. Thanks.
There was a problem hiding this comment.
Use the NORMALIZE_WHITESPACE flag
There was a problem hiding this comment.
Oh okay didn't get that.. Thanks will do it.
|
The tests look alright. I'd like to see the narrative doc changed to mention this feature and its motivation. |
sklearn/preprocessing/imputation.py
Outdated
| - If `axis=1` and X is encoded as a CSC matrix. | ||
|
|
||
| add_missing_indicator : boolean, optional (default=False) | ||
| If True, the returned X will have an appended indicator matrix |
There was a problem hiding this comment.
"returned" is inappropriate in the class docstring. Try "transformed"
|
LGTM, after that name change. Thanks @maniteja123! |
|
Thanks. Shall I will squash all the commits ? |
|
After you've changed the test name, you can squash the commits, I guess. |
bb94859 to
74108e3
Compare
|
Squashed and rebased with master. Thank you so much @jnothman for patiently answering my queries and guiding all along ! |
|
Just one more minor thing, can you add a test to show that I can merge after that. |
|
Sorry if I am understanding it wrong, but irrespective of |
|
Oh yes, my bad :/ Sorry. |
|
No problem, I actually changed it in the last version. And any suggestion regarding the name ? Also I am on mobile now. Sorry don't have laptop now to do the changes now. Would it be possible for you to add what's new if it is not a problem. Thanks for the help ! |
74108e3 to
18396be
Compare
|
I have added an entry to what's new. As a side note, I also did add entry for the MultiOutputClassifier here. But i suppose it will now have a merge conflict. Please do let me know if you need to push a commit again for that or if possible for you to resolve conflict and add. Thanks. |
|
LGTM! Thanks @maniteja123! |
|
Thanks a lot @maniteja123 🍷 🍷 !! |
|
Thanks @maniteja123 🍻 |
|
With apologies, @maniteja123, I think we should revert this change so as not to include it in the upcoming release. The error was mine: I was not thorough enough in my review, and mistakenly thought someone else had given it a +1 (we usually require two full reviews to merge). I think it cannot be released in its current form and we should revert this change and produce a PR with a fully functional |
|
Sorry for not responding to your pings lately (or did you ping?). My github notifications are overloaded. I'm astonished that I did not object to the attribute
I know there are a few other issues that you had summarized somewhere but I'm unable to find them. However at this point of time, if you feel that it is too much of stress before the release +1 for reverting. I would be happy to start afresh as well. Again apologies, I feel quite bad for doing this. |
Indeed, I feel that good practice is not to release this in a |
|
Hi everyone, I am sorry for proceeding in the wrong direction and this issue becoming a blocker for the release. Please don't apologize for reverting this change or reviewing, all of you have patiently guided me with the questions. IMO I think it is definitely essential to revert the change. Also if possible, I would happily work on this once the consensus of functionality for fit and transform is reached. And I suppose partial fit is not supported for the Imputer as of now. Please let me know if I can revert the changes or if the commits can be reverted. Thanks. |
|
It's not blocking the release. It's just distressing me a little that I merged it and can't fix it quickly enough :) |
)" This reverts commit 18396be as it was merged when incomplete.
|
Should we reopen? |
|
I think rebasing and renaming #6782 is more appropriate. |
)" (scikit-learn#7292) This reverts commit 18396be as it was merged when incomplete.
)" (scikit-learn#7292) This reverts commit 18396be as it was merged when incomplete.
Initial attempt to provide an option to indicate imputed features in the transformed output as suggested in #6556 . I have added an option to
fit_transformto provide backward compatibility and not break the code.At present, the indicator feature is only for dense matrices and it is a simple approach. Any feedback is appreciated here. Thanks.