[MRG] Raise exception on providing complex data to estimators#9551
[MRG] Raise exception on providing complex data to estimators#9551ogrisel merged 16 commits intoscikit-learn:masterfrom
Conversation
- Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported
…into complex_data merging changes from the master branch
…into complex_data Merging with upstream
sklearn/utils/validation.py
Outdated
|
|
||
| if dtype_orig is not None and dtype_orig.kind == "c": | ||
| raise ValueError("Complex data is not supported\n{}\n".format(array)) | ||
| elif isinstance(array, list) or isinstance(array, tuple): |
There was a problem hiding this comment.
I'm not sure this is the best way. We might also not catch everything with it (how about dataframes that contain complex data?). I'm not sure this is easy to catch later if dtype is not None, though. With your addition we're converting into an array twice: here and in 411. It would be great if we could use the conversion at 411, but as I said, it's not entirely clear to me how to do that.
|
I just realized that we are not doing anything with complex data, as we pass anything that's not
Looking into the
I guess that's in accordance with the documentation, though. If dtype was numeric, we should check after the array creation, as we do for "O" data in 425. Can you please also add a test to |
|
Thank you @amueller . Let me make changes as you have suggested. |
|
@amueller |
Did you mean |
|
We have common tests for all estimators, implemented in
check_estimators.py. That those tests work correctly is tested in
test_estimator_checks.py
…On 16 August 2017 at 07:35, pravarmahajan ***@***.***> wrote:
Can you please also add a test to check_estimators.py?
Did you mean test_estimator_checks.py in sklearn/utils/tests?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9551 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_3kdz0XqyPjQE8mgFTSj7OZDcTHks5sYg81gaJpZM4O2t8W>
.
|
|
Why do we even allow dtype to be list when we are discarding everything
other than the first element?
You missed the part where we *accepted* the data as is if its dtype is in
that list. Otherwise we *try* to convert the data with the first dtype.
|
|
@jnothman |
|
Answering 2): |
…into complex_data
…into complex_data
|
Thanks @amueller |
|
|
||
|
|
||
| def test_check_array_complex_data_error(): | ||
| X = np.array([[1 + 2j, 3 + 4j, 5 + 7j], [2 + 3j, 4 + 5j, 6 + 7j]]) |
There was a problem hiding this comment.
maybe instead of repeating the content every time create a list of lists in the beginning and then create everything else from that, so it's more obvious what is tested? Don't have a strong opinion though.
There was a problem hiding this comment.
I had thought of that, but it becomes very messy for some of the cases. For example converting list of list to tuple of tuple, or to tuple of np-arrays. Maybe I can add one line comment for what case is being tested.
| warnings.simplefilter('error', ComplexWarning) | ||
| array = np.array(array, dtype=dtype, order=order, copy=copy) | ||
| except ComplexWarning: | ||
| raise ValueError("Complex data not supported\n" |
There was a problem hiding this comment.
I see that this branch is covered, but I don't see where. We're never passing dtype right?
There was a problem hiding this comment.
Never passing dtype to check_array? Even if we don't pass it to check_array, the value of dtype changes within this function. For example lines 381-386. The ComplexWarning comes only if we are setting dtype to some one of the real types. However, if dtype is "None", no conversion takes place and no warning is produced, so we need to check that case again in subsequentl lines
There was a problem hiding this comment.
One example where dtype is explicitly passed to the function is that of svm decision function
|
LGTM, I think unfortunately this is the easiest we can get away with. I'm a bit confused about when we hit the ComplexWarning branch, though. |
…into complex_data
| array = np.array(array, dtype=dtype, order=order, copy=copy) | ||
| except ComplexWarning: | ||
| raise ValueError("Complex data not supported\n" | ||
| "{}\n".format(array)) |
There was a problem hiding this comment.
Unfortunately, at this point the array has been converted to float or int and the error message will be confusing. We need to keep a reference to the original input with complex values to report it in this error message (and del original_array otherwise to let the gabage collector free the memory as soon as possible otherwise).
There was a problem hiding this comment.
Or maybe:
with warnings.catch_warnings():
try:
warnings.simplefilter('error', ComplexWarning)
new_array = np.array(array, dtype=dtype, order=order, copy=copy)
except ComplexWarning:
raise ValueError("Complex data not supported\n"
"{}\n".format(array))
array = new_array
del new_arrayThere was a problem hiding this comment.
Hum sorry my previous comments are wrong. Your code is fine.
ogrisel
left a comment
There was a problem hiding this comment.
+1 once my comment is addressed
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
…-learn#9551) * Modifies model_selection.cross_validate docstring (scikit-learn#9534) - Fixes rendering of docstring examples - Instead of importing cross_val_score in example, cross_validate is imported * raise error on complex data input to estimators * Raise exception on providing complex data to estimators * adding checks to check_estimator for complex data * removing some unnecessary parts * autopep8 changes * removing ipdb, restoring some autopep8 fixes * removing ipdb, restoring some autopep8 fixes * adding documentation for complex data handling * adding one line explanation for each test case
|
Stupid question, why did we not use |
|
@lesteve The |
|
OK fair enough ... I feel like this could be simplified but it will require a bit more thoughts. |
Reference Issue
Fixes #9528
What does this implement/fix? Explain your changes.
An exception should be raised on giving complex data as input to the estimators. The changed code
does the same, via the function check_array. A new test function has been added to test_validation.py
corresponding to the code changes.
Any other comments?