[MRG] BUG ensure object array are properly casted when dtype=object#16076
[MRG] BUG ensure object array are properly casted when dtype=object#16076TomDLT merged 32 commits intoscikit-learn:masterfrom
Conversation
…X, but not in 0.21.3 (scikit-learn#16036)
thomasjpfan
left a comment
There was a problem hiding this comment.
Thank you for the PR @alexshacked !
glemaitre
left a comment
There was a problem hiding this comment.
Looks good. A couple of changes.
Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:
|
I see a similar idiom three times in the radius_neighbors method in this
file
|
|
Oh I see, I was seeking for |
|
So we need to call NB: I searched for the patter |
I see similar here: but otherwise agree it's all in radius_neighbors |
|
@glemaitre change log is in v0.20.rst? I thought v0.23.rst |
|
v0.23.rst |
|
Ups my automatic answering is broken :) |
…in regression testing also distances
|
ok. v0.23 then. Thanks @glemaitre |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
There was a problem hiding this comment.
We will need to apply the _to_object_array function on the following line:
sklearn/preprocessing/tests/test_label.py=435=def test_multilabel_binarizer_non_integer_labels():
sklearn/preprocessing/tests/test_label.py:436: tuple_classes = np.empty(3, dtype=object)
sklearn/preprocessing/tests/test_label.py-437- tuple_classes[:] = [(1,), (2,), (3,)]
I propose to add a docstring (as you did earlier) and move the _to_object_array function in sklearn/utils/__init__.py. Then, we can import it in neighbors and preprocessing.
We just need to add a small test in sklearn/utils/tests/test_utils.py to check the expected behavior:
@pytest.mark.parametrize(
"sequence",
[[np.array(1), np.array(2)], [[1, 2], [3, 4]]]
)
test_to_object_array(sequence):
out = _to_object_array(sequence)
assert isinstance(out, ndarray)
assert out.dtype.kind == 'O'
assert out.ndim == 1Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…moved to_object_array() to sklearn.utils
… fix PEP8 errors
|
Hi @glemaitre. Moved function to_object_array() to sklearn.utils and changed the message in the change log of v0.23.rst |
glemaitre
left a comment
There was a problem hiding this comment.
Apart of making the function private LGTM. @alexshacked you can accept my suggestion and this would be enough.
|
Sorry about this @glemaitre . I thought one underscore means private inside the class, not private inside the package. Will restore the underscore |
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…removed unnecessary comment
…restored underscore
|
LGTM. @jnothman @thomasjpfan Could you have a look. I added the regression tag and tag it as a candidate for 0.22.2 |
|
Thanks for your comments @TomDLT. Will apply them all. |
…improved _to_object_array() documentation
|
Thanks @alexshacked ! |
* FIX ensure object array are properly casted when dtype=object (#16076) * DOC Docstring example of classifier should import classifier (#16430) * MNT Update nightly build URL and release staging config (#16435) * BUG ensure that estimator_name is properly stored in the ROC display (#16500) * BUG ensure that name is properly stored in the precision/recall display (#16505) * ENH Perform KNN imputation without O(n^2) memory cost (#16397) * bump scikit-learn version for binder * bump version to 0.22.2 * MNT Skips failing SpectralCoclustering doctest (#16232) * TST Updates test for deprecation in pandas.SparseArray (#16040) * move 0.22.2 what's new entries (#16586) * add 0.22.2 in the news of the web site frontpage * skip test_ard_accuracy_on_easy_problem Co-authored-by: alexshacked <al.shacked@gmail.com> Co-authored-by: Oleksandr Pavlyk <oleksandr-pavlyk@users.noreply.github.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Joel Nothman <joel.nothman@gmail.com> Co-authored-by: Thomas J Fan <thomasjpfan@gmail.com>
closes #16036
Fix a bug where calling
np.array(..., dtype=object)will create a N-D array while algorithms are expecting a 1-D array with objects inside (similar to a list).