[MRG] EHN add parameter axis in safe_indexing to slice rows and columns#14035
[MRG] EHN add parameter axis in safe_indexing to slice rows and columns#14035jnothman merged 18 commits intoscikit-learn:masterfrom
Conversation
|
Thank you for the PR! This would be useful for permutation importance too! #13146 |
|
@NicolasHug This PR would need to be in for the PDP (and the permutation importance then ;) |
NicolasHug
left a comment
There was a problem hiding this comment.
A few comments but mostly LGTM.
_check_key_type isn't tested anywhere?
This is tested through |
Only very indirectly then. I won't push it more because it wasn't even tested before but I don't think that's great. Unit tests can also be used to illustrate the behavior of a function. |
Good point |
|
Looks like the pandas tests are not being reported again to codecov again 🤔 |
|
Is there any sense in just using safe_indexing(X.T, mask)?
|
Do you mean to use the same indexing but with a transposed matrix in all cases? My intent was to merge the code from the |
|
Sorry, I think I confused safe_mask with safe_indexing. This is good.
… |
jnothman
left a comment
There was a problem hiding this comment.
I thought in ColumnTransformer we only needed to handle callables at fit time and not actually when indexing. I am surprised we need callable support here.
NicolasHug
left a comment
There was a problem hiding this comment.
I would propose the following docstring which IMHO is clearer. LGTM anyway.
Parameters
----------
X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series
Data from which to sample rows, items or columns.
indices : array-like
- When ``axis=0``, indices need to be an array of integer.
- When ``axis=1``, indices can be one of:
- integer: output is 1D, unless `X` is sparse.
- container: lists, slices, boolean masks: output is 2D.
Supported data types for containers:
- integer or boolean (positional): supported for
arrays, sparse matrices and dataframes
- string (key-based): only supported for dataframes. No keys
other than strings are allowed.
axis : int, default=0
The axis along which `X` will be subsampled. ``axis=0`` will select
rows while ``axis=1`` will select columns.
jnothman
left a comment
There was a problem hiding this comment.
With regards to your improved docstring, @NicolasHug, what about the scalar string?
|
I understand that scalar is just another name for integer, in this context? |
|
I updated the documentation with @NicolasHug proposal. |
|
Can we confirm the behaviour with a single string? |
|
scalar string will return a pandas series |
|
Considering that the documentation is the only point of contention here, and not the specification, I'll merge. Thanks @glemaitre |
|
I should add that I'm okay with the most recent docstring update. |
Allows to index columns as well as rows using the code from
ColumnTransformer.Added some tests for the supported type.
This PR should simplify the review of #14028
NB: the functions have been moved and renamed but not modified.