MAINT validate parameter in sklearn.preprocessing._encoders#23579
MAINT validate parameter in sklearn.preprocessing._encoders#23579glemaitre merged 29 commits intoscikit-learn:mainfrom
Conversation
… successfully run tests
…tic/scikit-learn into ohe_validate_params
|
I'm currently stuck, because _InstancesOf doesn't support numpy data types, see #23599 |
…ass numpy dtypes as string and internally converts them
|
I think we leave off validation from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder(dtype="i4")
X = [['dog'], ['cat'], ['snake']]
enc.fit_transform(X)
# array([[1],
# [0],
# [2]], dtype=int32)We can let NumPy handle bad dtypes. For example, this will raise already: from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder(dtype="i423e12")
X = [['dog'], ['cat'], ['snake']]
enc.fit_transform(X)
# TypeError: data type 'i423e12' not understoodFor reference, what NumPy considers a valid "dtype" is quite complex according to NumPy's typing system |
|
Since the parameter validation requires at least one constraint for each parameter. What is the best way to approach the |
I think we can use |
Unfortunately this collides with Maybe this check should be disabled for |
|
The current design of |
@thomasjpfan I guess we can add a special case |
Okay, then I'll resume working on this, when the PR is done :) |
ea27732 to
e3f230b
Compare
|
@Diadochokinetic I directly pushed some changes to take into account the recent improvements in the validation mechanism. |
|
Thanks @Diadochokinetic |
…learn#23579) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Reference Issues/PRs
Make all estimators use _validate_params #23462
What does this implement/fix? Explain your changes.
Implements _validate_params for sklear.preprocessing._encoders. Shared parameters will be implemented in the _BaseEncoder.
Any other comments?
This is my first contribution. Tips and Feedback are highly appreciated.