-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
[SPRINT] Add warning notes in preprocessing functions #17402
Copy link
Copy link
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveSprint
Description
The goal here is to add a warning note in the docstring of the pre-processing functions (follow up to #17387) to warn about potential issues when using these functions, and recommend using a pipeline instead:
- maxabs_scale
- minmax_scale
-
normalize - quantile_transform
- robust_scale
- scale
- power_transform
All of these are in sklearn/preprocessing/_data.py. Here is a warning template:
.. warning:: Risk of data leak
Do not use :func:`~sklearn.preprocessing.scale` unless you know what
you are doing. A common mistake is to apply it to the entire data
*before* splitting into training and test sets. This will bias the
model evaluation because information would have leaked from the test
set to the training set.
In general, we recommend using
:class:`~sklearn.preprocessing.StandardScaler` within a
:ref:`Pipeline <pipeline>` in order to prevent most risks of data
leaking: `pipe = make_pipeline(StandardScaler(), LogisticRegression()))`.
You should of course adapt scale and StandardScaler.
Please indicate below which function(s) you want to work on with e.g. "I'm working on scale and robust_scale" so that others don't pick the same ones
@scikit-learn/core-devs feel free to directly edit the warning message
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveSprint