The goal here is to add a warning note in the docstring of the pre-processing functions (follow up to #17387) to warn about potential issues when using these functions, and recommend using a pipeline instead:
All of these are in sklearn/preprocessing/_data.py. Here is a warning template:
.. warning:: Risk of data leak
Do not use :func:`~sklearn.preprocessing.scale` unless you know what
you are doing. A common mistake is to apply it to the entire data
*before* splitting into training and test sets. This will bias the
model evaluation because information would have leaked from the test
set to the training set.
In general, we recommend using
:class:`~sklearn.preprocessing.StandardScaler` within a
:ref:`Pipeline <pipeline>` in order to prevent most risks of data
leaking: `pipe = make_pipeline(StandardScaler(), LogisticRegression()))`.
You should of course adapt scale and StandardScaler.
Please indicate below which function(s) you want to work on with e.g. "I'm working on scale and robust_scale" so that others don't pick the same ones
@scikit-learn/core-devs feel free to directly edit the warning message
The goal here is to add a warning note in the docstring of the pre-processing functions (follow up to #17387) to warn about potential issues when using these functions, and recommend using a pipeline instead:
normalizeAll of these are in
sklearn/preprocessing/_data.py. Here is a warning template:You should of course adapt
scaleandStandardScaler.Please indicate below which function(s) you want to work on with e.g. "I'm working on
scaleandrobust_scale" so that others don't pick the same ones@scikit-learn/core-devs feel free to directly edit the warning message