-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
IterativeImputer does not behave as expected when using fit() and transform() on a train/test split versus using fit_transform() on an entire dataset #14383
Copy link
Copy link
Closed
Description
Description
IterativeImputer seems to give pretty rubbish values when using the two-step fit() and transform() on a train/test data split versus simply using fit_transform() on the entire dataset. Issue #14338 may be related.
Steps/Code to Reproduce
Here's a fairly minimal example:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.linear_model import LinearRegression
import numpy as np
# x_0 = x_1 + x_2 + x_3
train = np.array([
[ 5, 2, 2, 1],
[10, 1, 2, 7],
[ 3, 1, 1, 1],
[ 8, 4, 2, 2]
])
test = np.array([
[np.nan, 2, 4, 5],
[np.nan, 4, 1, 2],
[np.nan, 1, 10, 1]
])
y_actual = np.array([11, 7, 12])
## Prediction using IterativeImputer and the 1-step fit_transform()
imputer = IterativeImputer(estimator = LinearRegression())
fullData = np.concatenate([train,test])
imputedData = imputer.fit_transform(fullData)
y_1imputed = imputedData[-3:,0]
## Prediction using IterativeImputer and the 2-step fit() / transform()
imputer2 = IterativeImputer(estimator = LinearRegression())
imputer2 = imputer2.fit(train)
imputedTest = imputer2.transform(test)
y_2imputed = imputedTest[:,0]
print('Actual y: {}'.format(y_actual))
print('1-step Imputed y: {}'.format(y_1imputed))
print('2-step Imputed y: {}'.format(y_2imputed))Expected Results
Actual y: [11 7 12]
1-step Imputed y: [11. 7. 12.]
2-step Imputed y: [11. 7. 12.]
Actual Results
Actual y: [11 7 12]
1-step Imputed y: [11. 7. 12.]
2-step Imputed y: [6.5 6.5 6.5]
Versions
System:
python: 3.7.2 (default, Dec 29 2018, 06:19:36) [GCC 7.3.0]
executable: /home/jack/miniconda3/bin/python
machine: Linux-4.15.0-54-generic-x86_64-with-debian-buster-sid
BLAS:
macros: NO_ATLAS_INFO=1, HAVE_CBLAS=None
lib_dirs: /usr/lib/x86_64-linux-gnu
cblas_libs: cblas
Python deps:
pip: 19.1.1
setuptools: 40.2.0
sklearn: 0.22.dev0
numpy: 1.15.3
scipy: 1.1.0
Cython: 0.29.12
pandas: 0.24.2
matplotlib: 3.0.2
joblib: 0.13.2
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels