-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Interactive Imputer cannot accept PLSRegression() as an estimator due to "shape mismatch" #19352
Copy link
Copy link
Open
Labels
Description
Describe the bug
When setting the estimator as PLSRegression(), a ValueError is triggered by module '_iteractive.py' in line 348, caused by "shape mismatch"
Steps/Code to Reproduce
Example:
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.cross_decomposition import PLSRegression
from sklearn.experimental import enable_iterative_imputer # noqa
from sklearn.impute import IterativeImputer
rng = np.random.RandomState(42)
X_california, y_california = fetch_california_housing(return_X_y=True)
X_california = X_california[:400]
y_california = y_california[:400]
def add_missing_values(X_full, y_full):
n_samples, n_features = X_full.shape
# Add missing values in 75% of the lines
missing_rate = 0.75
n_missing_samples = int(n_samples * missing_rate)
missing_samples = np.zeros(n_samples, dtype=bool)
missing_samples[: n_missing_samples] = True
rng.shuffle(missing_samples)
missing_features = rng.randint(0, n_features, n_missing_samples)
X_missing = X_full.copy()
X_missing[missing_samples, missing_features] = np.nan
y_missing = y_full.copy()
return X_missing, y_missing
X_miss_california, y_miss_california = add_missing_values(
X_california, y_california)
imputer = IterativeImputer(estimator=PLSRegression(n_components=2))
X_imputed = imputer.fit_transform(X_miss_california)
print(X_imputed)Expected Results: after applying the workaround below:
[[ 8.3252 41. 6.98412698 ... 2.55555556
37.88 -122.25930206]
[ 8.3014 21. 6.23813708 ... 2.10984183
37.86 -122.22 ]
[ 7.2574 52. 8.28813559 ... 2.80225989
37.85 -122.24 ]
...
[ 3.60438721 50. 5.33480176 ... 2.30396476
37.88 -122.29 ]
[ 5.1675 52. 6.39869281 ... 2.44444444
37.89 -122.29 ]
[ 5.1696 52. 6.11590296 ... 2.70619946
37.8709526 -122.29 ]]Actual Results
File "/home/hushsh/py3/lib/python3.6/site-packages/sklearn/impute/_iterative.py", line 348, in _impute_one_feature
X_filled[missing_row_mask, feat_idx] = imputed_values
ValueError: shape mismatch: value array of shape (27,1) could not be broadcast to indexing result of shape (27,) Versions
System:
python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
executable: /home/hushsh/raid_data/py3/bin/python
machine: Linux-5.4.0-60-generic-x86_64-with-LinuxMint-19.3-tricia
Python dependencies:
pip: 21.0.1
setuptools: 47.3.1
sklearn: 0.24.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.1.3
matplotlib: 3.1.2
joblib: 0.14.1
threadpoolctl: 2.1.0
Built with OpenMP: True
My Workaround that fixed the bug: Insert the following three lines before line 348
shape_imputed_values = imputed_values.shape
if len(shape_imputed_values)>1:
# convert 2D array to 1D array fixes the bug:
imputed_values = imputed_values.reshape(shape_imputed_values[0])Reactions are currently unavailable