Skip to content

Interactive Imputer cannot accept PLSRegression() as an estimator due to "shape mismatch" #19352

@firshu

Description

@firshu

Describe the bug

When setting the estimator as PLSRegression(), a ValueError is triggered by module '_iteractive.py' in line 348, caused by "shape mismatch"

Steps/Code to Reproduce

Example:

import numpy as np

from sklearn.datasets import fetch_california_housing
from sklearn.cross_decomposition import PLSRegression
from sklearn.experimental import enable_iterative_imputer  # noqa
from sklearn.impute import IterativeImputer

rng = np.random.RandomState(42)

X_california, y_california = fetch_california_housing(return_X_y=True)
X_california = X_california[:400]
y_california = y_california[:400]

def add_missing_values(X_full, y_full):
    n_samples, n_features = X_full.shape

    # Add missing values in 75% of the lines
    missing_rate = 0.75
    n_missing_samples = int(n_samples * missing_rate)

    missing_samples = np.zeros(n_samples, dtype=bool)
    missing_samples[: n_missing_samples] = True

    rng.shuffle(missing_samples)
    missing_features = rng.randint(0, n_features, n_missing_samples)
    X_missing = X_full.copy()
    X_missing[missing_samples, missing_features] = np.nan
    y_missing = y_full.copy()

    return X_missing, y_missing

X_miss_california, y_miss_california = add_missing_values(
    X_california, y_california)

imputer = IterativeImputer(estimator=PLSRegression(n_components=2))

X_imputed = imputer.fit_transform(X_miss_california)
print(X_imputed)

Expected Results: after applying the workaround below:

[[   8.3252       41.            6.98412698 ...    2.55555556
    37.88       -122.25930206]
 [   8.3014       21.            6.23813708 ...    2.10984183
    37.86       -122.22      ]
 [   7.2574       52.            8.28813559 ...    2.80225989
    37.85       -122.24      ]
 ...
 [   3.60438721   50.            5.33480176 ...    2.30396476
    37.88       -122.29      ]
 [   5.1675       52.            6.39869281 ...    2.44444444
    37.89       -122.29      ]
 [   5.1696       52.            6.11590296 ...    2.70619946
    37.8709526  -122.29      ]]

Actual Results

File "/home/hushsh/py3/lib/python3.6/site-packages/sklearn/impute/_iterative.py", line 348, in _impute_one_feature
    X_filled[missing_row_mask, feat_idx] = imputed_values
ValueError: shape mismatch: value array of shape (27,1) could not be broadcast to indexing result of shape (27,) 

Versions

System:
python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
executable: /home/hushsh/raid_data/py3/bin/python
machine: Linux-5.4.0-60-generic-x86_64-with-LinuxMint-19.3-tricia

Python dependencies:
pip: 21.0.1
setuptools: 47.3.1
sklearn: 0.24.1
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.15
pandas: 1.1.3
matplotlib: 3.1.2
joblib: 0.14.1
threadpoolctl: 2.1.0

Built with OpenMP: True

My Workaround that fixed the bug: Insert the following three lines before line 348

shape_imputed_values = imputed_values.shape
if len(shape_imputed_values)>1:
    # convert 2D array to 1D array fixes the bug:
    imputed_values = imputed_values.reshape(shape_imputed_values[0])

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions