Describe the bug
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array. is thrown when passing a sparse matrix to the fit method
Steps/Code to Reproduce
import scipy.sparse as sparse
import numpy as np
from xgboost import XGBClassifier
from sklearn.multiclass import OutputCodeClassifier
xdemo = sparse.random(100, 200, random_state=10)
ydemo = np.random.choice((0, 1, 3, 4), size=100)
xgb = XGBClassifier(random_state=10)
OutputCodeClassifier(xgb, n_jobs=-1, random_state=10, code_size=2).fit(xdemo, ydemo)
Expected Results
No error thrown, successful fitting
Actual Results
~/.pyenv/versions/3.7.3/envs/metro/lib/python3.7/site-packages/sklearn/multiclass.py in fit(self, X, y)
763 self
764 """
--> 765 X, y = check_X_y(X, y)
766 if self.code_size <= 0:
767 raise ValueError("code_size should be greater than 0, got {0}"
~/.pyenv/versions/3.7.3/envs/metro/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
737 ensure_min_features=ensure_min_features,
738 warn_on_dtype=warn_on_dtype,
--> 739 estimator=estimator)
740 if multi_output:
741 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
~/.pyenv/versions/3.7.3/envs/metro/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
493 dtype=dtype, copy=copy,
494 force_all_finite=force_all_finite,
--> 495 accept_large_sparse=accept_large_sparse)
496 else:
497 # If np.array(..) gives ComplexWarning, then we convert the warning
~/.pyenv/versions/3.7.3/envs/metro/lib/python3.7/site-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
293
294 if accept_sparse is False:
--> 295 raise TypeError('A sparse matrix was passed, but dense '
296 'data is required. Use X.toarray() to '
297 'convert to a dense numpy array.')
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
It appears that the check_X_y function causes the exception and is not set to allow sparse matrices.
This is especially bad when using this classifier in a pipeline where the previous step outputs a sparse matrix. The easy workaround in this case was to create an intermediate transformer to convert the sparse to dense
class TurnToDense(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
return X.A
unfortunately this causes everything to crash because of ram being filled up by using a huge dense matrix. Simply adding the keyword argument allow_sparse=True to the check_X_y function fixes this bug.
Versions
System:
python: 3.7.3 (default, Apr 8 2020, 16:07:18) [GCC 6.5.0 20181026]
executable: /home/.pyenv/versions/3.7.3/envs/metro/bin/python3.7
machine: Linux-4.15.0-76-generic-x86_64-with-debian-buster-sid
Python dependencies:
pip: 19.0.3
setuptools: 46.1.3
sklearn: 0.22
numpy: 1.18.1
scipy: 1.4.1
Cython: None
pandas: 1.0.1
matplotlib: 3.2.1
joblib: 0.14.1
Built with OpenMP: True
Describe the bug
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.is thrown when passing a sparse matrix to thefitmethodSteps/Code to Reproduce
Expected Results
No error thrown, successful fitting
Actual Results
It appears that the
check_X_yfunction causes the exception and is not set to allow sparse matrices.This is especially bad when using this classifier in a pipeline where the previous step outputs a sparse matrix. The easy workaround in this case was to create an intermediate transformer to convert the sparse to dense
unfortunately this causes everything to crash because of ram being filled up by using a huge dense matrix. Simply adding the keyword argument
allow_sparse=Trueto thecheck_X_yfunction fixes this bug.Versions