Skip to content

TST Extend tests for scipy.sparse.*array in model_selection/tests/test_validation.py#27366

Merged
OmarManzoor merged 12 commits intoscikit-learn:mainfrom
brendanlu:sparse2
Sep 18, 2023
Merged

TST Extend tests for scipy.sparse.*array in model_selection/tests/test_validation.py#27366
OmarManzoor merged 12 commits intoscikit-learn:mainfrom
brendanlu:sparse2

Conversation

@brendanlu
Copy link
Copy Markdown
Contributor

@brendanlu brendanlu commented Sep 14, 2023

Reference Issues/PRs

Towards #27090.

Any other comments?

Happy to make any changes.

EDIT: Construct P_sparse depending on the type of input it is being checked against in the dummy class...

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 14, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 980560d. Link to the linter CI: here

sparse_sample_weight.shape[0], X.shape[0]
)
if sparse_param is not None:
P_sparse = type(sparse_param)(P)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really need P_sparse here. We only want to check the shape.
So you can remove this line and change P_sparse by P below.

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

@brendanlu
Copy link
Copy Markdown
Contributor Author

Should the cross_val_predict function in sklearn/model_selection/_validation.py preserve the type of sparse container? If one adds assert isinstance(preds_sparse, type(y_sparse)) after line 1765 in this testing file, pytest fails seeming to indicate that this is not the case.

It appears when parameterizing with scipy.sparse._arrays.csr_array the function returns the matrix format even when taking sparse array format as input.

@glemaitre
Copy link
Copy Markdown
Member

Should the cross_val_predict function in sklearn/model_selection/_validation.py preserve the type of sparse container?

cross_validate are not in charge of the output. It pack the output provided by the classifier. So in this case, the classifier does not return a sparse array. However, we did not modify our API for the moment. In the future, we could stipulate that any scikit-learn estimator should return a sparse array if a sparse array was given during training.

# for StratifiedKFold(n_splits=3)
y2 = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3])
P_sparse = coo_matrix(np.eye(5))
P_sparse = COO_CONTAINERS[-1](np.eye(5))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not what I meant. What I meant was to change the usage of P_sparse and use P in the Mock classifier since we are only interested about the shape of the data. The sparsity in the Mock does not matter so much.

@brendanlu
Copy link
Copy Markdown
Contributor Author

cross_validate are not in charge of the output. It pack the output provided by the classifier.

I completely overlooked this.

Have just changed Mock Classifier to use just P for shape now.

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @brendanlu

Copy link
Copy Markdown
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @brendanlu

@OmarManzoor OmarManzoor merged commit 2800ed5 into scikit-learn:main Sep 18, 2023
@brendanlu brendanlu deleted the sparse2 branch September 18, 2023 10:52
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants