Skip to content

TST Extend tests for scipy.sparse.*array in sklearn/utils/tests/test_estimator_checks.py#27203

Merged
jjerphan merged 5 commits intoscikit-learn:mainfrom
Tialo:tests/test_estimator_checks
Nov 1, 2023
Merged

TST Extend tests for scipy.sparse.*array in sklearn/utils/tests/test_estimator_checks.py#27203
jjerphan merged 5 commits intoscikit-learn:mainfrom
Tialo:tests/test_estimator_checks

Conversation

@Tialo
Copy link
Copy Markdown
Contributor

@Tialo Tialo commented Aug 29, 2023

Reference Issues/PRs

Towards #27090.

What does this implement/fix? Explain your changes.

Any other comments?

Added __init__ to SparseTransformer so it can transform both into a matrix and into an array.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 29, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 6b5dfbf. Link to the linter CI: here

@Tialo
Copy link
Copy Markdown
Contributor Author

Tialo commented Sep 29, 2023

Also, I wonder why pytest can't be used here? @glemaitre

@glemaitre
Copy link
Copy Markdown
Member

From the top of the head, it comes to not request pytest as a required dependency for third-party library that want to use the checks available in estimator_checks.py. Because of this, we want to run some test in test_estimator_checks.py that does not expect pytest to be installed and it seems that in the past, we needed to not have any pytest import in this file.

This is the meaning of the comment on the top of the file:

# We can not use pytest here, because we run
# build_tools/azure/test_pytest_soft_dependency.sh on these
# tests to make sure estimator_checks works without pytest.


if sparse.issparse(result_full):
result_full = result_full.A
result_full = result_full.toarray()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the last error due to the deprecation.

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks @Tialo

@jjerphan
Copy link
Copy Markdown
Member

jjerphan commented Oct 1, 2023

@StefanieSenger: I do not know if you have been reviewing Pull Requests, but I think you should be able to review PR for #27090 starting with this Pull Request for instance. :)

This is just a suggestion and you might already focus on something: feel free to review those PRs or not.

@StefanieSenger
Copy link
Copy Markdown
Member

StefanieSenger commented Oct 2, 2023

Thank you @jjerphan for you trust. Though, I‘m not sure if I can be of use here.

This is the first review that I‘m trying, the changes in the code make sense to me, but I don‘t have anything to add except for a question. This is not really a review, I‘m afraid.

Edit: I've just seen that I'm not done yet. I need some time to understand check_estimator_sparse_data. I cannot work on it for the next 1,5 days, but I will get back to it as soon as I can manage.

@StefanieSenger
Copy link
Copy Markdown
Member

StefanieSenger commented Oct 3, 2023

So, the idea behind check_estimator_sparse_data, if I understand correctly, is to check whether a bad configured estimator would trigger one of the error messages „doesn't seem to fail gracefully“. When LargeSparseNotSupportedClassifier is tested in test_estimator_checks.py this is supposed to happen (because it claims to accept large sparse data (accept_large_sparse=True), but doesn‘t).

But the check_estimator_sparse_data is only done on sparse matrices, not sparse arrays, and thus the test is. I have no idea if this means something for this PR. (Why is #27090 only concerned with the tests and not with the rest of the codebase?) Should check_estimator_sparse_data be made to iterate through both types?

@glemaitre
Copy link
Copy Markdown
Member

@StefanieSenger Thanks for noting this issue. I think we should address it in another PR because as you mentioned, we try first to fix the test files.

However, we want the scikit-learn estimators to be accepting sparse arrays and this should be also true for estimator that are compatible with scikit-learn. So we will need to create/duplicate the check_sparse_* from estimator_checks to test for sparse arrays. I assume that we want a mechanism for people to deactivate the test so we would need a new tag. Up to know, I think that "sparse" was never used in the common test. So we could have a "sparse matrix" and "sparse array" for the X_types. I will open a new issue to track this problem.

@glemaitre glemaitre added the Waiting for Second Reviewer First reviewer is done, need a second one! label Oct 31, 2023
Copy link
Copy Markdown
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Tialo.

@jjerphan jjerphan merged commit 9621539 into scikit-learn:main Nov 1, 2023
@Tialo Tialo deleted the tests/test_estimator_checks branch November 1, 2023 22:24
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…st_estimator_checks.py` (scikit-learn#27203)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants