TST Extend tests for `scipy.sparse.*array` in `test_glm.py` by ivirshup · Pull Request #27107 · scikit-learn/scikit-learn

ivirshup · 2023-08-18T19:45:38Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds tests for csr_matrix and csr_array to test_glm.py

Any other comments?

Warning
I think this probably shouldn't be merged.

This more than triples the run time of test_glm.py and is probably more extensive than is actually needed for this case.

But I would appreciate guidance on the correct way to implement this. AFAICT there are no existing tests for sparse input to GLMs, even though they seem to work.

github-actions · 2023-08-18T19:47:48Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: f681bc4. Link to the linter CI: here}

ivirshup · 2023-08-21T09:36:29Z

@jjerphan what's a reasonable time increase for these tests? Also, am I missing any existing tests for sparse matrices here?

jjerphan · 2023-08-21T10:16:51Z

what's a reasonable time increase for these tests?

Thanks for pointing this out. I would suggest not worrying much about this: we can add a setup of tests to only have *_CONTAINERS contain sparse matrices' class for most CI runs.

As for your questions:

AFAICT there are no existing tests for sparse input to GLMs, even though they seem to work.

Also, am I missing any existing tests for sparse matrices here?

Code coverage does the absence of tests cases for sparse data with GLMs. Yet, adding more tests for GLM on sparse data would be appreciated.

I do not have the bandwidth to look at that for now, but I might later.

ogrisel

Maybe we could introduce safe_sparse_hstack / safe_sparse_vstack next to our safe_sparse_dot.

I am not particularly found of the safe_ prefix but we could reuse it for the sake of consistency.

ogrisel · 2023-08-21T16:31:04Z

sklearn/linear_model/_glm/tests/test_glm.py

    X = X[:, :-1]  # remove intercept
-    X = 0.5 * np.concatenate((X, X), axis=1)
+    if sparse.issparse(X):
+        X = np.multiply(sparse.hstack((X, X)), 0.5)


The following looks simpler:

Suggested change

X = np.multiply(sparse.hstack((X, X)), 0.5)

X = 0.5 * sparse.hstack((X, X))

ogrisel · 2023-08-21T16:39:43Z

This more than triples the run time of test_glm.py and is probably more extensive than is actually needed for this case.

I am not sure if this is worth it or not. For most solvers, the numerical code is probably independent to the original data sparsity pattern. But there might be solvers (e.g. SAG / SAGA) who are actually specialized for sparse input datastructures (different solver branches).

I have no strong opinion one way or another myself.

Any opinion @lorentzenchr?

lorentzenchr · 2023-08-21T18:53:41Z

This more than triples the run time of test_glm.py and is probably more extensive than is actually needed for this case.

I would definitely NOT extend glm_dataset by sparse arrays in the proposed way. It would be enough to have one single test (maybe with one glm_dataset) and parametrized by a few things like sample weights that tests for same coefficients with sparse and dense fitting.

ogrisel · 2023-08-24T08:16:14Z

Alright. Let's keep test_glm.py unchanged then and instead make sure that scipy sparse array support works by updating test_logistic.py, test_ridge.py and co.

Add sparse tests for GLM

f681bc4

github-actions bot added the module:linear_model label Aug 18, 2023

jjerphan mentioned this pull request Aug 19, 2023

TST Extend tests for scipy.sparse.*array #27090

Closed

jjerphan added the No Changelog Needed label Aug 21, 2023

ogrisel reviewed Aug 21, 2023

View reviewed changes

ogrisel closed this Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TST Extend tests for `scipy.sparse.*array` in `test_glm.py`#27107

TST Extend tests for `scipy.sparse.*array` in `test_glm.py`#27107
ivirshup wants to merge 1 commit intoscikit-learn:mainfrom
ivirshup:glm-sparse-array-naive-tests

ivirshup commented Aug 18, 2023

Uh oh!

github-actions bot commented Aug 18, 2023

Uh oh!

ivirshup commented Aug 21, 2023

Uh oh!

jjerphan commented Aug 21, 2023

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Aug 21, 2023

Uh oh!

ogrisel commented Aug 21, 2023 •

edited

Loading

Uh oh!

lorentzenchr commented Aug 21, 2023

Uh oh!

ogrisel commented Aug 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	X = np.multiply(sparse.hstack((X, X)), 0.5)
	X = 0.5 * sparse.hstack((X, X))

Uh oh!

Conversation

ivirshup commented Aug 18, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Aug 18, 2023

✔️ Linting Passed

Uh oh!

ivirshup commented Aug 21, 2023

Uh oh!

jjerphan commented Aug 21, 2023

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Aug 21, 2023

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Aug 21, 2023

Uh oh!

ogrisel commented Aug 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ogrisel commented Aug 21, 2023 •

edited

Loading