Skip to content

Add SVD support for short-and-fat arrays#6591

Merged
TomAugspurger merged 1 commit intodask:masterfrom
eric-czech:short_fat_svd
Sep 3, 2020
Merged

Add SVD support for short-and-fat arrays#6591
TomAugspurger merged 1 commit intodask:masterfrom
eric-czech:short_fat_svd

Conversation

@eric-czech
Copy link
Contributor

@eric-czech eric-czech commented Sep 2, 2020

  • Tests added / passed
  • Passes black dask / flake8 dask

This is for #6590 and adds support for SVD on short-and-fat arrays by transposing inputs/outputs to/from tsqr.

To do a comparison to np.lingalg.svd, I needed to ensure that the eigenvector signs are always the same. I lifted svd_flip from scikit-learn for that, but I'm not sure where to put it. Alternatively, the comparison could look for equality on either sign in each vector as in dask-ml#731. Any thoughts on that @mrocklin?

EDIT

FYI there were a few changes in here from the current upstream master branch that weren't passing the black formatter check. This commit bases on 0ca1607. Hm how did that get into master without passing CI?

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're just using svd_flip in testing then I'm fine with where you've put it. Do you think it's OK to just adjust the output in the test, or should we be doing that in svd for the user?

@eric-czech
Copy link
Contributor Author

eric-czech commented Sep 3, 2020

Hm well I can imagine that many end users don't currently care about sign indeterminacy with SVD, but I can't see any reason why they shouldn't other than for performance.

Some options that I had in mind were:

  1. Leave it as a numpy-only testing utility (inline with the current test, as you suggested)
  2. Move it to dask.array.utils since it's pretty essential for downstream library maintainers, but not necessarily something most users need to be aware of
  3. Move it to dask.array.linalg (or maybe still utils?) and make it optional in svd

Option 2 seems like a logical choice now to keep what could be a somewhat unstable piece of code off of such a critical path but still surface the functionality. If I'm not mistaken, https://github.com/dask/dask-ml/blob/master/dask_ml/utils.py#L26 is pulling the entire array into memory for the sign flip so it would probably take some work to rewrite that as a chunked operation without the nd fancy indexing currently used in sklearn.svd_flip. I'm also not sure what u_based_decision is getting at, but I'd be happy to figure it out if you guys are on board with making svd_flip a part of the API (even if undocumented).

@TomAugspurger
Copy link
Member

I'll trust your judgement on the importance of determinacy here. And I'd hate for performance to be the only reason not to do it.

Let's spend a bit of time seeing if we can implement a fast version and just always use it. I don't think the version in dask-ml is a good starting point. I'll look through your suggestion in dask/dask-ml#731 (comment).

@eric-czech
Copy link
Contributor Author

Let's spend a bit of time seeing if we can implement a fast version and just always use it

Ok cool, if it's all the same to you then I'll clean up this PR a little but would prefer to use the inline in-memory version of it within the test now and then submit a separate PR for svd_flip #6599 (after tinkering with how to do it a bit).

@eric-czech
Copy link
Contributor Author

I rebased off your fix for those formatting issues (thanks for that @TomAugspurger) and this is ready to go afaict. Let me know if there's anything else you think is worth testing or revisiting.

@TomAugspurger
Copy link
Member

Sounds good to me, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants