Add SVD support for short-and-fat arrays by eric-czech · Pull Request #6591 · dask/dask

eric-czech · 2020-09-02T21:39:06Z

Tests added / passed
Passes black dask / flake8 dask

This is for #6590 and adds support for SVD on short-and-fat arrays by transposing inputs/outputs to/from tsqr.

To do a comparison to np.lingalg.svd, I needed to ensure that the eigenvector signs are always the same. I lifted svd_flip from scikit-learn for that, but I'm not sure where to put it. Alternatively, the comparison could look for equality on either sign in each vector as in dask-ml#731. Any thoughts on that @mrocklin?

EDIT

FYI there were a few changes in here from the current upstream master branch that weren't passing the black formatter check. This commit bases on 0ca1607. Hm how did that get into master without passing CI?

TomAugspurger

If we're just using svd_flip in testing then I'm fine with where you've put it. Do you think it's OK to just adjust the output in the test, or should we be doing that in svd for the user?

eric-czech · 2020-09-03T13:23:13Z

Hm well I can imagine that many end users don't currently care about sign indeterminacy with SVD, but I can't see any reason why they shouldn't other than for performance.

Some options that I had in mind were:

Leave it as a numpy-only testing utility (inline with the current test, as you suggested)
Move it to dask.array.utils since it's pretty essential for downstream library maintainers, but not necessarily something most users need to be aware of
Move it to dask.array.linalg (or maybe still utils?) and make it optional in svd

Option 2 seems like a logical choice now to keep what could be a somewhat unstable piece of code off of such a critical path but still surface the functionality. If I'm not mistaken, https://github.com/dask/dask-ml/blob/master/dask_ml/utils.py#L26 is pulling the entire array into memory for the sign flip so it would probably take some work to rewrite that as a chunked operation without the nd fancy indexing currently used in sklearn.svd_flip. I'm also not sure what u_based_decision is getting at, but I'd be happy to figure it out if you guys are on board with making svd_flip a part of the API (even if undocumented).

TomAugspurger · 2020-09-03T14:56:51Z

I'll trust your judgement on the importance of determinacy here. And I'd hate for performance to be the only reason not to do it.

Let's spend a bit of time seeing if we can implement a fast version and just always use it. I don't think the version in dask-ml is a good starting point. I'll look through your suggestion in dask/dask-ml#731 (comment).

eric-czech · 2020-09-03T17:51:09Z

Let's spend a bit of time seeing if we can implement a fast version and just always use it

Ok cool, if it's all the same to you then I'll clean up this PR a little but would prefer to use the inline in-memory version of it within the test now and then submit a separate PR for svd_flip #6599 (after tinkering with how to do it a bit).

eric-czech · 2020-09-03T18:03:48Z

I rebased off your fix for those formatting issues (thanks for that @TomAugspurger) and this is ready to go afaict. Let me know if there's anything else you think is worth testing or revisiting.

TomAugspurger · 2020-09-03T18:28:55Z

Sounds good to me, thanks!

eric-czech force-pushed the short_fat_svd branch from ea865a6 to e6e93ff Compare September 2, 2020 22:16

eric-czech mentioned this pull request Sep 3, 2020

PCA sgkit-dev/sgkit#227

Closed

TomAugspurger reviewed Sep 3, 2020

View reviewed changes

eric-czech mentioned this pull request Sep 3, 2020

Expose svd_flip for controlling sign indeterminacy #6599

Closed

1 task

Add SVD support for short-and-fat arrays

07f8172

eric-czech force-pushed the short_fat_svd branch from e6e93ff to 07f8172 Compare September 3, 2020 17:59

TomAugspurger merged commit 00843e3 into dask:master Sep 3, 2020

This was referenced Sep 9, 2020

Improve single chunk array handling in svd #6612

Closed

Add svd_flip (#6599) #6613

Merged

Support short-fat array chunking in svd #6590

Closed

PCA implementation sgkit-dev/sgkit#262

Merged

eric-czech mentioned this pull request Oct 2, 2020

PCA for short-fat arrays dask/dask-ml#731

Closed

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020

Add SVD support for short-and-fat arrays (dask#6591)

d6f1411

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add SVD support for short-and-fat arrays#6591

Add SVD support for short-and-fat arrays#6591
TomAugspurger merged 1 commit intodask:masterfrom
eric-czech:short_fat_svd

eric-czech commented Sep 2, 2020 •

edited

Loading

Uh oh!

TomAugspurger left a comment

Uh oh!

eric-czech commented Sep 3, 2020 •

edited

Loading

Uh oh!

TomAugspurger commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

TomAugspurger commented Sep 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

eric-czech commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

eric-czech commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

eric-czech commented Sep 3, 2020

Uh oh!

TomAugspurger commented Sep 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

eric-czech commented Sep 2, 2020 •

edited

Loading

eric-czech commented Sep 3, 2020 •

edited

Loading