Add SVD support for short-and-fat arrays#6591
Conversation
ea865a6 to
e6e93ff
Compare
TomAugspurger
left a comment
There was a problem hiding this comment.
If we're just using svd_flip in testing then I'm fine with where you've put it. Do you think it's OK to just adjust the output in the test, or should we be doing that in svd for the user?
|
Hm well I can imagine that many end users don't currently care about sign indeterminacy with SVD, but I can't see any reason why they shouldn't other than for performance. Some options that I had in mind were:
Option 2 seems like a logical choice now to keep what could be a somewhat unstable piece of code off of such a critical path but still surface the functionality. If I'm not mistaken, https://github.com/dask/dask-ml/blob/master/dask_ml/utils.py#L26 is pulling the entire array into memory for the sign flip so it would probably take some work to rewrite that as a chunked operation without the nd fancy indexing currently used in sklearn.svd_flip. I'm also not sure what |
|
I'll trust your judgement on the importance of determinacy here. And I'd hate for performance to be the only reason not to do it. Let's spend a bit of time seeing if we can implement a fast version and just always use it. I don't think the version in dask-ml is a good starting point. I'll look through your suggestion in dask/dask-ml#731 (comment). |
Ok cool, if it's all the same to you then I'll clean up this PR a little but would prefer to use the inline in-memory version of it within the test now and then submit a separate PR for svd_flip #6599 (after tinkering with how to do it a bit). |
e6e93ff to
07f8172
Compare
|
I rebased off your fix for those formatting issues (thanks for that @TomAugspurger) and this is ready to go afaict. Let me know if there's anything else you think is worth testing or revisiting. |
|
Sounds good to me, thanks! |
black dask/flake8 daskThis is for #6590 and adds support for SVD on short-and-fat arrays by transposing inputs/outputs to/from
tsqr.To do a comparison to
np.lingalg.svd, I needed to ensure that the eigenvector signs are always the same. I lifted svd_flip from scikit-learn for that, but I'm not sure where to put it. Alternatively, the comparison could look for equality on either sign in each vector as in dask-ml#731. Any thoughts on that @mrocklin?EDIT
FYI there were a few changes in here from the current upstream master branch that weren't passing the black formatter check. This commit bases on 0ca1607. Hm how did that get into master without passing CI?