Add random_state parameter and refactor by pavlin-policar · Pull Request #9 · facebookarchive/fbpca

pavlin-policar · 2018-12-15T22:06:08Z

Thanks for this awesome SVD/PCA implementation!

I've made a fair bit of changes that hopefully improve the code. I've structured the commits such that it should be fairly simple to exclude anything if needed and it will be easiest to review this PR commit by commit.

I've done the following:

Tests are generally kept in a separate file or directory. This makes everything much more readable. I guessed I probably need to include the copyright in the test file as well.
Black is a fairly standard code formatter that makes the code more readable.
There were a lot of empty comment lines which made everything much harder to read, so I removed those.
The reason I even started this - add the random_state parameter to all the functions. check_random_state was taken from scikit-learn to be consistent (adding scikit-learn as a requirement would be overkill for the simple function). Requiring the user to set a global random seed is not very user-friendly and this method is very familliar in python.

If you don't like any of formatting changes or moving tests to their own file, I can remove that, but having a random_state parameter would be really useful.

tygert · 2018-12-15T22:25:29Z

Could you perhaps fork into a new project? The randomized method in scikit-learn for PCA has by now largely caught up with fbpca in terms of accuracy, speed, and ease-of-use. For some reason the storage (memory) requirements are still larger for the implementation in scikit-learn, but otherwise standardizing on scikit-learn and their very active pool of maintainers would be my preference. Maybe you would want to take over fbpca by creating a new, more definitive repository that includes special functionality beyond the capabilities of scikit-learn? I would like to deprecate fbpca in favor of scikit-learn, but the limitations of the latter's implementation has prevented full deprecation so far. A new forked repo would be most welcome.

pavlin-policar · 2018-12-15T22:36:46Z

I see, thanks for the response!

The main functionality I am missing in scikit-learn is PCA for sparse, non-centred data, which this implementation does include. As far as I am aware, this is the only Python implementation that does this.

I'll see about maintaining using a fork unless I find a better alternative.

tygert · 2018-12-15T22:46:04Z

For what it's worth, @pavlin-policar, scikit-learn definitely welcomes contributors ;-)

pavlin-policar added 4 commits December 15, 2018 22:39

Move tests to separate file

a92aad5

Run black on python files

6253032

Remove extra lines

254d6ce

Add random_state parameter

f605740

tygert closed this Dec 15, 2018

pavlin-policar mentioned this pull request Dec 15, 2018

PCA on sparse, noncentered data scikit-learn/scikit-learn#12794

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add random_state parameter and refactor#9

Add random_state parameter and refactor#9
pavlin-policar wants to merge 4 commits intofacebookarchive:masterfrom
pavlin-policar:refactor

pavlin-policar commented Dec 15, 2018

Uh oh!

tygert commented Dec 15, 2018

Uh oh!

pavlin-policar commented Dec 15, 2018

Uh oh!

tygert commented Dec 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pavlin-policar commented Dec 15, 2018

Uh oh!

tygert commented Dec 15, 2018

Uh oh!

pavlin-policar commented Dec 15, 2018

Uh oh!

tygert commented Dec 15, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants