Skip to content
This repository was archived by the owner on Nov 2, 2020. It is now read-only.

Add random_state parameter and refactor#9

Closed
pavlin-policar wants to merge 4 commits intofacebookarchive:masterfrom
pavlin-policar:refactor
Closed

Add random_state parameter and refactor#9
pavlin-policar wants to merge 4 commits intofacebookarchive:masterfrom
pavlin-policar:refactor

Conversation

@pavlin-policar
Copy link
Copy Markdown

Thanks for this awesome SVD/PCA implementation!

I've made a fair bit of changes that hopefully improve the code. I've structured the commits such that it should be fairly simple to exclude anything if needed and it will be easiest to review this PR commit by commit.

I've done the following:

  1. Tests are generally kept in a separate file or directory. This makes everything much more readable. I guessed I probably need to include the copyright in the test file as well.
  2. Black is a fairly standard code formatter that makes the code more readable.
  3. There were a lot of empty comment lines which made everything much harder to read, so I removed those.
  4. The reason I even started this - add the random_state parameter to all the functions. check_random_state was taken from scikit-learn to be consistent (adding scikit-learn as a requirement would be overkill for the simple function). Requiring the user to set a global random seed is not very user-friendly and this method is very familliar in python.

If you don't like any of formatting changes or moving tests to their own file, I can remove that, but having a random_state parameter would be really useful.

@tygert
Copy link
Copy Markdown
Contributor

tygert commented Dec 15, 2018

Could you perhaps fork into a new project? The randomized method in scikit-learn for PCA has by now largely caught up with fbpca in terms of accuracy, speed, and ease-of-use. For some reason the storage (memory) requirements are still larger for the implementation in scikit-learn, but otherwise standardizing on scikit-learn and their very active pool of maintainers would be my preference. Maybe you would want to take over fbpca by creating a new, more definitive repository that includes special functionality beyond the capabilities of scikit-learn? I would like to deprecate fbpca in favor of scikit-learn, but the limitations of the latter's implementation has prevented full deprecation so far. A new forked repo would be most welcome.

@tygert tygert closed this Dec 15, 2018
@pavlin-policar
Copy link
Copy Markdown
Author

I see, thanks for the response!

The main functionality I am missing in scikit-learn is PCA for sparse, non-centred data, which this implementation does include. As far as I am aware, this is the only Python implementation that does this.

I'll see about maintaining using a fork unless I find a better alternative.

@tygert
Copy link
Copy Markdown
Contributor

tygert commented Dec 15, 2018

For what it's worth, @pavlin-policar, scikit-learn definitely welcomes contributors ;-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants