Add random_state parameter and refactor#9
Add random_state parameter and refactor#9pavlin-policar wants to merge 4 commits intofacebookarchive:masterfrom
Conversation
|
Could you perhaps fork into a new project? The randomized method in scikit-learn for PCA has by now largely caught up with fbpca in terms of accuracy, speed, and ease-of-use. For some reason the storage (memory) requirements are still larger for the implementation in scikit-learn, but otherwise standardizing on scikit-learn and their very active pool of maintainers would be my preference. Maybe you would want to take over fbpca by creating a new, more definitive repository that includes special functionality beyond the capabilities of scikit-learn? I would like to deprecate fbpca in favor of scikit-learn, but the limitations of the latter's implementation has prevented full deprecation so far. A new forked repo would be most welcome. |
|
I see, thanks for the response! The main functionality I am missing in scikit-learn is PCA for sparse, non-centred data, which this implementation does include. As far as I am aware, this is the only Python implementation that does this. I'll see about maintaining using a fork unless I find a better alternative. |
|
For what it's worth, @pavlin-policar, scikit-learn definitely welcomes contributors ;-) |
Thanks for this awesome SVD/PCA implementation!
I've made a fair bit of changes that hopefully improve the code. I've structured the commits such that it should be fairly simple to exclude anything if needed and it will be easiest to review this PR commit by commit.
I've done the following:
random_stateparameter to all the functions.check_random_statewas taken from scikit-learn to be consistent (adding scikit-learn as a requirement would be overkill for the simple function). Requiring the user to set a global random seed is not very user-friendly and this method is very familliar in python.If you don't like any of formatting changes or moving tests to their own file, I can remove that, but having a
random_stateparameter would be really useful.