Skip to content

IncrementalPCA #617

@fujiisoup

Description

@fujiisoup

Hi all,

Although dask-ml already has PCA and wrappers.Incremented, but there is no IncrementalPCA.
Is this in the scope of development?

With dask-ml's PCA, we can handle large data, but if the data is too large, some approximated algorithms may be necessary.
With wrappers.Incremental, we could use scikit-learn's IncrementalPCA, but it internally uses scipy.linalg.svd, which won't run with large feature size of data.

For me it would be nicer if dask-ml has dask-compatible IncrementalPCA.

I implemented it in my folk, most part of which is stolen from scikit-learn but uses dask.array.linalg.svd_compressed internally.
If it is in the scope, I will clean it a bit and send a PR.

Thank you!

cc:
@demaheim
@yasahi-hpc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions