Skip to content

CountVectorizer for text preprocessing #689

@jrdzha

Description

@jrdzha

I understand why hash seems to be a better solution for distributed text preprocessing, but I also need a way to make my features human-readable. It seems like spark has a CountVectorizer. Would it be possible to implement one for dask-ml?

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgorithmImplement a new algorithm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions