Skip to content

MMSeqs2 DB slimmer #316

@genomewalker

Description

@genomewalker

Hi
this is not an issue but a potential enhancement we discussed with @martin-steinegger.
We have a seed clustering database that is continuously updated with new sequences. The size of the DB is growing quite fast, and eventually, we will have problems storing and distributing it. As we have many redundant sequences in each cluster. We thought that having a module that takes a DB and then filters it based on a criterion similar to --diff from result2msa or result2profile would be very useful to keep only informative sequences in the clusters.

Thanks
Antonio

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions