Skip to content

Implement subsampling with strain lists instead of sequences and metadata #807

@huddlej

Description

@huddlej

Context
Subsampling currently extracts metadata and sequences for each subsample rule, requiring each corresponding augur filter call to loop through the full sequence and metadata.

Description
We can speed up subsampling by rewriting these rules to output only strain names (or strain names and metadata) and then adding a rule that emits the subsampled sequences and metadata based on those strain names with augur filter's.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions