Skip to content

filter: Grouping by day works when it shouldn't #1069

@victorlin

Description

@victorlin

Current Behavior

--group-by month day works. It groups on the extracted day integer from the YYYY-MM-DD date string. It generally works when day is used with month and/or year (these trigger the creation of the day column).

Expected behavior

A warning saying day column was not found, and it should behave as --group-by month.

How to reproduce

cat >metadata.tsv <<~~
strain	date
SEQ1	2022-01-01
SEQ2	2022-01-01
SEQ3	2022-01-02
SEQ4	2022-01-03
SEQ5	2022-01-04
~~

augur filter \
   --metadata metadata.tsv \
   --group-by month day \
   --sequences-per-group 1 \
   --subsample-seed 0 \
   --output-metadata out.tsv

cat out.tsv
# strain	date
# SEQ1	2022-01-01
# SEQ3	2022-01-02
# SEQ4	2022-01-03
# SEQ5	2022-01-04

Possible solutions

  1. Formally enable --group-by day. This has been ruled out as impractical in filter: Reduce over-sampling in partial months with --group-by month #960 (comment).
  2. Disable --group-by day.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions