Allow build-specific --narrow-bandwidth param in frequencies#1130
Allow build-specific --narrow-bandwidth param in frequencies#1130
Conversation
There was a bug in the current behavior where the rule `frequencies` was calling `config["frequencies"]["narrow_bandwidth"]`. This resulted in --narrow-bandwidth to be fixed to whatever was specified in parameters.yaml. However, we were relying on build-specific settings to differentiate behavior in the all-time vs 6m vs 2m vs 1m builds, ala:
```
frequencies
global_1m:
narrow_bandwidth: 0.019
```
This commit fixes this issue and allows overriding of narrow_bandwidth in parameters.yaml by build-specific settings. It also provides a genuine default (equal to augur default) so that parameters.yaml doesn't have to always specify narrow_bandwidth.
1363c19 to
3d8bed9
Compare
This removes `recent_days_to_censor` from GISAID/21L and open builds to match removal from GISAID builds.
8eca4ee to
f8081fe
Compare
This follows recommendation in issue #1131 to use the pattern frequencies: default: min_date: "6M" narrow_bandwidth: 0.038 This use of "default" extends just to min_date, max_date and narrow_bandwidth. This behavior is now documented in parameters.yaml as well as workflow-config-file.rst. The specification of frequencies parameters in builds.yaml now follows this pattern.
f8081fe to
6458357
Compare
|
Okay. I've updated this PR to follow "strategy 2" from #1131 and use I've also re-triggered trial builds with https://github.com/nextstrain/ncov/actions/runs/11062321363. @victorlin: Could you review this when you have a moment? I think it should be ready to be merged. |
victorlin
left a comment
There was a problem hiding this comment.
Looks good – one small thing and a couple non-blocking comments. I can make these changes if it helps.
|
|
||
|
|
||
| # settings that can be over-ridden across all builds, but not for specific builds | ||
| recent_days_to_censor: 0 |
There was a problem hiding this comment.
The default value is now defined in two places: the config file (where this comment is added) and in the workflow:
ncov/workflow/snakemake_rules/common.smk
Line 203 in 6458357
Since all workflow invocations¹ source from defaults/parameters.yaml, the default value defined in common.smk never gets used. I suggest removing it:
recent_days_to_censor = config["frequencies"]["recent_days_to_censor"]¹ under expected usage
There was a problem hiding this comment.
In general, I liked having workflow default in addition to the parameters.yaml default. As you point out this redundancy also exists for frequencies.narrow_bandwidth.
But perhaps I understand your concern, where by having multiple layers of "defaults" it might add confusion. It would be nice if people could look just at parameters.yaml to understand defaults rather than digging into the workflow.
I think the original push for adding workflow redundancy here was that parameters.yaml (still) doesn't include recent_days_to_censor and if we start looking for it in the workflow and people are stilling using old parameters.yaml files then things will break.
I think we need a broader pattern to adhere to here (kind of like the conversion in issue #1131). I'm not immediately sure what's best for this specific PR.
There was a problem hiding this comment.
if we start looking for it in the workflow and people are stilling using old
parameters.yamlfiles then things will break.
I'm not familiar with external usage of ncov so I appreciate the thoughts here.
Since parameters.yaml is under version control, shouldn't it be sufficient to have workflow changes + parameters.yaml changes (adding recent_days_to_censor) in the same PR? If someone pulls the workflow changes, they'll also pull the default recent_days_to_censor in parameters.yaml. I can think of a few possibilities where breakage might happen:
-
User has made custom changes to
parameters.yaml.git pullwill try to auto-merge and and flag any merge conflicts. -
User has explicitly removed this line from Snakefile:
Line 42 in 82ca8d4
-
User is using a profile that does not source from
parameters.yaml, i.e. missing this line:ncov/nextstrain_profiles/nextstrain-open/config.yaml
Lines 1 to 2 in 82ca8d4
This seems the most likely for external usage, and I believe it's one of the reasons why we now recommend
--configfileover--profilenowadays.
There was a problem hiding this comment.
I think we need a broader pattern to adhere to here
The broader pattern seems to be having a default in parameters.yaml + direct access in Snakemake. This is how it's done for other config:
ncov/workflow/snakemake_rules/main_workflow.smk
Lines 1165 to 1168 in 8399290
Lines 143 to 152 in 8399290
There was a problem hiding this comment.
Thanks for the thoughts. I agree with your logic. Though I would like to push the fixes to a separate PR.
| # else return augur frequencies default value | ||
| else: | ||
| return 0.0833 |
There was a problem hiding this comment.
(non-blocking)
| # else return augur frequencies default value | |
| else: | |
| return 0.0833 |
This shouldn't be necessary since defaults/parameters.yaml already defines the default value and should always be used¹.
Non-blocking because _get_min_date_for_frequencies already has similar redundancy.
¹ under expected usage
| max_date: "2022-01-01" | ||
| narrow_bandwidth: 0.076 | ||
|
|
||
| Each named traits configuration (``default`` or build-named) supports specification of ``min_date``, ``max_date`` and ``narrow_bandwidth``. Other parameters can only be specified across all builds. |
There was a problem hiding this comment.
(non-blocking)
Would it be worth allowing build-specific recent_days_to_censor, pivot_interval, etc. as well? If done right, this could reduce the duplication of functions _get_min_date_for_frequencies/_get_max_date_for_frequencies/_get_narrow_bandwidth_for_wildcards.
Obviously beyond the scope this PR – I could take this or at least make an issue to start it off.
There was a problem hiding this comment.
Totally. Ideally, every entry in parameters.yaml would be able to be over-ridden with build-specific entries in builds.yaml. As you say, the current Snakemake workflow strategy here however would require padding out individual functions for each parameter. It would be wonderful to have a generic strategy that makes for automatic over-rides.
I think maybe more important than making this work for ncov is thinking through how to do this right across pathogens. Everyone requests the ncov style parameter-overrides (this came up for mpox most obviously). cc @joverlee521
There was a problem hiding this comment.
There's definitely been a lot of discussions around build configs outside of ncov. I'll take some time to wrangle previous discussions and write up a summary outside of this PR (probably in pathogen-repo-guide).
Description of proposed changes
This is a bug fix PR. There was a bug in the current behavior where the rule
frequencieswas callingconfig["frequencies"]["narrow_bandwidth"]. This resulted in--narrow-bandwidthto be fixed to whatever was specified inparameters.yaml. However, we were relying on build-specific settings to differentiate behavior in theall-timevs6mvs2mvs1mbuilds, ala:This PR fixes this issue and allows overriding of
narrow_bandwidthinparameters.yamlby build-specific settings. It also provides a genuine default (equal to augur default) so thatparameters.yamldoesn't have to always specifynarrow_bandwidth.Additionally, this PR removes the build-specific setting for
recent_days_to_censor. This setting wasn't actually being applied in a build-specific fashion and so we had been always using the workflow default of0. I decided to stick with this.Testing
I've tested locally and via trial build. Trail builds are visible at:
etc...