-
Notifications
You must be signed in to change notification settings - Fork 634
Accept hyphens in config key given to CLI #2986
Description
TL;DR Can we allow hyphens in config keys given on the command line?
I'm designing a distributed pipeline consisting of many different bioinformatic tools. In order to make it easy to use, I wish to be able to address any possible command line argument to individual rules when calling snakemake on the command line. And that is by using the built in config command line parser in Snakemake.
I have already implemented a system that looks for a specific syntax in the config key and sends that key-value pair to a specific rule. The system works by looking for config keys following a specific syntax that allows addressing rule, key and value for the command line argument: "set_<rule><key>=<value>".
Here "set_" is a static prefix; <rule> is the Snakemake rule; <key> is the command line option key; and <value> is the argument value. By setting these in the config.yaml-file, one can easily configure unlimited settings for individual rules in the pipeline. For example, you might want to set the "--kingdom archaea" command line argument to the prokka rule when analysing archaeal genomes. In this case you would write "set_prokka--kingdom: archaea" in the config.yaml and the parser will then call prokka with prokka --kingdom archaea [...] automatically.
The problem arises when I then try to set these on the command line. E.g. snakemake --snakefile [...] --config set_prokka--kingdom=archaea. I then get the error: "Invalid config definition: Config entry must start with a valid identifier." This is because Snakemake's parse_config() command line config parser uses the regex string "[a-zA-Z_]\w*$" to validate the keys given to the config, which doesn't allow hyphens in the config keys at all.
My proposed workaround (cmkobel@9479418) is to change this regex string to "[a-zA-Z_][\w-]*\w$", thereby allowing hyphens in the middle of config keys (when given on the command line). I already implemented this in my pipeline as a proof-of-concept. It works as intended.
Unfortunately it is not sustainable that I have to tell the users of my pipeline to change the internal code of Snakemake though this can easily be done with the following sed command:
sed -i 's|valid = re.compile(r"\[a-zA-Z_\]\\w\*\$")|valid=re.compile(r"\[a-zA-Z_\]\[\\w-\]\*\\w\$")|g' $CONDA_PREFIX/lib/python3.11/site-packages/snakemake/__init__.pySo with this feature request, I'd like the Snakemake developers to consider if we can change the regex string of parse_config() to allow hyphens in the config keys. For example with r"[a-zA-Z_][\w-]*\w$" instead.
By allowing Snakemake to accept hyphen-containing keys for the config on the command line, the usability of the built in command line parser is greatly enhanced. As hyphens are already valid in keys given in the config-file, I don't see any obvious problems to allow these on the command line.
Just want to humbly brainstorm this with you before making a pull request.
And thanks for developing a great framework. Best, Carl