Skip to content

Confusion with Overriding input After Snakemake Modularization #3713

@jmzhang1911

Description

@jmzhang1911

Hello Developer,

I have encountered the following issue and would like to ask for your help!

  1. When I import another module and override the input of one of its rules, the override does not seem to take effect.

  2. When I run A.smk alone, the logic works as expected: the workflow takes a.txt as input and finally produces output_run2.txt.

  3. When I import A into B and modify the input of the run rule to b.txt, the workflow in B does not behave as expected: it still takes a.txt as input and produces output_run2.txt.

  4. However, if I add output_run.txt to the input of the all rule in B, then B correctly takes b.txt as input.

I don’t quite understand why I need to explicitly add output_run.txt for this to work.
My understanding is that Snakemake should construct the DAG based on my overridden rule and eventually resolve the input as b.txt.

I would greatly appreciate your feedback. If I have misunderstood something, I sincerely apologize for taking up your valuable time.

Create example data

mkdir data && touch data/{a,b}.txt

A.smk

config = {"a": "data/a.txt"}


rule all:
    input:
        "output_run2.txt",


rule run:
    input:
        rds=config["a"],
    output:
        txt="output_run.txt",
    shell:
        "cp {input.rds} {output.txt}"


rule run2:
    input:
        rds=rules.run.output.txt,
    output:
        txt="output_run2.txt",
    shell:
        "cp {input.rds} {output.txt}"

B.smk (import A.smk and override input of rule run)

config = {"b": "data/b.txt"}


module A:
    snakefile:
        f"A.smk"
    config:
        config


use rule * from A as A_*


use rule run from A as A_run with:
    input:
        rds="data/b.txt",


rule all:
    input:
        "output_run2.txt",
    default_target: True

Workaround (adding output_run.txt makes it work as expected)

config = {"b": "data/b.txt"}


module A:
    snakefile:
        f"A.smk"
    config:
        config


use rule * from A as A_*


use rule run from A as A_run with:
    input:
        rds="data/b.txt",


rule all:
    input:
        "output_run.txt",
        "output_run2.txt",
    default_target: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions