-
Notifications
You must be signed in to change notification settings - Fork 634
checkpoint + temp #609
Description
Is your feature request related to a problem? Please describe.
checkpoint is great for dealing with dynamically created files, but it would be great if temp() worked on those files so that the files would automatically be removed by snakemake. I know that checkpoint is providing input for downstream rules, while temp() is for output, but there doesn't seem to be a good alternative for automatically removing the dynamically generated files, except for creating another rule that deletes all of the files, but the user must then make sure to not delete those file prematurely.
An example from the docs with temp() added:
# a target rule to define the desired final output
rule all:
input:
"aggregated/a.txt",
"aggregated/b.txt"
# the checkpoint that shall trigger re-evaluation of the DAG
checkpoint somestep:
input:
"samples/{sample}.txt"
output:
"somestep/{sample}.txt"
shell:
# simulate some output value
"echo {wildcards.sample} > somestep/{wildcards.sample}.txt"
# intermediate rule
rule intermediate:
input:
"somestep/{sample}.txt"
output:
"post/{sample}.txt"
shell:
"touch {output}"
# alternative intermediate rule
rule alt_intermediate:
input:
"somestep/{sample}.txt"
output:
"alt/{sample}.txt"
shell:
"touch {output}"
# input function for the rule aggregate
def aggregate_input(wildcards):
# decision based on content of output file
# Important: use the method open() of the returned file!
# This way, Snakemake is able to automatically download the file if it is generated in
# a cloud environment without a shared filesystem.
with checkpoints.somestep.get(sample=wildcards.sample).output[0].open() as f:
if f.read().strip() == "a":
return temp("post/{sample}.txt")
else:
return temp("alt/{sample}.txt")
rule aggregate:
input:
aggregate_input
output:
"aggregated/{sample}.txt"
shell:
"touch {output}"
As far as I can tell, this method of using temp() doesn't work.
The alternative that would be required to delete the files without temp:
# a target rule to define the desired final output
rule all:
input:
"aggregated/a.txt",
"aggregated/b.txt",
"tmp_deleted.done"
# the checkpoint that shall trigger re-evaluation of the DAG
checkpoint somestep:
input:
"samples/{sample}.txt"
output:
"somestep/{sample}.txt"
shell:
# simulate some output value
"echo {wildcards.sample} > somestep/{wildcards.sample}.txt"
# intermediate rule
rule intermediate:
input:
"somestep/{sample}.txt"
output:
"post/{sample}.txt"
shell:
"touch {output}"
# alternative intermediate rule
rule alt_intermediate:
input:
"somestep/{sample}.txt"
output:
"alt/{sample}.txt"
shell:
"touch {output}"
# input function for the rule aggregate
def aggregate_input(wildcards):
# decision based on content of output file
# Important: use the method open() of the returned file!
# This way, Snakemake is able to automatically download the file if it is generated in
# a cloud environment without a shared filesystem.
with checkpoints.somestep.get(sample=wildcards.sample).output[0].open() as f:
if f.read().strip() == "a":
return "post/{sample}.txt"
else:
return "alt/{sample}.txt"
rule aggregate:
input:
aggregate_input
output:
"aggregated/{sample}.txt"
shell:
"touch {output}"
rule delete_after_aggregate:
input:
tmp = aggregate_input,
final = "aggregated/{sample}.txt"
output:
touch("tmp_deleted.done")
shell:
"rm -f {input.tmp}"
The second option is more verbose, and the user must be sure to run the file-delete rule after all rules requires those files have run.