Hello,
After running a pipeline with checkpoints,and then deleting an output generated by a checkpoint, rerunning the pipeline does not properly detect that the file is missing and that the checkpoint needs to run again.
Below an example pipeline
checkpoint checkpoint_test:
output:
"output/test_{n}.txt"
shell:
'''
shuf -i 1-100 -n 1 > {output[0]}
'''
def aggregate_input(wildcards):
max_value = None
selected_file = None
for i in range(10):
check = checkpoints.checkpoint_test.get(n=i)
curr_file = check.output[0]
with open(curr_file) as f:
n = int(f.read())
if max_value is None or n > max_value:
max_value = n
selected_file = curr_file
return selected_file
rule aggregate:
input:
aggregate_input
output:
"output/selection.txt"
shell:
'''
cp {input[0]} {output[0]}
'''
rule all:
default_target: True
input: "output/selection.txt"
If one of the checkpoint outputs is removed before re-executing the pipeline, checkpoints.get passes and leads to a crash in the following line.
The shell script below systematically produces the error
rm -rf output
snakemake --cores 1
rm -rf output/test_4.txt
snakemake --cores 1
I tried looking at the code, it seems to me that this function
|
async def update_needrun(job): |
doesn't check if output files are missing or have changed. A check is done here
|
missing_output = [f async for f in job_.missing_output(files)] |
but only for jobs that have been already considered for rerunning in the aforementioned function. However I am not sure enough of the reasons behind the current implementation to propose a PR.
Hello,
After running a pipeline with checkpoints,and then deleting an output generated by a checkpoint, rerunning the pipeline does not properly detect that the file is missing and that the checkpoint needs to run again.
Below an example pipeline
If one of the checkpoint outputs is removed before re-executing the pipeline, checkpoints.get passes and leads to a crash in the following line.
The shell script below systematically produces the error
I tried looking at the code, it seems to me that this function
snakemake/src/snakemake/dag.py
Line 1416 in 9598655
snakemake/src/snakemake/dag.py
Line 1598 in 9598655