Skip to content

Deletion of checkpoint outputs is not properly detected #3879

@tkchouaki

Description

@tkchouaki

Hello,
After running a pipeline with checkpoints,and then deleting an output generated by a checkpoint, rerunning the pipeline does not properly detect that the file is missing and that the checkpoint needs to run again.
Below an example pipeline

checkpoint checkpoint_test:
    output:
        "output/test_{n}.txt"
    shell:
        '''
        shuf -i 1-100 -n 1 > {output[0]}
        '''

def aggregate_input(wildcards):
    max_value = None
    selected_file = None
    for i in range(10):
        check = checkpoints.checkpoint_test.get(n=i)
        curr_file = check.output[0]
        with open(curr_file) as f:
            n = int(f.read())
        if max_value is None or n > max_value:
            max_value = n
            selected_file = curr_file
    return selected_file

rule aggregate:
    input:
        aggregate_input
    output:
        "output/selection.txt"
    shell:
        '''
        cp {input[0]} {output[0]}
        '''

rule all:
    default_target: True
    input: "output/selection.txt"

If one of the checkpoint outputs is removed before re-executing the pipeline, checkpoints.get passes and leads to a crash in the following line.

The shell script below systematically produces the error

rm -rf output
snakemake --cores 1
rm -rf output/test_4.txt
snakemake --cores 1

I tried looking at the code, it seems to me that this function

async def update_needrun(job):
doesn't check if output files are missing or have changed. A check is done here
missing_output = [f async for f in job_.missing_output(files)]
but only for jobs that have been already considered for rerunning in the aforementioned function. However I am not sure enough of the reasons behind the current implementation to propose a PR.

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions