-
Notifications
You must be signed in to change notification settings - Fork 634
Description
Snakemake version
Occurs in 8.16.0 and 8.13.0, does not occur in 7.32.4.
Describe the bug
I am running into an issue where under certain circumstances, a checkpoint's get() method does not raise the expected IncompleteCheckpointException when it has not yet been run. I've been able to create minimal example below, The problem happens in the input function of the problem rule, which reads the output of the my_checkpoint checkpoint rule.
The example runs fine starting from scratch. However, if I create an empty file for the problem rule's output file, then execute a rule that takes that file as an input, the call to checkpoints.my_checkpoint.get() succeeds even though the checkpoint has not been run.
Several warnings like the following are generated, which may or may not be related:
RuntimeWarning: coroutine 'DAG.sanitize_local_storage_copies' was never awaited
Minimal example
import os, sys
checkpoint my_checkpoint:
output: 'checkpoint.txt'
shell: 'echo 1 2 > {output}'
rule make_problem_input:
output: 'input-{i}.txt'
shell: 'echo data-{wildcards[i]} > {output}'
def my_input_func(wc):
print('!!!!!!!!!! input function called', file=sys.stderr)
checkpoint_file = checkpoints.my_checkpoint.get().output[0]
assert os.path.exists(checkpoint_file), f'{checkpoint_file} does not exist!'
print('!!!!!!!!!! checkpoint file exists', file=sys.stderr)
data = open(checkpoint_file).read()
nums = data.strip().split()
return expand(rules.make_problem_input.output, i=nums)
rule problem:
input: my_input_func
output: 'problem.txt'
shell: 'cat {input} > {output}'
rule final:
input: 'problem.txt'
output: 'final.txt'
shell: 'cat {input} > {output}'Logs
With directory empty except for the Snakefile:
snakemake --profile none final.txtAssuming unrestricted shared filesystem usage.
Building DAG of jobs...
!!!!!!!!!! input function called
!!!!!!!!!! input function called
/home/jared.lumpe/opt/mambaforge/envs/signature-finder/lib/python3.12/site-packages/snakemake/dag.py:1612: RuntimeWarning: coroutine 'DAG.sanitize_local_storage_copies' was never awaited
self.sanitize_local_storage_copies()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Using shell: /usr/bin/bash
Provided cores: 80
Rules claiming more threads will be scaled down.
Job stats:
job count
------------- -------
final 1
my_checkpoint 1
problem 1
total 3
Select jobs to execute...
Execute 1 jobs...
[Fri Aug 9 16:52:43 2024]
localcheckpoint my_checkpoint:
output: checkpoint.txt
jobid: 2
reason: Missing output files: <TBD>
resources: tmpdir=/tmp
DAG of jobs will be updated after completion.
[Fri Aug 9 16:52:43 2024]
Finished job 2.
1 of 3 steps (33%) done
!!!!!!!!!! input function called
!!!!!!!!!! checkpoint file exists
/home/jared.lumpe/opt/mambaforge/envs/signature-finder/lib/python3.12/site-packages/snakemake/dag.py:1612: RuntimeWarning: coroutine 'DAG.sanitize_local_storage_copies' was never awaited
self.sanitize_local_storage_copies()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Select jobs to execute...
Execute 2 jobs...
[Fri Aug 9 16:52:43 2024]
localrule make_problem_input:
output: input-2.txt
jobid: 6
reason: Missing output files: input-2.txt
wildcards: i=2
resources: tmpdir=/tmp
[Fri Aug 9 16:52:43 2024]
localrule make_problem_input:
output: input-1.txt
jobid: 5
reason: Missing output files: input-1.txt
wildcards: i=1
resources: tmpdir=/tmp
[Fri Aug 9 16:52:43 2024]
Finished job 6.
2 of 5 steps (40%) done
[Fri Aug 9 16:52:43 2024]
Finished job 5.
3 of 5 steps (60%) done
Select jobs to execute...
Execute 1 jobs...
[Fri Aug 9 16:52:43 2024]
localrule problem:
input: input-1.txt, input-2.txt
output: problem.txt
jobid: 1
reason: Missing output files: problem.txt; Input files updated by another job: input-1.txt, input-2.txt
resources: tmpdir=/tmp
[Fri Aug 9 16:52:43 2024]
Finished job 1.
4 of 5 steps (80%) done
Select jobs to execute...
Execute 1 jobs...
[Fri Aug 9 16:52:43 2024]
localrule final:
input: problem.txt
output: final.txt
jobid: 0
reason: Missing output files: final.txt; Input files updated by another job: problem.txt
resources: tmpdir=/tmp
[Fri Aug 9 16:52:43 2024]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2024-08-09T165243.478887.snakemake.log
Remove all files and .snakemake/ directory, touch problem.txt, and rerun:
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
!!!!!!!!!! input function called
!!!!!!!!!! input function called
InputFunctionException in rule problem in file /home/jared.lumpe/tmp/24-08-09-snakemake-bug/Snakefile, line 20:
Error:
AssertionError: checkpoint.txt does not exist!
Wildcards:
Traceback:
File "/home/jared.lumpe/tmp/24-08-09-snakemake-bug/Snakefile", line 14, in my_input_func (rule problem, line 40, /home/jared.lumpe/tmp/24-08-09-snakemake-bug/Snakefile)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status