Skip to content

snakemake exit with MissingOutputException when using --immediate-submit #2992

@sirintra

Description

@sirintra

Snakemake version
8.16.0

Describe the bug

I am currently attempting to execute Snakemake with the --immediate-submit flag in order to submit all jobs to a SLURM cluster simultaneously, without waiting for the presence of input files for subsequent jobs. Previously, this flag worked seamlessly for me on the same SLURM cluster using Snakemake v5.31. However, after migrating to Snakemake v8.16, I noticed that the immediate submit functionality no longer functions as expected. Despite successfully adapting my Snakefile and configuration to be compatible with the changes in version 8, I can only run a simple workflow without the --immediate-submit flag. When immediate-submit is enabled, Snakemake appears to continually check for input files for the next step, even while a submitted job is still running. Consequently, job submission terminates prematurely with a MissingOutputException. I wonder whether this behaviour is a bug or if there’s a specific configuration in version 8 that I need to set? Any advice on this would be much appreciated. Thanks.

Logs
Execute snakemake without immediate-submit flag work fine
Command executed: snakemake -s snakefile --profile slurm

Using profile slurm for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 999
Job stats:
job      count
-----  -------
all          1
step1        1
step2        1
step3        1
total        4

Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:37:52 2024]
rule step1:
    input: input.txt
    output: output1.txt
    jobid: 3
    reason: Missing output files: output1.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, cpus=1

cat input.txt > output1.txt
Submitted job 3 with external jobid '1089'.
[Wed Jul 31 10:38:02 2024]
Finished job 3.
1 of 4 steps (25%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:38:02 2024]
rule step2:
    input: output1.txt
    output: output2.txt
    jobid: 2
    reason: Missing output files: output2.txt; Input files updated by another job: output1.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, cpus=1

sleep 20; head -n2 output1.txt > output2.txt
Submitted job 2 with external jobid '1090'.
[Wed Jul 31 10:38:32 2024]
Finished job 2.
2 of 4 steps (50%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:38:32 2024]
rule step3:
    input: output2.txt
    output: output3.txt
    jobid: 1
    reason: Missing output files: output3.txt; Input files updated by another job: output2.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, cpus=1

head -n1 output2.txt > output3.txt
Submitted job 1 with external jobid '1091'.
[Wed Jul 31 10:38:42 2024]
Finished job 1.
3 of 4 steps (75%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:38:42 2024]
localrule all:
    input: output3.txt
    jobid: 0
    reason: Input files updated by another job: output3.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=/tmp, cpus=1

[Wed Jul 31 10:38:42 2024]
Finished job 0.
4 of 4 steps (100%) done
Complete log: .snakemake/log/2024-07-31T103751.980121.snakemake.log`

Execute snakemake with immediate-submit flag failed to submit all jobs and exit prematurely with the following error:
Command executed: snakemake -s snakefile --profile slurm --immediate-submit --notemp

Using profile slurm for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 999
Job stats:
job      count
-----  -------
all          1
step1        1
step2        1
step3        1
total        4

Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:47:31 2024]
rule step1:
    input: input.txt
    output: output1.txt
    jobid: 3
    reason: Missing output files: output1.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, cpus=1

cat input.txt > output1.txt
Submitted job 3 with external jobid '1092'.
Waiting at most 5 seconds for missing files.
[Wed Jul 31 10:47:33 2024]
Finished job 3.
1 of 4 steps (25%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Jul 31 10:47:33 2024]
rule step2:
    input: output1.txt
    output: output2.txt
    jobid: 2
    reason: Missing output files: output2.txt; Input files updated by another job: output1.txt
    resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=<TBD>, cpus=1

sleep 20; head -n2 output1.txt > output2.txt
Submitted job 2 with external jobid '1093'.
Waiting at most 5 seconds for missing files.
MissingOutputException in rule step2 in file /mnt/scratch2/users/3056021/sm_centos8/immSub/snakefile, line 11:
Job 2  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
output2.txt (missing locally, parent dir not present)
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-07-31T104731.768237.snakemake.log
WorkflowError:
At least one job did not complete successfully.

Minimal example

Content of the snakefile:

rule all:
  input:
      'output3.txt'
rule step1:
  input:
      'input.txt'
  output:
      'output1.txt'
  shell:
      'cat {input} > {output}'
rule step2:
  input:
      'output1.txt'
  output:
      'output2.txt'
  shell:
      'sleep 20; head -n2 {input} > {output}'
rule step3:
  input:
      'output2.txt'
  output:
      'output3.txt'
  shell:
      'head -n1 {input} > {output}'

Additional context

config.yaml file:

executor: cluster-generic 
jobs: 999
default-resources: [cpus=1, mem_mb=2000]
cluster-generic-submit-cmd: "./slurm/sbatch.py {resources.cpus} {resources.mem_mb} {rule} {dependencies}"
max-status-checks-per-second: 10
rerun-incomplete: True
scheduler: greedy
keep-going: True
printshellcmds: True
show-failed-logs: True

sbatch.py

#!/usr/bin/env python3

import os
import sys 
import subprocess

from snakemake.utils import read_job_properties


# last command-line argument is the job script -- required to submit jobs to slurm
jobscript = sys.argv[-1]  

cpu = sys.argv[1]
mem = sys.argv[2]
rulename = sys.argv[3]
dependencies = set(sys.argv[4:-1])

cmdline = ["sbatch --chdir ./log_hpc --output=%j.out --error=%j.err --nodes=1 --parsable"]
cmdline.append("--job-name="+rulename)
cmdline.append("--ntasks="+cpu)
cmdline.append("--mem="+str(mem))
cmdline.append("--partition=k2-hipri --time=2:59:00")


if dependencies:
    cmdline.append("--dependency")
    cmdline.append( "afterok:" + ",".join(dependencies))

cmdline.append(jobscript)

# Constructs and submits
cmdline = " ".join(cmdline)
print (cmdline)
os.system(cmdline)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions