Skip to content

continuously fewer jobs scheduled by snakemake on cluster (slurm), as jobs very slowly registered as finished by snakemake #2091

@dlaehnemann

Description

@dlaehnemann

Snakemake version
7.20.0

Describe the bug
When running a workflow with a lot of parallel jobs and submitting those to a cluster, in my case slurm, snakemake initially schedules as many jobs as the provided resources allow. But then, over time, it schedules fewer and fewer new jobs. This has also been described for other scheduling / cluster systems:
#759 (comment)

For me, the output of --verbose in concert with manually checking the output of squeue -u <username> suggests, that jobs that are finished (according to squeue) do not get picked up by snakemake as Finished or only get picked up with a very long delay. As a result, the respective resources are not seen as free for the snakemake scheduler and nothing new will get scheduled.

Logs

Minimal example
I'll try to put a minimal example together, if I can find the time. But others are welcome to chime in, here.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions