Snakemake version
7.20.0
Describe the bug
When running a workflow with a lot of parallel jobs and submitting those to a cluster, in my case slurm, snakemake initially schedules as many jobs as the provided resources allow. But then, over time, it schedules fewer and fewer new jobs. This has also been described for other scheduling / cluster systems:
#759 (comment)
For me, the output of --verbose in concert with manually checking the output of squeue -u <username> suggests, that jobs that are finished (according to squeue) do not get picked up by snakemake as Finished or only get picked up with a very long delay. As a result, the respective resources are not seen as free for the snakemake scheduler and nothing new will get scheduled.
Logs
Minimal example
I'll try to put a minimal example together, if I can find the time. But others are welcome to chime in, here.
Additional context