-
Notifications
You must be signed in to change notification settings - Fork 634
Description
Snakemake version
7.19.1
Describe the bug
I have a large workflow with a total of 720.000 jobs. This workflow also includes checkpoints. It takes a long time to before submitting jobs. Also, I noticed that no new jobs were submitted after sending off the first jobs and the detection of finished jobs was severely delayed. Multiple other issues on GitHub might be related to this topic. for example #2091, #1700
To check what was casual to the delay, I used faulthandler to take multiple peaks inside where snakemake was spending time on. (a normal profiler would generate too much data to handle with ease).
I was a bit surprised that after checking the presents of files, it took an awfully long time in the toposort function. This function is an external dependency. I found on the toposort gitlab a merge request that sound like it addresses the problem, implemented this (dirty hack), and snakemake seems to be more responsive. However, I have a hard time benchmarking only this part of the pipeline, so I am not sure if the is a placebo effect. I will try to gather some dry-run in the future.
Additional context
about faulthandeler
I added into $(which snakemake) the following lines:
import faulthandler
import signal
faulthandler.register(signal.SIGUSR1.value)
Run snakemake,fetch the PID and while running call
kill -s SIGUSR1 $PID
will output where snakemake is spending time at the moment in the Snakemake terminal (watch out for selection bias!).