Skip to content

Lot of time spend on toposort in a large workflow #2134

@maarten-k

Description

@maarten-k

Snakemake version
7.19.1

Describe the bug
I have a large workflow with a total of 720.000 jobs. This workflow also includes checkpoints. It takes a long time to before submitting jobs. Also, I noticed that no new jobs were submitted after sending off the first jobs and the detection of finished jobs was severely delayed. Multiple other issues on GitHub might be related to this topic. for example #2091, #1700

To check what was casual to the delay, I used faulthandler to take multiple peaks inside where snakemake was spending time on. (a normal profiler would generate too much data to handle with ease).

I was a bit surprised that after checking the presents of files, it took an awfully long time in the toposort function. This function is an external dependency. I found on the toposort gitlab a merge request that sound like it addresses the problem, implemented this (dirty hack), and snakemake seems to be more responsive. However, I have a hard time benchmarking only this part of the pipeline, so I am not sure if the is a placebo effect. I will try to gather some dry-run in the future.

Additional context

about faulthandeler
I added into $(which snakemake) the following lines:

import faulthandler
import signal
faulthandler.register(signal.SIGUSR1.value)

Run snakemake,fetch the PID and while running call

kill -s SIGUSR1 $PID

will output where snakemake is spending time at the moment in the Snakemake terminal (watch out for selection bias!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions