Skip to content

snakemake --touch is slow/handicapped for unclear reasons #2517

@bsmith89

Description

@bsmith89

Snakemake version:

7.25.4 (but has not been fixed in the main branch, unless I'm misunderstanding something)

Describe the bug

When I want to "clean up" a large workflow (e.g. to mark my knowledge that a software update will not affect results), I will run snakemake -j100 --touch FINAL_OUTPUT, where this requires touching a large number of intermediate files—thousands or tens of thousands.

Unfortunately, this runs everything seemingly in series (not parallelized), starting jobs relatively quickly, but then taking another few moments for those jobs to finish before it starts additional jobs. For large workflows with thousands of files, it can take 20 minutes to touch everything.

CPU utilization is low during this process.

Additional context

Digging into the source code, I find this line: https://github.com/snakemake/snakemake/blob/0ed8eb4ec749d168067666078e8761f10a70656d/snakemake/executors/touch.py#L38C28-L38C28 which suggests that the problem may be somewhat artificial. When I patch my installation of snakemake, changing this line to time.sleep(0.01) it seems to run 10x faster.

Why is there a sleep here? If it's because of some sort of file-system latency issue, is this constant of 0.1 really appropriate for ALL users? Maybe it could be adjustable?

Now that I'm looking, I'm seeing time.sleeps littered throughout the code-base. On a "perfect OS", could I drop all of these and have my workflow run WAY faster (assuming the jobs themselves are light)? Could these individual sleeps be commented with details about why they were set in the first place?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions