Skip to content

Two checkpoint workflow fails to output all files in latest version 8.18.1 #3036

@peterch405

Description

@peterch405

Snakemake version

Tested with v8.18.1
Works with v7.30.2

Describe the bug

A collect rule that aggregates files from two sequential checkpoints fails to output all files in the latest version of snakemake. This type of workflow works back in v7.30.2 of snakemake. In the latest version only files from the first checkpoint are produced, the second checkpoint is not executed. The output_j = glob.glob(f"{checkpoints.second.get(**wildcards, i=i).output}/*/") returns an empty list and does not trigger the second checkpoint to be run. I am unsure why this no longer works in the latest version, but would love some help making it run.

Minimal example

checkpoint_test.smk:

import glob
import random
from pathlib import Path

ALL_SAMPLES = ["s1", "s2"]


rule all:
    input:
        expand('collect/{sample}/all_done.txt',
               sample = ALL_SAMPLES)


checkpoint first:
    input:
        expand("{sample}", sample=ALL_SAMPLES)
    output:
        directory('first/{sample}')
    run:
        for i in range(1,5):
            Path(f"{output[0]}/{i}").mkdir(parents=True, exist_ok=True)
            Path(f"{output[0]}/{i}/test.txt").touch()


checkpoint second:
    input:
        'first/{sample}/{i}/test.txt'
    output:
        directory('second/{sample}/{i}')
    run:
        for j in range(6,10):
            Path(f"{output[0]}/{j}").mkdir(parents=True, exist_ok=True)
            Path(f"{output[0]}/{j}/test2.txt").touch()


rule copy:
    input:
        'second/{sample}/{i}/{j}/test2.txt'
    output:
        'copy/{sample}/{i}/{j}/test2.txt'
    shell:
        """
        cp -f {input} {output}
        """


def aggregate(wildcards):

    outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")

    outputs_i = [output.split('/')[-2] for output in outputs_i]

    split_files = []
    for i in outputs_i:
        output_j = glob.glob(f"{checkpoints.second.get(**wildcards, i=i).output}/*/")
        outputs_j = [output.split('/')[-2] for output in output_j]
        for j in outputs_j:
            split_files.extend(expand(f"copy/{{sample}}/{i}/{j}/test2.txt",
                               sample=wildcards.sample))
    return split_files




rule collect:
    input:
        aggregate
    output:
        touch('collect/{sample}/all_done.txt')
snakemake \
--snakefile checkpoint_test.smk \
--cores 1

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions