Skip to content

Checkpoint "get" still only resolvable once per aggregate function #3475

@ASLeonard

Description

@ASLeonard

Using an aggregate rule over the same checkpoint twice seems to only take the first occurrence. This seems similar to #3283, but does not seem fixed by #3341.

My use case here was splitting two VCFs with unknown samples, and then compare samples present in both files.

This code does not work, and you can tell something is wrong because snakemake will only complain if the "A.vcf" does not exist, but doesn't care if "B.vcf" doesn't exist, despite also being necessary input.

rule all:
    input:
        'A_B.csv'

checkpoint split_files:
    input:
        '{experiment}.vcf'
    output:
        directory('split_{experiment}')

rule happy:
    input:
        truth = 'split_{truth}/{sample}.vcf',
        query = 'split_{query}/{sample}.vcf'
    output:
        '{truth}_{query}.{sample}.vcf'
    shell: 'hap.py ...'

def aggregrate_happy(wildcards):
    truth_samples = set(glob_wildcards(Path(checkpoints.split_files.get(experiment=wildcards.truth).output[0]).joinpath('{sample}.vcf')).sample)
    query_samples = set(glob_wildcards(Path(checkpoints.split_files.get(experiment=wildcards.query).output[0]).joinpath('{sample}.vcf')).sample)
    return  [f'{wildcards.truth}_{wildcards.query}.{sample}.vcf' for sample in (truth_samples & query_samples)]

rule aggregate_happy:
    input:
        aggregrate_happy
    output:
        '{truth}_{query}.csv'
    shell: 'cat {input} > {output}'

Generating two checkpoint rules or generating a second "fake" aggregate works

def aggregrate_happy(wildcards):
    truth_samples = set(glob_wildcards(Path(checkpoints.split_files.get(experiment=wildcards.truth).output[0]).joinpath('{sample}.vcf')).sample)
    return  [f'{wildcards.truth}_{wildcards.query}.{sample}.vcf' for sample in truth_samples]

def fake_aggregate(wildcards):
    query_samples = set(glob_wildcards(Path(checkpoints.split_files.get(experiment=wildcards.query).output[0]).joinpath('{sample}.vcf')).sample)
    return []

rule aggregate_happy:
    input:
        aggregrate_happy,
        fake_aggregate

So seems to be still limited to checkpoints.<rule>.get only being callable once per rule per function.

This was tested on versions v9.1.1 and v8.29.3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions