Skip to content

Cache entries are reused even when the inputs differ #6513

@muffato

Description

@muffato

Bug report

As I was editing my pipeline and testing with -resume, I ended in a situation where an entire sub-worfklow was being skipped. I traced it back to a join not outputting anything because the meta maps of the two channels had different keys.
I had indeed change the key name at some point but what bothers me is that Nextflow -resume is returning the cached entries from processes with different inputs (here the meta map).

Expected behavior and actual behavior

A cache entry with a meta map should not be reused if the content of the meta map differs

Steps to reproduce the problem

In this minimal example, I create a channel named ch_genome with tuples made of:

  • a meta map that uses a key name chosen by the user,
  • a string symbolising a Fasta file.

The channel goes through the FAIDX process, which outputs similar tuples:

  • the input meta map as is
  • a string symbolising the faidx index

Then the input and output channels are join-ed on the assumption that the meta map remains the same.

workflow {

    Channel.of(1, 2, 3)
    | map { i -> [ [ 'id': i, "${params.key}": 100 * i * i], "seq_${i}.fa" ] } 
    | set { ch_genome }
    ch_genome.view()

    FAIDX ( ch_genome )
    FAIDX.out.fai.view()

    // join fasta with corresponding fai file
    ch_genome
    | join ( FAIDX.out.fai )
    | set { fasta_fai }
    fasta_fai.view()
}

process FAIDX {
    input:
    tuple val(meta), val(reads)
 
    output:
    tuple val(meta), val(index), emit: fai 
 
    script:
    index = "seq_${meta.id}.fai"
    """ 
    """
}

Program output

In the first run, I choose the key genome_size. You can see all three .view() printing 3 entries each.

$ nextflow run bug2/ --key genome_size
Nextflow 25.10.0 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 25.04.6

Launching `bug2/main.nf` [friendly_kimura] DSL2 - revision: 43f5fd76a3

executor >  local (3)
[60/9c9378] process > FAIDX (3) [100%] 3 of 3 ✔
[[id:1, genome_size:100], seq_1.fa]
[[id:2, genome_size:400], seq_2.fa]
[[id:3, genome_size:900], seq_3.fa]
[[id:1, genome_size:100], seq_1.fai]
[[id:1, genome_size:100], seq_1.fa, seq_1.fai]
[[id:2, genome_size:400], seq_2.fai]
[[id:2, genome_size:400], seq_2.fa, seq_2.fai]
[[id:3, genome_size:900], seq_3.fai]
[[id:3, genome_size:900], seq_3.fa, seq_3.fai]

In the second run, I change the key name to total_length and run in -resume mode. FAIDX.out.fai.view() still prints the meta maps from the first runs, i.e. with genome_size as the key !
Then, join can't find anything in common and doesn't print anything.

$ nextflow run bug2/ --key total_length -resume
Nextflow 25.10.0 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 25.04.6

Launching `bug2/main.nf` [special_shockley] DSL2 - revision: 43f5fd76a3

[60/9c9378] process > FAIDX (3) [100%] 3 of 3, cached: 3 ✔
[[id:1, total_length:100], seq_1.fa]
[[id:2, total_length:400], seq_2.fa]
[[id:3, total_length:900], seq_3.fa]
[[id:2, genome_size:400], seq_2.fai]
[[id:1, genome_size:100], seq_1.fai]
[[id:3, genome_size:900], seq_3.fai]

Environment

  • Nextflow version: 25.04.6 build 5954
  • Java version: Groovy 4.0.26 on OpenJDK 64-Bit Server VM 17.0.15+6-LTS
  • Operating system: Ubuntu 22.04.5 LTS Linux 5.15.0-141-generic
  • Bash version: 5.1.16(1)-release (x86_64-pc-linux-gnu)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions