groupTuple not correctly grouping based on shared key

## Bug report 

I have a sample Nextflow script like this which is collecting sample ID's from processes throughout the workflow and then grouping them at the end to show which samples passed which steps in the pipeline. At the end I am using a simple `groupTuple` operation to gather up the list of successful workflow steps for each sample. However `groupTuple` is not correctly grouping items based on their shared keys. I have tried a lot of different methods of swapping the Channels and operators around with different combinations of other operators and nothing seems to make this very very simple basic grouping operation work correctly. It seems like there is either some fundamental broken behavior in `groupTuple` or perhaps there is some hidden discrepancy that is not apparent. 

The Nextflow script is here;

```nextflow
process DO_THING {
    // debug true
    errorStrategy "ignore"

    input:
    val(x)

    output:
    tuple val("${sampleID}"), val("${task.process}"), topic: passed
    tuple val("${sampleID}"), val("${task.process}"), emit: passed

    script:
    sampleID = "${x}"
    """
    echo "got $x"
    if [ "$x" == "Sample1" ]; then echo "bad sample!"; exit 1; fi
    """
}

process DO_THING2 {
    // debug true
    errorStrategy "ignore"

    input:
    val(x)

    output:
    tuple val("${sampleID}"), val("${task.process}"), topic: passed
    tuple val("${sampleID}"), val("${task.process}"), emit: passed

    script:
    sampleID = "${x}"
    """
    echo "got $x"
    if [ "$x" == "Sample2" ]; then echo "bad sample!"; exit 1; fi
    """
}

process DO_THING3 {
    // debug true
    errorStrategy "ignore"

    input:
    val(x)

    output:
    tuple val("${sampleID}"), val("${task.process}"), topic: passed
    tuple val("${sampleID}"), val("${task.process}"), emit: passed

    script:
    sampleID = "${x}"
    """
    echo "got $x"
    if [ "$x" == "Sample3" ]; then echo "bad sample!"; exit 1; fi
    """
}

process FAKE_MULTIQC {
    // put your multiqc here to do thing with the table you made
    // debug true

    input:
    path(input_file)

    script:
    """
    echo "put your multiqc with the table here"
    echo "here is your table:"
    cat "${input_file}"
    """
}

workflow {
    samples = Channel.from("Sample1", "Sample2", "Sample3", "Sample4")

    // list all the input samples
    inputSamples = samples.map { sampleID ->
        // coerce all sample ID's to new string objects for consistency
        return ["${sampleID}", "INPUT"]
    }
    // sanity check that these INPUT items group correctly
    inputSamples2 = samples.map { sampleID ->
        return ["${sampleID}", "INPUT2"]
    }

    DO_THING(samples)
    DO_THING2(samples)
    DO_THING3(samples)

    // view the samples that passed
    // concat the channels sequentially to force blocking behavior for sanity check
    allSamples = inputSamples.concat(channel.topic("passed")).concat(inputSamples2).groupTuple()
    // NOTE: this gives the same output so its not due to topic channels; allSamples = inputSamples.concat(DO_THING.out.passed, DO_THING2.out.passed).concat(inputSamples2).groupTuple(by: 0)
    allSamples.view()

    // gives the following output;
    // [Sample1, [INPUT, INPUT2]]
    // [Sample2, [INPUT, INPUT2]]
    // [Sample3, [INPUT, INPUT2]]
    // [Sample4, [INPUT, INPUT2]]
    // [Sample1, [DO_THING3, DO_THING2]]
    // [Sample2, [DO_THING3, DO_THING]]
    // [Sample4, [DO_THING2, DO_THING3, DO_THING]]
    // [Sample3, [DO_THING2, DO_THING]]

    // but its supposed to look like this;
    // [Sample1, [INPUT, INPUT2, DO_THING3, DO_THING2]]
    // [Sample3, [INPUT, INPUT2, DO_THING2, DO_THING]]
    // [Sample4, [INPUT, INPUT2, DO_THING2, DO_THING3, DO_THING]]
    // [Sample2, [INPUT, INPUT2, DO_THING3, DO_THING]]

    // make a table out of the passing samples
    samplesTable = channel.topic("passed")
        .map { processName, sampleId ->
            return "${processName}\t${sampleId}"
        }
        .collectFile(name: "passed.txt", storeDir: "output", newLine: true)
    FAKE_MULTIQC(samplesTable)
}
```


### Expected behavior and actual behavior

This line in the script here;

```nextflow
// view the samples that passed
    allSamples = inputSamples.concat(channel.topic("passed")).concat(inputSamples2).groupTuple()
    allSamples.view()
```

Is supposed to give output like this;

```
[Sample1, [INPUT, INPUT2, DO_THING3, DO_THING2]]
    [Sample3, [INPUT, INPUT2, DO_THING2, DO_THING]]
    [Sample4, [INPUT, INPUT2, DO_THING2, DO_THING3, DO_THING]]
    [Sample2, [INPUT, INPUT2, DO_THING3, DO_THING]]
```

But instead it gives this output;

```
[Sample1, [INPUT, INPUT2]]
[Sample2, [INPUT, INPUT2]]
[Sample3, [INPUT, INPUT2]]
[Sample4, [INPUT, INPUT2]]
[Sample4, [DO_THING, DO_THING2]]
[Sample2, [DO_THING]]
[Sample3, [DO_THING, DO_THING2]]
[Sample1, [DO_THING2]]
```

The items are not correctly grouped based on the sample ID, however, the items from the "input samples" channels are getting grouped together and the items from the downstream Nextflow process channels are getting grouped together. 

This behavior seems wrong because the expectation is that all items would get grouped.

For context, I re-read the docs for `groupTuple` here https://www.nextflow.io/docs/latest/reference/operator.html#grouptuple multiple times but I cannot figure out what is wrong with the example shown here. Note that docs there seem to mention something to do with `groupKey` and `size` but the [explanation there on these topics](https://training.nextflow.io/2.1/advanced/grouping/#passing-maps-through-processes) is [mostly unintelligible](https://github.com/nextflow-io/nextflow/issues/3935) (I could not figure out what it was talking about and neither could my colleagues) so I am not sure if that could be affecting this or not? 

I also saw this  discussion https://github.com/nextflow-io/nextflow/discussions/4716 but I am not sure the problem is the same here

### Environment 

* Nextflow version:  25.04.6 ; I test on a variety of older versions of Nextflow and saw the same behavior
* Java version: 
```
openjdk version "17.0.10" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-17.0.10.7.1 (build 17.0.10+7-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.10.7.1 (build 17.0.10+7-LTS, mixed mode, sharing)
```
* Operating system: tested on macOS and Linux
* Bash version: (use the command `$SHELL --version`)
```
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin24)

GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

groupTuple not correctly grouping based on shared key #6399

Bug report

Expected behavior and actual behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

groupTuple not correctly grouping based on shared key #6399

Description

Bug report

Expected behavior and actual behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions