Better documentation for groupTuple and groupKey

I have a scatter/gather workflow and need to gather the files from a particular sample using the groupBy() operator.  The format of my input channel is as follows:

```
[
    {
        "id": "SAMP_0002",
        "single_end": "false"
    },
    "SAMP_0002_0.ubam"
],
[
    {
        "id": "SAMP_0002",
        "single_end": "false"
    },
    "SAMP_0002_1.ubam"
]
```

What I would like to do is wait until all the emissions from the upstream channel have finished and then to merge them with:

```
    reads_sized
        .groupTuple(by: 0)
        .set {reads_grouped}
```

To get a merged channel like this:
```
[
    {
        "id": "SAMP_0002",
        "single_end": "false"
    },
    ["SAMP_0002_0.ubam", "SAMP_0002_1.ubam"]
],
```

It's really unclear to me what happens if you don't specify a `size` parameter to `groupTuple()`. Does it block until the input channel has finished all its emissions?  Is it possible that it will emit multiple tuples with the same key if they come in at different times?

The "tip" suggests that you need to either specify a constant `size` or calculate the sizes of each key in advance and using built in function `groupKey`, but it's not clear to me exactly how `groupKey` works, and I don't see any documentation on groupKey or its inputs and outputs.

I went to the "patterns" example on groupTuple (https://nextflow-io.github.io/patterns/process-into-groups/), but it does not use `groupKey` at all even though the groups would have variable size in that example.

Overall, scatter gather seems like an important pattern and I think people would benefit from reviewing this documentation.  Or is there a better operator to use for my example above?

(Note, the history of `groupKey` seems to be in this issue https://github.com/nextflow-io/nextflow/issues/796 )


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better documentation for groupTuple and groupKey #3935

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Better documentation for groupTuple and groupKey #3935

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions