Skip to content

Fix groupTuple operator GString vs String key handling#6400

Merged
pditommaso merged 1 commit intomasterfrom
fix/grouptuple-gstring-normalization
Sep 14, 2025
Merged

Fix groupTuple operator GString vs String key handling#6400
pditommaso merged 1 commit intomasterfrom
fix/grouptuple-gstring-normalization

Conversation

@pditommaso
Copy link
Member

@pditommaso pditommaso commented Sep 14, 2025

Summary

Fixes the groupTuple operator to consistently handle GString vs String keys that represent the same logical value.

Problem

The groupTuple operator was not properly grouping items when the same logical key was represented as both GString and String objects. This occurred when:

  • Channel values contained interpolated strings ("${variable}" - GStringImpl)
  • These were mixed with regular String literals ("same_value" - String)
  • Despite having identical content, they were treated as separate grouping keys

Root Cause

GString and String objects with identical content have different:

  • Hash codes
  • Equality behavior (== returns false between GString and String)
  • Class types

This caused them to create separate entries in the internal grouping map.

Solution

Added key normalization in the groupTuple operator:

  • Convert GString objects to String before grouping
  • Preserve all other data types unchanged
  • Maintain backward compatibility

Changes

  • GroupTupleOp.groovy:

    • Add normalizeKey() method to convert GString elements to String
    • Modify collect() method to normalize keys before grouping
    • Use enhanced for loops as requested
  • GroupTupleOpTest.groovy:

    • Add comprehensive unit tests for GString/String scenarios
    • Add tests for preserving non-string key types
    • All existing tests continue to pass

Test Plan

  • Unit tests pass (including new GString-specific tests)
  • Integration tests pass (all existing groupTuple tests)
  • No regressions in existing functionality
  • Non-string key types preserved correctly

Related Issues

Fixes #6399

🤖 Generated with Claude Code

The groupTuple operator was not properly grouping items when the same
logical key was represented as both GString and String objects. This
occurred when channel values contained interpolated strings (GString)
that were later mixed with regular String literals having the same
content.

The root cause was that GString and String objects with identical
content have different hash codes and equality behavior, causing them
to be treated as separate grouping keys.

This commit adds key normalization in the groupTuple operator that
converts GString objects to String while preserving all other data
types unchanged.

Changes:
- Add normalizeKey() method to convert GString elements to String
- Modify collect() method to normalize keys before grouping
- Add comprehensive unit tests covering GString/String scenarios
- Preserve existing behavior for all non-GString key types

Fixes #6399

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@netlify
Copy link

netlify bot commented Sep 14, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 2e43e7b
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/68c6f6e93126eb0008a5f609

@pditommaso
Copy link
Member Author

@claude review

@claude
Copy link

claude bot commented Sep 14, 2025

Claude finished @pditommaso's task —— View job


Code Review

I've reviewed the PR that fixes GString vs String key handling in the groupTuple operator. Overall, this is a solid fix for a real problem with clean implementation.

Tasks

  • Read and analyze the implementation changes in GroupTupleOp.groovy
  • Review the test additions in GroupTupleOpTest.groovy
  • Evaluate the solution approach and code quality
  • Check for potential edge cases or issues
  • Provide comprehensive review feedback

Review Summary

✅ Strengths:

  • Clean solution: The normalizeKey() method is simple and focused - converts GString to String while preserving all other types
  • Minimal impact: Only 2 lines changed in core logic (line 94 and 265-272), reducing risk of regressions
  • Proper integration: Key normalization happens at the right point in collect() method before grouping
  • Good test coverage: Tests cover both the main GString/String scenario and preservation of other key types

📋 Code Quality:

  • Implementation follows existing code patterns (enhanced for loops, consistent style)
  • Proper null handling with size allocation
  • Uses static protected method appropriately
  • Good separation of concerns

🧪 Test Quality:

  • Test 'should handle GString vs String keys correctly' properly demonstrates the core issue and fix
  • Test 'should preserve non-string key types' ensures backward compatibility
  • Tests use realistic scenarios (sample names with interpolation)

Detailed Analysis

Implementation (GroupTupleOp.groovy:94, 265-272)

The fix is minimal and surgical:

final key = normalizeKey(tuple[indices])        // Line 94

The normalizeKey() method correctly:

  • Processes each element in the key list independently
  • Only converts GString instances to String using toString()
  • Preserves all other data types unchanged
  • Uses efficient ArrayList pre-sizing

Edge Cases Considered ✅

  • Null keys: Would be preserved (not GString instances)
  • Nested structures: Only top-level GString objects normalized (appropriate scope)
  • Mixed types: Integer, File, Map keys properly preserved as shown in tests
  • Performance: Minimal overhead - only processes keys, not values

Potential Improvements (Minor)

  1. Documentation: Consider adding a brief comment explaining why GString normalization is needed
  2. Null safety: While likely not an issue in practice, could add null check in normalizeKey if defensive programming is preferred

Backward Compatibility ✅

  • All existing functionality preserved
  • Only affects the problematic GString/String equality case
  • No changes to public API or behavior for other key types

Conclusion

This is a well-implemented fix that addresses the root cause of the GString/String grouping issue. The solution is minimal, focused, and well-tested. The approach of normalizing keys before grouping is the correct strategy, and the implementation handles edge cases appropriately.

Recommendation: Approve - This fix resolves the issue without introducing risks or breaking changes.

@pditommaso pditommaso added the bug label Sep 14, 2025
@pditommaso pditommaso merged commit 206cc01 into master Sep 14, 2025
22 checks passed
@pditommaso pditommaso deleted the fix/grouptuple-gstring-normalization branch September 14, 2025 18:09
@bentsherman
Copy link
Member

This is not needed because it is already addressed in the strict syntax at the compiler level.

If you tried to fix it at the runtime level like this, you would have to add extra logic to every possible location where a string might be compared to a gstring, and even then you probably wouldn't catch every case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

groupTuple not correctly grouping based on shared key

2 participants