BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands by pditommaso · Pull Request #6346 · nextflow-io/nextflow

pditommaso · 2025-08-15T08:34:58Z

Summary

This PR fixes issue #5470 by implementing a more robust approach to including eval output commands in task hash calculation. The solution uses semantic parameter names paired with command values, creating a symmetric pattern with input parameter hashing.

Problem Statement

Issue #5470 identified that changes to eval output commands were not properly invalidating task cache, leading to incorrect cached results. The previous fix (commit b0fe0a9) was reverted because it included raw bash commands directly in the hash.

Key Improvements Over Reverted Approach

Semantic names: Uses nxf_out_eval_* parameter names instead of raw bash commands for better readability
Symmetric pattern: Follows the same name + value approach as input parameter hashing
Deterministic ordering: Sorts eval outputs by name for consistent hash generation across JVM runs
Testable architecture: Separates logic into dedicated computeEvalOutputsContent() method
Better documentation: Comprehensive comments explaining rationale and implementation

Breaking Change Notice

⚠️ BREAKING CHANGE: This change will invalidate existing task cache entries that use output eval parameters, requiring re-execution of those tasks.

The cache invalidation is intentional and necessary to ensure proper cache behavior when eval output definitions change.

Code Changes

Added eval output hash calculation in TaskProcessor.createTaskHashKey()
Implemented computeEvalOutputsContent() method with sorting and proper formatting
Added comprehensive unit tests for deterministic behavior verification

Fixes #5470

🤖 Generated with Claude Code

netlify · 2025-08-15T08:35:03Z

✅ Deploy Preview for nextflow-docs-staging canceled.

Name	Link
🔨 Latest commit	`f1aa2d0`
🔍 Latest deploy log	https://app.netlify.com/projects/nextflow-docs-staging/deploys/689ef21286be460008a77590

… of raw commands This commit fixes issue #5470 by implementing a more robust approach to including eval output commands in task hash calculation. Instead of using raw command strings directly, we now use semantic parameter names paired with command values, creating a symmetric pattern with input parameter hashing. Key improvements over the reverted approach (b0fe0a9): - Uses semantic names (nxf_out_eval_*) instead of raw bash commands for better readability - Maintains deterministic ordering through sorting for cache consistency - Follows the same name+value pattern as input parameters for symmetry - Separates hash computation logic into testable computeEvalOutputsContent() method - Provides comprehensive comments explaining the rationale BREAKING CHANGE: This change will invalidate existing task cache entries that use output eval parameters, requiring re-execution of those tasks. The cache invalidation is intentional and necessary to ensure proper cache behavior when eval output definitions change. Fixes #5470 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

bentsherman · 2025-08-19T14:10:55Z

modules/nextflow/src/main/groovy/nextflow/processor/TaskProcessor.groovy

+        // Without sorting, HashMap iteration order can vary between executions, leading to
+        // different cache keys for identical eval output configurations and causing
+        // unnecessary cache misses and task re-execution
+        final sortedEntries = outEvals.entrySet().sort { a, b -> a.key.compareTo(b.key) }


@pditommaso instead of sorting the entries, it would be better to hash them as an unordered collection (e.g. ArrayBag)

What would be the difference?

It would just be a more efficient hash calculation since you wouldn't need to sort the entries. Like we did with the directory hash

Considering the average number of entries (< 10) not sure it's something it could have an impact, but not against that. Feel free to make a PR if you think it's important.

pditommaso force-pushed the fix-eval-outputs-hash-semantic branch from a147370 to f1aa2d0 Compare August 15, 2025 08:38

pditommaso merged commit d86be1a into master Aug 15, 2025
22 checks passed

pditommaso deleted the fix-eval-outputs-hash-semantic branch August 15, 2025 09:27

bentsherman reviewed Aug 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands#6346

BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands#6346
pditommaso merged 1 commit intomasterfrom
fix-eval-outputs-hash-semantic

pditommaso commented Aug 15, 2025 •

edited

Loading

Uh oh!

netlify bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

bentsherman Aug 19, 2025

Uh oh!

pditommaso Aug 20, 2025

Uh oh!

bentsherman Aug 20, 2025

Uh oh!

pditommaso Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pditommaso commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem Statement

Key Improvements Over Reverted Approach

Breaking Change Notice

Code Changes

Uh oh!

netlify bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for nextflow-docs-staging canceled.

Uh oh!

Uh oh!

bentsherman Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

pditommaso Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

bentsherman Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

pditommaso Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pditommaso commented Aug 15, 2025 •

edited

Loading

netlify bot commented Aug 15, 2025 •

edited

Loading