BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands#6346
Merged
pditommaso merged 1 commit intomasterfrom Aug 15, 2025
Merged
Conversation
✅ Deploy Preview for nextflow-docs-staging canceled.
|
… of raw commands This commit fixes issue #5470 by implementing a more robust approach to including eval output commands in task hash calculation. Instead of using raw command strings directly, we now use semantic parameter names paired with command values, creating a symmetric pattern with input parameter hashing. Key improvements over the reverted approach (b0fe0a9): - Uses semantic names (nxf_out_eval_*) instead of raw bash commands for better readability - Maintains deterministic ordering through sorting for cache consistency - Follows the same name+value pattern as input parameters for symmetry - Separates hash computation logic into testable computeEvalOutputsContent() method - Provides comprehensive comments explaining the rationale BREAKING CHANGE: This change will invalidate existing task cache entries that use output eval parameters, requiring re-execution of those tasks. The cache invalidation is intentional and necessary to ensure proper cache behavior when eval output definitions change. Fixes #5470 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
a147370 to
f1aa2d0
Compare
bentsherman
reviewed
Aug 19, 2025
| // Without sorting, HashMap iteration order can vary between executions, leading to | ||
| // different cache keys for identical eval output configurations and causing | ||
| // unnecessary cache misses and task re-execution | ||
| final sortedEntries = outEvals.entrySet().sort { a, b -> a.key.compareTo(b.key) } |
Member
There was a problem hiding this comment.
@pditommaso instead of sorting the entries, it would be better to hash them as an unordered collection (e.g. ArrayBag)
Member
Author
There was a problem hiding this comment.
What would be the difference?
Member
There was a problem hiding this comment.
It would just be a more efficient hash calculation since you wouldn't need to sort the entries. Like we did with the directory hash
Member
Author
There was a problem hiding this comment.
Considering the average number of entries (< 10) not sure it's something it could have an impact, but not against that. Feel free to make a PR if you think it's important.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes issue #5470 by implementing a more robust approach to including eval output commands in task hash calculation. The solution uses semantic parameter names paired with command values, creating a symmetric pattern with input parameter hashing.
Problem Statement
Issue #5470 identified that changes to eval output commands were not properly invalidating task cache, leading to incorrect cached results. The previous fix (commit b0fe0a9) was reverted because it included raw bash commands directly in the hash.
Key Improvements Over Reverted Approach
nxf_out_eval_*parameter names instead of raw bash commands for better readabilityname + valueapproach as input parameter hashingcomputeEvalOutputsContent()methodBreaking Change Notice
The cache invalidation is intentional and necessary to ensure proper cache behavior when eval output definitions change.
Code Changes
TaskProcessor.createTaskHashKey()computeEvalOutputsContent()method with sorting and proper formattingFixes #5470
🤖 Generated with Claude Code