Skip to content

BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands#6346

Merged
pditommaso merged 1 commit intomasterfrom
fix-eval-outputs-hash-semantic
Aug 15, 2025
Merged

BREAKING CHANGE: Improve eval output hash with semantic names instead of raw commands#6346
pditommaso merged 1 commit intomasterfrom
fix-eval-outputs-hash-semantic

Conversation

@pditommaso
Copy link
Member

@pditommaso pditommaso commented Aug 15, 2025

Summary

This PR fixes issue #5470 by implementing a more robust approach to including eval output commands in task hash calculation. The solution uses semantic parameter names paired with command values, creating a symmetric pattern with input parameter hashing.

Problem Statement

Issue #5470 identified that changes to eval output commands were not properly invalidating task cache, leading to incorrect cached results. The previous fix (commit b0fe0a9) was reverted because it included raw bash commands directly in the hash.

Key Improvements Over Reverted Approach

  • Semantic names: Uses nxf_out_eval_* parameter names instead of raw bash commands for better readability
  • Symmetric pattern: Follows the same name + value approach as input parameter hashing
  • Deterministic ordering: Sorts eval outputs by name for consistent hash generation across JVM runs
  • Testable architecture: Separates logic into dedicated computeEvalOutputsContent() method
  • Better documentation: Comprehensive comments explaining rationale and implementation

Breaking Change Notice

⚠️ BREAKING CHANGE: This change will invalidate existing task cache entries that use output eval parameters, requiring re-execution of those tasks.

The cache invalidation is intentional and necessary to ensure proper cache behavior when eval output definitions change.

Code Changes

  • Added eval output hash calculation in TaskProcessor.createTaskHashKey()
  • Implemented computeEvalOutputsContent() method with sorting and proper formatting
  • Added comprehensive unit tests for deterministic behavior verification

Fixes #5470

🤖 Generated with Claude Code

@netlify
Copy link

netlify bot commented Aug 15, 2025

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit f1aa2d0
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/689ef21286be460008a77590

… of raw commands

This commit fixes issue #5470 by implementing a more robust approach to
including eval output commands in task hash calculation. Instead of using
raw command strings directly, we now use semantic parameter names paired
with command values, creating a symmetric pattern with input parameter hashing.

Key improvements over the reverted approach (b0fe0a9):
- Uses semantic names (nxf_out_eval_*) instead of raw bash commands for better readability
- Maintains deterministic ordering through sorting for cache consistency
- Follows the same name+value pattern as input parameters for symmetry
- Separates hash computation logic into testable computeEvalOutputsContent() method
- Provides comprehensive comments explaining the rationale

BREAKING CHANGE: This change will invalidate existing task cache entries
that use output eval parameters, requiring re-execution of those tasks.
The cache invalidation is intentional and necessary to ensure proper
cache behavior when eval output definitions change.

Fixes #5470

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@pditommaso pditommaso force-pushed the fix-eval-outputs-hash-semantic branch from a147370 to f1aa2d0 Compare August 15, 2025 08:38
@pditommaso pditommaso merged commit d86be1a into master Aug 15, 2025
22 checks passed
@pditommaso pditommaso deleted the fix-eval-outputs-hash-semantic branch August 15, 2025 09:27
// Without sorting, HashMap iteration order can vary between executions, leading to
// different cache keys for identical eval output configurations and causing
// unnecessary cache misses and task re-execution
final sortedEntries = outEvals.entrySet().sort { a, b -> a.key.compareTo(b.key) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pditommaso instead of sorting the entries, it would be better to hash them as an unordered collection (e.g. ArrayBag)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would just be a more efficient hash calculation since you wouldn't need to sort the entries. Like we did with the directory hash

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the average number of entries (< 10) not sure it's something it could have an impact, but not against that. Feel free to make a PR if you think it's important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline uses cached process despite edited eval script

2 participants