Skip to content

Add opt-in patch for unordered directory hashing#6353

Merged
pditommaso merged 4 commits intoSTABLE-25.04.xfrom
patch-unordered-dir-hashing
Sep 3, 2025
Merged

Add opt-in patch for unordered directory hashing#6353
pditommaso merged 4 commits intoSTABLE-25.04.xfrom
patch-unordered-dir-hashing

Conversation

@pditommaso
Copy link
Member

@pditommaso pditommaso commented Aug 20, 2025

Summary

This PR implements improved order-independent hashing for directories and unordered collections, controlled by the NXF_PATCH_DIRECTORY_HASH environment variable.

Key Features

  • Order-independent directory hashing: Directory traversal order no longer affects hash values
  • Consistent unordered collection hashing: Sets and Bags produce consistent hashes regardless of internal ordering
  • Backward compatible: Feature is disabled by default, enabled via NXF_PATCH_DIRECTORY_HASH=true
  • Addresses edge cases: Fixes issue where directories with similar contents could have the same hash (relates to Add additional test for hasher #6198)

Technical Implementation

  • Uses commutative byte addition for order independence in both directory and collection hashing
  • Implements separate "patched" methods: hashDirSha256Patched() and hashUnorderedCollectionPatched()
  • Maintains full backward compatibility - default behavior unchanged
  • Added comprehensive tests covering both default and patched behaviors

Usage

To enable the patch:

export NXF_PATCH_DIRECTORY_HASH=true
nextflow run your-pipeline.nf

Test Coverage

  • ✅ Original tests continue to pass (backward compatibility)
  • ✅ New tests verify order independence for directories with same content but different creation order
  • ✅ Tests verify different content yields different hashes (no false positives)
  • ✅ Comprehensive SysEnv testing for environment variable control

🤖 Generated with Claude Code

This implements improved order-independent hashing for directories and
unordered collections, controlled by the NXF_PATCH_UNORDERED_DIR environment
variable.

Key improvements:
- Directory traversal order no longer affects hash values
- Unordered collections (Sets, Bags) produce consistent hashes
- Addresses edge cases with similar directory contents (fixes #6198)
- Uses commutative byte addition for order independence
- Maintains backward compatibility (disabled by default)

The patch can be enabled by setting NXF_PATCH_UNORDERED_DIR=true.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@pditommaso pditommaso requested a review from bentsherman August 20, 2025 09:55
@bentsherman
Copy link
Member

This seems out of scope for a backport

Copy link
Member

@bentsherman bentsherman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's a hidden environment var and it's only on 25.04 so it won't clutter the code or docs on master. My only suggestion is to improve the name: NXF_PATCH_DIRECTORY_HASH

@bentsherman
Copy link
Member

Don't forget to backport #6313 while you're at it

pditommaso and others added 3 commits September 3, 2025 11:01
…PATCH_DIRECTORY_HASH

Rename the environment variable to better reflect its purpose for directory hashing configuration.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@pditommaso pditommaso merged commit 1f1e9d4 into STABLE-25.04.x Sep 3, 2025
18 checks passed
@pditommaso pditommaso deleted the patch-unordered-dir-hashing branch September 3, 2025 10:32
@bentsherman bentsherman mentioned this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants