feat: Enable saving and reloading IOCache object#3386
feat: Enable saving and reloading IOCache object#3386johanneskoester merged 19 commits intosnakemake:mainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request introduces a new parameter, Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant CLI as Command Line Interface
participant API as Workflow API
participant DAG as DAG
participant P as Persistence
U->>CLI: Execute command with --trust-io-cache
CLI->>API: Parse args (trust_io_cache=True) and call args_to_api
API->>DAG: Initialize DAG with trust_io_cache flag
alt Trust Cache
DAG->>P: Load IOCache (if trust_io_cache is True)
end
alt Dry-run Execution
DAG->>P: Save IOCache after execution
else Non-dry-run Execution
DAG->>P: Drop outdated IOCache
end
Possibly related PRs
Suggested reviewers
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Path-based instructions (1)`**/*.py`: Do not try to improve formatting. Do not suggest ...
🔇 Additional comments (9)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
snakemake/cli.py (2)
862-2192: Consider adding test coverage for the mtime cache functionalityThis PR introduces the first component of dry-run caching, but there are no tests included yet. As mentioned in the PR description, you're running local tests using specific test files. Consider formalizing these into proper test cases to ensure the functionality works as expected.
I could help draft a test case that validates the behavior of the
--trust-mtime-cacheflag, ensuring that it properly reloads the cache when specified and ignores it when not specified. Would that be helpful?🧰 Tools
🪛 Ruff (0.8.2)
1880-1880: Do not use bare
except(E722)
862-868: Document the relationship between--dry-runand--trust-mtime-cacheSince this flag is related to reloading the cache from a previous dry-run, it would be helpful to clarify in the help text how this flag interacts with the
--dry-runflag and what the expected workflow is (first run with--dry-run, then run with--trust-mtime-cache).- "Reload the mtime cache from a previous dry-run, if it exists." + "Reload the mtime cache from a previous dry-run, if it exists. " + "This flag is meant to be used in a workflow where you first run with --dry-run " + "to populate the cache, then run again with this flag to utilize the cached information."snakemake/dag.py (1)
236-246: Logic for persisting or dropping the iocache is well structuredThe implementation properly handles both cases for dry runs and non-dry runs. For dry runs, the cache is saved for future use, while for regular runs it's dropped to prevent using potentially stale data.
One minor suggestion - the TODO comment above this block could be removed or refined since the implementation is now complete.
-# TODO: The above is where the inventory got built, so -# this is where we have a built inventory and we're ready -# to save it or reload it? if self.workflow.dryrun: # The iocache is now up-to-date and can be persisted for future # non-dry-runs. self.workflow.persistence.save_iocache() else: # The iocache is now up-to-date, but it's not a dry run, # so we shouldn't trust the previously persisted version. self.workflow.persistence.drop_iocache()snakemake/io.py (1)
143-151: Clean implementation of serialization methodsThe
saveandloadmethods provide a straightforward API for serializing and deserializing the IOCache. Using_simple_copybefore dumping ensures proper serialization.As a consideration for future improvement, you might want to handle potential pickle errors. However, this is not critical for the current implementation.
def save(self, handle): """Dump IOCache to file.""" simplified = self._simple_copy() - pickle.dump(simplified, handle) + try: + pickle.dump(simplified, handle) + except (pickle.PickleError, TypeError) as e: + raise WorkflowError(f"Failed to save IOCache: {e}") @classmethod def load(cls, handle): """Load an IOCache from file.""" - return pickle.load(handle) + try: + return pickle.load(handle) + except (pickle.PickleError, TypeError, EOFError) as e: + raise WorkflowError(f"Failed to load IOCache: {e}")snakemake/persistence.py (1)
744-750: Implementation of IOCache loading with minor issueThe
load_iocachemethod correctly loads the previously saved IOCache object when the file exists. However, there's an unnecessary 'f' prefix in the logging statement on line 747 since it doesn't contain any placeholders.- logger.info(f"Loading trusted IOCache from latest dry-run.") + logger.info("Loading trusted IOCache from latest dry-run.")🧰 Tools
🪛 Ruff (0.8.2)
747-747: f-string without any placeholders
Remove extraneous
fprefix(F541)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
snakemake/cli.py(2 hunks)snakemake/dag.py(2 hunks)snakemake/io.py(2 hunks)snakemake/persistence.py(4 hunks)snakemake/settings/types.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/settings/types.pysnakemake/cli.pysnakemake/dag.pysnakemake/io.pysnakemake/persistence.py
🪛 GitHub Actions: CI
snakemake/cli.py
[error] 1-1: Black formatting check failed. 3 files would be reformatted, 140 files would be left unchanged.
snakemake/io.py
[error] 1-1: Black formatting check failed. 3 files would be reformatted, 140 files would be left unchanged.
snakemake/persistence.py
[error] 1-1: Black formatting check failed. 3 files would be reformatted, 140 files would be left unchanged.
🪛 Ruff (0.8.2)
snakemake/persistence.py
747-747: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (12)
snakemake/settings/types.py (1)
120-120: Looks good: New attribute for mtime cache settingsThe new
trust_mtime_cacheboolean attribute is properly added to theExecutionSettingsclass with a sensible default value ofFalse. The attribute follows the existing naming conventions.snakemake/cli.py (2)
862-868: Looks good: CLI argument added for controlling mtime cache behaviorThe new
--trust-mtime-cacheargument is added to the appropriate argument group with a clear help message explaining its purpose.
2192-2192: Looks good: Properly passing CLI argument to execution settingsThe CLI argument is correctly passed to the
ExecutionSettingsconstructor, ensuring the user's preference is properly propagated through the system.snakemake/dag.py (1)
196-199: Implementation looks good for loading cached IO dataThe code correctly checks for the
trust_mtime_cachesetting before loading the persisted iocache. The comments are clear about the purpose of this code section.snakemake/io.py (3)
25-25: Import addition is appropriateAdding the pickle import is necessary for the new serialization functionality.
120-141: Well-implemented internal helper method for serializationThe
_simple_copymethod is well designed for creating a serializable copy of the IOCache instance. The conversion of dictionary keys to strings is a good approach to ensure serialization compatibility.The method handles both direct attributes and dictionaries appropriately, with special handling for
ExistsDictinstances.
1-1:Details
❓ Verification inconclusive
Black formatting issues should be addressed
The pipeline failure indicates that the Black formatting check failed with "3 files would be reformatted".
🏁 Script executed:
#!/bin/bash # Run black to see which files need formatting black --check --diff .Length of output: 120
Please verify the Black formatter installation in the pipeline environment
It appears that the error from running the formatting check is due to Black not being found ("black: command not found"). This prevents us from confirming whether the reported formatting issues (i.e., "3 files would be reformatted") are genuinely affecting the codebase or are a result of an unavailable tool. Please ensure that Black is installed (for example, via pip) and available in your PATH, then re-run the formatting check. Once Black is properly installed, verify that its output aligns with the pipeline failure message and address any formatting issues if they persist.
- Verify that Black is installed and accessible in your CI/local environment.
- Re-run the formatting check with
black --check --diff .to confirm which files need reformatting.🧰 Tools
🪛 GitHub Actions: CI
[error] 1-1: Black formatting check failed. 3 files would be reformatted, 140 files would be left unchanged.
snakemake/persistence.py (5)
31-31: Import added for IOCache functionalityThe IOCache import has been appropriately added to support the new cache persistence functionality.
80-81: Storage path for IOCache properly definedThe
iocache_pathattribute is correctly defined to store cache files within the .snakemake directory, following the same pattern as other persistence paths.
124-124: Directory creation for IOCache pathThe
iocache_pathis properly included in the list of directories to create during initialization, ensuring the cache directory exists when needed.
739-743: Implementation of IOCache savingThe
save_iocachemethod correctly serializes the workflow's IOCache object to a file named "latest.pkl". The implementation follows best practices by using the appropriate file mode ('wb') for binary pickle data.
751-755: Implementation of IOCache droppingThe
drop_iocachemethod correctly removes the cache file when it exists. The implementation properly checks for file existence before attempting removal, avoiding potential errors.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
snakemake/io.py (1)
120-141: Well-structured implementation for creating a simplified copy.The method creates a simplified copy of IOCache where dictionary keys are converted to strings, which is essential for reliable serialization. This handles both non-dictionary attributes and dictionary attributes appropriately.
However, consider adding a comment about why string conversion is necessary for serialization/deserialization purposes, and potentially handle any unexpected errors during the copying process.
def _simple_copy(self): - """Identical copy except dictionary keys are downcast to str.""" + """Identical copy except dictionary keys are downcast to str. + + This conversion is necessary for reliable pickle serialization/deserialization + because complex keys such as _IOFile objects might not be properly reconstructed. + """ simplified = type(self)(self.max_wait_time) # Copy non-dictionary attributes the same. for name in ['remaining_wait_time', 'active']: old_attr = getattr(self, name) setattr(simplified, name, old_attr) # Copy dictionary attributes, casting keys to str. for name in ['_mtime', '_exists_local', '_exists_in_storage', '_size']: old_dict = getattr(self, name) if isinstance(old_dict, ExistsDict): setattr(simplified, name, ExistsDict(simplified)) else: setattr(simplified, name, {}) new_dict = getattr(simplified, name) for key in old_dict: new_dict[str(key)] = old_dict[key] return simplified
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
snakemake/cli.py(2 hunks)snakemake/dag.py(2 hunks)snakemake/io.py(2 hunks)snakemake/persistence.py(4 hunks)snakemake/persistence.py(1 hunks)snakemake/dag.py(0 hunks)snakemake/cli.py(1 hunks)snakemake/io.py(1 hunks)snakemake/persistence.py(1 hunks)snakemake/persistence.py(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- snakemake/io.py
- snakemake/dag.py
- snakemake/persistence.py
🚧 Files skipped from review as they are similar to previous changes (5)
- snakemake/persistence.py
- snakemake/persistence.py
- snakemake/cli.py
- snakemake/dag.py
- snakemake/cli.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/persistence.pysnakemake/io.py
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: testing (10, 3.12, bash)
- GitHub Check: testing (10, 3.11, bash)
- GitHub Check: testing-windows (10)
- GitHub Check: testing-windows (9)
- GitHub Check: testing-windows (8)
- GitHub Check: testing-windows (7)
- GitHub Check: testing-windows (6)
- GitHub Check: testing-windows (5)
- GitHub Check: testing-windows (4)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12, bash)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11, bash)
- GitHub Check: testing-windows (1)
🔇 Additional comments (9)
snakemake/io.py (3)
25-25: Good addition of the pickle import.This import is necessary for the new serialization/deserialization functionality you're adding to the IOCache class.
143-147: Straightforward implementation of saving mechanism.This method properly serializes the IOCache object by first creating a simplified copy and then using pickle to dump it to the file.
148-151: Clean implementation of loading mechanism.The class method appropriately uses pickle to load an IOCache instance from a file. Since this is a class method, it can be called without an existing instance.
snakemake/persistence.py (6)
31-31: Good addition of IOCache import.This import is necessary for the new functionality to save and load IOCache objects.
80-81: Well-placed attribute for IOCache path.The
iocache_pathattribute follows the established pattern for path attributes in the Persistence class.
124-124: Properly included iocache_path in directories to create.This ensures the directory exists when the Persistence class is initialized, following the pattern used for other directories.
739-743: Clean implementation of IOCache saving.The method correctly opens the file in binary write mode and delegates to the IOCache's save method.
744-750: Well-implemented IOCache loading with appropriate checks.The method checks if the file exists before attempting to load it and provides a helpful log message about loading from the trusted cache.
751-757: Simple and effective cache cleanup method.The method checks if the file exists before attempting to remove it, preventing potential errors.
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🧹 Nitpick comments (2)
snakemake/cli.py (1)
862-2192: Add unit test for mtime cache functionalityThe PR description mentions manual testing with various test files, but there doesn't appear to be an automated test to ensure the mtime cache reloading works correctly.
Consider adding a unit test that validates the following flow:
- Run with
--dry-runto generate the mtime cache- Verify the cache is created
- Run with
--trust-mtime-cacheand verify the cache is properly loaded- Test both CLI entry points (behavior and execution groups)
This will help ensure the functionality works as expected and prevent regressions in future changes.
🧰 Tools
🪛 Ruff (0.8.2)
1878-1878: Do not use bare
except(E722)
snakemake/dag.py (1)
236-247: Confirm the intended behavior when dropping the existing I/O cache after a real run.Dropping the cache on real runs may discard useful intermediate data. Consider whether you want to preserve the orchestration data for subsequent runs or partial restarts.
🛑 Comments failed to post (2)
snakemake/cli.py (2)
862-868:
⚠️ Potential issueFlag duplication detected:
--trust-mtime-cacheappears twice in the CLIThis adds
--trust-mtime-cacheto the execution group, but the same flag is already defined in the behavior group at lines 1303-1306 with identical help text. Having the same CLI flag defined in two different argument groups can cause confusion and potential issues with precedence.Consider removing one of these duplicated definitions, likely keeping only the behavior group one (lines 1303-1306) since that aligns with your PR objective of moving this setting from execution parameters to the behavior group.
2192-2192:
⚠️ Potential issueRemove or update setting source based on flag relocation
This line is still passing
trust_mtime_cacheto theExecutionSettingsconstructor, but according to the PR description and other changes, this setting is being moved toDAGSettings(as seen at line 2103).This creates inconsistent behavior where the flag is passed to both settings objects. You should remove this parameter from the
ExecutionSettingsconstructor to complete the migration.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
snakemake/cli.py (1)
1311-1315: Add a brief clarification to the help text.While this CLI flag is straightforward, consider expanding the help message to highlight any consequences of trusting potentially outdated mtime data. This can help clarify scenarios where relying on an old cache might lead to skipped runs.
- help=("Reload the mtime cache from a previous dry-run, if it exists."), + help=( + "Reload the mtime cache from a previous dry-run, if it exists. " + "Note that this may cause older file modifications to be ignored if the cache is out of date." + ),snakemake/persistence.py (3)
747-750: Property _iocache_filename is well-defined.Storing the IOCache in a single file (
latest.pkl) is straightforward. However, be mindful of potential concurrency issues if multiple processes attempt to read or write simultaneously.
751-755: Automatic saving of IOCache is implemented cleanly.Consider wrapping the file writing in a try-except block for robust error handling and potential partial writes on network/global filesystems.
def save_iocache(self): filepath = self._iocache_filename try: with open(filepath, "wb") as handle: self.dag.workflow.iocache.save(handle) + except OSError as e: + logger.error(f"Failed to write IOCache: {e}")
756-762: Loading IOCache gracefully.The load routine is straightforward. Consider aligning its error handling with
_read_record_uncached(e.g., handling file corruption) in case partial files are encountered.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
snakemake/cli.py(2 hunks)snakemake/dag.py(2 hunks)snakemake/persistence.py(4 hunks)snakemake/settings/types.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- snakemake/settings/types.py
- snakemake/dag.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/cli.pysnakemake/persistence.py
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: testing (10, 3.12, bash)
- GitHub Check: testing (10, 3.11, bash)
- GitHub Check: testing (9, 3.12, bash)
- GitHub Check: testing (9, 3.11, bash)
- GitHub Check: testing (8, 3.12, bash)
- GitHub Check: testing (8, 3.11, bash)
- GitHub Check: testing (7, 3.12, bash)
- GitHub Check: testing (7, 3.11, bash)
- GitHub Check: testing (6, 3.12, bash)
- GitHub Check: testing (6, 3.11, bash)
- GitHub Check: testing (5, 3.12, bash)
- GitHub Check: testing (5, 3.11, bash)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (4, 3.12, bash)
- GitHub Check: testing-windows (9)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (4, 3.11, bash)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (3, 3.12, bash)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (3, 3.11, bash)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (2, 3.12, bash)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (2, 3.11, bash)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12, bash)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11, bash)
- GitHub Check: testing-windows (1)
🔇 Additional comments (4)
snakemake/cli.py (1)
2112-2113: Usage of trust_mtime_cache in DAGSettings looks correct.Passing the new CLI argument into
DAGSettingsis consistent and ensures the workflow logic can properly handle the trusted cache scenario. No further action needed.snakemake/persistence.py (3)
31-31: The addition of IOCache import is appropriate.Importing
IOCachefor the new caching functionality is coherent with the rest of the changes and does not introduce any obvious issues.
80-81: Clear separation of iocache path.Defining a separate path for IOCache files improves organization and clarity. No concerns here.
124-125: Ensuring iocache_path directory creation.Including
self.iocache_pathin the list of directories foros.makedirsensures the path is always available before saving or loading. This looks good.
|
I'm finding it challenging to devise a good benchmark to show if/how this speeds up workflow execution. Snakemake doesn't currently have any instrumentation for DAG building time, so it's unclear how much of that is mtime/existence checks. However, theoretically this patch should lower that quite a bit on slow filesystems. Indeed, while I don't yet have anything reproducible, benchmarking on one of my own projects with a very complex workflow—10s of thousands of files on a network filesystem; dozens of rules; many of those files already exist—does show a fairly sizable speed-up:
|
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake/cli.py(2 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
tests/tests.pysnakemake/cli.py
🪛 GitHub Actions: CI
tests/tests.py
[error] 2074-2074: Black formatting check failed. The file would be reformatted due to missing newline at end of file.
🔇 Additional comments (4)
snakemake/cli.py (2)
1273-1277: Implementation of--trust-mtime-cacheflag looks good.The new command-line flag is appropriately placed in the behavior group with clear help text explaining its purpose.
2060-2060: Correctly passingtrust_mtime_cacheto DAGSettings.The flag value is correctly passed from the parsed arguments to the DAG settings object.
tests/tests.py (2)
2056-2069: Good test coverage through parameterization.The parameterized test provides good coverage across multiple test directories. The commented-out tests with TODOs help track platform-specific issues that need addressing in the future.
2070-2074: Test correctly verifies the--trust-mtime-cachefunctionality.This test effectively validates that a workflow can run with cached file modification times from a previous dry run. It's a good approach to verify the expected behavior of the new feature.
🧰 Tools
🪛 GitHub Actions: CI
[error] 2074-2074: Black formatting check failed. The file would be reformatted due to missing newline at end of file.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/tests.py (1)
2062-2081: Good test implementation, but some test cases are commented outThe parameterized test implementation is well-structured to verify the new mtime caching functionality. It first executes a dry-run, then uses that cached information in a subsequent run with
trust_mtime_cache=True.However, several test cases are commented out (lines 2065, 2069, 2073). If these test cases are important, consider either uncommented them or adding TODOs explaining why they're disabled and when they might be enabled.
src/snakemake/io.py (1)
147-150: Good implementation of load class method.The classmethod appropriately loads an IOCache object from a file handle using pickle.
Consider adding error handling for potential pickle load failures, as serialized data could become corrupt or incompatible between versions.
@classmethod def load(cls, handle): """Load an IOCache from file.""" - return pickle.load(handle) + try: + return pickle.load(handle) + except (pickle.UnpicklingError, EOFError, AttributeError) as e: + logger.warning(f"Failed to load IOCache: {e}") + return cls(0) # Return a new cache with default max_wait_time=0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(2 hunks)src/snakemake/io.py(2 hunks)src/snakemake/persistence.py(4 hunks)src/snakemake/settings/types.py(1 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
tests/tests.pysrc/snakemake/persistence.pysrc/snakemake/dag.pysrc/snakemake/cli.pysrc/snakemake/settings/types.pysrc/snakemake/io.py
🪛 GitHub Actions: docs
src/snakemake/cli.py
[error] 1293-1293: SyntaxError: positional argument follows keyword argument
⏰ Context from checks skipped due to timeout of 90000ms (38)
- GitHub Check: tests (10, windows-latest, py312, bash)
- GitHub Check: tests (10, windows-latest, py311, bash)
- GitHub Check: tests (10, ubuntu-latest, py312, bash)
- GitHub Check: tests (10, ubuntu-latest, py311, bash)
- GitHub Check: tests (9, windows-latest, py312, bash)
- GitHub Check: tests (9, windows-latest, py311, bash)
- GitHub Check: tests (9, ubuntu-latest, py312, bash)
- GitHub Check: tests (9, ubuntu-latest, py311, bash)
- GitHub Check: tests (8, windows-latest, py312, bash)
- GitHub Check: tests (8, windows-latest, py311, bash)
- GitHub Check: tests (7, windows-latest, py312, bash)
- GitHub Check: tests (7, windows-latest, py311, bash)
- GitHub Check: tests (7, ubuntu-latest, py312, bash)
- GitHub Check: tests (7, ubuntu-latest, py311, bash)
- GitHub Check: tests (6, windows-latest, py312, bash)
- GitHub Check: tests (6, windows-latest, py311, bash)
- GitHub Check: tests (6, ubuntu-latest, py312, bash)
- GitHub Check: tests (6, ubuntu-latest, py311, bash)
- GitHub Check: tests (5, windows-latest, py312, bash)
- GitHub Check: tests (5, windows-latest, py311, bash)
- GitHub Check: tests (5, ubuntu-latest, py312, bash)
- GitHub Check: tests (5, ubuntu-latest, py311, bash)
- GitHub Check: tests (4, windows-latest, py312, bash)
- GitHub Check: tests (4, windows-latest, py311, bash)
- GitHub Check: tests (4, ubuntu-latest, py312, bash)
- GitHub Check: tests (4, ubuntu-latest, py311, bash)
- GitHub Check: tests (3, windows-latest, py312, bash)
- GitHub Check: tests (3, windows-latest, py311, bash)
- GitHub Check: tests (3, ubuntu-latest, py312, bash)
- GitHub Check: tests (3, ubuntu-latest, py311, bash)
- GitHub Check: tests (2, windows-latest, py312, bash)
- GitHub Check: tests (2, windows-latest, py311, bash)
- GitHub Check: tests (2, ubuntu-latest, py312, bash)
- GitHub Check: tests (2, ubuntu-latest, py311, bash)
- GitHub Check: tests (1, windows-latest, py312, bash)
- GitHub Check: tests (1, windows-latest, py311, bash)
- GitHub Check: tests (1, ubuntu-latest, py312, bash)
- GitHub Check: tests (1, ubuntu-latest, py311, bash)
🔇 Additional comments (14)
src/snakemake/settings/types.py (1)
210-210: New configuration parameter added for DAG settingsThe
trust_mtime_cacheparameter is a good addition that enables reusing the modification time cache from a previous dry-run. This aligns well with the feature described in the PR objectives and should help improve performance for workflow executions, especially on slower filesystems.src/snakemake/cli.py (1)
2035-2035: Proper integration with the DAG settingsThe
trust_mtime_cacheparameter is correctly passed to the DAG settings, ensuring the feature is available through the CLI interface.src/snakemake/persistence.py (7)
31-31: Update to IOCache import looks good.This import is needed for the new functionality to save and load the IOCache state.
80-81: Good addition of iocache_path.The path is properly defined using the standardized approach consistent with other paths in the class.
124-124: Proper directory creation ensured.Adding the iocache_path to the list of directories to create ensures the path exists, consistent with other paths in the class.
752-754: Properly implemented _iocache_filename property.The property returns the path to the pickle file using a consistent naming convention.
756-759: Well-implemented save_iocache method.The method correctly opens a file for binary writing and calls the IOCache.save method to serialize the cache.
761-766: Good implementation of load_iocache method.The method checks if the cache file exists, logs the action, and loads the cache using the IOCache.load method.
768-771: Properly implemented drop_iocache method.The method checks if the cache file exists before attempting to remove it, preventing potential errors.
src/snakemake/io.py (3)
25-25: Added pickle import for serialization.This is necessary for the new serialization functionality.
120-140: Well-implemented _simple_copy method for serialization.The method creates a simplified copy of the IOCache object with string keys, which is important for serialization compatibility. The implementation handles both simple attributes and dictionary attributes appropriately, with special handling for ExistsDict objects.
142-145: Clean implementation of save method.The method properly creates a simplified copy of the object before serializing it using pickle, ensuring the saved state can be loaded correctly later.
src/snakemake/dag.py (2)
240-248: Cache persistence logic looks correct.The conditional logic for saving or dropping the I/O cache based on whether it's a dry run is well-implemented. Saving the cache after a dry run while dropping it during regular runs ensures cache consistency. The comments clearly explain the rationale behind each operation.
200-204:Details
❓ Verification inconclusive
Adding IOCache loading looks good, but validate the feature with tests.
The implementation for loading the I/O cache when
trust_mtime_cacheis enabled makes logical sense. It's properly placed at the beginning of the DAG initialization to ensure cache data is available before processing.Since this feature is crucial for performance improvements (especially on slower filesystems as mentioned in the PR), ensure it's covered by tests:
🏁 Script executed:
#!/bin/bash # Check if there are any test files that verify the iocache loading functionality # Search for test files related to IOCache rg -l "trust_mtime_cache" tests/Length of output: 32
Action Required: Add Tests for IOCache Loading Functionality
The IOCache loading implementation in
src/snakemake/dag.py(lines 200–204) is logically sound and well-placed during DAG initialization. However, our search for tests referencingtrust_mtime_cachedid not return any results in thetests/directory, which raises concerns about test coverage for this performance-critical feature.
- Location:
src/snakemake/dag.py, lines 200–204- Concern: No automated tests appear to verify that IOCache loading works as intended when
trust_mtime_cacheis enabled.- Recommendation: Please add test cases in the
tests/directory to validate thatself.workflow.persistence.load_iocache()is correctly invoked and that the functionality maintains its performance benefits, especially on slower filesystems.
7c1baac to
2299398
Compare
2299398 to
ddaf65a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/tests.py (1)
2062-2081: Good test coverage for the new feature.The parametrized test appropriately validates the new functionality by:
- First running with
executor="dryrun"to create the cache- Then running a second time with
trust_mtime_cache=Trueto ensure the cache is usedThis effectively tests the end-to-end functionality that a user would experience.
Consider adding a comment explaining what these tests verify, especially since some test cases are commented out with TODOs.
@pytest.mark.parametrize( "testdir,kwargs", [ # ("test01", {}), # TODO: Figure out how to @skip_on_windows ("test02", {}), ("test03", {"targets": ["test.out"]}), ("test04", {"targets": ["test.out"]}), # ("test05", {}), # TODO: Figure out how to @skip_on_windows ("test06", {"targets": ["test.bla.out"]}), ("test07", {"targets": ["test.out", "test2.out"]}), ("test08", {"targets": ["test.out", "test2.out"]}), # ("test09", {"shouldfail": True}), # TODO: Figure out how to only fail in the second pass. ], ) +# This test validates the mtime cache functionality by first running in dry-run mode +# to generate the cache, then running with trust_mtime_cache=True to use that cache. +# This simulates the workflow described in PR #3386 for implementing dry-run caching. def test_inventory_cache_with_dryrun_first(testdir, kwargs): tmpdir = run( dpath(testdir), executor="dryrun", **kwargs, cleanup=False, check_results=False ) run(dpath(testdir), **kwargs, tmpdir=tmpdir, trust_mtime_cache=True)src/snakemake/io.py (2)
120-141: Good implementation of a simplified copy mechanism.The
_simple_copymethod properly handles creating a simplified version of the IOCache by converting dictionary keys to strings, which is necessary for reliable serialization. The method carefully handles each attribute type, with special handling forExistsDictinstances.However, this implementation lacks error handling for edge cases.
Consider adding error handling for unexpected attribute types or values:
def _simple_copy(self): """Identical copy except dictionary keys are downcast to str.""" simplified = type(self)(self.max_wait_time) # Copy non-dictionary attributes the same. for name in ["remaining_wait_time", "active"]: old_attr = getattr(self, name) setattr(simplified, name, old_attr) # Copy dictionary attributes, casting keys to str. for name in ["_mtime", "_exists_local", "_exists_in_storage", "_size"]: old_dict = getattr(self, name) + if not isinstance(old_dict, dict) and not isinstance(old_dict, ExistsDict): + logger.warning(f"Unexpected type for {name}, expected dict or ExistsDict. Skipping.") + continue if isinstance(old_dict, ExistsDict): setattr(simplified, name, ExistsDict(simplified)) else: setattr(simplified, name, {}) new_dict = getattr(simplified, name) for key in old_dict: new_dict[str(key)] = old_dict[key] return simplified
147-150: Missing error handling in pickle load operation.The
loadclass method is straightforward but lacks error handling for corrupt cache files or version incompatibility.Add error handling to gracefully handle loading failures:
@classmethod def load(cls, handle): """Load an IOCache from file.""" - return pickle.load(handle) + try: + return pickle.load(handle) + except (pickle.UnpicklingError, EOFError, AttributeError) as e: + logger.warning(f"Failed to load IOCache: {e}. Creating a new one.") + return cls(0) # Create a new cache with zero wait time to avoid using it
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(2 hunks)src/snakemake/io.py(2 hunks)src/snakemake/persistence.py(4 hunks)src/snakemake/settings/types.py(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/snakemake/settings/types.py
- src/snakemake/persistence.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
src/snakemake/cli.pytests/tests.pysrc/snakemake/io.pysrc/snakemake/dag.py
⏰ Context from checks skipped due to timeout of 90000ms (40)
- GitHub Check: tests (10, windows-latest, py312, bash)
- GitHub Check: tests (10, windows-latest, py311, bash)
- GitHub Check: tests (10, ubuntu-latest, py312, bash)
- GitHub Check: tests (10, ubuntu-latest, py311, bash)
- GitHub Check: tests (9, windows-latest, py312, bash)
- GitHub Check: tests (9, windows-latest, py311, bash)
- GitHub Check: tests (9, ubuntu-latest, py312, bash)
- GitHub Check: tests (9, ubuntu-latest, py311, bash)
- GitHub Check: tests (8, windows-latest, py312, bash)
- GitHub Check: tests (8, windows-latest, py311, bash)
- GitHub Check: tests (8, ubuntu-latest, py312, bash)
- GitHub Check: tests (8, ubuntu-latest, py311, bash)
- GitHub Check: tests (7, windows-latest, py312, bash)
- GitHub Check: tests (7, windows-latest, py311, bash)
- GitHub Check: tests (7, ubuntu-latest, py312, bash)
- GitHub Check: tests (7, ubuntu-latest, py311, bash)
- GitHub Check: tests (6, windows-latest, py312, bash)
- GitHub Check: tests (6, windows-latest, py311, bash)
- GitHub Check: tests (6, ubuntu-latest, py312, bash)
- GitHub Check: tests (6, ubuntu-latest, py311, bash)
- GitHub Check: tests (5, windows-latest, py312, bash)
- GitHub Check: tests (5, windows-latest, py311, bash)
- GitHub Check: tests (5, ubuntu-latest, py312, bash)
- GitHub Check: tests (5, ubuntu-latest, py311, bash)
- GitHub Check: tests (4, windows-latest, py312, bash)
- GitHub Check: tests (4, windows-latest, py311, bash)
- GitHub Check: tests (4, ubuntu-latest, py312, bash)
- GitHub Check: tests (4, ubuntu-latest, py311, bash)
- GitHub Check: tests (3, windows-latest, py312, bash)
- GitHub Check: tests (3, windows-latest, py311, bash)
- GitHub Check: tests (3, ubuntu-latest, py312, bash)
- GitHub Check: tests (3, ubuntu-latest, py311, bash)
- GitHub Check: tests (2, windows-latest, py312, bash)
- GitHub Check: tests (2, windows-latest, py311, bash)
- GitHub Check: tests (2, ubuntu-latest, py312, bash)
- GitHub Check: tests (2, ubuntu-latest, py311, bash)
- GitHub Check: tests (1, windows-latest, py312, bash)
- GitHub Check: tests (1, windows-latest, py311, bash)
- GitHub Check: tests (1, ubuntu-latest, py312, bash)
- GitHub Check: tests (1, ubuntu-latest, py311, bash)
🔇 Additional comments (6)
src/snakemake/cli.py (2)
1282-1286: LGTM: The new CLI argument looks good.The
--trust-mtime-cacheargument is well-defined with clear help text that explains its purpose. This enables the workflow to utilize the modification time cache from a previous dry run, which should improve performance as mentioned in the PR objectives.
2039-2039: Properly integrated with the API.The new parameter is correctly passed to the workflow API, ensuring the CLI argument will actually affect the workflow execution.
src/snakemake/io.py (2)
25-25: Imported pickle module for object serialization.This is appropriate for saving and loading the IOCache state between runs, supporting the dry-run caching functionality described in the PR objectives.
142-146: Simple and effective implementation of cache serialization.The
savemethod appropriately uses the_simple_copymethod to create a serializable version of the cache before pickling it to the provided file handle.src/snakemake/dag.py (2)
200-204: Well-designed I/O cache loading based on trust settingThis change adds functionality to load the I/O cache from a persisted version when the user explicitly trusts the cache, which is a good performance optimization especially for workflows with many files or on slower filesystems.
240-248: Appropriate I/O cache persistence logic based on execution modeThis code correctly handles cache persistence based on the workflow's execution mode:
- In dry-run mode: Persists the I/O cache for future non-dry runs
- In normal mode: Drops the persisted cache to avoid relying on potentially outdated information
This implementation aligns well with the PR objectives of enabling saving and reloading the IOCache object to improve performance across successive runs.
|
@johanneskoester ready for review |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/snakemake/dag.py (2)
201-211: Consider logging mismatched versions for easier debugging.While handling
IOCacheLoadError, you log an informational message but do not indicate which version was found. Including the runtime version from the loaded file can help diagnose caching issues more quickly.except IOCacheLoadError as e: logger.info( - "Most recently saved inventory has mismatched version. Removing." + f"Most recently saved inventory has mismatched version: {e}. Removing." ) self.workflow.persistence.drop_iocache()
246-255: Validate whether to keep the pre-existing cache for non-dry runs.Dropping the cache for real runs ensures a fresh start but might impact performance for large workflows. Consider adding a user-facing option or an explicit confirmation before discarding a valid cache.
src/snakemake/io.py (1)
150-154: Handle possible errors during serialization.No retries or error-handling logic is present if
pickle.dumpfails on the provided file handle (e.g., I/O issues). Logging or an exception message could enhance debuggability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/snakemake/dag.py(3 hunks)src/snakemake/io.py(3 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
tests/tests.pysrc/snakemake/dag.pysrc/snakemake/io.py
🔇 Additional comments (7)
tests/tests.py (3)
2062-2078: Well-structured parameterized test implementation.This is a good implementation of a parameterized test that verifies the new
trust_mtime_cachefunctionality across multiple test cases. The test first performs a dry run to generate the cache, then runs with the cache enabled to verify it works correctly.
2080-2093: Appropriate Windows-specific test handling.Good use of the
@skip_on_windowsdecorator for tests that aren't compatible with the Windows platform. The parameterization allows testing the cache functionality with different test cases that might have platform-specific behaviors.
2095-2098: Good negative test case implementation.This test properly verifies error handling when using the inventory cache. It ensures that when a test is expected to fail, it still fails appropriately even when using the cache from a previous dry run.
src/snakemake/dag.py (1)
65-65: No concerns on importingIOCacheLoadError.This import looks straightforward and is used correctly in subsequent code.
src/snakemake/io.py (3)
110-112: Exception class definition is fine.No issues found with the new
IOCacheLoadErrorclass.
128-129: Confirm if deeper nesting needs handling in_simple_copy.You currently only downcast the top-level keys to strings. If a nested dict has non-string keys, it remains untouched. Verify if deeper nesting is needed or acceptable.
25-25:Details
❓ Verification inconclusive
Be aware of
picklesecurity considerations.When using
pickle, loading data from untrusted sources can lead to code execution vulnerabilities. Ensure the cache file is controlled and not accessible to untrusted parties.Do you want me to provide a script to check for references to untrusted paths or confirm secure file permissions in the repository?
🏁 Script executed:
#!/bin/bash # Searching for suspicious usage of untrusted or unsafe file paths. rg -A 5 "pickle.load" . rg -A 5 "open(" .Length of output: 1009
Security Alert: Validate Use of Pickle with Trusted Data Sources
The code in
src/snakemake/io.py(line 25) usespickle.load(handle), and there’s also a usage ofpickle.loadsinsrc/snakemake/script/__init__.py. Since unverified pickle data can lead to arbitrary code execution, please ensure that:
- The cache file (or any file being loaded) is controlled and not accessible from untrusted sources.
- File access permissions and the construction of file paths (e.g., via
open(...)) are securely managed. Note that the earlier regex search foropen(produced an error—consider revising it (for example, usingrg -A 5 "open\(" .) and manually verifying file access patterns.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/snakemake/io.py (3)
128-149: Consider handling non-string objects in dictionaries more robustly.The
_simple_copymethod converts dictionary keys to strings, which is generally fine for serialization purposes. However, this approach may lead to key collisions if the original dictionaries contained different object types with the same string representation.Consider adding safeguards against potential key collisions:
# Copy dictionary attributes, casting keys to str. for name in ["_mtime", "_exists_local", "_exists_in_storage", "_size"]: old_dict = getattr(self, name) if isinstance(old_dict, ExistsDict): setattr(simplified, name, ExistsDict(simplified)) else: setattr(simplified, name, {}) new_dict = getattr(simplified, name) for key in old_dict: - new_dict[str(key)] = old_dict[key] + # Use a prefixing scheme if needed to prevent key collisions + key_str = str(key) + new_dict[key_str] = old_dict[key]
150-154: Add error handling for the save method.The
savemethod should handle potential pickling errors to provide more meaningful error messages to users.def save(self, handle): """Dump IOCache to file.""" simplified = self._simple_copy() - pickle.dump(simplified, handle) + try: + pickle.dump(simplified, handle) + except (TypeError, pickle.PicklingError) as e: + raise WorkflowError(f"Failed to save IOCache: {e}")
155-168: Add error handling for pickle.load in the load method.The
loadmethod needs to handle potential errors when unpickling, especially if the file is corrupted or from an incompatible version.@classmethod def load(cls, handle): """Load an IOCache from file.""" - loaded = pickle.load(handle) - if loaded.IOCACHE_VERSION != cls.IOCACHE_VERSION: - raise IOCacheLoadError( - ( - f"Trying to load IOCache object with a mismatched version: " - f"{loaded.IOCACHE_VERSION} (loaded) != {cls.IOCACHE_VERSION} (current)" - ) - ) - else: - return loaded + try: + loaded = pickle.load(handle) + if not hasattr(loaded, 'IOCACHE_VERSION'): + raise IOCacheLoadError("Invalid IOCache file format: missing version information") + if loaded.IOCACHE_VERSION != cls.IOCACHE_VERSION: + raise IOCacheLoadError( + ( + f"Trying to load IOCache object with a mismatched version: " + f"{loaded.IOCACHE_VERSION} (loaded) != {cls.IOCACHE_VERSION} (current)" + ) + ) + return loaded + except (pickle.UnpicklingError, AttributeError, EOFError) as e: + raise IOCacheLoadError(f"Failed to load IOCache: {e}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/io.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
src/snakemake/io.py
🔇 Additional comments (3)
src/snakemake/io.py (3)
25-25: LGTM - Importing pickle for IOCache serialization.The
picklemodule is an appropriate choice for serializing the IOCache object to enable its persistence between runs.
110-112: LGTM - Clean exception class definition.The
IOCacheLoadErrorexception class is well-defined for handling specific errors related to loading IOCache objects.
115-117: LGTM - Good versioning implementation.Adding a version identifier to the IOCache class is essential for ensuring backward compatibility when loading cached objects. The comment clearly explains the purpose of this constant.
johanneskoester
left a comment
There was a problem hiding this comment.
Just one modification of the CLI interface (which should propagate to the API of course).
|
Fantastic work! |
🤖 I have created a release *beep* *boop* --- ## [9.0.0](v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ### Features * [#3412](#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([#3430](#3430)) ([22978c3](22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([#3428](#3428)) ([b0a7f03](b0a7f03)) * add flatten function to IO utils ([#3424](#3424)) ([67fa392](67fa392)) * add helper functions to parse input files ([#2918](#2918)) ([63e45a7](63e45a7)) * Add option to print redacted file names ([#3089](#3089)) ([ba4d264](ba4d264)) * add support for validation of polars dataframe and lazyframe ([#3262](#3262)) ([c7473a6](c7473a6)) * added support for rendering dag with mermaid js ([#3409](#3409)) ([7bf8381](7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([#3381](#3381)) ([47504a0](47504a0)) * Dynamic module name ([#3401](#3401)) ([024dc32](024dc32)) * Enable saving and reloading IOCache object ([#3386](#3386)) ([c935953](c935953)) * files added in rule params with workflow.source_path will be available in used containers ([#3385](#3385)) ([a6e45bf](a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([#3410](#3410)) ([67b4739](67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([#3365](#3365)) ([93e4b92](93e4b92)) * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ([86f1d6e](86f1d6e)) * Maximal file size for checksums ([#3368](#3368)) ([b039f8a](b039f8a)) * Modernize package configuration using Pixi ([#3369](#3369)) ([77992d8](77992d8)) * multiext support for named input/output ([#3372](#3372)) ([05e1378](05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([#3378](#3378)) ([cc9bba2](cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([#3265](#3265)) ([23fef82](23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([#3398](#3398)) ([cd9c3c3](cd9c3c3)) * add default value to max-jobs-per-timespan ([#3043](#3043)) ([2959abe](2959abe)) * checkpoints inside modules are overwritten ([#3359](#3359)) ([fba3ac7](fba3ac7)) * Convert Path to IOFile ([#3405](#3405)) ([c58684c](c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([#3358](#3358)) ([9a6d14b](9a6d14b)) * edgecases of source deployment in case of remote execution ([#3396](#3396)) ([5da13be](5da13be)) * enhance error message formatting for strict DAG-building mode ([#3376](#3376)) ([a1c39ee](a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([#3423](#3423)) ([8cf4a2f](8cf4a2f)) * force check all required outputs ([#3341](#3341)) ([495a4e7](495a4e7)) * group job formatting ([#3442](#3442)) ([f0b10a3](f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([#3377](#3377)) ([eace08f](eace08f)) * only skip eval when resource depends on input ([#3374](#3374)) ([4574c92](4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([#3388](#3388)) ([c43c5c0](c43c5c0)) * print filenames with quotes around them in RuleException ([#3269](#3269)) ([6baeda5](6baeda5)) * Re-evaluation of free resources ([#3399](#3399)) ([6371293](6371293)) * ReadTheDocs layout issue due to src directory change ([#3419](#3419)) ([695b127](695b127)) * robustly escaping quotes in generated bash scripts (v2) ([#3297](#3297)) ([#3389](#3389)) ([58720bd](58720bd)) * Show apptainer image URL in snakemake report ([#3407](#3407)) ([45f0450](45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([#3433](#3433)) ([3f227a6](3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([#3382](#3382)) ([115e81b](115e81b)) * fix contribution section heading levels, fix docs testing setup order ([#3360](#3360)) ([051dc53](051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([#3436](#3436)) ([ec6d97c](ec6d97c)) * fix quoting ([#3394](#3394)) ([b40f599](b40f599)) * fix rerun-triggers default ([#3403](#3403)) ([4430e23](4430e23)) * fix typo 'safe' -> 'save' ([#3384](#3384)) ([7755861](7755861)) * mention code formatting in the contribution section ([#3431](#3431)) ([e8682b7](e8682b7)) * remove duplicated 'functions'. ([#3356](#3356)) ([7c595db](7c595db)) * update broken links documentation ([#3437](#3437)) ([e3d0d88](e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([#3415](#3415)) ([8e95a12](8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>
First component of dry-run caching, part of a larger plan described in issue snakemake#3330. ### QC * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. No test case yet. I'm testing locally using `tests/test01`, `02`, `05`, `10`, `11`, `12`, and `13`. first by running with `--dry-run` and then with `--trust-mtime-cache`. I'm open to any ideas about how to use the existing infrastructure to test these successive runs. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). I'll update the docs once the feature is ready for general use. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an option to optionally trust cached modification times, enhancing workflow efficiency during both standard and dry-run operations. - Enhanced caching persistence, enabling the saving, loading, and clearing of cached I/O data based on execution context. - **Tests** - Expanded test coverage with new parameterized tests to validate inventory caching across various configurations and conditions. - Added tests to ensure proper handling of inventory caching scenarios, including dry runs and expected failures. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
🤖 I have created a release *beep* *boop* --- ## [9.0.0](snakemake/snakemake@v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ### Features * [snakemake#3412](snakemake#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([snakemake#3430](snakemake#3430)) ([22978c3](snakemake@22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([snakemake#3428](snakemake#3428)) ([b0a7f03](snakemake@b0a7f03)) * add flatten function to IO utils ([snakemake#3424](snakemake#3424)) ([67fa392](snakemake@67fa392)) * add helper functions to parse input files ([snakemake#2918](snakemake#2918)) ([63e45a7](snakemake@63e45a7)) * Add option to print redacted file names ([snakemake#3089](snakemake#3089)) ([ba4d264](snakemake@ba4d264)) * add support for validation of polars dataframe and lazyframe ([snakemake#3262](snakemake#3262)) ([c7473a6](snakemake@c7473a6)) * added support for rendering dag with mermaid js ([snakemake#3409](snakemake#3409)) ([7bf8381](snakemake@7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([snakemake#3381](snakemake#3381)) ([47504a0](snakemake@47504a0)) * Dynamic module name ([snakemake#3401](snakemake#3401)) ([024dc32](snakemake@024dc32)) * Enable saving and reloading IOCache object ([snakemake#3386](snakemake#3386)) ([c935953](snakemake@c935953)) * files added in rule params with workflow.source_path will be available in used containers ([snakemake#3385](snakemake#3385)) ([a6e45bf](snakemake@a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([snakemake#3410](snakemake#3410)) ([67b4739](snakemake@67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([snakemake#3365](snakemake#3365)) ([93e4b92](snakemake@93e4b92)) * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ([86f1d6e](snakemake@86f1d6e)) * Maximal file size for checksums ([snakemake#3368](snakemake#3368)) ([b039f8a](snakemake@b039f8a)) * Modernize package configuration using Pixi ([snakemake#3369](snakemake#3369)) ([77992d8](snakemake@77992d8)) * multiext support for named input/output ([snakemake#3372](snakemake#3372)) ([05e1378](snakemake@05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([snakemake#3378](snakemake#3378)) ([cc9bba2](snakemake@cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([snakemake#3265](snakemake#3265)) ([23fef82](snakemake@23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([snakemake#3398](snakemake#3398)) ([cd9c3c3](snakemake@cd9c3c3)) * add default value to max-jobs-per-timespan ([snakemake#3043](snakemake#3043)) ([2959abe](snakemake@2959abe)) * checkpoints inside modules are overwritten ([snakemake#3359](snakemake#3359)) ([fba3ac7](snakemake@fba3ac7)) * Convert Path to IOFile ([snakemake#3405](snakemake#3405)) ([c58684c](snakemake@c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([snakemake#3358](snakemake#3358)) ([9a6d14b](snakemake@9a6d14b)) * edgecases of source deployment in case of remote execution ([snakemake#3396](snakemake#3396)) ([5da13be](snakemake@5da13be)) * enhance error message formatting for strict DAG-building mode ([snakemake#3376](snakemake#3376)) ([a1c39ee](snakemake@a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([snakemake#3423](snakemake#3423)) ([8cf4a2f](snakemake@8cf4a2f)) * force check all required outputs ([snakemake#3341](snakemake#3341)) ([495a4e7](snakemake@495a4e7)) * group job formatting ([snakemake#3442](snakemake#3442)) ([f0b10a3](snakemake@f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([snakemake#3377](snakemake#3377)) ([eace08f](snakemake@eace08f)) * only skip eval when resource depends on input ([snakemake#3374](snakemake#3374)) ([4574c92](snakemake@4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([snakemake#3388](snakemake#3388)) ([c43c5c0](snakemake@c43c5c0)) * print filenames with quotes around them in RuleException ([snakemake#3269](snakemake#3269)) ([6baeda5](snakemake@6baeda5)) * Re-evaluation of free resources ([snakemake#3399](snakemake#3399)) ([6371293](snakemake@6371293)) * ReadTheDocs layout issue due to src directory change ([snakemake#3419](snakemake#3419)) ([695b127](snakemake@695b127)) * robustly escaping quotes in generated bash scripts (v2) ([snakemake#3297](snakemake#3297)) ([snakemake#3389](snakemake#3389)) ([58720bd](snakemake@58720bd)) * Show apptainer image URL in snakemake report ([snakemake#3407](snakemake#3407)) ([45f0450](snakemake@45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([snakemake#3433](snakemake#3433)) ([3f227a6](snakemake@3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([snakemake#3382](snakemake#3382)) ([115e81b](snakemake@115e81b)) * fix contribution section heading levels, fix docs testing setup order ([snakemake#3360](snakemake#3360)) ([051dc53](snakemake@051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([snakemake#3436](snakemake#3436)) ([ec6d97c](snakemake@ec6d97c)) * fix quoting ([snakemake#3394](snakemake#3394)) ([b40f599](snakemake@b40f599)) * fix rerun-triggers default ([snakemake#3403](snakemake#3403)) ([4430e23](snakemake@4430e23)) * fix typo 'safe' -> 'save' ([snakemake#3384](snakemake#3384)) ([7755861](snakemake@7755861)) * mention code formatting in the contribution section ([snakemake#3431](snakemake#3431)) ([e8682b7](snakemake@e8682b7)) * remove duplicated 'functions'. ([snakemake#3356](snakemake#3356)) ([7c595db](snakemake@7c595db)) * update broken links documentation ([snakemake#3437](snakemake#3437)) ([e3d0d88](snakemake@e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([snakemake#3415](snakemake#3415)) ([8e95a12](snakemake@8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>



First component of dry-run caching, part of a larger plan described in issue #3330.
QC
No test case yet. I'm testing locally using
tests/test01,02,05,10,11,12, and13. first by running with--dry-runand then with--trust-mtime-cache.I'm open to any ideas about how to use the existing infrastructure to test these successive runs.
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).I'll update the docs once the feature is ready for general use.
Summary by CodeRabbit
New Features
Tests