fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions#3162
Conversation
…d as incomplete, fix bug leading to missing log file after edit notebook sessions
WalkthroughThe changes in this pull request primarily focus on the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- snakemake/jobs.py (0 hunks)
- snakemake/notebook.py (3 hunks)
💤 Files with no reviewable changes (1)
- snakemake/jobs.py
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/notebook.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
🔇 Additional comments (2)
snakemake/notebook.py (2)
4-4: Importingshutilfor file operationsThe addition of the
shutilimport is appropriate as it is used later for file copying operations.
77-82: Correct handling of output parameters based onfname_outThe updated logic ensures that the
output_parameteris set correctly depending on whetherfname_outis provided. This improves the robustness of the output path handling during notebook execution.
| if fname_out is not None: | ||
| # store log file (executed notebook) in requested path | ||
| shutil.copyfile(fname, fname_out) | ||
|
|
There was a problem hiding this comment.
Ensure destination directory exists before copying the output notebook
When copying the executed notebook to fname_out, the destination directory might not exist, leading to an error. To prevent this, consider creating the parent directories of fname_out before copying.
Apply this diff to fix the issue:
if fname_out is not None:
+ os.makedirs(os.path.dirname(fname_out), exist_ok=True)
# store log file (executed notebook) in requested path
shutil.copyfile(fname, fname_out)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if fname_out is not None: | |
| # store log file (executed notebook) in requested path | |
| shutil.copyfile(fname, fname_out) | |
| if fname_out is not None: | |
| os.makedirs(os.path.dirname(fname_out), exist_ok=True) | |
| # store log file (executed notebook) in requested path | |
| shutil.copyfile(fname, fname_out) |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
snakemake/dag.py (1)
528-534: Add docstring to clarify the difference between notebook job types.Consider adding a docstring to explain:
- The purpose of this method
- How it differs from
is_edit_notebook_job- When to use each method
Here's a suggested docstring:
def is_draft_notebook_job(self, job): + """Check if the given job is a draft notebook job. + + Unlike is_edit_notebook_job which identifies any notebook being edited, + this method specifically identifies draft notebook jobs where: + 1. Edit notebook mode is enabled + 2. draft_only flag is set to True + 3. Job's target file is in the target files + + Args: + job: The job to check + + Returns: + bool: True if the job is a draft notebook job + """ return ( self.workflow.execution_settings.edit_notebook and self.workflow.execution_settings.edit_notebook.draft_only and job.targetfile in self.targetfiles )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- snakemake/dag.py (1 hunks)
- snakemake/jobs.py (1 hunks)
- snakemake/persistence.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
snakemake/dag.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.snakemake/jobs.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.snakemake/persistence.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
🔇 Additional comments (5)
snakemake/persistence.py (2)
295-298: LGTM! Good extraction of marker removal logic.The new method follows the Single Responsibility Principle by extracting the incomplete marker removal logic into a dedicated method, making it reusable and the code more maintainable.
300-302: LGTM! Improved metadata handling logic.The changes improve the code by:
- Using the extracted
remove_incomplete_markermethod for better maintainability- Adding an early return optimization when metadata storage is not needed
snakemake/jobs.py (1)
1124-1126: LGTM! The changes correctly handle draft notebook jobs.The code properly removes the incomplete marker for draft notebook jobs and returns early, which fixes the issue with output files being incorrectly marked as incomplete during edit notebook sessions.
Let's verify the existence and usage of
is_draft_notebook_job:✅ Verification successful
Based on the shell script results, I can now generate the final response since I have sufficient information about the
is_draft_notebook_jobimplementation and its usage across the codebase.Verification successful:
is_draft_notebook_jobis properly implemented and usedThe verification confirms that:
- The
is_draft_notebook_jobmethod is properly defined in theDAGclass (found insnakemake/dag.py)- The feature is well-documented in the codebase, with references in:
- CLI implementation (
snakemake/cli.py)- Documentation (
docs/snakefiles/rules.rst)- Changelog entries showing its intentional addition and maintenance
The changes in the
postprocessmethod correctly integrate with this existing functionality, making it a safe and proper implementation.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the existence and usage of is_draft_notebook_job method # Test 1: Check if is_draft_notebook_job is defined in DAG class echo "Checking for is_draft_notebook_job definition..." ast-grep --pattern $'class DAG { $$$ is_draft_notebook_job($$$) { $$$ } $$$ }' # Test 2: Check for any other references to draft notebook jobs echo "Checking for other draft notebook references..." rg "draft.*notebook"Length of output: 2577
snakemake/dag.py (2)
528-534: LGTM! The implementation correctly identifies draft notebook jobs.The new method
is_draft_notebook_jobproperly checks all required conditions to identify draft notebook jobs:
- Verifies that edit notebook mode is enabled
- Ensures draft_only flag is set
- Confirms the job's target file is in the target files
This implementation aligns with the PR objectives to fix bugs related to output files being marked as incomplete during edit notebook sessions.
528-534: Verify the usage of the new method in the codebase.Let's verify how this new method is used to fix the incomplete output files bug.
✅ Verification successful
Let me gather more context about how this method is used in
jobs.pyto understand the complete flow.
I have enough context now to provide the final response based on the verification results.
The new method is correctly integrated and serves a specific purpose
The
is_draft_notebook_jobmethod is properly used in the codebase:
- It's called in
jobs.pyto handle a special case where draft notebook jobs should skip output handling and only remove incomplete markers- The method is part of a larger feature controlled by
NotebookEditModewithdraft_onlyflag- The implementation is consistent with the CLI options (
--draft-notebook) and test casesThe changes are well-integrated and don't require additional modifications in other parts of the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for usage of is_draft_notebook_job echo "Searching for usage of is_draft_notebook_job..." rg "is_draft_notebook_job" # Search for related notebook job handling echo "Searching for related notebook job handling..." rg "edit_notebook.*draft_only"Length of output: 631
Script:
#!/bin/bash # Get context around the usage in jobs.py echo "Context around usage in jobs.py:" rg -B 5 -A 5 "is_draft_notebook_job" snakemake/jobs.py # Look for NotebookEditMode definition and usage echo -e "\nNotebookEditMode definition and usage:" rg -B 2 -A 2 "NotebookEditMode" --type py # Look for incomplete output files handling echo -e "\nIncomplete output files handling:" rg -B 2 -A 2 "incomplete.*output" --type pyLength of output: 4849
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
snakemake/jobs.py (1)
1137-1185: LGTM: Improved error handling with proper cleanup.The addition of the try-except block ensures proper cleanup of metadata in case of any errors during postprocessing, which helps prevent inconsistent state.
Consider adding a debug log statement before cleaning up metadata to help with troubleshooting:
except Exception as e: + logger.debug(f"Cleaning up metadata due to error during postprocessing: {e}") # cleanup metadata in case of any exception above self.dag.workflow.persistence.cleanup(self) raise e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- snakemake/jobs.py (2 hunks)
- snakemake/persistence.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- snakemake/persistence.py
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/jobs.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
🔇 Additional comments (1)
snakemake/jobs.py (1)
1124-1126: LGTM: Early cleanup for draft notebook jobs.The early return with persistence cleanup for draft notebook jobs correctly addresses the issue of output files being marked as incomplete.
Let's verify the implementation of
is_draft_notebook_job:✅ Verification successful
Let me try a different approach to find the implementation by searching for the method name with ripgrep.
Implementation of
is_draft_notebook_jobis correctly usedThe implementation in
snakemake/dag.pyshows thatis_draft_notebook_jobchecks three conditions:
- Notebook editing is enabled (
workflow.execution_settings.edit_notebook)- Draft-only mode is active (
edit_notebook.draft_only)- The job's target file is in the DAG's target files
This aligns perfectly with the usage in
jobs.pywhere it's used to identify draft notebook jobs that need early cleanup. The implementation ensures that incomplete markers are properly cleaned up only for draft notebook jobs, which addresses the original issue of output files being marked as incomplete during notebook editing sessions.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the implementation of is_draft_notebook_job method # Test: Search for the method definition ast-grep --pattern 'def is_draft_notebook_job($$$)'Length of output: 53
Script:
#!/bin/bash # Search for is_draft_notebook_job in the codebase rg "is_draft_notebook_job" -A 5Length of output: 724
🤖 I have created a release *beep* *boop* --- ## [8.24.1](v8.24.0...v8.24.1) (2024-10-23) ### Bug Fixes * fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions ([#3162](#3162)) ([19c6c0a](19c6c0a)) * proper error message if conda info fails ([#3157](#3157)) ([4f99c20](4f99c20)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Fix: This PR adds back the removal of the incomplete marker file also when keep_metadata is False (i.e. --drop-metadata is active). The removal calls were merged and put at the start of the function in #3162, but this caused loss of starttime info in the metadata record. In #3197 this was fixed, but by moving the removal call to the end of the function, the incomplete tracker file was not removed anymore when --drop-metadata was active. This PR adds the call to the removal function back also for that case, and adds a note to inform why the code is structured this way. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Bug Fixes** - Improved handling of job metadata by ensuring the incomplete marker is removed only after successful metadata record creation, preserving job start time. - **Documentation** - Added clarifying comments to explain changes regarding the incomplete marker removal. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>



QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Refactor