fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions by johanneskoester · Pull Request #3162 · snakemake/snakemake

johanneskoester · 2024-10-22T06:55:54Z

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

New Features
- Enhanced functionality for executing and editing notebooks, including improved output handling and saving of modified notebooks.
- Introduced a method to identify draft notebook jobs, enhancing job classification.
Bug Fixes
- Simplified postprocessing for draft notebook jobs, ensuring consistent handling across all job types.
Refactor
- Streamlined logic for determining output file paths in notebook execution.
- Improved organization of job completion and metadata handling in persistence management.

…d as incomplete, fix bug leading to missing log file after edit notebook sessions

coderabbitai · 2024-10-22T06:56:02Z

Walkthrough

The changes in this pull request primarily focus on the postprocess method in the Job class of snakemake/jobs.py, enhancing the handling of "draft notebook jobs." The conditional check for "edit notebook jobs" has been replaced, allowing "draft notebook jobs" to undergo postprocessing. Additionally, the notebook.py file has been updated to streamline output path handling and ensure that modified notebooks are saved correctly after execution. A new method in the DAG class identifies draft notebook jobs, and a method in the Persistence class manages incomplete job markers.

Changes

File Path	Change Summary
snakemake/jobs.py	Modified `postprocess` method in `Job` class by replacing the check for "edit notebook jobs" with "draft notebook jobs," allowing for postprocessing of the latter.
snakemake/notebook.py	Added import for `shutil`, refactored output file path logic in `execute_script`, and introduced code to copy executed notebooks to the output path if edited.
snakemake/dag.py	Added method `is_draft_notebook_job` in `DAG` class to check if a job is a draft notebook job based on workflow execution settings.
snakemake/persistence.py	Added method `_remove_incomplete_marker` in `Persistence` class to handle removal of incomplete job markers, called from the `finished` method.

Possibly related PRs

fix: notebook execution for apptainer #3131: This PR modifies the execute_script method in notebook.py, which is closely related to the changes made to the postprocess method in this PR, as both involve handling notebook jobs and their execution logic.
fix: fix bug in code change detection leading to spurious code change reporting when relying on older snakemake metadata #3144: This PR modifies the _code_changed method in persistence.py, which is relevant because it deals with job completion and metadata management, similar to the changes in the finished method that affect job handling.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between cd6d035 and 0b0e355.

📒 Files selected for processing (2)

snakemake/jobs.py (0 hunks)
snakemake/notebook.py (3 hunks)

💤 Files with no reviewable changes (1)

snakemake/jobs.py

🧰 Additional context used

📓 Path-based instructions (1)

snakemake/notebook.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (2)

snakemake/notebook.py (2)

4-4: Importing shutil for file operations

The addition of the shutil import is appropriate as it is used later for file copying operations.

77-82: Correct handling of output parameters based on fname_out

The updated logic ensures that the output_parameter is set correctly depending on whether fname_out is provided. This improves the robustness of the output path handling during notebook execution.

coderabbitai · 2024-10-22T07:01:02Z

snakemake/notebook.py

+                if fname_out is not None:
+                    # store log file (executed notebook) in requested path
+                    shutil.copyfile(fname, fname_out)
+


⚠️ Potential issue

Ensure destination directory exists before copying the output notebook

When copying the executed notebook to fname_out, the destination directory might not exist, leading to an error. To prevent this, consider creating the parent directories of fname_out before copying.

Apply this diff to fix the issue:

if fname_out is not None: + os.makedirs(os.path.dirname(fname_out), exist_ok=True) # store log file (executed notebook) in requested path shutil.copyfile(fname, fname_out)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if fname_out is not None:

# store log file (executed notebook) in requested path

shutil.copyfile(fname, fname_out)

if fname_out is not None:

os.makedirs(os.path.dirname(fname_out), exist_ok=True)

# store log file (executed notebook) in requested path

shutil.copyfile(fname, fname_out)

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

snakemake/dag.py (1)

528-534: Add docstring to clarify the difference between notebook job types.

Consider adding a docstring to explain:

The purpose of this method
How it differs from is_edit_notebook_job
When to use each method

Here's a suggested docstring:

 def is_draft_notebook_job(self, job):
+    """Check if the given job is a draft notebook job.
+    
+    Unlike is_edit_notebook_job which identifies any notebook being edited,
+    this method specifically identifies draft notebook jobs where:
+    1. Edit notebook mode is enabled
+    2. draft_only flag is set to True
+    3. Job's target file is in the target files
+    
+    Args:
+        job: The job to check
+        
+    Returns:
+        bool: True if the job is a draft notebook job
+    """
     return (
         self.workflow.execution_settings.edit_notebook
         and self.workflow.execution_settings.edit_notebook.draft_only
         and job.targetfile in self.targetfiles
     )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0b0e355 and 92ac5a4.

📒 Files selected for processing (3)

snakemake/dag.py (1 hunks)
snakemake/jobs.py (1 hunks)
snakemake/persistence.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/persistence.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (5)

snakemake/persistence.py (2)

295-298: LGTM! Good extraction of marker removal logic.

The new method follows the Single Responsibility Principle by extracting the incomplete marker removal logic into a dedicated method, making it reusable and the code more maintainable.

300-302: LGTM! Improved metadata handling logic.

The changes improve the code by:

Using the extracted remove_incomplete_marker method for better maintainability

Adding an early return optimization when metadata storage is not needed
snakemake/jobs.py (1)
1124-1126: LGTM! The changes correctly handle draft notebook jobs.

The code properly removes the incomplete marker for draft notebook jobs and returns early, which fixes the issue with output files being incorrectly marked as incomplete during edit notebook sessions.

Let's verify the existence and usage of is_draft_notebook_job:

✅ Verification successful

Based on the shell script results, I can now generate the final response since I have sufficient information about the is_draft_notebook_job implementation and its usage across the codebase.

Verification successful: is_draft_notebook_job is properly implemented and used

The verification confirms that:

The is_draft_notebook_job method is properly defined in the DAG class (found in snakemake/dag.py)

The feature is well-documented in the codebase, with references in:

CLI implementation (snakemake/cli.py)

Documentation (docs/snakefiles/rules.rst)

Changelog entries showing its intentional addition and maintenance

The changes in the postprocess method correctly integrate with this existing functionality, making it a safe and proper implementation.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the existence and usage of is_draft_notebook_job method

# Test 1: Check if is_draft_notebook_job is defined in DAG class
echo "Checking for is_draft_notebook_job definition..."
ast-grep --pattern $'class DAG {
  $$$
  is_draft_notebook_job($$$) {
    $$$
  }
  $$$
}'

# Test 2: Check for any other references to draft notebook jobs
echo "Checking for other draft notebook references..."
rg "draft.*notebook"
Length of output: 2577
snakemake/dag.py (2)
528-534: LGTM! The implementation correctly identifies draft notebook jobs.

The new method is_draft_notebook_job properly checks all required conditions to identify draft notebook jobs:

Verifies that edit notebook mode is enabled

Ensures draft_only flag is set

Confirms the job's target file is in the target files

This implementation aligns with the PR objectives to fix bugs related to output files being marked as incomplete during edit notebook sessions.

528-534: Verify the usage of the new method in the codebase.

Let's verify how this new method is used to fix the incomplete output files bug.

✅ Verification successful

Let me gather more context about how this method is used in jobs.py to understand the complete flow.

I have enough context now to provide the final response based on the verification results.

The new method is correctly integrated and serves a specific purpose

The is_draft_notebook_job method is properly used in the codebase:

It's called in jobs.py to handle a special case where draft notebook jobs should skip output handling and only remove incomplete markers

The method is part of a larger feature controlled by NotebookEditMode with draft_only flag

The implementation is consistent with the CLI options (--draft-notebook) and test cases

The changes are well-integrated and don't require additional modifications in other parts of the codebase.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for usage of is_draft_notebook_job
echo "Searching for usage of is_draft_notebook_job..."
rg "is_draft_notebook_job"

# Search for related notebook job handling
echo "Searching for related notebook job handling..."
rg "edit_notebook.*draft_only"
Length of output: 631

Script:
#!/bin/bash
# Get context around the usage in jobs.py
echo "Context around usage in jobs.py:"
rg -B 5 -A 5 "is_draft_notebook_job" snakemake/jobs.py

# Look for NotebookEditMode definition and usage
echo -e "\nNotebookEditMode definition and usage:"
rg -B 2 -A 2 "NotebookEditMode" --type py

# Look for incomplete output files handling
echo -e "\nIncomplete output files handling:"
rg -B 2 -A 2 "incomplete.*output" --type py
Length of output: 4849

sonarqubecloud · 2024-10-23T08:05:48Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

snakemake/jobs.py (1)
1137-1185: LGTM: Improved error handling with proper cleanup.

The addition of the try-except block ensures proper cleanup of metadata in case of any errors during postprocessing, which helps prevent inconsistent state.

Consider adding a debug log statement before cleaning up metadata to help with troubleshooting:
         except Exception as e:
+            logger.debug(f"Cleaning up metadata due to error during postprocessing: {e}")
             # cleanup metadata in case of any exception above
             self.dag.workflow.persistence.cleanup(self)
             raise e

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 92ac5a4 and e9feb30.

📒 Files selected for processing (2)

snakemake/jobs.py (2 hunks)
snakemake/persistence.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

snakemake/persistence.py

🧰 Additional context used

📓 Path-based instructions (1)

snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (1)

snakemake/jobs.py (1)
1124-1126: LGTM: Early cleanup for draft notebook jobs.

The early return with persistence cleanup for draft notebook jobs correctly addresses the issue of output files being marked as incomplete.

Let's verify the implementation of is_draft_notebook_job:

✅ Verification successful

Let me try a different approach to find the implementation by searching for the method name with ripgrep.

Implementation of is_draft_notebook_job is correctly used

The implementation in snakemake/dag.py shows that is_draft_notebook_job checks three conditions:

Notebook editing is enabled (workflow.execution_settings.edit_notebook)

Draft-only mode is active (edit_notebook.draft_only)

The job's target file is in the DAG's target files

This aligns perfectly with the usage in jobs.py where it's used to identify draft notebook jobs that need early cleanup. The implementation ensures that incomplete markers are properly cleaned up only for draft notebook jobs, which addresses the original issue of output files being marked as incomplete during notebook editing sessions.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the implementation of is_draft_notebook_job method
# Test: Search for the method definition
ast-grep --pattern 'def is_draft_notebook_job($$$)'
Length of output: 53

Script:
#!/bin/bash
# Search for is_draft_notebook_job in the codebase
rg "is_draft_notebook_job" -A 5
Length of output: 724

🤖 I have created a release *beep* *boop* --- ## [8.24.1](v8.24.0...v8.24.1) (2024-10-23) ### Bug Fixes * fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions ([#3162](#3162)) ([19c6c0a](19c6c0a)) * proper error message if conda info fails ([#3157](#3157)) ([4f99c20](4f99c20)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Fix: This PR adds back the removal of the incomplete marker file also when keep_metadata is False (i.e. --drop-metadata is active). The removal calls were merged and put at the start of the function in #3162, but this caused loss of starttime info in the metadata record. In #3197 this was fixed, but by moving the removal call to the end of the function, the incomplete tracker file was not removed anymore when --drop-metadata was active. This PR adds the call to the removal function back also for that case, and adds a note to inform why the code is structured this way.  ## Summary by CodeRabbit - **Bug Fixes** - Improved handling of job metadata by ensuring the incomplete marker is removed only after successful metadata record creation, preserving job start time. - **Documentation** - Added clarifying comments to explain changes regarding the incomplete marker removal.  --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>

fix: fix bug with --edit-notebook sessions causing output files marke…

0b0e355

…d as incomplete, fix bug leading to missing log file after edit notebook sessions

coderabbitai bot reviewed Oct 22, 2024

View reviewed changes

improved handling of incomplete markers

92ac5a4

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

furhter metadata cleanup

e9feb30

coderabbitai bot reviewed Oct 23, 2024

View reviewed changes

johanneskoester merged commit 19c6c0a into main Oct 23, 2024

johanneskoester deleted the fix/notebook-output branch October 23, 2024 08:27

github-actions bot mentioned this pull request Oct 23, 2024

chore(main): release 8.24.1 #3166

Merged

mhulsman mentioned this pull request Nov 20, 2024

fix: Remove incomplete marker also when drop-metadata is active #3215

Merged

coderabbitai bot mentioned this pull request Feb 19, 2025

feat: New CLI argument --premable-only, acting as a flexible side-kick to --draft-notebook #3281

Open

3 tasks

coderabbitai bot mentioned this pull request Mar 13, 2025

fix: Convert Path to IOFile #3405

Merged

2 tasks

coderabbitai bot mentioned this pull request Jun 19, 2025

fix: avoid checking output files in immediate-submit mode #3569

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions#3162

fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions#3162
johanneskoester merged 3 commits intomainfrom
fix/notebook-output

johanneskoester commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 22, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

sonarqubecloud bot commented Oct 23, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johanneskoester commented Oct 22, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 22, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Oct 23, 2024

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johanneskoester commented Oct 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 22, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)