Skip to content

fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions#3162

Merged
johanneskoester merged 3 commits intomainfrom
fix/notebook-output
Oct 23, 2024
Merged

fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions#3162
johanneskoester merged 3 commits intomainfrom
fix/notebook-output

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Oct 22, 2024

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Enhanced functionality for executing and editing notebooks, including improved output handling and saving of modified notebooks.
    • Introduced a method to identify draft notebook jobs, enhancing job classification.
  • Bug Fixes

    • Simplified postprocessing for draft notebook jobs, ensuring consistent handling across all job types.
  • Refactor

    • Streamlined logic for determining output file paths in notebook execution.
    • Improved organization of job completion and metadata handling in persistence management.

…d as incomplete, fix bug leading to missing log file after edit notebook sessions
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 22, 2024

Walkthrough

The changes in this pull request primarily focus on the postprocess method in the Job class of snakemake/jobs.py, enhancing the handling of "draft notebook jobs." The conditional check for "edit notebook jobs" has been replaced, allowing "draft notebook jobs" to undergo postprocessing. Additionally, the notebook.py file has been updated to streamline output path handling and ensure that modified notebooks are saved correctly after execution. A new method in the DAG class identifies draft notebook jobs, and a method in the Persistence class manages incomplete job markers.

Changes

File Path Change Summary
snakemake/jobs.py Modified postprocess method in Job class by replacing the check for "edit notebook jobs" with "draft notebook jobs," allowing for postprocessing of the latter.
snakemake/notebook.py Added import for shutil, refactored output file path logic in execute_script, and introduced code to copy executed notebooks to the output path if edited.
snakemake/dag.py Added method is_draft_notebook_job in DAG class to check if a job is a draft notebook job based on workflow execution settings.
snakemake/persistence.py Added method _remove_incomplete_marker in Persistence class to handle removal of incomplete job markers, called from the finished method.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between cd6d035 and 0b0e355.

📒 Files selected for processing (2)
  • snakemake/jobs.py (0 hunks)
  • snakemake/notebook.py (3 hunks)
💤 Files with no reviewable changes (1)
  • snakemake/jobs.py
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/notebook.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (2)
snakemake/notebook.py (2)

4-4: Importing shutil for file operations

The addition of the shutil import is appropriate as it is used later for file copying operations.


77-82: Correct handling of output parameters based on fname_out

The updated logic ensures that the output_parameter is set correctly depending on whether fname_out is provided. This improves the robustness of the output path handling during notebook execution.

Comment on lines +103 to +106
if fname_out is not None:
# store log file (executed notebook) in requested path
shutil.copyfile(fname, fname_out)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure destination directory exists before copying the output notebook

When copying the executed notebook to fname_out, the destination directory might not exist, leading to an error. To prevent this, consider creating the parent directories of fname_out before copying.

Apply this diff to fix the issue:

if fname_out is not None:
+    os.makedirs(os.path.dirname(fname_out), exist_ok=True)
    # store log file (executed notebook) in requested path
    shutil.copyfile(fname, fname_out)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if fname_out is not None:
# store log file (executed notebook) in requested path
shutil.copyfile(fname, fname_out)
if fname_out is not None:
os.makedirs(os.path.dirname(fname_out), exist_ok=True)
# store log file (executed notebook) in requested path
shutil.copyfile(fname, fname_out)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
snakemake/dag.py (1)

528-534: Add docstring to clarify the difference between notebook job types.

Consider adding a docstring to explain:

  1. The purpose of this method
  2. How it differs from is_edit_notebook_job
  3. When to use each method

Here's a suggested docstring:

 def is_draft_notebook_job(self, job):
+    """Check if the given job is a draft notebook job.
+    
+    Unlike is_edit_notebook_job which identifies any notebook being edited,
+    this method specifically identifies draft notebook jobs where:
+    1. Edit notebook mode is enabled
+    2. draft_only flag is set to True
+    3. Job's target file is in the target files
+    
+    Args:
+        job: The job to check
+        
+    Returns:
+        bool: True if the job is a draft notebook job
+    """
     return (
         self.workflow.execution_settings.edit_notebook
         and self.workflow.execution_settings.edit_notebook.draft_only
         and job.targetfile in self.targetfiles
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 0b0e355 and 92ac5a4.

📒 Files selected for processing (3)
  • snakemake/dag.py (1 hunks)
  • snakemake/jobs.py (1 hunks)
  • snakemake/persistence.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/persistence.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (5)
snakemake/persistence.py (2)

295-298: LGTM! Good extraction of marker removal logic.

The new method follows the Single Responsibility Principle by extracting the incomplete marker removal logic into a dedicated method, making it reusable and the code more maintainable.


300-302: LGTM! Improved metadata handling logic.

The changes improve the code by:

  1. Using the extracted remove_incomplete_marker method for better maintainability
  2. Adding an early return optimization when metadata storage is not needed
snakemake/jobs.py (1)

1124-1126: LGTM! The changes correctly handle draft notebook jobs.

The code properly removes the incomplete marker for draft notebook jobs and returns early, which fixes the issue with output files being incorrectly marked as incomplete during edit notebook sessions.

Let's verify the existence and usage of is_draft_notebook_job:

✅ Verification successful

Based on the shell script results, I can now generate the final response since I have sufficient information about the is_draft_notebook_job implementation and its usage across the codebase.

Verification successful: is_draft_notebook_job is properly implemented and used

The verification confirms that:

  • The is_draft_notebook_job method is properly defined in the DAG class (found in snakemake/dag.py)
  • The feature is well-documented in the codebase, with references in:
    • CLI implementation (snakemake/cli.py)
    • Documentation (docs/snakefiles/rules.rst)
    • Changelog entries showing its intentional addition and maintenance

The changes in the postprocess method correctly integrate with this existing functionality, making it a safe and proper implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and usage of is_draft_notebook_job method

# Test 1: Check if is_draft_notebook_job is defined in DAG class
echo "Checking for is_draft_notebook_job definition..."
ast-grep --pattern $'class DAG {
  $$$
  is_draft_notebook_job($$$) {
    $$$
  }
  $$$
}'

# Test 2: Check for any other references to draft notebook jobs
echo "Checking for other draft notebook references..."
rg "draft.*notebook"

Length of output: 2577

snakemake/dag.py (2)

528-534: LGTM! The implementation correctly identifies draft notebook jobs.

The new method is_draft_notebook_job properly checks all required conditions to identify draft notebook jobs:

  1. Verifies that edit notebook mode is enabled
  2. Ensures draft_only flag is set
  3. Confirms the job's target file is in the target files

This implementation aligns with the PR objectives to fix bugs related to output files being marked as incomplete during edit notebook sessions.


528-534: Verify the usage of the new method in the codebase.

Let's verify how this new method is used to fix the incomplete output files bug.

✅ Verification successful

Let me gather more context about how this method is used in jobs.py to understand the complete flow.


I have enough context now to provide the final response based on the verification results.

The new method is correctly integrated and serves a specific purpose

The is_draft_notebook_job method is properly used in the codebase:

  • It's called in jobs.py to handle a special case where draft notebook jobs should skip output handling and only remove incomplete markers
  • The method is part of a larger feature controlled by NotebookEditMode with draft_only flag
  • The implementation is consistent with the CLI options (--draft-notebook) and test cases

The changes are well-integrated and don't require additional modifications in other parts of the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for usage of is_draft_notebook_job
echo "Searching for usage of is_draft_notebook_job..."
rg "is_draft_notebook_job"

# Search for related notebook job handling
echo "Searching for related notebook job handling..."
rg "edit_notebook.*draft_only"

Length of output: 631


Script:

#!/bin/bash
# Get context around the usage in jobs.py
echo "Context around usage in jobs.py:"
rg -B 5 -A 5 "is_draft_notebook_job" snakemake/jobs.py

# Look for NotebookEditMode definition and usage
echo -e "\nNotebookEditMode definition and usage:"
rg -B 2 -A 2 "NotebookEditMode" --type py

# Look for incomplete output files handling
echo -e "\nIncomplete output files handling:"
rg -B 2 -A 2 "incomplete.*output" --type py

Length of output: 4849

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
snakemake/jobs.py (1)

1137-1185: LGTM: Improved error handling with proper cleanup.

The addition of the try-except block ensures proper cleanup of metadata in case of any errors during postprocessing, which helps prevent inconsistent state.

Consider adding a debug log statement before cleaning up metadata to help with troubleshooting:

         except Exception as e:
+            logger.debug(f"Cleaning up metadata due to error during postprocessing: {e}")
             # cleanup metadata in case of any exception above
             self.dag.workflow.persistence.cleanup(self)
             raise e
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 92ac5a4 and e9feb30.

📒 Files selected for processing (2)
  • snakemake/jobs.py (2 hunks)
  • snakemake/persistence.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake/persistence.py
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (1)
snakemake/jobs.py (1)

1124-1126: LGTM: Early cleanup for draft notebook jobs.

The early return with persistence cleanup for draft notebook jobs correctly addresses the issue of output files being marked as incomplete.

Let's verify the implementation of is_draft_notebook_job:

✅ Verification successful

Let me try a different approach to find the implementation by searching for the method name with ripgrep.


Implementation of is_draft_notebook_job is correctly used

The implementation in snakemake/dag.py shows that is_draft_notebook_job checks three conditions:

  1. Notebook editing is enabled (workflow.execution_settings.edit_notebook)
  2. Draft-only mode is active (edit_notebook.draft_only)
  3. The job's target file is in the DAG's target files

This aligns perfectly with the usage in jobs.py where it's used to identify draft notebook jobs that need early cleanup. The implementation ensures that incomplete markers are properly cleaned up only for draft notebook jobs, which addresses the original issue of output files being marked as incomplete during notebook editing sessions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of is_draft_notebook_job method
# Test: Search for the method definition
ast-grep --pattern 'def is_draft_notebook_job($$$)'

Length of output: 53


Script:

#!/bin/bash
# Search for is_draft_notebook_job in the codebase
rg "is_draft_notebook_job" -A 5

Length of output: 724

@johanneskoester johanneskoester merged commit 19c6c0a into main Oct 23, 2024
@johanneskoester johanneskoester deleted the fix/notebook-output branch October 23, 2024 08:27
johanneskoester pushed a commit that referenced this pull request Oct 23, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.24.1](v8.24.0...v8.24.1)
(2024-10-23)


### Bug Fixes

* fix bug with --edit-notebook sessions causing output files marked as
incomplete, fix bug leading to missing log file after edit notebook
sessions ([#3162](#3162))
([19c6c0a](19c6c0a))
* proper error message if conda info fails
([#3157](#3157))
([4f99c20](4f99c20))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
johanneskoester added a commit that referenced this pull request Nov 21, 2024
Fix: This PR adds back the removal of the incomplete marker file also
when keep_metadata is False (i.e. --drop-metadata is active).

The removal calls were merged and put at the start of the function in
#3162, but this caused loss of starttime info in the metadata record. In
#3197 this was fixed, but by moving the removal call to the end of the
function, the incomplete tracker file was not removed anymore when
--drop-metadata was active. This PR adds the call to the removal
function back also for that case, and adds a note to inform why the code
is structured this way.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of job metadata by ensuring the incomplete marker is
removed only after successful metadata record creation, preserving job
start time.
  
- **Documentation**
- Added clarifying comments to explain changes regarding the incomplete
marker removal.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
@coderabbitai coderabbitai bot mentioned this pull request Mar 13, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant