fix: Prevent execution of conda in apptainer when not explicitly requested in software deployment method by johausmann · Pull Request #3388 · snakemake/snakemake

johausmann · 2025-03-11T15:57:00Z

This is a quick fix for the issue described in #3202.

From the original issue:

Snakemake crashes with a 'conda info --json' returned non-zero exit status 127 when a rule contains both conda and container directives.

This behaviour was caused by the retrieval in _software_stack_hash() within the Persistence class. If a snakemake rule specified both a container and a conda file, this method would always try to get a hash value of the conda file, resulting in an instantiation of the Conda class using the container as the base image (ad-hoc combination). This happened even when conda was not specified with --sdm. This also happened in get_log_error_info' of jobs.py'. This PR fixes this issue and ensures that rules can be executed wither with conda, apptainer or the ad-hoc combination.

QC

[ x] The PR contains a test case for the changes or the changes are already covered by an existing test case.
[ x] The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

Bug Fixes
- Enhanced handling of environment details to prevent errors when specific deployment modes aren’t active.
- Improved logic for updating environment-related configurations.
New Features
- Introduced a workflow that supports execution in both containerized and Conda environments.
- Added a dependency configuration file for consistent package management.
- New output file created with expected results from the workflow.
- Expanded validation capabilities for data structures, including support for both Pandas and Polars.
Tests
- Implemented new tests to verify proper execution across varying deployment methods.
- Enhanced validation tests for sample data against specified schemas.

…ot requested in the software deployment method.

coderabbitai · 2025-03-11T15:57:10Z

📝 Walkthrough

Walkthrough

The changes update the core logic in handling conda environments and container deployments. In the Job class, the method for generating log error information now conditionally retrieves the conda environment address only when the deployment method is set to conda; it also ensures the container image field is consistently present. In the Persistence class, the MD5 hash update logic now includes deployment method checks for both conda and Apptainer. Additionally, new test resources and a workflow (Snakefile, expected output, test environment YAML, and corresponding test function) have been added for issue 3202, validating execution in both conda and containerized environments.

Changes

File(s)	Change Summary
`snakemake/jobs.py`, `snakemake/persistence.py`	Modified conditional logic for conda environment handling: In `Job.get_log_error_info`, the conda environment address is retrieved only if the deployment method is CONDA; in `Persistence._software_stack_hash`, hash updates now check for deployment methods for conda and container images. Also added an import for `DeploymentMethod`.
`tests/test_issue_3202/Snakefile`, `tests/test_issue_3202/expected-results/output.txt`, `tests/test_issue_3202/test-env.yaml`, `tests/tests_using_conda.py`	Added a new workflow and testing setup for issue 3202: Introduced rules for executing jobs in both Conda and containerized environments, provided an expected output file and a conda environment YAML configuration, and implemented a test function to validate execution under both deployment methods.
`docs/project_info/codebase.rst`, `docs/project_info/contributing.rst`, `docs/snakefiles/configuration.rst`	Updated documentation for clarity and consistency, including corrections to headings and enhancements to configuration validation descriptions.
`snakemake/assets/__init__.py`, `snakemake/report/html_reporter/data/packages.py`	Added numerous package declarations for dependencies related to `vega`, `vega-lite`, and `vega-embed`, enhancing asset management capabilities.
`snakemake/utils.py`	Introduced new internal functions for validating records and data frames for pandas and polars, improving validation capabilities and code organization.
`tests/test_validate/Snakefile`, `tests/test_validate/samples.schema.yaml`	Expanded validation functionality for sample data using both Pandas and Polars, and updated schema definitions for samples.
`docs/project_info/codebase.rst`, `docs/project_info/contributing.rst`, `docs/snakefiles/configuration.rst`	Updated documentation for clarity and consistency, including corrections to headings and enhancements to configuration validation descriptions.

Sequence Diagram(s)

sequenceDiagram
    participant J as Job
    participant C as CondaEnv Checker
    participant L as LogInfo Constructor

    J->>+J: get_log_error_info()
    alt DeploymentMethod is CONDA and conda_env exists
        J->>+C: Retrieve conda_env.address
        C-->>-J: Return conda_env.address
    else
        J-->>J: Set conda_env to None
    end
    J->>L: Include container_img in log data
    L-->>J: Return error info dictionary

sequenceDiagram
    participant T as Test (test_issue_3202)
    participant S as Workflow (Snakefile)
    participant R as Rule Executor

    T->>+S: Invoke run() with DeploymentMethod.APPTAINER
    S->>R: Execute rule (run_in_container_and_conda)
    R-->>S: Write "foo" to output.txt
    S-->>T: Validate container workflow result

    T->>+S: Invoke run() with DeploymentMethod.CONDA
    S->>R: Execute rule (run_in_container_and_conda)
    R-->>S: Write "foo" to output.txt
    S-->>T: Validate conda workflow result

Possibly related PRs

fix: fixed check for remote conda env pinning and post-deploy files; fixed conda env cleanup #3103: The changes in the main PR related to the get_log_error_info method in the Job class are connected to the modifications in the conda_cleanup_envs method in the Persistence class, as both involve handling the conda_env attribute and its address.
fix: install conda in container image #3127: The changes in the main PR, which enhance error handling for conda environments in the get_log_error_info method, are related to the retrieved PR that modifies the installation of conda in the container image, as both involve the management and usage of conda environments.
fix: include conda pinnings, conda post deploy script, and env modules for detection of software stack changes and corresponding rerun triggers #3184: The changes in the main PR, which enhance the error handling and control flow related to conda environments in the get_log_error_info method, are related to the modifications in the retrieved PR that also involve conda environment checks and handling in the Persistence class, specifically through the introduction of the software_stack_hash method.

Suggested reviewers

johanneskoester

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0a5968 and d18f99e.

⛔ Files ignored due to path filters (1)

tests/test_validate/samples.tsv is excluded by !**/*.tsv

📒 Files selected for processing (9)

docs/project_info/codebase.rst (1 hunks)
docs/project_info/contributing.rst (2 hunks)
docs/snakefiles/configuration.rst (1 hunks)
snakemake/assets/__init__.py (4 hunks)
snakemake/dag.py (1 hunks)
snakemake/report/html_reporter/data/packages.py (1 hunks)
snakemake/utils.py (1 hunks)
tests/test_validate/Snakefile (1 hunks)
tests/test_validate/samples.schema.yaml (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/project_info/codebase.rst

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/assets/__init__.py
snakemake/utils.py
snakemake/dag.py
snakemake/report/html_reporter/data/packages.py

🪛 Ruff (0.8.2)

snakemake/utils.py

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

⏰ Context from checks skipped due to timeout of 90000ms (30)

GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing (9, 3.12, bash)
GitHub Check: testing (9, 3.11, bash)
GitHub Check: testing (8, 3.12, bash)
GitHub Check: testing (8, 3.11, bash)
GitHub Check: testing (7, 3.12, bash)
GitHub Check: testing (7, 3.11, bash)
GitHub Check: testing (6, 3.12, bash)
GitHub Check: testing (6, 3.11, bash)
GitHub Check: testing (5, 3.12, bash)
GitHub Check: testing (5, 3.11, bash)
GitHub Check: testing-windows (10)
GitHub Check: testing (4, 3.12, bash)
GitHub Check: testing-windows (9)
GitHub Check: testing (4, 3.11, bash)
GitHub Check: testing-windows (8)
GitHub Check: testing (3, 3.12, bash)
GitHub Check: testing-windows (7)
GitHub Check: testing (3, 3.11, bash)
GitHub Check: testing-windows (6)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (4)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (3)
GitHub Check: testing (1, 3.11, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing-windows (1)

🔇 Additional comments (20)

snakemake/dag.py (1)

412-434: Improved file processing order with topological sorting

The changes to store_storage_outputs method enhance the reliability of storage operations by processing jobs in topologically sorted levels rather than in arbitrary order. This ensures that jobs are processed with respect to their dependencies, which is particularly important when files from one job may be needed by subsequent jobs.

By organizing the storage operations by levels in the dependency graph, the code now offers:

More predictable execution order

Better parallelization as each level can be processed concurrently

Proper handling of dependencies between jobs

snakemake/report/html_reporter/data/packages.py (1)

41-185: LGTM! Comprehensive package documentation for visualization libraries.

The additions provide thorough documentation of visualization library dependencies, correctly organizing them into logical sections with clear comments. This improves license compliance and package management.

snakemake/assets/__init__.py (2)

96-531: LGTM! Well-documented asset specifications for visualization libraries.

These additions comprehensively document all the dependencies for visualization libraries with appropriate licensing information, version details, and SHA256 checksums. The comments clearly delineate different dependency groups and explain relationships between libraries.

96-531: Request clarification on relation to PR objectives.

These comprehensive asset additions seem unrelated to the PR's stated objective of "Prevent execution of conda in apptainer when not explicitly requested in software deployment method". The PR description mentions fixes to _software_stack_hash() and get_log_error_info(), but these changes focus on visualizations.

Could you clarify how these asset and package additions relate to fixing the conda/apptainer issue described in the PR objectives?

tests/test_validate/samples.schema.yaml (1)

9-17: Schema additions improve validation but lack connection to PR objective.

The schema has been enhanced with two new default properties (n and tissue) and a simplified description for the condition property. These are good improvements for robust validation, but don't appear to relate to the PR's stated goal of fixing conda and container directive issues.

tests/test_validate/Snakefile (5)

4-4: Added Polars import for enhanced validation.

The import of Polars alongside Pandas enables support for an alternative columnar data processing library.

11-28: New dictionary and Pandas validation tests added.

These tests validate different aspects of schema validation:

Dictionary validation from single row data

Dictionary validation with null value filtering

Pandas DataFrame validation without an index

The tests ensure that default values are properly applied and validated against the schema.

29-40: Added Polars DataFrame validation tests.

The test creates a Polars DataFrame with explicitly defined schema and null values handling, then validates it against the sample schema. The assertions confirm default values are properly applied.

41-50: Added Polars LazyFrame validation test with disabled defaults.

This test demonstrates validation of a Polars LazyFrame with set_default=False, showing flexibility in the validation process. Note that LazyFrames allow for deferred execution which can improve performance for large datasets.

51-57: Renamed and enhanced existing Pandas test.

The original validation test has been preserved but enhanced with assertions to verify that default values are properly applied to the indexed DataFrame.

docs/project_info/contributing.rst (3)

188-189: Documentation guidelines improved with authoritative references.

The section structure recommendation now includes links to both Sphinx documentation and Python Developer's Guide, providing clearer guidance for contributors.

193-213: Updated section heading structure and naming conventions.

The section heading structure has been standardized:

"heading_1" → "section_heading"

"heading_2" → "subsection_heading"

"heading_3" → "subsubsection_heading"

"heading_4" → "paragraph_heading"

This improves consistency with standard documentation practices.

228-229: Updated environment name for documentation builds.

Changed the conda environment name from "snakemake" to "snakemake_docs" for documentation building, which helps distinguish it from the main development environment.

snakemake/utils.py (7)

118-118: New _validate_pandas function looks good
This concise and well-named function enhances clarity in handling Pandas-based validations.

123-123: Helpful debug statement
Logging the specific DataFrame type aids in troubleshooting and debugging.

125-130: Confirm the intended dropping of null-valued fields
By excluding NULL fields, columns with NaN or None values will disappear from the record. Verify this aligns with your desired behavior.

137-142: Check partial-column updates
Using data[newcol] = None and then data.update(newdata) is a neat way to insert default columns. Confirm that this approach doesn’t inadvertently overwrite valid existing data with None.

150-150: New _validate_polars function
This is a valuable addition that mirrors the Pandas validation approach for Polars users.

219-222: Dictionary validation approach
Logging and then reusing _validate_record keeps things consistent for dict-based configs.

227-230: Good fallback chain for Pandas and Polars
This ensures that multiple data types are smoothly validated.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

snakemake/jobs.py (1)
1100-1113: Good fix for preventing conda instantiation in containers.

This change properly checks if conda deployment is enabled before accessing the conda environment address, preventing the 'conda info --json' error when using a container with a rule that also includes a conda directive but isn't using conda for deployment.

However, there's a small typo in the variable name conda_env_adress (should be conda_env_address).
-        conda_env_adress = (
+        conda_env_address = (
tests/test_issue_3202/test-env.yaml (1)
1-6: Good test environment configuration.

This conda environment configuration specifies appropriate dependencies for testing the issue fix. It correctly defines the channels and required packages with version constraints.

Consider adding a newline at the end of the file to follow YAML best practices.
  - melt ==1.0.3
  - python <3.10
+
🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)
tests/test_issue_3202/Snakefile (1)
8-8: Fix typo in comment.

There's a spelling error in the comment.
-# Test that rule can be run idenpently in conda and apptainer
+# Test that rule can be run independently in conda and apptainer

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ba62fe and 794d15a.

📒 Files selected for processing (6)

snakemake/jobs.py (1 hunks)
snakemake/persistence.py (2 hunks)
tests/test_issue_3202/Snakefile (1 hunks)
tests/test_issue_3202/expected-results/output.txt (1 hunks)
tests/test_issue_3202/test-env.yaml (1 hunks)
tests/tests_using_conda.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/tests_using_conda.py
snakemake/persistence.py
snakemake/jobs.py

🪛 YAMLlint (1.35.1)

tests/test_issue_3202/test-env.yaml

[error] 6-6: no new line character at the end of file

(new-line-at-end-of-file)

🪛 GitHub Actions: CI

tests/tests_using_conda.py

[error] 1-1: Black formatting check failed. 1 file would be reformatted. Please run 'black' to fix code style issues in this file.

🔇 Additional comments (6)

snakemake/persistence.py (2)

33-33: Good addition of necessary import.

Adding the import for DeploymentMethod to support the conditional logic in the _software_stack_hash method.

522-532: Excellent fix to prevent conda instantiation when not explicitly requested.

This change fixes the issue described in #3202 by ensuring that the MD5 hash is only updated for conda environments when conda is explicitly specified as a deployment method. Similarly, it only updates the hash for Apptainer when it's explicitly requested. This prevents the unnecessary instantiation of the Conda class when a rule includes both conda and container directives but conda isn't being used.

snakemake/jobs.py (1)

1121-1122: Consistent handling of container image.

The code now consistently includes the container image field regardless of deployment method, which is a good practice.

tests/test_issue_3202/expected-results/output.txt (1)

1-1: Appropriate test output file for issue #3202.

This expected output file is used to verify that the rule with both conda and container directives runs correctly.

tests/test_issue_3202/Snakefile (2)

1-7: LGTM: Clear setup for test case.

The configuration correctly sets up bash as the shell executable and defines the conda environment path.

9-17: LGTM: Well-structured test rule with both conda and container directives.

This rule effectively demonstrates the scenario from issue #3202 by including both conda and container directives in a single rule. The simple shell command ensures the test is focused specifically on the deployment method conflict resolution.

tests/tests_using_conda.py

sonarqubecloud · 2025-03-11T16:13:25Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (4)

snakemake/utils.py (4)
111-117: Consider adding a docstring for _validate_record
Providing a brief explanation of this function’s logic and parameter expectations (e.g., that record should be a dictionary) can aid maintainability and readability.

145-149: Consolidate return paths
Multiple return False statements can make the flow less obvious. Consider combining them or restructuring for improved readability.
 if condition_A:
     return False
 if condition_B:
     return False
 return True
168-170: Raise from the original exception
Use raise WorkflowError(...) from e to better preserve the traceback chain.
-    raise WorkflowError(
-        f"Error validating row {i} of data frame.", e
-    )
+    raise WorkflowError(
+        f"Error validating row {i} of data frame."
+    ) from e
🧰 Tools

🪛 Ruff (0.8.2)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Raise from the original exception
Same rationale here: attach the original exception to keep the traceback intact.
-    raise WorkflowError(
-        f"Error validating row {i} of data frame.", e
-    )
+    raise WorkflowError(
+        f"Error validating row {i} of data frame."
+    ) from e
🧰 Tools

🪛 Ruff (0.8.2)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🛑 Comments failed to post (1)

docs/snakefiles/configuration.rst (1)
115-116: 💡 Verification agent
🧩 Analysis chain

Documentation update aligns with functionality changes but lacks connection to PR objective.

The documentation has been updated to reflect that validation now supports polars DataFrames and LazyFrames, which is good for user awareness. However, these changes don't seem directly related to the PR's objective of preventing conda execution in apptainer.

🌐 Web query:
What is Polars and how does it compare to Pandas for data processing?
💡 Result:

Polars is a high-performance DataFrame library designed for fast data processing in Python, built in Rust and optimized for parallel computing. It leverages Apache Arrow's memory model for efficient columnar data handling, enabling speed improvements of 5–10x over pandas for common operations, with benchmarks showing up to 40x faster joins and 8–25x faster aggregations on large datasets[3][6][12]. Here's how they compare:

Key Differences

Feature Polars Pandas

Execution Model Eager + lazy (with query optimizer) Eager only

Core Language Rust Python/Cython

Memory Management Apache Arrow (columnar) NumPy arrays (row-oriented)

Parallelism Full CPU core utilization Single-threaded by default

Memory Efficiency 2–4x dataset size in RAM required 5–10x dataset size in RAM required

API Style Expressive method chaining Mix of methods + apply/lambdas

Performance Highlights

Data Ingestion

Reads CSV files 11x faster than pandas (1.25s vs 14.14s for 5GB datasets)[12]

Handles joins 40x faster in some benchmarks[3][16]

Memory Efficiency
Polars consumes 63–87% less energy for large-scale operations while processing data 8x faster[10]

Scalability

Processes datasets up to 10x larger than pandas on the same hardware[4][8]

Maintains performance with 100M+ row datasets where pandas fails[16]

When to Choose Polars

Big data processing (1GB–100GB datasets)

Complex ETL pipelines requiring parallel execution

Memory-constrained environments

Operations benefiting from query optimization (filter → groupby patterns)[6]

When to Prefer Pandas

Small datasets (<1M rows) for exploratory analysis

Tight integration with ML libraries (scikit-learn, PyTorch)

Legacy codebases using pandas-specific features

Interactive workflows requiring immediate feedback[2][6]

Ecosystem Considerations

While Polars excels at raw performance:

Interoperability: Convert to/from pandas with .to_pandas() and from_pandas()[9]

Visualization: Requires conversion to pandas for integration with Matplotlib/Seaborn[6]

GPU Support: New RAPIDS integration enables 13x speedups on NVIDIA GPUs[11]

For most modern data engineering tasks, Polars provides substantial advantages in speed and memory efficiency while maintaining pandas-like syntax. However, pandas remains preferable for small-scale analytics and ML integration scenarios[4][6][16].

Citations:

1: https://www.datacamp.com/blog/an-introduction-to-polars-python-s-tool-for-large-scale-data-analysis

2: https://www.kdnuggets.com/pandas-vs-polars-a-comparative-analysis-of-python-dataframe-libraries

3: https://stuffbyyuki.com/pandas-vs-polars-speed-comparison/

4: https://deepnote.com/guides/data-science-and-analytics/pandas-vs-polars

5: https://realpython.com/polars-python/

6: https://blog.jetbrains.com/pycharm/2024/07/polars-vs-pandas/

7: https://towardsdatascience.com/high-performance-data-processing-pandas-2-vs-polars-a-vcpu-perspective-e922d3064f4e/

8: https://pola.rs

9: https://docs.pola.rs/user-guide/migration/pandas/

10: https://pola.rs/posts/benchmark-energy-performance/

11: https://developer.nvidia.com/blog/nvidia-cuda-x-now-accelerates-the-polars-data-processing-library/

12: https://towardsdatascience.com/polars-vs-pandas-an-independent-speed-comparison/

13: https://github.com/prrao87/duckdb-study

14: https://github.com/pola-rs/polars

15: https://discuss.python.org/t/future-of-pandas-vs-polars/54754

16: https://pola.rs/posts/benchmarks/

17: https://kevinheavey.github.io/modern-polars/

18: https://www.datacamp.com/tutorial/high-performance-data-manipulation-in-python-pandas2-vs-polars

19: https://www.datacamp.com/tutorial/benchmarking-high-performance-pandas-alternatives

20: https://www.statology.org/pandas-vs-polars-performance-benchmarks-for-common-data-operations/
Action Required: Link Documentation Update to PR Objective

The updated documentation accurately reflects the new validation support for Polars DataFrame and LazyFrame types.

However, it does not address the PR’s primary objective of preventing conda execution in Apptainer.

Please clarify in the docs whether these changes are intended to be part of a broader initiative (including the Apptainer context) or if they should be documented separately.

🤖 I have created a release *beep* *boop* --- ## [9.0.0](v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ### Features * [#3412](#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([#3430](#3430)) ([22978c3](22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([#3428](#3428)) ([b0a7f03](b0a7f03)) * add flatten function to IO utils ([#3424](#3424)) ([67fa392](67fa392)) * add helper functions to parse input files ([#2918](#2918)) ([63e45a7](63e45a7)) * Add option to print redacted file names ([#3089](#3089)) ([ba4d264](ba4d264)) * add support for validation of polars dataframe and lazyframe ([#3262](#3262)) ([c7473a6](c7473a6)) * added support for rendering dag with mermaid js ([#3409](#3409)) ([7bf8381](7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([#3381](#3381)) ([47504a0](47504a0)) * Dynamic module name ([#3401](#3401)) ([024dc32](024dc32)) * Enable saving and reloading IOCache object ([#3386](#3386)) ([c935953](c935953)) * files added in rule params with workflow.source_path will be available in used containers ([#3385](#3385)) ([a6e45bf](a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([#3410](#3410)) ([67b4739](67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([#3365](#3365)) ([93e4b92](93e4b92)) * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ([86f1d6e](86f1d6e)) * Maximal file size for checksums ([#3368](#3368)) ([b039f8a](b039f8a)) * Modernize package configuration using Pixi ([#3369](#3369)) ([77992d8](77992d8)) * multiext support for named input/output ([#3372](#3372)) ([05e1378](05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([#3378](#3378)) ([cc9bba2](cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([#3265](#3265)) ([23fef82](23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([#3398](#3398)) ([cd9c3c3](cd9c3c3)) * add default value to max-jobs-per-timespan ([#3043](#3043)) ([2959abe](2959abe)) * checkpoints inside modules are overwritten ([#3359](#3359)) ([fba3ac7](fba3ac7)) * Convert Path to IOFile ([#3405](#3405)) ([c58684c](c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([#3358](#3358)) ([9a6d14b](9a6d14b)) * edgecases of source deployment in case of remote execution ([#3396](#3396)) ([5da13be](5da13be)) * enhance error message formatting for strict DAG-building mode ([#3376](#3376)) ([a1c39ee](a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([#3423](#3423)) ([8cf4a2f](8cf4a2f)) * force check all required outputs ([#3341](#3341)) ([495a4e7](495a4e7)) * group job formatting ([#3442](#3442)) ([f0b10a3](f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([#3377](#3377)) ([eace08f](eace08f)) * only skip eval when resource depends on input ([#3374](#3374)) ([4574c92](4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([#3388](#3388)) ([c43c5c0](c43c5c0)) * print filenames with quotes around them in RuleException ([#3269](#3269)) ([6baeda5](6baeda5)) * Re-evaluation of free resources ([#3399](#3399)) ([6371293](6371293)) * ReadTheDocs layout issue due to src directory change ([#3419](#3419)) ([695b127](695b127)) * robustly escaping quotes in generated bash scripts (v2) ([#3297](#3297)) ([#3389](#3389)) ([58720bd](58720bd)) * Show apptainer image URL in snakemake report ([#3407](#3407)) ([45f0450](45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([#3433](#3433)) ([3f227a6](3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([#3382](#3382)) ([115e81b](115e81b)) * fix contribution section heading levels, fix docs testing setup order ([#3360](#3360)) ([051dc53](051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([#3436](#3436)) ([ec6d97c](ec6d97c)) * fix quoting ([#3394](#3394)) ([b40f599](b40f599)) * fix rerun-triggers default ([#3403](#3403)) ([4430e23](4430e23)) * fix typo 'safe' -> 'save' ([#3384](#3384)) ([7755861](7755861)) * mention code formatting in the contribution section ([#3431](#3431)) ([e8682b7](e8682b7)) * remove duplicated 'functions'. ([#3356](#3356)) ([7c595db](7c595db)) * update broken links documentation ([#3437](#3437)) ([e3d0d88](e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([#3415](#3415)) ([8e95a12](8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>

…ested in software deployment method (snakemake#3388) This is a quick fix for the issue described in snakemake#3202. From the original issue: > Snakemake crashes with a 'conda info --json' returned non-zero exit status 127 when a rule contains both conda and container directives. This behaviour was caused by the retrieval in `_software_stack_hash()` within the `Persistence` class. If a snakemake rule specified both a container and a conda file, this method would always try to get a hash value of the conda file, resulting in an instantiation of the `Conda` class using the container as the base image (ad-hoc combination). This happened even when conda was not specified with `--sdm`. This also happened in `get_log_error_info' of `jobs.py'. This PR fixes this issue and ensures that rules can be executed wither with conda, apptainer or the ad-hoc combination. ### QC * [ x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit - **Bug Fixes** - Enhanced handling of environment details to prevent errors when specific deployment modes aren’t active. - Improved logic for updating environment-related configurations. - **New Features** - Introduced a workflow that supports execution in both containerized and Conda environments. - Added a dependency configuration file for consistent package management. - New output file created with expected results from the workflow. - Expanded validation capabilities for data structures, including support for both Pandas and Polars. - **Tests** - Implemented new tests to verify proper execution across varying deployment methods. - Enhanced validation tests for sample data against specified schemas.

🤖 I have created a release *beep* *boop* --- ## [9.0.0](snakemake/snakemake@v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ### Features * [snakemake#3412](snakemake#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([snakemake#3430](snakemake#3430)) ([22978c3](snakemake@22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([snakemake#3428](snakemake#3428)) ([b0a7f03](snakemake@b0a7f03)) * add flatten function to IO utils ([snakemake#3424](snakemake#3424)) ([67fa392](snakemake@67fa392)) * add helper functions to parse input files ([snakemake#2918](snakemake#2918)) ([63e45a7](snakemake@63e45a7)) * Add option to print redacted file names ([snakemake#3089](snakemake#3089)) ([ba4d264](snakemake@ba4d264)) * add support for validation of polars dataframe and lazyframe ([snakemake#3262](snakemake#3262)) ([c7473a6](snakemake@c7473a6)) * added support for rendering dag with mermaid js ([snakemake#3409](snakemake#3409)) ([7bf8381](snakemake@7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([snakemake#3381](snakemake#3381)) ([47504a0](snakemake@47504a0)) * Dynamic module name ([snakemake#3401](snakemake#3401)) ([024dc32](snakemake@024dc32)) * Enable saving and reloading IOCache object ([snakemake#3386](snakemake#3386)) ([c935953](snakemake@c935953)) * files added in rule params with workflow.source_path will be available in used containers ([snakemake#3385](snakemake#3385)) ([a6e45bf](snakemake@a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([snakemake#3410](snakemake#3410)) ([67b4739](snakemake@67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([snakemake#3365](snakemake#3365)) ([93e4b92](snakemake@93e4b92)) * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ([86f1d6e](snakemake@86f1d6e)) * Maximal file size for checksums ([snakemake#3368](snakemake#3368)) ([b039f8a](snakemake@b039f8a)) * Modernize package configuration using Pixi ([snakemake#3369](snakemake#3369)) ([77992d8](snakemake@77992d8)) * multiext support for named input/output ([snakemake#3372](snakemake#3372)) ([05e1378](snakemake@05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([snakemake#3378](snakemake#3378)) ([cc9bba2](snakemake@cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([snakemake#3265](snakemake#3265)) ([23fef82](snakemake@23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([snakemake#3398](snakemake#3398)) ([cd9c3c3](snakemake@cd9c3c3)) * add default value to max-jobs-per-timespan ([snakemake#3043](snakemake#3043)) ([2959abe](snakemake@2959abe)) * checkpoints inside modules are overwritten ([snakemake#3359](snakemake#3359)) ([fba3ac7](snakemake@fba3ac7)) * Convert Path to IOFile ([snakemake#3405](snakemake#3405)) ([c58684c](snakemake@c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([snakemake#3358](snakemake#3358)) ([9a6d14b](snakemake@9a6d14b)) * edgecases of source deployment in case of remote execution ([snakemake#3396](snakemake#3396)) ([5da13be](snakemake@5da13be)) * enhance error message formatting for strict DAG-building mode ([snakemake#3376](snakemake#3376)) ([a1c39ee](snakemake@a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([snakemake#3423](snakemake#3423)) ([8cf4a2f](snakemake@8cf4a2f)) * force check all required outputs ([snakemake#3341](snakemake#3341)) ([495a4e7](snakemake@495a4e7)) * group job formatting ([snakemake#3442](snakemake#3442)) ([f0b10a3](snakemake@f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([snakemake#3377](snakemake#3377)) ([eace08f](snakemake@eace08f)) * only skip eval when resource depends on input ([snakemake#3374](snakemake#3374)) ([4574c92](snakemake@4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([snakemake#3388](snakemake#3388)) ([c43c5c0](snakemake@c43c5c0)) * print filenames with quotes around them in RuleException ([snakemake#3269](snakemake#3269)) ([6baeda5](snakemake@6baeda5)) * Re-evaluation of free resources ([snakemake#3399](snakemake#3399)) ([6371293](snakemake@6371293)) * ReadTheDocs layout issue due to src directory change ([snakemake#3419](snakemake#3419)) ([695b127](snakemake@695b127)) * robustly escaping quotes in generated bash scripts (v2) ([snakemake#3297](snakemake#3297)) ([snakemake#3389](snakemake#3389)) ([58720bd](snakemake@58720bd)) * Show apptainer image URL in snakemake report ([snakemake#3407](snakemake#3407)) ([45f0450](snakemake@45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([snakemake#3433](snakemake#3433)) ([3f227a6](snakemake@3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([snakemake#3382](snakemake#3382)) ([115e81b](snakemake@115e81b)) * fix contribution section heading levels, fix docs testing setup order ([snakemake#3360](snakemake#3360)) ([051dc53](snakemake@051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([snakemake#3436](snakemake#3436)) ([ec6d97c](snakemake@ec6d97c)) * fix quoting ([snakemake#3394](snakemake#3394)) ([b40f599](snakemake@b40f599)) * fix rerun-triggers default ([snakemake#3403](snakemake#3403)) ([4430e23](snakemake@4430e23)) * fix typo 'safe' -> 'save' ([snakemake#3384](snakemake#3384)) ([7755861](snakemake@7755861)) * mention code formatting in the contribution section ([snakemake#3431](snakemake#3431)) ([e8682b7](snakemake@e8682b7)) * remove duplicated 'functions'. ([snakemake#3356](snakemake#3356)) ([7c595db](snakemake@7c595db)) * update broken links documentation ([snakemake#3437](snakemake#3437)) ([e3d0d88](snakemake@e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([snakemake#3415](snakemake#3415)) ([8e95a12](snakemake@8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>

Fixed error that caused conda to be executed in apptainer even when n…

794d15a

…ot requested in the software deployment method.

johausmann requested a review from johanneskoester as a code owner March 11, 2025 15:57

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

tests/tests_using_conda.py Show resolved Hide resolved

johausmann and others added 2 commits March 11, 2025 17:00

Run black on tests_using_conda

d0a5968

Merge branch 'main' into 3202_fix_conda_status_127

d18f99e

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

johanneskoester approved these changes Mar 11, 2025

View reviewed changes

johanneskoester merged commit c43c5c0 into snakemake:main Mar 11, 2025
41 checks passed

github-actions bot mentioned this pull request Mar 11, 2025

chore(main): release 9.0.0 #3380

Merged

coderabbitai bot mentioned this pull request Aug 20, 2025

fix: unit testing #3680

Merged

2 tasks

coderabbitai bot mentioned this pull request Nov 28, 2025

feat!: generic software deployment plugin support #3339

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Prevent execution of conda in apptainer when not explicitly requested in software deployment method#3388

fix: Prevent execution of conda in apptainer when not explicitly requested in software deployment method#3388
johanneskoester merged 3 commits intosnakemake:mainfrom
johausmann:3202_fix_conda_status_127

johausmann commented Mar 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 11, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 11, 2025

Uh oh!

coderabbitai bot left a comment

Key Differences

Performance Highlights

When to Choose Polars

When to Prefer Pandas

Ecosystem Considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature	Polars	Pandas
Execution Model	Eager + lazy (with query optimizer)	Eager only
Core Language	Rust	Python/Cython
Memory Management	Apache Arrow (columnar)	NumPy arrays (row-oriented)
Parallelism	Full CPU core utilization	Single-threaded by default
Memory Efficiency	2–4x dataset size in RAM required	5–10x dataset size in RAM required
API Style	Expressive method chaining	Mix of methods + apply/lambdas

Conversation

johausmann commented Mar 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 11, 2025

Quality Gate passed

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Key Differences

Performance Highlights

When to Choose Polars

When to Prefer Pandas

Ecosystem Considerations

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johausmann commented Mar 11, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)