chore: Add license texts for things bundled in the assets#3099
chore: Add license texts for things bundled in the assets#3099fgvieira merged 16 commits intosnakemake:mainfrom
Conversation
Referencing the license file from the release tag corresponding to the bundled release of each asset makes it less likely that the checksum will eventually stop matching due to later upstream license changes – perhaps as simple as a new copyright date – and makes errors in the included license less likely. Since the copyright statement in the react license changed from "Copyright (c) Facebook, Inc. and its affiliates." to "Copyright (c) Meta Platforms, Inc. and affiliates." since the bundled version of react was released – and this commit uses the older version that corresponds to the bundled library – the checksum of the react license file changes. Similarly, in the vega license, the copyright date range changed from 2015-2021 to 2015-2023 after the bundled version was released, so using the correct license file for the bundled version changes the checksum.
The current version as of when icon paths from heroicons were incorporated in Snakemake (in d09df0c on 2022-03-13 and 142a452 on 2022-03-23) was 1.0.3; in version 2.1.2, the copyright statement changed from "Copyright (c) 2020 Refactoring UI Inc." to "Copyright (c) Tailwind Labs, Inc.". Since this commit switches to the older license text, it changes the checksum.
These are the NPM packages that are included in the vega.js bundle via dependencies. An effort has been made to verify that every license file added in this commit actually corresponds to some code present in vega.js.
These correspond to routines that are maintained with the rest of the vega sources, but are attributed to other projects.
📝 WalkthroughWalkthroughThe pull request introduces significant updates to asset management in the Snakemake project, specifically in the Changes
Possibly related PRs
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
|
thanks a lot!! |
|
Can you use the |
Sure, I’ll update that. I’ve really wanted to find time to return to this and finish auditing |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake/assets/__init__.py(1 hunks)snakemake/report/html_reporter/data/packages.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/report/html_reporter/data/packages.pysnakemake/assets/__init__.py
🪛 GitHub Actions: CI
snakemake/assets/__init__.py
[error] 93-93: Black formatting check failed. 1 file would be reformatted. Please run 'black' to fix code style issues.
🔇 Additional comments (2)
snakemake/assets/__init__.py (1)
96-318: Optionally verify the correctness of newly added URLs and checksums.You have referenced numerous new assets for the vega ecosystem. Please confirm that each URL is correct and that its associated SHA-256 digest aligns with upstream releases. Minor discrepancies can crop up if the upstream repository tags shift or if new content is introduced without changing the tag name.
snakemake/report/html_reporter/data/packages.py (1)
41-136: No issues found.These package declarations appear valid, and each package references the correct license path. No functional or security concerns are apparent.
Done. Now I’ll start looking at the other libraries. (I’ve put off finishing this PR, but I really want to get it done so I can update |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
snakemake/assets/__init__.py (1)
96-351: Consider externalizing asset references for maintainability.All of these asset entries are stored in one large Python dictionary. If you plan to continue adding more entries or adjust them over time, you might benefit from moving them to a structured data file (e.g., YAML or JSON) that can be loaded here at runtime. This can reduce code churn and simplify auditing for future changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake/assets/__init__.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/assets/__init__.py
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: testing (10, 3.12)
- GitHub Check: testing (10, 3.11)
- GitHub Check: testing (9, 3.12)
- GitHub Check: testing (9, 3.11)
- GitHub Check: testing (8, 3.12)
- GitHub Check: testing (8, 3.11)
- GitHub Check: testing (7, 3.12)
- GitHub Check: testing (7, 3.11)
- GitHub Check: testing (6, 3.12)
- GitHub Check: testing (6, 3.11)
- GitHub Check: testing (5, 3.12)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (5, 3.11)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (4, 3.12)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (4, 3.11)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (3, 3.12)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (3, 3.11)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (2, 3.12)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (2, 3.11)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11)
- GitHub Check: testing-windows (1)
🔇 Additional comments (2)
snakemake/assets/__init__.py (2)
96-351: Re-run black to ensure consistent indentation.It appears that these newly added lines (96-351) might still contain tabs or be otherwise misaligned. A past review comment already flagged this and the pipeline confirmed reformatting is necessary.
96-351: Confirm that all new asset references and checksums remain valid.These changes introduce a large batch of URLs and pinned checksums for various dependencies within Vega. Any mismatch or shift in these remote resources can break the build or violate license compliance.
Would you like a helper script to systematically fetch each URL, compute the SHA-256, and verify they match the pinned values?
|
Let me know when you are done so I can review it. |
6ef45d2 to
91312ce
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
snakemake/assets/__init__.py (2)
96-351: Good addition of license references for Vega dependencies.This large block of entries comprehensively covers various licensing requirements. As a good-to-have improvement, consider verifying that none of the listed dependencies are fully tree-shaken and effectively unused in
vega/vega.js. Removing unused entries would keep the asset list concise while still fulfilling license obligations. If you need assistance automating that verification, let me know.
362-412: Ensure only necessary license entries for Vega-Lite and related packages.Similarly, this block accounts for numerous dependencies bundled in
vega-lite.js. If any of these references turn out to be unused, you can remove them to simplify maintenance. Everything else looks good from a correctness standpoint.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
snakemake/assets/__init__.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/assets/__init__.py
⏰ Context from checks skipped due to timeout of 90000ms (23)
- GitHub Check: testing (10, 3.12)
- GitHub Check: testing (10, 3.11)
- GitHub Check: testing (9, 3.12)
- GitHub Check: testing (9, 3.11)
- GitHub Check: testing (8, 3.12)
- GitHub Check: testing (8, 3.11)
- GitHub Check: testing (7, 3.12)
- GitHub Check: testing (7, 3.11)
- GitHub Check: testing (6, 3.12)
- GitHub Check: testing (6, 3.11)
- GitHub Check: testing (5, 3.12)
- GitHub Check: testing (5, 3.11)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (4, 3.12)
- GitHub Check: testing-windows (9)
- GitHub Check: testing-windows (8)
- GitHub Check: testing-windows (7)
- GitHub Check: testing-windows (6)
- GitHub Check: testing-windows (5)
- GitHub Check: testing-windows (4)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12)
- GitHub Check: testing-windows (1)
|
@fgvieira : Ok, I think I’m done, and I’ve taken this as far as could reasonably be expected. |
|
Great work!!! |
|
Hold on. In my latest commits, I forgot I should add new |
These are the NPM packages that are included in the vega-lite.js bundle via dependencies. An effort has been made to verify that every license file added in this commit actually corresponds to some code present in vega-lite.js. Note that no license files needed to be added for software copied/derived/adapted into vega-lite.js; hashlru was bundled via vega-util, but this was already covered for vega.js.
57ade3c to
72a6549
Compare
There, that last force-push should take care of it. |
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
snakemake/assets/__init__.py (1)
533-551: 💡 Verification agent🧩 Analysis chain
Verify license file deployment in production.
The
deploymethod handles license file deployment with proper validation. However, it's important to verify that all license files are properly deployed in the production environment.Run this script to verify license file deployment:
🏁 Script executed:
#!/bin/bash # Description: Verify that all license files are properly deployed # Check if all license files exist and are readable for file in $(rg -o 'license_path="([^"]+)"' | cut -d'"' -f2); do if [ -f "snakemake/assets/data/$file" ]; then echo "✓ Found $file" else echo "✗ Missing $file" fi doneLength of output: 9701
License Deployment Verification: Missing Assets Detected
The
deploymethod appears correctly implemented in terms of checksum validation and file creation; however, our verification script indicates that a substantial number of license files are not being deployed. Notable missing files include:
snakemake/LICENSE.mdpygments/LICENSEtailwindcss/LICENSEreact/LICENSEvega/LICENSE- (and many others as shown in the verification output)
Please review the license asset specifications in
cls.specto ensure that the license file paths are correct and that the deployment mechanism covers all expected assets. Adjustments may be required either in the asset definitions or in the deployment logic to guarantee that all necessary license files are available in production.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake/assets/__init__.py(4 hunks)snakemake/report/html_reporter/data/packages.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/report/html_reporter/data/packages.pysnakemake/assets/__init__.py
🪛 Ruff (0.8.2)
snakemake/report/html_reporter/data/packages.py
172-172: Dictionary key literal "json-stringify-pretty-compact" repeated
Remove repeated key literal "json-stringify-pretty-compact"
(F601)
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: testing (10, 3.12)
- GitHub Check: testing (10, 3.11)
- GitHub Check: testing (9, 3.12)
- GitHub Check: testing (9, 3.11)
- GitHub Check: testing (8, 3.12)
- GitHub Check: testing (8, 3.11)
- GitHub Check: testing (7, 3.12)
- GitHub Check: testing (7, 3.11)
- GitHub Check: testing (6, 3.12)
- GitHub Check: testing (6, 3.11)
- GitHub Check: testing (5, 3.12)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (5, 3.11)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (4, 3.12)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (4, 3.11)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (3, 3.12)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (3, 3.11)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (2, 3.12)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (2, 3.11)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11)
- GitHub Check: testing-windows (1)
🔇 Additional comments (2)
snakemake/report/html_reporter/data/packages.py (1)
41-136: Excellent documentation and organization of package dependencies!The code demonstrates thorough documentation of package dependencies, including:
- Clear section markers for dependencies vs copied/derived code
- Detailed comments about version sources and dependency relationships
- Comprehensive tracking of license origins
Also applies to: 246-351
snakemake/assets/__init__.py (1)
96-494: Well-implemented asset management with proper validation!The implementation includes:
- Proper SHA256 checksum validation for each asset
- Clear version tracking
- Comprehensive error handling
- Detailed documentation of asset origins and relationships
These are the NPM packages that are included in the vega-embed.js bundle via dependencies. An effort has been made to verify that every license file added in this commit actually corresponds to some code present in vega-embed.js. Note that no license files needed to be added for software copied/derived/adapted into vega-embed.js; everything is already covered via the dependencies for vega-lite.js or vega.js.
72a6549 to
ed8f337
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
snakemake/report/html_reporter/data/packages.py (1)
106-136: Consider storing the notice text in the resulting JSON record for clarity.The
commons-mathpackage entry includes anotice="commons-math/NOTICE.txt"argument, which is correctly fetched in__init__, but is not surfaced by thePackage.get_record()method. If you want the notice’s text to be displayed or tracked (similar to the license text), you may store it in the final JSON record:def get_record(self): return { "version": self.version, "license": self.license, + # Optionally surface the notice text. + # "notice": self.sources.get("notice", None), }snakemake/assets/__init__.py (1)
248-351: Notice handling for “commons-math” could be made more transparent.Similar to the recommendation in
packages.py, you may wish to surface the “NOTICE.txt” within the relevant record or logs, if needed, to maintain clarity about the source and required acknowledgments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake/assets/__init__.py(4 hunks)snakemake/report/html_reporter/data/packages.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/report/html_reporter/data/packages.pysnakemake/assets/__init__.py
⏰ Context from checks skipped due to timeout of 90000ms (40)
- GitHub Check: testing (10, 3.12, dash)
- GitHub Check: testing (10, 3.12, bash)
- GitHub Check: testing (10, 3.11, bash)
- GitHub Check: testing (9, 3.12, dash)
- GitHub Check: testing (9, 3.12, bash)
- GitHub Check: testing (9, 3.11, bash)
- GitHub Check: testing (8, 3.12, dash)
- GitHub Check: testing (8, 3.12, bash)
- GitHub Check: testing (8, 3.11, bash)
- GitHub Check: testing (7, 3.12, dash)
- GitHub Check: testing (7, 3.12, bash)
- GitHub Check: testing (7, 3.11, bash)
- GitHub Check: testing (6, 3.12, dash)
- GitHub Check: testing (6, 3.12, bash)
- GitHub Check: testing (6, 3.11, bash)
- GitHub Check: testing (5, 3.12, dash)
- GitHub Check: testing (5, 3.12, bash)
- GitHub Check: testing (5, 3.11, bash)
- GitHub Check: testing (4, 3.12, dash)
- GitHub Check: testing (4, 3.12, bash)
- GitHub Check: testing (4, 3.11, bash)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (3, 3.12, dash)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (3, 3.12, bash)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (3, 3.11, bash)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (2, 3.12, dash)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (2, 3.12, bash)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (2, 3.11, bash)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (1, 3.12, dash)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12, bash)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11, bash)
- GitHub Check: testing-windows (1)
🔇 Additional comments (7)
snakemake/report/html_reporter/data/packages.py (3)
41-105: All newly added Vega dependency entries look consistent.No duplication or obvious issues were detected in these
Packagedeclarations; each library is mapped to the correct license path according to its references invega.js.
141-161: Dependencies for vega-lite appear correctly declared.These
Packageentries match the license files indicated in the inline comments, reflecting the correct version references from the vega-lite build.
167-185: No duplication or syntax issues with the vega-embed dependencies.All license paths and version numbers appear consistent with the references in the inline documentation.
snakemake/assets/__init__.py (4)
96-247: Vega asset entries look consistent and well-annotated.Each newly added asset references the correct URL, version, and license path. Checksums also appear appropriately aligned with the documented versions.
362-409: Vega-Lite asset definitions match the versions indicated.All references for vega-lite dependencies appear correct, and no duplication or missing references were found.
423-494: New vega-embed assets look properly tracked.No discrepancies detected between the declared assets and their documented usage in
vega-embed.js.
510-530: Prop-types dependencies exhibit no discovered issues.The newly added lines reference potential dependencies that are ultimately excluded through tree-shaking. This is consistent with the project’s approach to bundling minimal code in production.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (16)
.github/workflows/main.yml (1)
51-52: Remove trailing whitespaceThere is a trailing whitespace after
dashon line 51 that should be removed for consistency.- - shell: dash + - shell: dash🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 51-51: trailing spaces
(trailing-spaces)
snakemake/cli.py (1)
107-146: Well-implemented function for parsing ancient input specificationsThe new
parse_consider_ancientfunction is well-structured with appropriate error handling. It properly handles both string and integer input specifications with clear error messages.However, there's a minor issue with the exception handling pattern:
Consider using the
raise ... from errpattern to maintain the exception context:- except ValueError: - if item.isidentifier(): - return item - else: - raise ValueError(f"{errmsg} (Unparsable value: {repr(item)})") + except ValueError as err: + if item.isidentifier(): + return item + else: + raise ValueError(f"{errmsg} (Unparsable value: {repr(item)})") from errThis helps with debugging by preserving the original exception context.
🧰 Tools
🪛 Ruff (0.8.2)
135-135: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
snakemake/workflow.py (3)
127-127: Remove unused import.The
snakemake.apiimport is reported as unused. Please remove for cleanliness:-from snakemake import api, caching, sourcecache +from snakemake import caching, sourcecache🧰 Tools
🪛 Ruff (0.8.2)
127-127:
snakemake.apiimported but unusedRemove unused import:
snakemake.api(F401)
257-259: Address the TODO comment.These TODO lines question potential file-regeneration issues with the source cache. Consider creating an issue or describing the intended handling logic to avoid confusion.
Do you want me to open a new issue for further discussion on this?
1323-1369: Minor refinement to log message phrasing.Overall, these new logging methods look good. Consider a small tweak to clarify this portion of the message (line 1332~):
- f"{len(jobs)} jobs have {description} " - "provenance/metadata so that it in part " - "cannot be used to trigger re-runs.\n" + f"{len(jobs)} jobs have {description} provenance/metadata, " + "which (in part) cannot be used to trigger re-runs.\n"snakemake/io.py (5)
1775-1780: Consider caching or memoizingsize_files.Since
size_files()triggers asynchronous calls for every file each time it is accessed, repeated invocations could be expensive. Caching or memoizing the result might reduce overhead for large or frequently accessedInputFiles.
1781-1792: Refactor repetitive conversion to KB, MB, GB.The
self.size_files_kb,self.size_files_mb, andself.size_files_gbproperties chain multiple divisions by 1024. Consider introducing a small helper function to convert from bytes to the desired unit to reduce repetition and make the code more self-descriptive.- @property - def size_files_kb(self): - return [f / 1024 for f in self.size_files] - @property - def size_files_mb(self): - return [f / 1024 for f in self.size_files_kb] - @property - def size_files_gb(self): - return [f / 1024 for f in self.size_files_mb] + def _convert_sizes(self, factor: float) -> List[float]: + return [size / factor for size in self.size_files] + @property + def size_files_kb(self): + return self._convert_sizes(1024) + @property + def size_files_mb(self): + return self._convert_sizes(1024 * 1024) + @property + def size_files_gb(self): + return self._convert_sizes(1024 * 1024 * 1024)
1793-1796: Potential for repeated summations insize.Summing
size_fileswill trigger the asynchronous size retrieval for each file on every call. If performance becomes a concern, consider persisting the result in a cached property.
1797-1800: Consistent naming and caching approach forsize_kb.
size_kbcallssum(self.size_files_kb), which re-runs the division logic for each file. If repeated calls are likely, exploring a cached approach would help, maintaining consistent caching across all size-based properties.
1805-1808: Unify all size-based properties.
size_gbis another repetitive summation pattern. For maintainability, unify all these conversions (KB, MB, GB) into a single logic path or caching mechanism, similar to the suggestions for the other properties.snakemake/script/__init__.py (6)
8-11: Remove redundant imports and consolidate them.You already do
import osat line 13. Consider removing duplicate imports at lines 9–11 to improve clarity and adhere to Python best practices about avoiding repeated imports.- import os + # remove the duplicated import if it's not needed
68-68: Clarify naming of_safely_store_params.While this method presumably handles safe serialization of parameters, the name “safely” is somewhat vague. Consider a more descriptive name like
_serialize_paramsor_persist_paramsto clarify its role.
119-158: Ensure resilience inparamsproperty rehydration.This block dynamically reconstructs pandas/numpy/polars objects. Consider:
- Partial Failure Handling: If certain items can’t be reimported, ensure you handle or log partial successes.
- Version Mismatch: Dependencies’ versions can introduce subtle breakage on object rehydration.
- Performance: Re-creating large DataFrames or arrays may be costly; consider caching if re-invoked.
Otherwise, the approach helps keep Snakemake parameters portable.
160-202: Centralize logic for_safely_store_params.While this block carefully converts numerous pandas, numpy, and polars structures to dictionaries, consider:
- Single Dispatcher: Using a small dispatcher or pattern matching for each object type can reduce nested
ifblocks.- Error Logging: Provide context (e.g. param index) if a conversion fails.
- Docstring: Add a docstring clarifying how each type is transformed.
This can simplify maintenance and usage clarity.
313-329: Reduce repetition inencode_listfor homogeneous vs. heterogeneous data.The logic is clear, but you may unify your checks for homogeneity and produce the result in a single pass. Also consider whether you need separate handling of
bytesor prefer them as strings in the final R code.
454-455: Use more descriptive naming than_params_storesuffix.Renaming
_params_storeto something likestored_paramsmight be more transparent. The existing fallback to “params” is workable but can lead to confusion for new contributors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (8)
CHANGELOG.mdis excluded by!CHANGELOG.mdCHANGELOG.mdis excluded by!CHANGELOG.mdCHANGELOG.mdis excluded by!CHANGELOG.mdCHANGELOG.mdis excluded by!CHANGELOG.mddocs/gurobi.logis excluded by!**/*.logdocs/project_info/img/architecture.svgis excluded by!**/*.svgpyproject.tomlis excluded by!pyproject.tomltests/test_script_xsh/expected-results/test.outis excluded by!**/*.out
📒 Files selected for processing (63)
.github/workflows/main.yml(4 hunks)docs/index.rst(3 hunks)docs/snakefiles/deployment.rst(2 hunks)docs/snakefiles/rules.rst(11 hunks)snakemake/cli.py(21 hunks)snakemake/io.py(2 hunks)snakemake/script/__init__.py(11 hunks)snakemake/workflow.py(11 hunks)test-environment.yml(3 hunks)tests/tests.py(13 hunks).github/workflows/main.yml(1 hunks)docs/snakefiles/deployment.rst(4 hunks)docs/snakefiles/reporting.rst(4 hunks)snakemake/io.py(1 hunks)snakemake/script/__init__.py(4 hunks)tests/tests.py(7 hunks).github/workflows/main.yml(1 hunks).github/workflows/main.yml(4 hunks)docs/snakefiles/rules.rst(2 hunks)snakemake/io.py(5 hunks)test-environment.yml(0 hunks).github/workflows/main.yml(3 hunks).gitignore(2 hunks)docs/index.rst(2 hunks)docs/project_info/codebase.rst(1 hunks)docs/project_info/contributing.rst(3 hunks)docs/snakefiles/deployment.rst(2 hunks)docs/snakefiles/reporting.rst(5 hunks)docs/snakefiles/rules.rst(1 hunks)misc/vim/ftplugin/snakemake/folding.vim(2 hunks)misc/vim/ftplugin/snakemake/sections.vim(1 hunks)misc/vim/syntax/snakemake.vim(5 hunks)setup.py(1 hunks)snakemake/cli.py(2 hunks)snakemake/common/tests/__init__.py(1 hunks)snakemake/io.py(1 hunks)snakemake/script/__init__.py(3 hunks)snakemake/workflow.py(1 hunks)test-environment.yml(2 hunks)tests/README.md(1 hunks)tests/test01/Snakefile(1 hunks)tests/test_benchmark/Snakefile(1 hunks)tests/test_benchmark_jsonl/Snakefile(1 hunks)tests/test_conda_python_3_7_script/Snakefile(1 hunks)tests/test_conda_python_3_7_script/test_script_python_3_7.py(1 hunks)tests/test_conda_run/Snakefile(1 hunks)tests/test_conda_run/expected-results/test.txt(1 hunks)tests/test_conda_run/test_python_env.yaml(1 hunks)tests/test_conda_run/test_script_run.py(1 hunks)tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/PKG-INFO(0 hunks)tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/SOURCES.txt(0 hunks)tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/dependency_links.txt(0 hunks)tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/top_level.txt(0 hunks)tests/test_script_xsh/Snakefile(1 hunks)tests/test_script_xsh/envs/xonsh.yaml(1 hunks)tests/test_script_xsh/scripts/test.xsh(1 hunks)tests/test_validate/Snakefile(1 hunks)tests/test_validate/config.schema.yaml(1 hunks)tests/test_validate/module-test/Snakefile(1 hunks)tests/test_validate/module-test/config.schema.yaml(1 hunks)tests/test_validate/samples.schema.yaml(1 hunks)tests/tests.py(4 hunks)tests/tests_using_conda.py(1 hunks)
💤 Files with no reviewable changes (4)
- tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/top_level.txt
- tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/PKG-INFO
- tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/SOURCES.txt
- tests/test_prebuilt_conda_script/dummy_package/src/dummy.egg-info/dependency_links.txt
✅ Files skipped from review due to trivial changes (13)
- tests/test_validate/samples.schema.yaml
- tests/test_conda_python_3_7_script/test_script_python_3_7.py
- test-environment.yml
- tests/test_script_xsh/scripts/test.xsh
- tests/README.md
- tests/test_conda_run/test_python_env.yaml
- .gitignore
- .github/workflows/main.yml
- tests/test_script_xsh/envs/xonsh.yaml
- tests/test_conda_run/test_script_run.py
- tests/test_validate/config.schema.yaml
- snakemake/io.py
- tests/test_conda_run/expected-results/test.txt
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
setup.pysnakemake/cli.pysnakemake/workflow.pysnakemake/io.pysnakemake/script/__init__.pysnakemake/common/tests/__init__.pytests/tests.pytests/tests_using_conda.py
🧠 Learnings (1)
tests/test01/Snakefile (1)
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-11-12T11:27:07.572Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
🪛 Ruff (0.8.2)
snakemake/workflow.py
1812-1812: Avoid inequality comparisons to False; use if ruleinfo.container_img: for truth checks
Replace with ruleinfo.container_img
(E712)
127-127: snakemake.api imported but unused
Remove unused import: snakemake.api
(F401)
snakemake/io.py
686-688: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
695-697: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
snakemake/cli.py
135-135: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
snakemake/script/__init__.py
97-100: Use ternary operator anchor = f"#{urllib.parse.quote(self._anchor)}" if self._anchor else "" instead of if-else-block
Replace if-else-block with anchor = f"#{urllib.parse.quote(self._anchor)}" if self._anchor else ""
(SIM108)
tests/tests.py
18-18: snakemake.utils.min_version imported but unused
Remove unused import: snakemake.utils.min_version
(F401)
18-18: snakemake.utils.min_version imported but unused
Remove unused import: snakemake.utils.min_version
(F401)
🪛 YAMLlint (1.35.1)
.github/workflows/main.yml
[error] 51-51: trailing spaces
(trailing-spaces)
🔇 Additional comments (81)
tests/test_validate/module-test/Snakefile (3)
1-5: Solid setup with proper imports and shell specification.The file correctly sets up the bash shell and imports the necessary modules for validation.
10-14: Rule implementation looks good.The rule 'a' is properly structured with output and shell directives. The use of a wildcard in the output pattern allows for flexible file generation.
7-7:Details
✅ Verification successful
Verify schema file location.
The validation is referencing "config.schema.yaml" which should exist in the same directory as this Snakefile.
🏁 Script executed:
#!/bin/bash # Check if the schema file exists in the expected location if [ -f "tests/test_validate/module-test/config.schema.yaml" ]; then echo "✅ Schema file exists in the correct location" else echo "❌ Schema file not found in expected location" echo "Looking for schema file in other locations..." find tests/test_validate -name "config.schema.yaml" fiLength of output: 162
Schema file location verified.
The verification script confirms that the file
config.schema.yamlexists in the expected directory (tests/test_validate/module-test). No further action is needed regarding its placement.tests/test_validate/module-test/config.schema.yaml (3)
1-4: Appropriate schema declaration and description.The file correctly specifies the JSON Schema version and includes a clear description.
5-13: Well-structured schema definition with proper validation constraints.The schema correctly defines the properties with appropriate types and validation patterns. The regex pattern for the adapter property ensures it only contains valid nucleotide characters.
14-15: Required field properly specified.Marking 'samples' as required ensures that essential configuration data is provided.
tests/test_validate/Snakefile (4)
10-10: Config validation moved after configfile declaration.This is the correct order as validation should happen after the config is loaded.
18-18: Appropriate use of trailing comma in expand list.The trailing comma in the expand function's arguments prepares for possible future additions and follows Python best practices.
21-25: Well-structured module declaration.The module is correctly defined with the appropriate snakefile path and config passing. This modular approach promotes code reuse and maintainability.
28-28: Effective use of rule namespace with prefixing.Using the wildcard import with a prefix follows best practices for module integration, preventing namespace conflicts while making all rules available.
setup.py (1)
33-33: LGTM - Proper inclusion of template files in package dataThis addition ensures that template files in the unit_tests directory are properly included when packaging the application.
tests/test_script_xsh/Snakefile (1)
1-12: LGTM - Well-structured test Snakefile for Xonsh scriptsThe rules are clearly defined with proper input/output relationships. The Snakefile correctly specifies the conda environment and script path.
tests/test_conda_python_3_7_script/Snakefile (1)
7-7: LGTM - Improved script naming for version clarityThe script name change clearly indicates this test is specific to Python 3.7, improving organization and clarity.
tests/test_conda_run/Snakefile (1)
1-13: LGTM - Well-documented test rule with helpful commentsThe rule is clearly structured with good documentation. I appreciate the comments clarifying that this approach is for testing purposes only, with a recommendation to use the script directive in production workflows.
tests/test01/Snakefile (1)
6-9: Good documentation on commented codeThe added comments clearly explain why the shell.prefix line is commented out and provide an important warning about restoring default settings if re-enabled. This is valuable documentation that will help prevent potential issues in the test suite.
tests/test_benchmark/Snakefile (2)
16-17: Good refactoring to use parameterizationAdding a
paramssection with a configurable timeout value improves maintainability by centralizing configuration values.
19-19: Updated to use stress-ng with explicit time unitThe change from
stresstostress-ngalong with the explicit "s" suffix for seconds improves clarity and updates the test to use a more current stress testing tool.misc/vim/ftplugin/snakemake/sections.vim (2)
7-7: Improved section navigation for Snakemake filesAdding support for
use,onstart,onsuccess, andonerrorkeywords enhances the Vim navigation capabilities, allowing users to jump between more types of Snakemake sections.
9-9: Consistent pattern updates across navigation typesThe same keywords have been properly added to the second pattern as well, ensuring consistency in navigation behavior for all commands.
tests/test_benchmark_jsonl/Snakefile (2)
19-19: Reduced benchmark timeout for faster testsReducing the timeout from 10 to 4 seconds will make the benchmark tests run faster while still being adequate for testing purposes.
22-22: Consistent use of stress-ng across benchmark testsThe update to use
stress-ngwith explicit seconds suffix aligns with the changes in the other benchmark file, maintaining consistency across the test suite.docs/index.rst (4)
32-38: Updated social media presenceGood update replacing the Twitter badge with Bluesky and Mastodon badges, aligning with current social media trends and providing alternative platforms for users to follow updates.
104-105: Welcome new maintainersGreat addition of Michael Jahn and Cade Mirchandani to the maintainers list. This acknowledges their contributions and gives users additional points of contact.
148-148: New tutorial path addedThe addition of the interaction visualization reporting tutorial expands the documentation resources, which is valuable for users looking to create interactive visualizations or reports.
152-152: Renamed toctree for better clarityChanging
:name: executionto:name: execution-toctreemakes the purpose of this toctree clearer and more consistent with naming conventions.misc/vim/ftplugin/snakemake/folding.vim (3)
15-16: Enhanced folding for additional Snakemake constructsGood improvement to the Vim folding functionality by including checkpoints, use statements, and event handlers (onstart, onsuccess, onerror) in the folding logic.
34-34: Updated top-level rule detectionThis change properly extends the detection of top-level constructs to include checkpoints and use statements, matching the changes made to the folding logic above.
36-36: Added 'use' statement to indented rule detectionConsistent update to also check for indented 'use' statements, ensuring the folding behavior treats all similar constructs uniformly.
.github/workflows/main.yml (4)
44-52: Expanded testing matrix with shell variantsGood enhancement to test with both bash and dash shells, improving coverage of different shell environments. The exclusion of dash with Python 3.11 helps focus testing resources efficiently.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 51-51: trailing spaces
(trailing-spaces)
91-91: Added dash shell to apt dependenciesAppropriate addition of the dash shell to the apt install list, ensuring it's available for the expanded testing matrix.
121-124: Added shell switching logicGood implementation to change the default shell to dash when testing with dash, ensuring the tests are actually using the specified shell.
147-147: Updated artifact upload conditionThe condition for uploading the report has been correctly updated to include the shell type check, ensuring reports are only uploaded from the dash shell tests when test_group is 1.
test-environment.yml (3)
7-7: Added setuptools dependencyGood addition of setuptools to the environment dependencies, which is often required for package installation and development workflows.
55-55: Added stress-ng for testingAdding stress-ng is appropriate for conducting stress tests in the environment, which can help validate system behavior under load.
61-62: Moved cwltool and cwl-utils from pip to conda dependenciesGood change moving cwltool and cwl-utils from pip dependencies to conda dependencies, which typically provides better integration with the conda environment and dependency resolution.
docs/snakefiles/deployment.rst (2)
288-293: Clear documentation enhancement for conda with run directivesThis addition provides important clarification that while conda environments can be used with the
rundirective, it's a special case with different behavior than other directives. The documentation now properly explains that thecondadirective only affects shell function calls within therunscript because therundirective must execute in the same process as Snakemake itself.
464-467: Good parallel documentation for apptainer/singularity with run directivesThis addition provides the same important clarification for container usage as was provided for conda environments. The parallel structure makes it clear that containers have the same limitations when used with the
rundirective.snakemake/common/tests/__init__.py (2)
224-224: Configuration updated to use local MinIO endpointChanged from remote endpoint (
play.minio.io) to a local one (127.0.0.1) for testing purposes.
228-232: MinIO credentials simplified for testingThe access and secret keys have been simplified to standard testing credentials (
minio/minio123), which is appropriate for local development and testing environments.docs/snakefiles/reporting.rst (4)
7-10: Enhanced report documentation with better explanationsThe documentation now better explains the types of reports that can be generated (HTML file for smaller datasets vs. ZIP archive for larger ones), providing clearer guidance to users.
13-16: Well-structured section on including results in reportsThis new section heading properly separates the content, making the documentation more navigable.
262-317: Comprehensive report generation instructions addedThis section now includes clear instructions for generating both HTML and ZIP reports, explains when to use each type, and details how to create partial reports. The examples with command-line syntax are particularly helpful.
318-330: Useful custom layout instructions addedThis new section provides valuable information on how to customize report appearance with a custom stylesheet, including a specific example for branding reports.
snakemake/cli.py (3)
777-789: Good addition of --consider-ancient argumentThis new CLI option provides users with a useful way to override rerun triggers for specific inputs, which helps in workflows where the user knows better than the file modification dates. The help text clearly explains the purpose and usage.
1474-1482: New scheduler-subsample parameter addedThis new parameter allows limiting the number of jobs considered for scheduling through random sampling, which is useful for very large DAGs where the scheduler might otherwise take a long time.
1685-1686:Details
❓ Verification inconclusive
Default conda frontend changed from mamba to conda
The default value for
--conda-frontendhas been changed from "mamba" to "conda". This is significant as mamba is typically much faster.
🌐 Web query:
Why might Snakemake change the default conda frontend from mamba to conda?💡 Result:
Snakemake switched its default conda frontend to mamba in 2021 due to its faster dependency resolution and installation speed[3][5][11]. However, there are scenarios where reverting to the original conda frontend might be necessary or occur unintentionally:
Key Reasons for Potential Reversion to Conda
Compatibility Issues with Mamba
Newer versions of mamba (e.g., 2.0.0) dropped support for certain flags like--no-default-packages, which Snakemake relies on for environment creation[16]. This breaks workflows that depend on these flags.Mamba Installation/Path Errors
- Snakemake requires mamba to be installed in the base conda environment or visible in the
PATH[9].- Common error:
The 'mamba' command is not available in the shell...[9].- Workaround: Explicitly use
--conda-frontend condaif mamba isn't properly configured.Error Handling Differences
Mamba sometimes fails to surface environment creation errors properly. For example:
- Misspelled packages in
.yamlfiles might log a misleading "environment created" message with mamba, while conda correctly reports the error[6].- Snakemake may falsely activate a nonexistent environment when using mamba[6].
How to Control the Frontend
- Force conda:
snakemake --use-conda --conda-frontend conda- Force mamba (if compatible):
snakemake --use-conda --conda-frontend mamba- Persistent configuration: Set
conda-frontend: condain Snakemake profiles to override the default globally[4][10].
Historical Context
- Pre-2021: Conda was the default frontend[3][13].
- 2021–2024: Mamba became default starting in Snakemake 6.1.0[3][5][11].
- Post-2024: Continued mamba default, but edge cases necessitate manual conda fallbacks[16][19].
For stability, always verify conda/mamba behavior in critical workflows and pin versions in environment files[14][18].
Citations:
- 1: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html
- 2: https://openprairie.sdstate.edu/cgi/viewcontent.cgi?article=1007&context=honors_isp
- 3: https://snakemake.readthedocs.io/en/v6.8.0/project_info/history.html
- 4: https://snakemake.readthedocs.io/en/stable/executing/cli.html
- 5: https://snakemake.readthedocs.io/en/v6.1.0/project_info/history.html
- 6: #581
- 7: https://snakemake.readthedocs.io/en/v7.29.0/executing/cli.html
- 8: https://workflowhub.eu/workflows/554
- 9: https://turakhia.ucsd.edu/ROADIES/troubleshooting/
- 10: https://snakemake.readthedocs.io/en/v6.15.1/executing/cli.html
- 11: https://snakemake.readthedocs.io/en/v8.7.0/project_info/history.html
- 12: https://snakemake.readthedocs.io/en/v8.1.1/project_info/history.html
- 13: https://snakemake.readthedocs.io/en/v5.32.1/executing/cli.html
- 14: https://pmc.ncbi.nlm.nih.gov/articles/PMC9754251/
- 15: https://discussion.nextstrain.org/t/spaces-in-full-paths-causing-problems/1195
- 16: #3108
- 17: https://aws.amazon.com/blogs/startups/how-stoke-therapeutics-is-turbo-charging-drug-discovery-using-snakemake-and-aws/
- 18: https://megalinter.io/7.1.0/descriptors/snakemake_snakemake/
- 19: https://github.com/snakemake/snakemake/actions/runs/7268918419
- 20: https://support.bioconductor.org/p/9135916/
Attention: Confirm the Intended Default for Conda Frontend
It looks like the default for the
--conda-frontendoption is now set to"conda"instead of"mamba". Since mamba has been the default since 2021—primarily for its faster dependency resolution and installation speed—this change could impact performance. However, based on known compatibility issues (such as dropped support for certain flags in newer mamba versions and error handling differences), reverting to conda might have been an intentional fallback.Please verify the following:
- Default Value Check: Confirm that the CLI now explicitly sets the default to
"conda"insnakemake/cli.py(lines 1685-1686) and that no residual code is affecting this.- Rationale Documentation: If this change is intentional due to mamba compatibility issues (e.g., missing support for the
--no-default-packagesflag or installation/path errors), consider adding a note in the documentation or in the help string to inform users about the trade-offs.- User Override: Ensure that users can still easily switch to mamba via the
--conda-frontend mambaflag if they prefer its performance benefits.docs/project_info/contributing.rst (1)
3-122: Documentation looks good.No issues found with these changes. The heading updates, additional explanations, and clarifications across lines 3–122 improve readability and flow.
snakemake/workflow.py (5)
57-57: Import of the newMissingOutputFileCachePathExceptionlooks appropriate.The exception is being used later in a try-except block to handle cache path misconfiguration.
689-709: Caching logic is well-structured.The addition of a try-except with
MissingOutputFileCachePathExceptionprovides clear handling for missing cache configuration.
1266-1267: Enhanced logging for metadata checks even with no jobs to run.Logging missing and outdated metadata at the end of a no-op run can help diagnose future rerun triggers or debugging, so this addition seems beneficial.
1282-1283: Provenance info logging on exceptions is helpful.Capturing provenance-triggered reasons before re-raising exceptions is a good usability improvement.
1301-1302: Provenance logging on dry-run completion.Makes sense to provide additional details on what triggered the run, even on a dry-run.
docs/project_info/codebase.rst (1)
1-224: Thorough architecture documentation.This new reference provides a clear overview of Snakemake's codebase and plugin system. The conceptual hierarchy and references to key modules are well-described.
snakemake/io.py (1)
1801-1804: Validate sizes before summation insize_mb.Before summing in megabytes, ensure no file size retrieval has failed or resulted in
None. A quick check and graceful handling might help avoid runtime errors.snakemake/script/__init__.py (2)
266-280: Extend numeric encoding for R with caution on potential edge cases.
- Large Integers: Python
intcan exceed typical R integer bounds. Potential for overflow in R.- Complex Splitting: For complex numbers,
a+biform is correct but confirm any user needs for slight variations (likea-biif imag < 0).Do you want a script to scan for usage of extremely large integers or negative imaginary parts in the codebase to confirm correct handling?
286-289: Boolean vs numeric type ordering.Here, you handle
boolbefore(int, float, complex)—which is correct in Python 3. Just ensure no fallback to numeric occurs for booleans. Looks good.docs/snakefiles/rules.rst (9)
12-20: Improved example clarity with specific rule naming.The example has been updated to use a concrete rule name (
myrule) instead of the generic placeholder (NAME), and the input/output section formatting has been improved with line breaks for better readability.
22-22: Enhanced rule capabilities description.The explanation of rule capabilities has been expanded to explicitly mention that rules can use plain Python, boilerplate-free scripting, wildcards, non-file parameters, and log files, providing a more comprehensive overview.
24-26: Improved shell command documentation.Additional details have been added about accessing local and global variables within shell commands, including how input and output files are evaluated and formatted, with special mention of the
:qformatting instruction for quoting list elements that contain whitespace.
214-215: Broadened helper section scope.The section title has been changed from the narrowly focused "Helper functions for defining input and output files" to the more comprehensive "Helpers for defining rules," reflecting an expansion of the section's scope.
551-571: Added Rule item access helpers section.This new section explains how to access other items of the same rule via globally available objects (
input,output,resources, andthreads), which simplifies code by eliminating the need for custom functions with specific parameter signatures.
586-636: Added Sub-path access functionality.The new
subpathfunction is documented, showing how to access sub-paths of input or output files with various options (strip_suffix,basename,parent,ancestor). Practical examples demonstrate usage patterns for each parameter option.
1074-1075: Removed duplicated section header.Fixed a duplicate "Log-Files" section header that was erroneously repeated.
3009-3026: Added support for continuously updated input.New feature in Snakemake 8.2 allows rules to continuously accept new input files during workflow execution via a synchronized Python queue and the
from_queuehelper function. This is useful for streaming data analysis scenarios.
3119-3138: Added procedural rule definition documentation.New section explains how to create rules programmatically without explicit names or by overriding the name attribute, allowing for dynamic rule generation in loops.
misc/vim/syntax/snakemake.vim (5)
4-4: Updated last change date.The "Last Change" date has been updated to March 6, 2025, reflecting recent modifications to the syntax file.
23-56: Updated Snakemake syntax definitions to version 8.29.The syntax definitions have been significantly expanded to accommodate new Snakemake features:
- Added grammar for
module,configfile, anduserulestatements- Extended rule parameters to include additional options like
log,benchmark,cache, andpriority- Added definitions for container integration, script handling, and notebook support
- Included module-specific parameters for configuration and validation
These changes ensure proper syntax highlighting for all Snakemake 8.29 constructs.
103-103: Added 'use' directive.The
usekeyword has been added to the list of directives with a label, supporting rule inheritance and module imports.
119-122: Added new file handling functions.Two new functions have been added to the syntax highlighting:
before_update: For marking files that should be considered as input before being updatedexists: For checking file existence within workflowsThese functions enhance file dependency management in Snakemake workflows.
131-131: Added 'update' function.The
updatefunction has been added to the list of Snakemake functions, allowing files to be modified rather than recreated from scratch.tests/tests.py (5)
18-18: Good import of min_version for patching.The import of
min_versionfromsnakemake.utilsis correctly marked with a comment explaining its purpose (to be patched during tests). This is a good practice as it clearly documents the intent.🧰 Tools
🪛 Ruff (0.8.2)
18-18:
snakemake.utils.min_versionimported but unusedRemove unused import:
snakemake.utils.min_version(F401)
1483-1487: Well-implemented patching for version check.Patching
min_versionto always returnTrueduring testing is a good approach. This prevents version check failures in the test environment while still testing the core functionality.
1492-1496: Consistent implementation of min_version patching.This test follows the same pattern as the previous one, maintaining consistency in how version checks are bypassed in the test suite.
1517-1520: Good application of patching technique.The same patching technique is applied consistently across all module tests that might be affected by version checks.
2016-2017: Clear function rename for better specificity.Renaming from what was likely
test_shell_exectotest_shell_exec_singularityimproves clarity by being more specific about what's being tested.tests/tests_using_conda.py (6)
1-24: Well-structured file header and imports.The new file maintains consistent authorship information and imports necessary modules. The organization follows the same pattern as the existing test files in the project.
31-39: Good test function with clear deployment method.This test explicitly specifies the deployment method as Conda, which makes the test's purpose clear. The
@skip_on_windowsdecorator correctly handles platform-specific limitations.
61-81: Well-documented test with detailed error explanation.The test is properly documented with comments explaining the failure scenario on Windows. The implementation includes proper cleanup and verification steps, checking both for environment directory existence and setup completion flag.
105-106: Good verification of environment hashing.The test correctly verifies that two separate conda environments are created (checking the number of directories), which validates the hashing mechanism for environment separation.
137-155: Excellent implementation of Git-based wrapper test.This test demonstrates a sophisticated approach by dynamically cloning the wrapper repository into a temporary directory and using it for testing. The use of
tempfile.TemporaryDirectory()ensures proper cleanup after the test.
323-326: Helpful comment explaining test dependencies.This comment provides valuable context about the following tests, explaining that while they don't explicitly use Conda for environment creation, they still depend on Conda functionality.
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
snakemake/ioutils/input.py (1)
12-31:⚠️ Potential issueFix missing import and improve function definition.
The
extract_checksumfunction has some issues that should be addressed:
WorkflowErroris not imported at the top of the file- The lambda function can be rewritten as a regular function
- The exception handling could be improved
+from snakemake.exceptions import WorkflowError def parse_input(infile, parser, **kwargs): def inner(wildcards, input, output): with open(infile, "r") as fh: if parser is None: return fh.read().strip() else: return parser(fh, **kwargs) return inner def extract_checksum(infile, **kwargs): try: import pandas as pd - fix_file_name = lambda x: x.removeprefix("./") + def fix_file_name(x): + return x.removeprefix("./") + return ( pd.read_csv( infile, sep=" ", header=None, engine="python", converters={1: fix_file_name}, ) .set_index(1) .loc[fix_file_name(kwargs.get("file"))] .item() ) except ImportError as err: - raise WorkflowError("Pandas is required to extract checksum from file.") + raise WorkflowError("Pandas is required to extract checksum from file. Please install it with 'pip install pandas'.") from err🧰 Tools
🪛 Ruff (0.8.2)
16-16: Do not assign a
lambdaexpression, use adefRewrite
fix_file_nameas adef(E731)
30-30: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
30-30: Undefined name
WorkflowError(F821)
snakemake/cli.py (1)
569-569: 🛠️ Refactor suggestionDuplicate parse_func assignment
This appears to be an unintended duplicate of line 568.- parse_func=maybe_base64(parse_set_threads),
🧹 Nitpick comments (4)
.github/workflows/main.yml (2)
51-54: Empty exclude section can be removedThe empty exclude array isn't necessary and can be removed if you're no longer excluding any specific shell configurations.
- exclude: [] - #- shell: dash - # py_ver: "3.11"🧰 Tools
🪛 actionlint (1.7.4)
52-52: "exclude" section should not be empty
(syntax-check)
🪛 YAMLlint (1.35.1)
[error] 53-53: trailing spaces
(trailing-spaces)
211-218: Good addition: New testing summary jobThe new
testing-donejob provides a clear indication that all tests have passed successfully, improving workflow visibility.Don't forget to add a newline at the end of the file.
- run: echo "All tests passed." +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 218-218: no new line character at the end of file
(new-line-at-end-of-file)
docs/snakefiles/rules.rst (1)
3192-3192: Minor formatting inconsistencyLine 3192 has a tilde character indicating a change, but appears to be unchanged from the original note section.
snakemake/cli.py (1)
107-147: New functionparse_consider_ancient
The parsing logic forRULE=INPUTITEMSis concise and works as intended. One edge case is the reliance onisidentifier()to validate names—if this is intentional, it may skip strings with special characters or hyphens. Otherwise, this is a sound implementation.🧰 Tools
🪛 Ruff (0.8.2)
135-135: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
🛑 Comments failed to post (1)
snakemake/workflow.py (1)
127-127: 🛠️ Refactor suggestion
Unused import
snakemake.api
Static analysis indicates this import is never referenced.-from snakemake import api, caching, sourcecache +from snakemake import caching, sourcecache📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.-from snakemake import api, caching, sourcecache +from snakemake import caching, sourcecache🧰 Tools
🪛 Ruff (0.8.2)
127-127:
snakemake.apiimported but unusedRemove unused import:
snakemake.api(F401)
|
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
snakemake/dag.py (1)
700-710:⚠️ Potential issueRemove apparent leftover code from merging conflict.
Lines 701–703 (including the extra “try:” at line 704) look like duplicate or partial conflict remnants. Retaining them may cause confusion or errors. Consider removing the extraneous statements so that the final logic is just one validtry/await wait_for_files(...)block.if not ignore_missing_output and expanded_output: - try: - await wait_for_files( - expanded_output, - try: try: await wait_for_files( expanded_output, latency_wait=wait, wait_for_local=wait_for_local, ignore_pipe_or_service=True, )
🧹 Nitpick comments (3)
snakemake/dag.py (1)
80-93: Consider renaming the variablesortedto avoid overshadowing the built-in.
While the functionality here is clear and correct for producing a list of concurrency sets, the identifiersortedoverrides Python’s built-in. Renaming it tosorted_layersor similar would improve clarity.def toposort(graph): from graphlib import TopologicalSorter sorter = TopologicalSorter(graph) sorter.prepare() - sorted = list() + sorted_layers = list() while sorter.is_active(): ready = set() for task in sorter.get_ready(): ready.add(task) sorter.done(task) - sorted.append(ready) - return sorted + sorted_layers.append(ready) + return sorted_layerstests/test_validate/Snakefile (1)
22-27: Consider DRY for repeated asserts.
These lines replicate the same data validation steps for different DataFrame forms. If you plan to add more test rows or more conditions, consider parameterizing the tests or using a common helper to reduce code repetition.snakemake/utils.py (1)
118-149: Efficient handling of Pandas DataFrame validation.
Using_validate_recordfor each row and adding missing columns is a solid approach. Consider logging the names of any columns that get newly populated with defaults, to aid debugging in large DataFrames.🧰 Tools
🪛 Ruff (0.8.2)
132-134: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
🛑 Comments failed to post (2)
tests/test_validate/Snakefile (2)
29-39:
⚠️ Potential issueSchema mismatch for 'tissue' column.
The Polars schema references{"sample": pl.String, "condition": pl.String, "n": pl.UInt8}, but this code assertssamples[0, "tissue"] == "blood". This may cause runtime errors or unexpected parsing if thetissuecolumn is missing from the schema. Either add"tissue": pl.Stringto the schema or change the asserts to match the defined columns.schema={ "sample": pl.String, "condition": pl.String, "n": pl.UInt8, + "tissue": pl.String },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.# Polars DataFrame samples = pl.read_csv( config["samples"], separator="\t", schema={"sample": pl.String, "condition": pl.String, "n": pl.UInt8, "tissue": pl.String}, null_values="NA", ) validate(samples, "samples.schema.yaml") assert samples[0, "tissue"] == "blood" assert samples[0, "n"] == 1 assert samples[1, "n"] == 0
4-4: 💡 Verification agent
🧩 Analysis chain
Ensure Polars is installed and properly declared.
The newimport polars as plstatement introduces a direct dependency on Polars. Verify that Polars is included in the project's environment and documented where appropriate.
🏁 Script executed:
#!/bin/bash # Check if polars is declared as a dependency rg -A 3 "polars" | grep -i "install_requires"Length of output: 43
I'll generate an additional shell script to broaden our search for an explicit Polars dependency in typical dependency configuration files (like
setup.py,requirements.txt, andpyproject.toml) to help verify whether Polars is properly declared. Please run the script below and share its output:Once we have the output from this script, we can determine if Polars has been declared as a dependency.
🏁 Script executed:
#!/bin/bash echo "Checking dependency files for 'polars' declaration:" echo -e "\nFile: setup.py" if [ -f setup.py ]; then rg -i "polars" setup.py || echo "No 'polars' found in setup.py" else echo "setup.py not found" fi echo -e "\nFile: requirements.txt" if [ -f requirements.txt ]; then rg -i "polars" requirements.txt || echo "No 'polars' found in requirements.txt" else echo "requirements.txt not found" fi echo -e "\nFile: pyproject.toml" if [ -f pyproject.toml ]; then rg -i "polars" pyproject.toml || echo "No 'polars' found in pyproject.toml" else echo "pyproject.toml not found" fiLength of output: 704
Action Required: Declare Polars Dependency Properly
- The
tests/test_validate/Snakefilenow imports Polars directly (import polars as pl).- A search in the common dependency files (setup.py, requirements.txt, and pyproject.toml) did not reveal any declaration of Polars.
- Please update the appropriate dependency configuration (e.g., add Polars to
install_requiresin setup.py) to ensure that the project environment includes Polars.
…3099) <!--Add a description of your PR here--> This PR is intended to apply on top of snakemake#3098; see that PR for the first two commits. From there, I attempted to account for all of the NPM packages that are bundled in `vega/vega.js` via dependencies. I made an effort to verify that every license file added in this commit actually corresponds to some code present in `vega.js`. Next, I inspected `vega.js` for comments indicating that particular routines were copied, derived, or adapted from other projects, and added the license files for those projects. I didn’t attempt to judge whether or not any of snippets might be distant enough from their inspirations that they could perhaps claim not to be derived works under a particular set of copyright laws – I just took the attributions at their word. Almost all of these licenses are ones (like `MIT`, `Apache-2.0`, `ISC` or `BSD-3-Clause`) that require including the copyright and permission statements (the license text) in copies and derived works, so while this work is fussy, tedious, and unrewarding, it would seem to be necessary. So far, I only dug through `vega.js`. I still need to check if there is anything in `vega-lite.js` or `vega-embed.js` that isn’t already accounted for in `vega.js`, and I need to look at the other libraries too. Still, I thought I should post my work in progress in case it collected any feedback. ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. **N/A – this change is not really testable – except that building the package confirms the checksums match.** * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). **N/A – no documentation update is believed to be necessary.** <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added numerous new assets and dependencies related to the Vega library, enhancing functionality. - Expanded package declarations for various D3 modules and other libraries in the reporting system. - Introduced a new class for executing Xonsh scripts, allowing for more scripting options. - Added a new rule for running Python scripts in a Conda environment, improving workflow flexibility. - Enhanced documentation with a new "Maintainers" section and expanded resources. - Introduced a new test suite for validating Conda functionalities. - Enhanced data validation capabilities using both Pandas and Polars for handling sample data. - **Bug Fixes** - Ensured that licenses and versions are accurately maintained for various libraries. - Improved error handling for Windows-specific issues in the testing framework. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com> Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>



This PR is intended to apply on top of #3098; see that PR for the first two commits.
From there, I attempted to account for all of the NPM packages that are bundled in
vega/vega.jsvia dependencies. I made an effort to verify that every license file added in this commit actually corresponds to some code present invega.js.Next, I inspected
vega.jsfor comments indicating that particular routines were copied, derived, or adapted from other projects, and added the license files for those projects. I didn’t attempt to judge whether or not any of snippets might be distant enough from their inspirations that they could perhaps claim not to be derived works under a particular set of copyright laws – I just took the attributions at their word.Almost all of these licenses are ones (like
MIT,Apache-2.0,ISCorBSD-3-Clause) that require including the copyright and permission statements (the license text) in copies and derived works, so while this work is fussy, tedious, and unrewarding, it would seem to be necessary.So far, I only dug through
vega.js. I still need to check if there is anything invega-lite.jsorvega-embed.jsthat isn’t already accounted for invega.js, and I need to look at the other libraries too. Still, I thought I should post my work in progress in case it collected any feedback.QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). N/A – no documentation update is believed to be necessary.Summary by CodeRabbit
New Features
Bug Fixes