fix: robustly escaping quotes in generated bash scripts (#3297)#3303
fix: robustly escaping quotes in generated bash scripts (#3297)#3303prs513rosewood wants to merge 5 commits intosnakemake:mainfrom
Conversation
The fix runs Python's format-generated string through shlex.quote to properly interpret in shell scripts generated with the `script:` directive.
📝 WalkthroughWalkthroughThis pull request introduces significant updates across multiple files in the Snakemake codebase. Key changes include the addition of the Changes
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
snakemake/script/__init__.py (1)
472-473: Consider escaping dictionary keys as well.While the value escaping is now robust, dictionary keys in bash associative arrays should also be escaped to handle special characters safely.
Apply this diff to escape both keys and values:
- formatted_v = shlex.quote(f"{v}") - s += f'[{k}]="$(printf "%s" {formatted_v})" ' + formatted_k = shlex.quote(f"{k}") + formatted_v = shlex.quote(f"{v}") + s += f'[{formatted_k}]="$(printf "%s" {formatted_v})" '
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
snakemake/script/__init__.py(2 hunks)tests/test_script/config.yaml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/test_script/config.yaml
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/script/__init__.py
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: testing (10, 3.12)
- GitHub Check: testing (10, 3.11)
- GitHub Check: testing (9, 3.12)
- GitHub Check: testing (9, 3.11)
- GitHub Check: testing (8, 3.12)
- GitHub Check: testing (8, 3.11)
- GitHub Check: testing (7, 3.12)
- GitHub Check: testing (7, 3.11)
- GitHub Check: testing (6, 3.12)
- GitHub Check: testing (6, 3.11)
- GitHub Check: testing-windows (10)
- GitHub Check: testing (5, 3.12)
- GitHub Check: testing (5, 3.11)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (4, 3.12)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (4, 3.11)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (3, 3.12)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (3, 3.11)
- GitHub Check: testing-windows (5)
- GitHub Check: testing (2, 3.12)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (2, 3.11)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.12)
- GitHub Check: testing-windows (2)
- GitHub Check: testing (1, 3.11)
- GitHub Check: testing-windows (1)
🔇 Additional comments (2)
snakemake/script/__init__.py (2)
20-20: LGTM! Good choice usingshlexfor shell escaping.The
shlexmodule is the correct choice for safely escaping shell command arguments.
472-473: Excellent improvement to shell script safety!The changes provide multiple security and robustness benefits:
shlex.quoteensures proper escaping of special characters in shell valuesprintf "%s"prevents unintended interpretation of escape sequences- The solution is more robust than direct string concatenation
|
I'm taking a look at this today. This PR is a good start, but I can quite easily add further tests that still break things, so I'm seeing this as an opportunity to beef up the tests further and try to make the entire business of passing values to scripts (including R and Julia) as robust as I can. In the proposed modified test case, we have: Then since BASH does not support nested dicts, the entire dictionary is quoted as a string And here the dict I'll also add some further tests for R and Julia. If this turns out to be a really big job and can't be done in a few days then the PR as it stands will at least avoid a very problematic failure case, and I think should be applied. |
|
I think I have it all working - see https://github.com/EdinburghGenomics/snakemake/tree/issue_3297 @prs513rosewood how would you like to proceed? I can make a fresh PR based on the above branch, which will be easy to do, but then you will not get any credit for your work on the fix because I somehow lost your commits from my Git history. Or you can pull the above commit then rebase the changes onto your own branch. It's a little bit of faff for you, but you will then get credit in the changelog for your work. It also avoids the need to make a second PR. Either is fine by me. Hopefully this being the Snakemake hackathon week it's a good time to get a PR accepted! |
|
Note that in my new version, a nested dict within the config gets represented as a string containing just the dict keys. This is consistent with how params are represented, and consistent with the existing documentation regarding list values. I don't think shell-escaping the full Python representation of the dict is very helpful to anyone - if something nested within a dict within the config needs to be extracted and passed to the shell script then it can easily be copied into a param. |
|
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🧹 Nitpick comments (29)
snakemake/workflow.py (1)
1812-1813: Fix boolean comparison style in conditional check.The direct comparison with
Falseshould be avoided. Python best practices favor implicit truth value testing.-if not ruleinfo.template_engine and ruleinfo.container_img != False: +if not ruleinfo.template_engine and ruleinfo.container_img is not False:Alternatively, a more Pythonic approach would be:
-if not ruleinfo.template_engine and ruleinfo.container_img != False: +if not ruleinfo.template_engine and ruleinfo.container_img:However, this might change the behavior if
ruleinfo.container_imgcould be an empty string or 0, as those would evaluate to False in a truthiness check, so the first suggestion is safer without knowing the context.🧰 Tools
🪛 Ruff (0.8.2)
1812-1812: Avoid inequality comparisons to
False; useif ruleinfo.container_img:for truth checksReplace with
ruleinfo.container_img(E712)
snakemake/report/html_reporter/template/components/app.js (1)
42-43: Added renderTrigger update logic - LGTM!The implementation to update renderTrigger with a random number when content changes is a good approach to force re-renders when needed.
Consider adding a comment to explain the purpose of the renderTrigger state and how it works to improve code maintainability:
renderTrigger: view.content !== undefined ? Math.random() : this.state.renderTrigger, +// Generate a new random number when content changes to force iframe re-renderingsnakemake/executors/local.py (1)
261-264: Improved error handling and reporting flowThe changes to the _callback method improve error handling by ensuring that exceptions are printed before reporting the job error, and by clarifying the success path with an explicit else block.
While the changes are good, consider adding a small comment to explain the control flow:
if ex is not None: print_exception(ex, self.workflow.linemaps) self.report_job_error(job_info) else: + # No exception occurred, report job success self.report_job_success(job_info)snakemake/report/html_reporter/template/components/toggle.js (2)
1-1: Remove redundant "use strict" directive.Since JavaScript modules are already in strict mode, this directive is unnecessary.
Apply this diff to remove it:
-'use strict';🧰 Tools
🪛 Biome (1.9.4)
[error] 1-1: Redundant use strict directive.
The entire contents of JavaScript modules are automatically in strict mode, with no statement needed to initiate it.
Safe fix: Remove the redundant use strict directive.(lint/suspicious/noRedundantUseStrict)
42-43: Use “htmlFor” instead of “for” in React.In standard React usage, the label element should use “htmlFor” instead of “for” to properly associate the label with the input.
For instance:
- {for: valueId(0), className: `${labelClasses} rounded-l`} + {htmlFor: valueId(0), className: `${labelClasses} rounded-l`}snakemake/jobs.py (1)
1599-1603: Consider re-raising with origin exception.The static analysis suggests clarifying exception chaining by explicitly using “raise … from e”. This helps preserve the traceback origin and improves debuggability.
Possible fix:
- except ExceptionGroup as e: - raise WorkflowError( - f"Error postprocessing group job {self.jobid}.", - *e.exceptions, - ) + except ExceptionGroup as e: + raise WorkflowError( + f"Error postprocessing group job {self.jobid}.", + *e.exceptions, + ) from e🧰 Tools
🪛 Ruff (0.8.2)
1599-1602: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
snakemake/report/__init__.py (4)
440-443: Handle potential path mismatch.
Whenparent_pathis not truly a parent ofself.path,relative_to()will raise an error. Consider whether this scenario should be handled or validated explicitly.def filename(self): - if self.parent_path is None: - return os.path.basename(self.path) - else: - return str(self.path.relative_to(self.parent_path)) + if self.parent_path is None or not self.path.is_relative_to(self.parent_path): + return os.path.basename(self.path) + return str(self.path.relative_to(self.parent_path))
545-550: Extend documentation forregister_filearguments.
The newparent_pathparameter is not explained in the docstring, potentially confusing future maintainers. Consider adding a short note about its purpose and usage.
617-617: Bitwise or usage.
Using|=for boolean accumulation is compact and valid. Just ensure clarity for newcomers. No functional issue here.
627-632: Helpful warning for missed patterns.
Emitting a warning when no file is found is beneficial. Consider making it more explicit (e.g., printing the unmatched pattern) if diagnosing user mistakes is a frequent scenario.docs/snakefiles/deployment.rst (2)
288-292: Grammar refinement for clarity.
Replace “from the withinrunscript” with “from within therunscript” to improve readability.- The ``conda`` directive for rules with a ``run`` directive therefore only affects ``shell`` function calls that are executed from the within ``run`` script. + The ``conda`` directive for rules with a ``run`` directive therefore only affects ``shell`` function calls that are executed from within the ``run`` script.
464-467: Mirror the same improvement forcontainerdirective text.
Similar grammar fix to clarify “from within therunscript.”- The ``container`` directive for rules with a ``run`` directive therefore only affects ``shell`` function calls that are executed from the within the ``run`` script. + The ``container`` directive for rules with a ``run`` directive therefore only affects ``shell`` function calls that are executed from within the ``run`` script.snakemake/output_index.py (1)
28-39: Filtered matching based on prefix and suffix.
This approach is simpler than a full trie, but confirm performance for large rule sets. If there's a risk of collisions or many false positives, consider further checks or indexing. Otherwise, this is a good improvement in maintainability.snakemake/common/tests/__init__.py (3)
224-224: Consider using HTTPS for endpoint.
Switching the endpoint URL from HTTPS to HTTP might introduce security issues unless this is strictly for local or testing scenarios. If security is not a concern here, consider clarifying it with comments or environment-based toggles.
228-228: Avoid storing plain credentials in code.
Returning a static access key in code may be acceptable for local testing only. If these credentials ever need secrecy, prefer environment variables or configuration.
232-232: Store sensitive secret keys securely.
Similar to the access key, consider using environment variables or a secure secret manager for the secret key to avoid exposing credentials in code.snakemake/common/prefix_lookup.py (2)
21-45: match_iter() prefix logic is well-structured.
The function employs binary search with bisect for efficiency. The iterative approach to find progressively shorter prefixes is sensible. Confirm that an empty key or unexpected boundary cases won’t break the logic if the dataset includes unusual strings.
46-56: _common_prefix method handles mismatch cleanly.
The early break on mismatch is correct. The docstring’s mention of “undefined if strings are empty” could be clarified or guarded with a quick check, but this appears to match your stated requirement. The static analysis warning about the loop variable is a false positive.🧰 Tools
🪛 Ruff (0.8.2)
52-52: Loop control variable
inot used within loop body(B007)
docs/snakefiles/rules.rst (3)
923-927: Document GPU resource usage carefully.
Providing integer forgpuand strings forgpu_manufacturer/gpu_modelis clear. Consider adding an explicit example for more clarity, e.g., specifying “gpu=2, gpu_manufacturer='nvidia'” in a rule.
967-967: Clarify multiple resource assignments.
The text states that bothgpuandfooare set to local. Ensure the documentation or examples show how to handle multiple local resources in real workflows.
972-972: Command-line snippet is helpful.
Demonstrating “snakemake --set-resource-scopes gpu=global disk_mb=local” is good. Consider including an example scenario explaining why GPU might be set as global or local.snakemake/report/html_reporter/template/components/result_view_button.js (1)
64-70: renderButton fosters reusability.
Generating the button from a shared method reduces duplicated code. Consider customizing the icon label for each content type if relevant.snakemake/script/__init__.py (1)
88-103: Simplify anchor assignment using a ternary operator
Currently the anchor is assigned with an if-else block. It's more concise to use a one-line conditional:- if self._anchor: - anchor = f"#{urllib.parse.quote(self._anchor)}" - else: - anchor = "" + anchor = f"#{urllib.parse.quote(self._anchor)}" if self._anchor else ""🧰 Tools
🪛 Ruff (0.8.2)
98-101: Use ternary operator
anchor = f"#{urllib.parse.quote(self._anchor)}" if self._anchor else ""instead ofif-else-blockReplace
if-else-block withanchor = f"#{urllib.parse.quote(self._anchor)}" if self._anchor else ""(SIM108)
snakemake/report/html_reporter/template/components/abstract_view_manager.js (1)
1-35: Handle unimplemented result handlers
Currently, thehandleImg,handleHtml,handlePdf, andhandleDefaultmethods throw errors. If these are intentional placeholders, consider adding a clear comment or implementing them to avoid runtime failures.Would you like me to propose an implementation or stub out each method with a more descriptive message?
.github/workflows/main.yml (2)
51-51: Remove trailing spaces.
Line 51 has trailing spaces.Apply this diff to remove trailing spaces:
- - shell: dash + - shell: dash🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 51-51: trailing spaces
(trailing-spaces)
100-113: Ensure sensitive credentials are not misused.
Using a staticaccess_keyandsecret_keyfor MinIO is acceptable for local testing, but be mindful of not reusing these in production. Ensure they are safe in test logs and do not leak to external systems.snakemake/io.py (3)
686-688: Raise exceptions withraise ... from e.
Consider chaining the exception for better tracebacks:- raise WorkflowError( - f"Failed to create output directory {dirpath}.", e - ) + raise WorkflowError( + f"Failed to create output directory {dirpath}." + ) from e🧰 Tools
🪛 Ruff (0.8.2)
686-688: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
693-697: Use exception chaining for clarity.
Same reasoning as above; attach the original exception's context:- raise WorkflowError( - f"Failed to create FIFO for pipe output {self._file}.", e - ) + raise WorkflowError( + f"Failed to create FIFO for pipe output {self._file}." + ) from e🧰 Tools
🪛 Ruff (0.8.2)
695-697: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
1775-1808: Async file-size retrieval within properties.
These property getters (size_files*,size*) invoke an async call under the hood, then block withasync_run(). This could be expensive for large sets of files. If usage is frequent, consider caching or avoiding repeated property lookups.
🛑 Comments failed to post (1)
snakemake/report/html_reporter/template/components/common.js (1)
19-21: 🛠️ Refactor suggestion
Add error handling to the getResultMimeType function.
The new getResultMimeType function directly accesses properties without validating if they exist. If resultPath doesn't exist in the results object or if the mime_type property is missing, this will cause a JavaScript error.
Add defensive programming to handle potential missing values:
function getResultMimeType(resultPath) { - return results[resultPath].mime_type + return results[resultPath]?.mime_type || 'application/octet-stream' }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.function getResultMimeType(resultPath) { return results[resultPath]?.mime_type || 'application/octet-stream' }
|
To be honest I don't really have the time to worry about it, so it's probably best if you make a seperate PR. |
|
Closing in favour of #3389 but the commentary above still applies |
…3389) See original PR #3303 ### QC * [x ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). (I checked that the fixed behaviour is still consistent with the docs) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated the CI/CD workflow configuration for improved clarity. - Removed an outdated iterative processing example. - **Refactor** - Enhanced internal code styling and documentation to maintain consistency. - **Tests** - Updated test configurations with dynamic parameterization and refined output naming. - Improved logging, debugging, and error handling across test scripts to ensure more informative execution feedback. - Added assertions to validate configuration and parameter values in the test scripts. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lucas Frérot <lucas.frerot@sorbonne-universite.fr>
…ke#3297) (snakemake#3389) See original PR snakemake#3303 ### QC * [x ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). (I checked that the fixed behaviour is still consistent with the docs) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Chores** - Updated the CI/CD workflow configuration for improved clarity. - Removed an outdated iterative processing example. - **Refactor** - Enhanced internal code styling and documentation to maintain consistency. - **Tests** - Updated test configurations with dynamic parameterization and refined output naming. - Improved logging, debugging, and error handling across test scripts to ensure more informative execution feedback. - Added assertions to validate configuration and parameter values in the test scripts. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lucas Frérot <lucas.frerot@sorbonne-universite.fr>



The fix runs Python's format-generated string through
shlex.quoteto properly interpret in shell scripts generated with thescript:directive.Closes #3297
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation