fix: allow "--containerize apptainer" to output apptainer format instead of dockerfile#4030
Conversation
📝 WalkthroughWalkthroughAdds selectable containerization output formats by introducing a ContainerFormat abstraction with Docker and Apptainer implementations, threads a fmt parameter through CLI → API → Workflow → containerize, and updates docs to document Apptainer definition generation. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI
participant API as SnakemakeApi
participant Workflow as Workflow
participant Containerize as containerize()
participant Formatter as Formatter (Docker/Apptainer)
CLI->>API: containerize(fmt="apptainer")
API->>Workflow: containerize(fmt="apptainer")
Workflow->>Containerize: containerize(workflow, dag, fmt="apptainer")
Containerize->>Formatter: select implementation (fmt)
Containerize->>Formatter: header(image, env_hash)
Containerize->>Formatter: label(key, value)
Containerize->>Formatter: copy_file(src, dest)
Containerize->>Formatter: run_command(cmd)
Containerize->>Formatter: run_combined(...)/finish() (apptainer)
Formatter-->>Containerize: formatted output emitted
Containerize-->>Workflow: done
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/snakemake/cli.py (1)
1020-1026: Add a real CLI regression for--containerize.This changed the option from a boolean flag to
nargs="?"plusconst="dockerfile". The existing containerize coverage intests/tests_using_conda.pygoes through the helper/API path (containerize=True), so it won’t catch parser regressions for bare--containerizeversus--containerize apptainer.Also applies to: 2144-2145
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/cli.py` around lines 1020 - 1026, The argparse change for "--containerize" (nargs="?" with const="dockerfile") can break bare-flag parsing; add CLI-level regression tests that exercise both forms. Update tests/tests_using_conda.py (or add a new test file) to invoke the CLI parser or run the CLI entry with args ["--containerize"] and ["--containerize", "apptainer"] and assert the parsed/observed value is "dockerfile" for the bare flag and "apptainer" for the explicit value; also keep the existing API-path test that passes containerize=True to ensure that path remains covered. Use the same parser function or entry-point used by src/snakemake/cli.py so the test detects parser regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/deployment/containerize.py`:
- Around line 112-113: The copy_file method currently appends cwd-relative src
paths to self._files which makes generated Apptainer `%files` entries
location-dependent; update copy_file(src, dest) so it normalizes src to an
absolute host path (e.g., via os.path.abspath or pathlib.Path(src).resolve())
before appending to self._files, ensuring the stored tuple (absolute_src, dest)
is used when rendering Apptainer `.def` files.
- Around line 115-118: The add_remote_file method appends unquoted url and dest
into /bin/sh %post commands (self._post), which breaks on characters like &;
update add_remote_file to shell-quote both operands when building the mkdir and
wget lines so the produced commands use quoted dirname(dest) and quoted dest/url
(i.e., wrap the interpolated {dest} and {url} in single quotes or use proper
sh-escaping) before appending to self._post to prevent word-splitting and
injection.
---
Nitpick comments:
In `@src/snakemake/cli.py`:
- Around line 1020-1026: The argparse change for "--containerize" (nargs="?"
with const="dockerfile") can break bare-flag parsing; add CLI-level regression
tests that exercise both forms. Update tests/tests_using_conda.py (or add a new
test file) to invoke the CLI parser or run the CLI entry with args
["--containerize"] and ["--containerize", "apptainer"] and assert the
parsed/observed value is "dockerfile" for the bare flag and "apptainer" for the
explicit value; also keep the existing API-path test that passes
containerize=True to ensure that path remains covered. Use the same parser
function or entry-point used by src/snakemake/cli.py so the test detects parser
regressions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b67a1ac8-21b6-4eb7-8ba9-21a6c90e34a9
📒 Files selected for processing (5)
docs/snakefiles/deployment.rstsrc/snakemake/api.pysrc/snakemake/cli.pysrc/snakemake/deployment/containerize.pysrc/snakemake/workflow.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/snakemake/deployment/containerize.py (2)
115-116:⚠️ Potential issue | 🟡 MinorMake Apptainer
%filessources location-independent.The
srcpath reaches this method as a cwd-relative path fromrelfile(). Apptainer's%filessection consumes host paths during build, so the generated.defwill fail if written or built from a different directory. Normalize to an absolute path.Proposed fix
def copy_file(self, src, dest): - self._files.append((src, dest)) + self._files.append((os.path.abspath(src), dest))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 115 - 116, The copy_file method stores cwd-relative sources which makes Apptainer %files entries location-dependent; update copy_file (and the call site expecting relfile() inputs) to normalize src to an absolute path before appending (e.g., use os.path.abspath or Path(src).resolve()) so the tuple stored in self._files contains an absolute host path and the generated .def is portable across working directories.
118-121:⚠️ Potential issue | 🟠 MajorShell-quote remote downloads in
%postto prevent injection.The
urlanddestparameters are interpolated directly into shell commands. A URL with&in the query string (or other shell metacharacters) will break or behave unexpectedly. Useshlex.quote()for both operands.Proposed fix
Add the import at the top of the file:
import shlexThen update the method:
def add_remote_file(self, url, dest): # Apptainer doesn't have ADD, so we download in %post - self._post.append(f"mkdir -p $(dirname {dest})") - self._post.append(f"wget -O {dest} {url}") + dest_quoted = shlex.quote(str(dest)) + url_quoted = shlex.quote(str(url)) + self._post.append(f"mkdir -p $(dirname {dest_quoted})") + self._post.append(f"wget -O {dest_quoted} {url_quoted}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 118 - 121, The add_remote_file method currently interpolates url and dest directly into shell lines, risking shell injection; import shlex at top and wrap both url and dest with shlex.quote() when building the _post entries in add_remote_file (e.g., quote url and dest before embedding into the mkdir and wget commands) so the generated %post commands are safely shell-escaped.
🧹 Nitpick comments (2)
src/snakemake/deployment/containerize.py (2)
85-87: Consider using spread syntax for list construction.Static analysis suggests using
[*commands, final_cmd]instead of concatenation.Proposed fix
def run_combined(self, commands, final_cmd): - all_cmds = commands + [final_cmd] + all_cmds = [*commands, final_cmd] print("\nRUN", " \\\n ".join(all_cmds))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 85 - 87, The list concatenation in run_combined uses commands + [final_cmd] to build all_cmds; replace this with the spread-style construction to create all_cmds as [*commands, final_cmd] and then continue using all_cmds (or inline it into the join) so the behavior is identical but uses the suggested list construction; update the expression where all_cmds is assigned in the run_combined function.
132-148: Addfinish()to the ABC with a default no-op.The
finish()method exists only inApptainerFormat, and the caller useshasattr(formatter, "finish")to conditionally invoke it. This breaks the abstraction—all formatters should expose the same interface. Addingfinish()toContainerFormatwith a default no-op implementation improves type safety and consistency.Proposed fix
Add to the ABC (around line 55):
`@abstractmethod` def run_combined(self, commands, final_cmd): """Print multiple commands combined together (e.g., conda env creation).""" pass + + def finish(self): + """Finalize output. Override in subclasses that need post-processing.""" + passThen update the caller (line 267-268):
- # For Apptainer, print the collected sections at the end - if hasattr(formatter, "finish"): - formatter.finish() + # Finalize output (no-op for Docker, prints sections for Apptainer) + formatter.finish()Also applies to: 267-268
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 132 - 148, Add a default no-op finish() method to the ContainerFormat abstract base class so all formatters implement the same interface (mirror the existing ApptainerFormat.finish behavior), and then change callers that currently use hasattr(formatter, "finish") to call formatter.finish() directly (remove the hasattr conditional). Reference the ContainerFormat class to add the method and the ApptainerFormat.finish implementation to copy semantics from, and update the call site that currently checks for hasattr(formatter, "finish") to invoke finish() unconditionally on the formatter instance.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/deployment/containerize.py`:
- Line 242: The call to formatter.comment currently uses an unnecessary
f-string; change formatter.comment(f"Conda environment:") to a plain string
literal formatter.comment("Conda environment:") so the redundant f prefix is
removed (search for the formatter.comment invocation in containerize.py and
replace the f-prefixed string).
---
Duplicate comments:
In `@src/snakemake/deployment/containerize.py`:
- Around line 115-116: The copy_file method stores cwd-relative sources which
makes Apptainer %files entries location-dependent; update copy_file (and the
call site expecting relfile() inputs) to normalize src to an absolute path
before appending (e.g., use os.path.abspath or Path(src).resolve()) so the tuple
stored in self._files contains an absolute host path and the generated .def is
portable across working directories.
- Around line 118-121: The add_remote_file method currently interpolates url and
dest directly into shell lines, risking shell injection; import shlex at top and
wrap both url and dest with shlex.quote() when building the _post entries in
add_remote_file (e.g., quote url and dest before embedding into the mkdir and
wget commands) so the generated %post commands are safely shell-escaped.
---
Nitpick comments:
In `@src/snakemake/deployment/containerize.py`:
- Around line 85-87: The list concatenation in run_combined uses commands +
[final_cmd] to build all_cmds; replace this with the spread-style construction
to create all_cmds as [*commands, final_cmd] and then continue using all_cmds
(or inline it into the join) so the behavior is identical but uses the suggested
list construction; update the expression where all_cmds is assigned in the
run_combined function.
- Around line 132-148: Add a default no-op finish() method to the
ContainerFormat abstract base class so all formatters implement the same
interface (mirror the existing ApptainerFormat.finish behavior), and then change
callers that currently use hasattr(formatter, "finish") to call
formatter.finish() directly (remove the hasattr conditional). Reference the
ContainerFormat class to add the method and the ApptainerFormat.finish
implementation to copy semantics from, and update the call site that currently
checks for hasattr(formatter, "finish") to invoke finish() unconditionally on
the formatter instance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 39c1b233-e735-4260-a6fe-dc339782b627
📒 Files selected for processing (1)
src/snakemake/deployment/containerize.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
src/snakemake/deployment/containerize.py (2)
242-242:⚠️ Potential issue | 🟡 MinorRemove extraneous
fprefix from string literal.This f-string has no placeholders.
Proposed fix
- formatter.comment(f"Conda environment:") + formatter.comment("Conda environment:")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` at line 242, The call to formatter.comment uses an unnecessary f-string: change formatter.comment(f"Conda environment:") to a regular string by removing the f prefix so it becomes formatter.comment("Conda environment:"); update the occurrence in containerize.py where formatter.comment is invoked to avoid the redundant f-string.
118-121:⚠️ Potential issue | 🟠 MajorShell quoting remains incomplete for URLs with single quotes.
The current implementation wraps
urlanddestin single quotes, which handles&characters but doesn't escape single quotes within the URL itself. If a URL contains a single quote, the shell command will break or potentially allow injection.Proposed fix using shlex.quote
Add
import shlexat the top, then:def add_remote_file(self, url, dest): # Apptainer doesn't have ADD, so we download in %post - self._post.append(f"mkdir -p $(dirname {dest})") - self._post.append(f"curl -L '{url}' > '{dest}'") + self._post.append(f"mkdir -p $(dirname {shlex.quote(str(dest))})") + self._post.append(f"curl -L {shlex.quote(str(url))} -o {shlex.quote(str(dest))}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 118 - 121, The add_remote_file method builds shell lines using raw url and dest which breaks if either contains single quotes; import shlex and use shlex.quote to safely shell-escape both values before inserting into the commands in add_remote_file (e.g., q_url = shlex.quote(url); q_dest = shlex.quote(dest)), then append f"mkdir -p $(dirname {q_dest})" and f"curl -L {q_url} > {q_dest}" to self._post so all embedded quotes and special chars are properly escaped.
🧹 Nitpick comments (2)
src/snakemake/deployment/containerize.py (2)
266-268: Consider definingfinish()in the ABC with a default no-op.Using
hasattr(formatter, "finish")to conditionally call the method is a code smell. A cleaner approach is to definefinish()as a concrete method inContainerFormatwith an empty body, allowing all formatters to call it unconditionally.Proposed changes
In the ABC:
class ContainerFormat(ABC): # ... existing abstract methods ... def finish(self): """Finalize output. Override if the format needs deferred printing.""" passThen at the end of
containerize():- # For Apptainer, print the collected sections at the end - if hasattr(formatter, "finish"): - formatter.finish() + # Finalize output (e.g., print collected sections for Apptainer) + formatter.finish()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 266 - 268, The current conditional hasattr check around formatter.finish() is a smell; add a concrete no-op finish(self) method to the ContainerFormat ABC (so implementations can override if needed) and then remove the hasattr guard in containerize() so you call formatter.finish() unconditionally; update the ContainerFormat class to include a finish method and ensure containerize() uses formatter.finish() directly (referencing ContainerFormat, finish, and containerize to locate the changes).
85-87: Consider using spread syntax for list concatenation.Static analysis suggests using
[*commands, final_cmd]instead ofcommands + [final_cmd]for slightly better performance and idiomatic style.Suggested change
def run_combined(self, commands, final_cmd): - all_cmds = commands + [final_cmd] + all_cmds = [*commands, final_cmd] print("\nRUN", " \\\n ".join(all_cmds))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/deployment/containerize.py` around lines 85 - 87, In run_combined, replace the list concatenation expression commands + [final_cmd] with the spread-style literal [*commands, final_cmd] to be more idiomatic and slightly faster; update the all_cmds assignment in the run_combined method to use all_cmds = [*commands, final_cmd] and keep the rest of the function behavior (the print call) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/deployment/containerize.py`:
- Around line 126-130: The current run_combined method uses cmd.rstrip(" &")
which indiscriminately removes any trailing spaces or ampersands; change this to
remove only the literal " &&" suffix so legitimate trailing " &" (background
operator) or spaces aren’t stripped—replace the rstrip call in run_combined
(where commands are appended into self._post) with a precise removal such as
cmd.removesuffix(" &&") (or an equivalent explicit check-and-slice) before
appending, ensuring final_cmd handling remains unchanged.
---
Duplicate comments:
In `@src/snakemake/deployment/containerize.py`:
- Line 242: The call to formatter.comment uses an unnecessary f-string: change
formatter.comment(f"Conda environment:") to a regular string by removing the f
prefix so it becomes formatter.comment("Conda environment:"); update the
occurrence in containerize.py where formatter.comment is invoked to avoid the
redundant f-string.
- Around line 118-121: The add_remote_file method builds shell lines using raw
url and dest which breaks if either contains single quotes; import shlex and use
shlex.quote to safely shell-escape both values before inserting into the
commands in add_remote_file (e.g., q_url = shlex.quote(url); q_dest =
shlex.quote(dest)), then append f"mkdir -p $(dirname {q_dest})" and f"curl -L
{q_url} > {q_dest}" to self._post so all embedded quotes and special chars are
properly escaped.
---
Nitpick comments:
In `@src/snakemake/deployment/containerize.py`:
- Around line 266-268: The current conditional hasattr check around
formatter.finish() is a smell; add a concrete no-op finish(self) method to the
ContainerFormat ABC (so implementations can override if needed) and then remove
the hasattr guard in containerize() so you call formatter.finish()
unconditionally; update the ContainerFormat class to include a finish method and
ensure containerize() uses formatter.finish() directly (referencing
ContainerFormat, finish, and containerize to locate the changes).
- Around line 85-87: In run_combined, replace the list concatenation expression
commands + [final_cmd] with the spread-style literal [*commands, final_cmd] to
be more idiomatic and slightly faster; update the all_cmds assignment in the
run_combined method to use all_cmds = [*commands, final_cmd] and keep the rest
of the function behavior (the print call) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 31ca21e4-7246-48b2-9345-8a36528cc902
📒 Files selected for processing (1)
src/snakemake/deployment/containerize.py
🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Fix snakemake#4114 `--containerize apptainer` option from snakemake#4030, is used to create apptainer definition file from workflow. When the workflow has rule with `wrapper` the image doesn't build, it's missing `curl`. Add check for a wrapper rule in workflow and add `curl`. Add image size optimization if wrapper rule in workflow. Add test for containerize with wrapper.
Fixes #4114 `--containerize apptainer` option from #4030, is used to create apptainer definition file from workflow. When the workflow has rule with `wrapper:` the image doesn't build, it's missing `curl`. Add check for a wrapper rule in workflow and add `curl`. Add image size optimization if wrapper rule in workflow. Add test for containerize with wrapper. ### QC * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). * [x] I, as a human being, have checked each line of code in this pull request ### AI-assistance disclosure <!-- If AI tools were involved in creating this PR, please check all boxes that apply below and make sure that you adhere to our AI-assisted contributions policy: https://github.com/snakemake/snakemake/blob/main/docs/project_info/contributing.rst --> I used AI assistance for: * [ ] Code generation (e.g., when writing an implementation or fixing a bug) * [ ] Test/benchmark generation * [ ] Documentation (including examples) * [ ] Research and understanding <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Container images for Apptainer now conditionally install curl when remote environment files are fetched and perform cleanup of package lists to reduce image size. * **Tests** * Added test coverage validating Apptainer containerization behavior for wrapper-rule scenarios, including dependency installation and cleanup steps. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Closes #3989
Problem
The
--containerizeflag only generated Dockerfile output. Users working in HPC environments that use Apptainer (the successor to Singularity) had no way to directly generate Apptainer definition files from their workflows.Solution
Refactored the containerization logic using an Abstract Base Class (ABC) pattern. This allows different container formats to be added by simply creating a new class that inherits from the base class. The dockerfile and apptainer format available now.
Changes
snakemake/deployment/containerize.py— AddedContainerFormatABC with abstract methods (header,label,copy_file,run_command, etc.),DockerFormatclass (produces same output as before), andApptainerFormatclass (generates%labels,%files,%postsections)snakemake/cli.py— Changed--containerizefrom a boolean flag to an optional format choice (dockerfileorapptainer), defaulting todockerfilefor backward compatibilitysnakemake/api.py— Addedfmtparameter to pass the chosen format throughsnakemake/workflow.py— Addedfmtparameter to pass the chosen format throughdocs/snakefiles/deployment.rst— Documented the new--containerize apptaineroptionUsage
QC
test_containerize_checkpointdocs/) is updated: Updateddocs/snakefiles/deployment.rstto document the new--containerize apptaineroption.Summary by CodeRabbit
New Features
Documentation
Refactor