Skip to content

docs: Correct workflow.source_path() description in documentation#4036

Merged
johanneskoester merged 6 commits intosnakemake:mainfrom
LKress:test_source_path
Mar 11, 2026
Merged

docs: Correct workflow.source_path() description in documentation#4036
johanneskoester merged 6 commits intosnakemake:mainfrom
LKress:test_source_path

Conversation

@LKress
Copy link
Copy Markdown
Contributor

@LKress LKress commented Mar 10, 2026

This PR is intended to clarify the workflow.source_path() documentation and additionally ensures the intended functionality with two new tests. It thereby addresses the following issues: #1805, #2331.

In the documentation, the usage of workflow.source_path() is clarified by specifying what that when workflow.source_path() is used in input no rerun is triggered while the use in params causes a rerun of the given rule upon every start of the pipeline.

The newly introduced tests check:

  • That workflow.source_path() correctly caches files that are at arbitrary positions in the filesystem
  • That workflow.source_path() does not result in rule reruns if it is used in input but it does if it is used in params

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • Documentation
    • Clarified how files are mounted and available in workflows/containers; recommend using input with workflow.source_path to avoid automatic reruns caused by params
    • Updated examples and guidance to prefer input-based access over params for auxiliary source files, and noted source-path inputs aren’t mapped to default storage providers
  • Tests
    • Added new tests and example workflows covering source_path usage, nested cases, and rerun behavior verification

@LKress LKress requested a review from johanneskoester as a code owner March 10, 2026 14:53
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 10, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90a9e5dc-91e8-4ece-adf2-52efe4fb5b1d

📥 Commits

Reviewing files that changed from the base of the PR and between 8ca72cb and ca5c5f4.

📒 Files selected for processing (2)
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/rules.rst

📝 Walkthrough

Walkthrough

Docs updated to instruct using input with workflow.source_path(...) instead of params for mounting source files; tests and example Snakefiles added to validate nested workflow.source_path usage and rerun differences when workflow.source_path is used from params vs input.

Changes

Cohort / File(s) Summary
Docs
docs/snakefiles/deployment.rst, docs/snakefiles/rules.rst
Clarify examples and notes to mount files via input: ... = workflow.source_path(...) rather than params; add notes that referencing workflow.source_path from params yields changing cached paths and can trigger reruns; note that workflow.source_path inputs are non-storage (won't map to default storage).
Test workflows & resources
tests/test_source_path_parent/nest/subdir/Snakefile, tests/test_source_path_rerun/Snakefile, tests/test_source_path_parent/resources/test.txt, tests/test_source_path_rerun/resources/test.txt
Add two Snakefiles demonstrating nested workflow.source_path use and rerun semantics (one using input, one showing params vs input behavior) and add small resource files (test.txt).
Tests — runner
tests/tests.py
Add test_source_path_parent() and test_source_path_rerun() (both @skip_on_windows) to exercise nested resolution and rerun behavior by comparing mtimes across runs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: clarifying the workflow.source_path() documentation to reflect its actual behavior regarding reruns.
Description check ✅ Passed The description is comprehensive and complete. It clearly explains the purpose of the PR, what documentation was updated, what tests were added, and both QC checkboxes are marked as completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@LKress LKress marked this pull request as draft March 10, 2026 14:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snakefiles/deployment.rst`:
- Around line 464-465: The note incorrectly implies any file in params causes
reruns; change the explanation to state that only using workflow.source_path
within params produces a cached path that changes on each start and thus
triggers reruns. Update the text to explicitly mention workflow.source_path as
the cause, advise using workflow.source_path in input rather than params, and
keep references to params, workflow.source_path, and input so reviewers can find
and update the sentence accordingly.

In `@tests/tests.py`:
- Around line 1997-2012: The test is flaky due to filesystem timestamp
resolution: after calling run(dpath("test_source_path_rerun"), cleanup=False)
then rerunning into tmpdir the mtime check between mtime_param_1 and
mtime_param_2 can be identical; to fix, import time and insert a short sleep
(e.g., time.sleep(1)) after the first run (before computing mtime_param_1 or
before the second run) or, alternatively, use higher-resolution timestamps via
os.stat(...).st_mtime_ns when computing mtime_param_1/mtime_param_2; also ensure
the temporary workspace(s) created when cleanup=False (path1 and path2 returned
by run) are removed in a finally block (use shutil.rmtree on path1/path2) so the
tempdirs are cleaned up even on assertion failure—look for run, dpath,
path1/path2, mtime_param_1/mtime_param_2 to apply these changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 542026fa-58e8-4cfc-9bda-f8017b7b486c

📥 Commits

Reviewing files that changed from the base of the PR and between b88171c and bfa4bee.

⛔ Files ignored due to path filters (1)
  • tests/test_source_path_parent/expected-results/test.out is excluded by !**/*.out
📒 Files selected for processing (4)
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/rules.rst
  • tests/test_source_path_parent/resources/test.txt
  • tests/tests.py

tests/tests.py Outdated
Comment on lines +1997 to +2012
path1 = run(dpath("test_source_path_rerun"), cleanup=False)

mtime_param_1 = os.path.getmtime(os.path.join(path1, "test.out"))

mtime_input_1 = os.path.getmtime(os.path.join(path1, "test2.out"))

path2 = run(dpath("test_source_path_rerun"), tmpdir=path1, cleanup=False)

mtime_param_2 = os.path.getmtime(os.path.join(path2, "test.out"))

mtime_input_2 = os.path.getmtime(os.path.join(path2, "test2.out"))

# rerun of the same workflow with source_path as param results in a rerun
assert mtime_param_1 != mtime_param_2
# rerun of the same workflow with source_path as input results in no rerun
assert mtime_input_1 == mtime_input_2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make this rerun assertion robust across filesystem timestamp resolution.

os.path.getmtime() can return the same value for two back-to-back runs on coarse-resolution filesystems, so Line 2010 can fail even when test.out was rebuilt. Since this test also keeps the workspace around with cleanup=False, it should clean that tempdir up in a finally block.

🧪 Suggested hardening

Add near the other stdlib imports:

import time
 def test_source_path_rerun():
     path1 = run(dpath("test_source_path_rerun"), cleanup=False)
-
-    mtime_param_1 = os.path.getmtime(os.path.join(path1, "test.out"))
-
-    mtime_input_1 = os.path.getmtime(os.path.join(path1, "test2.out"))
-
-    path2 = run(dpath("test_source_path_rerun"), tmpdir=path1, cleanup=False)
-
-    mtime_param_2 = os.path.getmtime(os.path.join(path2, "test.out"))
-
-    mtime_input_2 = os.path.getmtime(os.path.join(path2, "test2.out"))
-
-    # rerun of the same workflow with source_path as param results in a rerun
-    assert mtime_param_1 != mtime_param_2
-    # rerun of the same workflow with source_path as input results in no rerun
-    assert mtime_input_1 == mtime_input_2
+    try:
+        param_out = os.path.join(path1, "test.out")
+        input_out = os.path.join(path1, "test2.out")
+
+        mtime_param_1 = os.stat(param_out).st_mtime_ns
+        mtime_input_1 = os.stat(input_out).st_mtime_ns
+
+        time.sleep(1.1)
+        run(dpath("test_source_path_rerun"), tmpdir=path1, cleanup=False)
+
+        mtime_param_2 = os.stat(param_out).st_mtime_ns
+        mtime_input_2 = os.stat(input_out).st_mtime_ns
+
+        # rerun of the same workflow with source_path as param results in a rerun
+        assert mtime_param_1 != mtime_param_2
+        # rerun of the same workflow with source_path as input results in no rerun
+        assert mtime_input_1 == mtime_input_2
+    finally:
+        shutil.rmtree(path1, ignore_errors=ON_WINDOWS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 1997 - 2012, The test is flaky due to filesystem
timestamp resolution: after calling run(dpath("test_source_path_rerun"),
cleanup=False) then rerunning into tmpdir the mtime check between mtime_param_1
and mtime_param_2 can be identical; to fix, import time and insert a short sleep
(e.g., time.sleep(1)) after the first run (before computing mtime_param_1 or
before the second run) or, alternatively, use higher-resolution timestamps via
os.stat(...).st_mtime_ns when computing mtime_param_1/mtime_param_2; also ensure
the temporary workspace(s) created when cleanup=False (path1 and path2 returned
by run) are removed in a finally block (use shutil.rmtree on path1/path2) so the
tempdirs are cleaned up even on assertion failure—look for run, dpath,
path1/path2, mtime_param_1/mtime_param_2 to apply these changes.

@LKress LKress force-pushed the test_source_path branch from 3279eaa to b94d84b Compare March 10, 2026 15:40
LKress added 3 commits March 10, 2026 16:49
* Check if paths of parent directories are correctly resolved
* Check that no rerun is triggered if workflow.source_path is used in
  input and that a rerun is triggered if workflow.source_path is used in
params.
@LKress LKress force-pushed the test_source_path branch from b94d84b to 47f0852 Compare March 10, 2026 15:53
@LKress LKress marked this pull request as ready for review March 11, 2026 08:28
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/test_source_path_parent/nest/subdir/Snakefile (1)

6-8: Drop the extra echo unless the test asserts logs.

It looks like a debug artifact and just adds noise to test output; the cat alone already exercises the behavior under test.

♻️ Suggested cleanup
     shell:
-        "echo {input.a};"
         "cat {input.a} > {output}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_source_path_parent/nest/subdir/Snakefile` around lines 6 - 8, The
shell step in the Snakefile contains an extra debug command ("echo {input.a};")
that should be removed; update the shell: block in
tests/test_source_path_parent/nest/subdir/Snakefile to only run the actual
command (keep "cat {input.a} > {output}") so the test output isn’t noisy and the
behavior under test is still exercised.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_source_path_parent/nest/subdir/Snakefile`:
- Around line 6-8: The shell step in the Snakefile contains an extra debug
command ("echo {input.a};") that should be removed; update the shell: block in
tests/test_source_path_parent/nest/subdir/Snakefile to only run the actual
command (keep "cat {input.a} > {output}") so the test output isn’t noisy and the
behavior under test is still exercised.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1face5f4-685b-4e2e-b280-ec775fa9fc73

📥 Commits

Reviewing files that changed from the base of the PR and between bfa4bee and 8ca72cb.

⛔ Files ignored due to path filters (2)
  • tests/test_source_path_parent/expected-results/test.out is excluded by !**/*.out
  • tests/test_source_path_rerun/expected-results/test.out is excluded by !**/*.out
📒 Files selected for processing (7)
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/rules.rst
  • tests/test_source_path_parent/nest/subdir/Snakefile
  • tests/test_source_path_parent/resources/test.txt
  • tests/test_source_path_rerun/Snakefile
  • tests/test_source_path_rerun/resources/test.txt
  • tests/tests.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_source_path_rerun/resources/test.txt
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/snakefiles/deployment.rst
  • docs/snakefiles/rules.rst
  • tests/tests.py

@johanneskoester
Copy link
Copy Markdown
Contributor

closes #1805, #2331.

@johanneskoester johanneskoester merged commit 45883c5 into snakemake:main Mar 11, 2026
1 of 2 checks passed
johanneskoester pushed a commit that referenced this pull request Mar 13, 2026
🤖 I have created a release *beep* *boop*
---


##
[9.17.0](v9.16.3...v9.17.0)
(2026-03-13)


### Features

* Allow storing snakemake metadata in files or databases
([#4012](#4012))
([dd75f31](dd75f31))
* Allow to specify comparison command per-unit test
([#3956](#3956))
([b88171c](b88171c))
* job table orderd topological when run is started
([#4018](#4018))
([75cf506](75cf506))
* lambda functions for priority in rules
([#3253](#3253))
([d2aa226](d2aa226))
* Make on... directive of modules accessible
([#4050](#4050))
([e9f2e1c](e9f2e1c))


### Bug Fixes

* adjust conda tests to not fail on apple silicon; fix
[#4040](#4040)
([#4049](#4049))
([f5b0142](f5b0142))
* allow "--containerize apptainer" to output apptainer format instead of
dockerfile ([#4030](#4030))
([f5cac30](f5cac30))
* apptainer command not recognized when singularity is absent
([#4010](#4010))
([b8162e2](b8162e2))
* capture stderr when tests fail
([#3995](#3995))
([97d74ba](97d74ba))
* **docs:** make Data-dependent conditional execution a complete example
([#4043](#4043))
([3a1d7f2](3a1d7f2))
* don't build the DAG when running unlock. Fixes
[#4000](#4000) and
[#198](#198)
([#4007](#4007))
([acf79fd](acf79fd))
* Ensure pixi tasks may be run as advertised
([#4046](#4046))
([88253c2](88253c2))
* fix checkpoint handling corner cases
([#3870](#3870) and
[#3559](#3559))
([#4015](#4015))
([63f4257](63f4257))
* issue 3642
([#4054](#4054))
([76e6fc2](76e6fc2))
* issue 3815
([#4026](#4026))
([b0eec96](b0eec96))
* logging None in shellcmd context causes error
([#4064](#4064))
([d0652cd](d0652cd))
* lookup function returns default value for empty DataFrame queries
([#4056](#4056))
([f71de97](f71de97))
* make `cache: omit-software` a rule specific property
([#4085](#4085))
([034a9e7](034a9e7))
* reduce number of tests leaving temporary files behind
([#4033](#4033))
([a3a1c97](a3a1c97))
* regression in dynamic resource handling
([#4038](#4038))
([f2c554a](f2c554a))
* somewhat shorter announce message
([#4080](#4080))
([57efc71](57efc71))


### Performance Improvements

* switch reretry with tenacity; decouple container classes (with Python
3.7 compat for old scripts) from rest of the codebase (enabling moving
to newer python versions)
([#4032](#4032))
([ffb19e7](ffb19e7))


### Documentation

* Add AI-assisted contributions policy to contributing guidelines
([#4051](#4051))
([dd70526](dd70526))
* **codebase:** Update & simplify plugin architecture section
([#4052](#4052))
([176cf63](176cf63))
* Correct workflow.source_path() description in documentation
([#4036](#4036))
([45883c5](45883c5))
* fixed wrong code example for collect() function
([#4037](#4037))
([5c85ed8](5c85ed8))
* Minor docs improvements
([#4089](#4089))
([29ea226](29ea226))
* switch to sphinx_design for tabs
([#3976](#3976))
([9674614](9674614))
* typo in the migration table breaking a pip install command
([#4024](#4024))
([66f9dda](66f9dda))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

workflow.source_path does not work as expected using source_path in params always triggers unneeded rerun

2 participants