fix: ensure that tokens are not leaked when paths or uris of source files are logged#3821
Conversation
…or without any secrets (thus avoiding their leakage into log files or scripts.
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR adds a Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant SourceFile
Note over Caller,SourceFile: Two retrieval modes for paths/URIs
rect rgb(220,240,255)
Note over Caller: Internal operations (need secrets)
Caller->>SourceFile: get_path_or_uri(secret_free=False)
SourceFile-->>Caller: full_uri_with_tokens
end
rect rgb(255,240,220)
Note over Caller: User-facing / logs / errors (redact)
Caller->>SourceFile: get_path_or_uri(secret_free=True)
SourceFile-->>Caller: redacted_uri_without_tokens
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Please format your code with |
…ake into fix/cache-wrapper-files
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/snakemake/parser.py (1)
1355-1364: Prevent credential leak in error messages (use secret_free=True for display path)The path stored in
self.pathis used in two contexts where it's exposed to users:
- Line 178: embedded in
SyntaxErrortuples raised during parsing- Line 795: embedded in generated
@workflow.rule()decorator stringsWhen using remote sources like Github or Gitlab,
get_path_or_uri(secret_free=False)includes authentication tokens. This fix correctly usessecret_free=Truefor display purposes while preserving functionality—file operations continue using the originalSourceFileobject viaworkflow.sourcecache.open(path).Apply the suggested fix to prevent token leakage.
src/snakemake/workflow.py (1)
1625-1660: Keep tokens out of compiled filenamesUsing
secret_free=Falsewhen registering and compiling a Snakefile injects any embedded credentials intoco_filename. Those filenames surface in tracebacks, logging, and cache keys, so this change reintroduces the very token leakage this PR is trying to eliminate. Switch the bookkeeping to the redacted form while still retrieving the non‑redacted path on demand for actual I/O.- self._included[snakefile.get_path_or_uri(secret_free=False)] = snakefile - self.included_stack.append(snakefile) - - default_target = self.default_target - linemap: Dict[int, int] = dict() - self.linemaps[snakefile.get_path_or_uri(secret_free=False)] = linemap + path_key = snakefile.get_path_or_uri(secret_free=True) + self._included[path_key] = snakefile + self.included_stack.append(snakefile) + + default_target = self.default_target + linemap: Dict[int, int] = dict() + self.linemaps[path_key] = linemap @@ - exec( - compile(code, snakefile.get_path_or_uri(secret_free=False), "exec"), + exec( + compile(code, snakefile.get_path_or_uri(secret_free=True), "exec"), self.globals, )
🧹 Nitpick comments (1)
src/snakemake/sourcecache.py (1)
458-466: Correct dual usage ofsecret_freeparameter!The implementation perfectly demonstrates the intended usage pattern:
- Line 458 uses
secret_free=Falseto get the full authenticated URI for opening the file- Line 464 uses
secret_free=Trueto exclude tokens from the error messageHowever, consider improving the exception chaining:
try: return open(path_or_uri, mode, encoding=None if "b" in mode else encoding) except Exception as e: - raise WorkflowError( - f"Failed to open source file {source_file.get_path_or_uri(secret_free=True)}", - e, - ) + raise WorkflowError( + f"Failed to open source file {source_file.get_path_or_uri(secret_free=True)}" + ) from eThis preserves the exception chain for better debugging while maintaining Python best practices.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/snakemake/dag.py(1 hunks)src/snakemake/deployment/conda.py(10 hunks)src/snakemake/deployment/containerize.py(2 hunks)src/snakemake/jobs.py(1 hunks)src/snakemake/parser.py(1 hunks)src/snakemake/report/__init__.py(2 hunks)src/snakemake/script/__init__.py(10 hunks)src/snakemake/sourcecache.py(12 hunks)src/snakemake/utils.py(3 hunks)src/snakemake/workflow.py(6 hunks)src/snakemake/wrapper.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/jobs.pysrc/snakemake/dag.pysrc/snakemake/deployment/containerize.pysrc/snakemake/workflow.pysrc/snakemake/wrapper.pysrc/snakemake/report/__init__.pysrc/snakemake/parser.pysrc/snakemake/deployment/conda.pysrc/snakemake/script/__init__.pysrc/snakemake/utils.pysrc/snakemake/sourcecache.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
src/snakemake/wrapper.py
🧠 Learnings (3)
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Applied to files:
src/snakemake/workflow.pysrc/snakemake/parser.pysrc/snakemake/script/__init__.py
📚 Learning: 2025-09-17T04:03:59.943Z
Learnt from: Hocnonsense
Repo: snakemake/snakemake PR: 3714
File: src/snakemake/modules.py:236-241
Timestamp: 2025-09-17T04:03:59.943Z
Learning: In Snakemake's WorkflowModifier, rule filtering via rule_whitelist and rule_exclude_list happens upstream during rule creation in the rule() decorator. The avail_rulename() method returns None for filtered-out rules, causing the entire rule creation to be skipped. This means child_modifier.rule_proxies only contains rules that passed the filtering, so inherit_rule_proxies doesn't need additional filtering logic.
Applied to files:
src/snakemake/workflow.py
📚 Learning: 2024-10-11T13:12:35.827Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3132
File: snakemake/deployment/conda.py:85-88
Timestamp: 2024-10-11T13:12:35.827Z
Learning: In the `snakemake/deployment/conda.py` file, within the `Env` class, validation of `env_dir` occurs later in the code, so it's unnecessary to validate `env_dir` in the `__init__` method.
Applied to files:
src/snakemake/deployment/conda.py
🧬 Code graph analysis (11)
src/snakemake/jobs.py (1)
src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/dag.py (1)
src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/deployment/containerize.py (1)
src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/workflow.py (1)
src/snakemake/sourcecache.py (13)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)get_basedir(56-58)get_basedir(168-175)get_basedir(256-264)LocalSourceFile(111-141)join(63-66)join(158-166)join(266-279)
src/snakemake/wrapper.py (2)
src/snakemake/script/__init__.py (1)
script(1700-1779)src/snakemake/sourcecache.py (15)
GithubFile(286-298)SourceCache(353-466)infer_source_file(320-350)get_path(391-393)join(63-66)join(158-166)join(266-279)ref(184-185)ref(253-254)get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/report/__init__.py (1)
src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/parser.py (1)
src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/deployment/conda.py (2)
src/snakemake/io/__init__.py (2)
file(457-464)IOFile(235-237)src/snakemake/sourcecache.py (6)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)
src/snakemake/script/__init__.py (1)
src/snakemake/sourcecache.py (9)
get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)get_basedir(56-58)get_basedir(168-175)get_basedir(256-264)
src/snakemake/utils.py (2)
src/snakemake/workflow.py (2)
current_basedir(1541-1545)sourcecache(500-501)src/snakemake/sourcecache.py (10)
join(63-66)join(158-166)join(266-279)get_path_or_uri(47-47)get_path_or_uri(97-98)get_path_or_uri(115-116)get_path_or_uri(155-156)get_path_or_uri(294-298)get_path_or_uri(307-317)open(378-382)
src/snakemake/sourcecache.py (1)
src/snakemake/common/__init__.py (2)
parse_uri(131-146)smart_join(149-166)
🪛 Ruff (0.14.2)
src/snakemake/workflow.py
1656-1656: Use of exec detected
(S102)
src/snakemake/sourcecache.py
97-97: Unused method argument: secret_free
(ARG002)
115-115: Unused method argument: secret_free
(ARG002)
155-155: Unused method argument: secret_free
(ARG002)
462-462: Do not catch blind exception: Exception
(BLE001)
463-466: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
463-466: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (47)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (8, macos-latest, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (5, macos-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, macos-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: docs
- GitHub Check: apidocs
🔇 Additional comments (11)
src/snakemake/dag.py (1)
3303-3306: Good: secret-free local path in sources listingUsing get_path_or_uri(secret_free=True) for LocalSourceFile in get_sources() avoids leaking secrets in archive/log contexts.
src/snakemake/jobs.py (1)
409-417: LGTM: secret-free local script path for mtime checkUsing get_path_or_uri(secret_free=True) for LocalSourceFile keeps behavior while avoiding accidental token propagation if logged elsewhere.
src/snakemake/report/__init__.py (2)
431-436: Good: redact secrets in caption-load errorUsing get_path_or_uri(secret_free=True) in the WorkflowError message prevents leaking tokens when a caption file fails to load.
736-740: Good: redact secrets in global report caption errorsThe switch to secret_free=True for dag.workflow.report_text in error reporting avoids exposing tokens in report-generation errors.
src/snakemake/sourcecache.py (7)
45-90: Excellent implementation of the secret_free parameter in the base class!The propagation of
secret_freethroughout the base class methods is correct:
- Cache paths, hashing, equality checks, and string representations use
secret_free=Trueto prevent token leakage- Operational methods like
get_basedir()andjoin()usesecret_free=Falseto preserve full paths for actual file operationsThis ensures sensitive tokens are hidden in logs and user-facing output while remaining available for authentication during file access.
97-98: Unusedsecret_freeparameter is acceptable here.While static analysis flags the unused parameter, this is correct—
GenericSourceFileserves as a fallback for arbitrary URIs that don't contain authentication tokens. The parameter must be present to satisfy the abstract interface contract.
115-116: Correct implementation for local files.Local filesystem paths don't contain authentication tokens, so not using the
secret_freeparameter is appropriate. The parameter is required by the interface contract.
155-156: Appropriate handling for git+file:// protocol.Local git repositories authenticate via SSH keys or system credentials rather than URL-embedded tokens, so the
secret_freeparameter doesn't need to affect the returned URI. The interface requirement is satisfied.
294-298: Perfect implementation of token redaction for GitHub files!The conditional auth string correctly includes the token only when
secret_free=False, ensuring:
- Authenticated requests work properly when accessing files
- Tokens are omitted from logs, error messages, and user-facing output
This is exactly the security behavior the PR aims to achieve.
307-317: Correct token redaction for GitLab files!The implementation properly conditionally includes the
private_tokenquery parameter, matching the GitHub implementation pattern. Tokens are correctly hidden whensecret_free=True.
325-325: Correct use ofsecret_free=Truefor path inference.When converting an existing
SourceFileto a string for re-inference, usingsecret_free=Trueensures tokens aren't exposed in the derived path.
🤖 I have created a release *beep* *boop* --- ## [9.13.5](v9.13.4...v9.13.5) (2025-11-04) ### Bug Fixes * cache wrapper files and wait for them in case of shared filesystem for sources ([#3809](#3809)) ([498fff7](498fff7)) * correctly handle meta-wrapper tag replacement depending on the used snakemake-wrapper release ([#3826](#3826)) ([8d46a4a](8d46a4a)) * ensure that flags are properly considered for input files before applying path modifiers (i.e. default storage providers) ([#3813](#3813)) ([d3bfe32](d3bfe32)) * ensure that tokens are not leaked when paths or uris of source files are logged ([#3821](#3821)) ([a217e50](a217e50)) * print secs as numeric in jsonl benchmark format ([#3814](#3814)) ([395a5e6](395a5e6)) * revert breaking change in 9.11.9 disallowing empty input files even when unused ([#3810](#3810)) ([83c05cc](83c05cc)) * shorten report ids (thus dir names) in order to avoid issues with path length on windows ([#3822](#3822)) ([b24d971](b24d971)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…iles are logged (snakemake#3821) ### Description <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved security by preventing sensitive authentication credentials from appearing in log messages and error outputs across script execution, conda environment handling, report generation, and source file processing. * **Chores** * Updated internal path resolution methods to consistently manage credential exposure based on context. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.13.5](snakemake/snakemake@v9.13.4...v9.13.5) (2025-11-04) ### Bug Fixes * cache wrapper files and wait for them in case of shared filesystem for sources ([snakemake#3809](snakemake#3809)) ([498fff7](snakemake@498fff7)) * correctly handle meta-wrapper tag replacement depending on the used snakemake-wrapper release ([snakemake#3826](snakemake#3826)) ([8d46a4a](snakemake@8d46a4a)) * ensure that flags are properly considered for input files before applying path modifiers (i.e. default storage providers) ([snakemake#3813](snakemake#3813)) ([d3bfe32](snakemake@d3bfe32)) * ensure that tokens are not leaked when paths or uris of source files are logged ([snakemake#3821](snakemake#3821)) ([a217e50](snakemake@a217e50)) * print secs as numeric in jsonl benchmark format ([snakemake#3814](snakemake#3814)) ([395a5e6](snakemake@395a5e6)) * revert breaking change in 9.11.9 disallowing empty input files even when unused ([snakemake#3810](snakemake#3810)) ([83c05cc](snakemake@83c05cc)) * shorten report ids (thus dir names) in order to avoid issues with path length on windows ([snakemake#3822](snakemake#3822)) ([b24d971](snakemake@b24d971)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Bug Fixes
Chores