feat: Issue a warning when unsupported characters are used in wildcard names#1587
Conversation
|
The unix testing seems to be failing with setting up conda env. I have no idea what caused this.. |
|
Kudos, SonarCloud Quality Gate passed!
|
|
Kudos, SonarCloud Quality Gate passed!
|
|
You have successfully added a new SonarCloud configuration ``. As part of the setup process, we have scanned this repository and found no existing alerts. In the future, you will see all code scanning alerts on the repository Security tab. |
📝 WalkthroughWalkthroughAdds detection of illegal wildcard names in IO patterns, surfaces a warning when such names are found, and changes Rule.register_wildcards to accept item objects (deriving wildcard names via item.get_wildcard_names()) with earlier IOFile validation and consistent wildcard-set enforcement across outputs/logs/benchmarks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Rule
participant IOFile as IOFile/Item
participant Logger
User->>Rule: set_output / set_log / set_benchmark(item)
Rule->>IOFile: construct IOFile(item) and call .check()
IOFile-->>Rule: validated item
Rule->>Rule: register_wildcards(item)
Rule->>IOFile: item.get_wildcard_names()
IOFile-->>Rule: (wildcard_names, illegal_flag)
alt illegal_flag == true
Note over Rule,Logger: illegal wildcard name detected
Rule->>Logger: warn("Illegal wildcard characters in <rule>: <names>")
end
Rule->>Rule: compare/update _wildcard_names
alt mismatch
Rule-->>User: raise RuleException("Wildcard mismatch across items")
else match or first set
Rule-->>User: continue
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against linked issues
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/snakemake/io/__init__.py (1)
994-1004: Regex edge cases and intent clarityGood approach to skip escaped "{{...}}". Two nits:
- Comment “Start matching from the second {” is confusing; it's actually “a single { not followed by {”.
- Consider whether Unicode letters in wildcard names should be allowed. If not, use an ASCII definition for word chars (e.g., explicit [A-Za-z0-9_] or re.ASCII) to avoid locale-dependent behavior. Otherwise, keep as-is.
If ASCII-only is intended, confirm and I can provide a precise replacement pattern.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/snakemake/io/__init__.py(2 hunks)src/snakemake/rules.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/rules.pysrc/snakemake/io/__init__.py
🧬 Code graph analysis (1)
src/snakemake/rules.py (1)
src/snakemake/io/__init__.py (2)
get_wildcard_names(862-863)get_wildcard_names(1073-1083)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (10, windows-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (8, windows-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (7, windows-latest, py313)
- GitHub Check: tests (6, windows-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (5, windows-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-latest, py313)
- GitHub Check: tests (4, windows-latest, py312)
- GitHub Check: tests (3, windows-latest, py313)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, windows-latest, py313)
- GitHub Check: tests (1, windows-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/rules.py (3)
249-250: Benchmark wildcard registration call looks goodSwitching to pass the IOFile item directly is consistent with the new register_wildcards signature.
377-378: Output wildcard registration usage is correctIterating outputs and registering per item aligns with the consistency check below.
627-629: Handle callable log entries during wildcard registrationBecause self.log can contain callables, this loop can pass non-IOFile items into register_wildcards. The guard proposed in register_wildcards prevents errors without changing this call site.
After applying the guard, please run a quick sanity on a rule with callable logs to confirm no AttributeError is raised.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/snakemake/rules.py (1)
322-335: Guard register_wildcards against callable/non-IO log itemsLog entries can be callables; calling get_wildcard_names() on them raises AttributeError. Guard and skip until materialized.
Apply:
- def register_wildcards(self, item): - wildcard_names = item.get_wildcard_names() + def register_wildcards(self, item): + # Log items can be callables; only IO-like items expose wildcard introspection. + if not isinstance(item, _IOFile): + return + wildcard_names = item.get_wildcard_names() if self._wildcard_names is None: self._wildcard_names = wildcard_names else: if self.wildcard_names != wildcard_names: raise RuleException(
🧹 Nitpick comments (2)
src/snakemake/rules.py (2)
619-621: Defensively skip callables when registering log wildcardsEven with the register_wildcards guard, filtering here keeps call sites robust and intention-revealing.
Apply:
- for item in self.log: - self.register_wildcards(item) + for item in self.log: + if isinstance(item, _IOFile): + self.register_wildcards(item)
534-537: Verify IOFile.check() idempotencyPlease confirm that IOFile.check() internally tracks and suppresses repeat warnings (e.g., via a
_checkedflag or deduplicating logger) so invoking it multiple times (initial rule creation and later in expand methods) won’t duplicate logs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/snakemake/io/__init__.py(2 hunks)src/snakemake/rules.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/snakemake/io/init.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/rules.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (45)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (10, windows-latest, py313)
- GitHub Check: tests (1, windows-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (9, windows-latest, py313)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (7, windows-latest, py313)
- GitHub Check: tests (6, windows-latest, py312)
- GitHub Check: tests (5, windows-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (4, macos-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (4, windows-latest, py313)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (2, macos-latest, py312)
- GitHub Check: tests (3, macos-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-latest, py313)
- GitHub Check: tests (1, windows-latest, py313)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/rules.py (3)
249-251: Benchmark: early validation and wildcard registration look goodChecking the benchmark IOFile early and registering its wildcards is the right call. No issues here.
367-369: Registering wildcards for each output item is correctOutputs are always IO files (callables disallowed), so this is safe and preserves the consistency check.
631-636: String logs -> IOFile + check is good; pairs with the guard aboveConverting string logs to IOFile and validating early is sound. With the callable guard in place, this path is safe.
🤖 I have created a release *beep* *boop* --- ## [9.11.0](v9.10.1...v9.11.0) (2025-09-05) ### Features * add landing page to the report containing custom metadata defined with a YTE yaml template ([#3452](#3452)) ([c754d80](c754d80)) * Issue a warning when unsupported characters are used in wildcard names ([#1587](#1587)) ([fa57355](fa57355)) ### Bug Fixes * avoid checking output files in immediate-submit mode ([#3569](#3569)) ([58c42cf](58c42cf)) * clarify --keep-going flag help text to distinguish runtime vs DAG construction errors ([#3705](#3705)) ([a3a8ef4](a3a8ef4)) * enable docstring assignment in `use rule ... with:` block ([#3710](#3710)) ([2191962](2191962)) * fix missing attribute error in greedy scheduler settings when using touch, dryrun or immediate submit. ([6471004](6471004)) * only trigger script with CODE label ([#3707](#3707)) ([2d01f8c](2d01f8c)) * parser.py incorrectly parsing multiline f-strings ([#3701](#3701)) ([06ece76](06ece76)) * parsing ordinary yaml strings ([#3506](#3506)) ([ddf334e](ddf334e)) * remove temp files when using checkpoints ([#3716](#3716)) ([5ff3e20](5ff3e20)) * Restore python rules changes triggering reruns. (GH: [#3213](#3213)) ([#3485](#3485)) ([2f663be](2f663be)) * unit testing ([#3680](#3680)) ([06ba7e6](06ba7e6)) * use queue_input_wait_time when updating queue input jobs ([#3712](#3712)) ([a945a19](a945a19)) ### Documentation * output files within output directories is an error ([#2848](#2848)) ([#2913](#2913)) ([37272e5](37272e5)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…d names (snakemake#1587) ### Description Currently, wildcard names can only include alphanumeric characters and underscores. When an unsupported character such as `-` is used in a wildcard name, it is silently ignored. Making matters worse, the entire wildcard name is truncated, leading to confusing error messages like seen in snakemake#1349. This PR should close snakemake#1349 and snakemake#1579 #### Solution When registering wildcard names, attempt is first made to match "legal" regex pattern for a whilcard name. When that matching fails and returns `None`, it's either because there isn't a wildcard in the file name, or an "illegal" name is used for the wildcard. We then match again to a relaxed regex pattern, where we specifically look for any non-word characters (`\W`) before any commas in wildcard definition. If this returns a match, we issue a warning informing users of illegal names. #### Alternative Alternatively, an error can be raised in lieu of a warning. I opted not to do this because I'm not sure if there could be any legitimate case where `{}` are used but not for wildcard difinition (`{{}}` escaped brackets are already accounted for in relaxed regex matching) and therefore I'm not comfortable raising an exception at this stage. But if not, then we should probably raise a `WildcardError` instead. I see that `WildcardError` is currently not implemented. Do we have a plan for this exception class yet? ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added warnings when file patterns contain illegal wildcard names. * **Bug Fixes** * Validation now runs earlier to catch wildcard/name issues sooner. * Ensures outputs, logs, and benchmarks use a consistent wildcard set with clearer errors. * **Refactor** * Public wildcard-registration API updated; integrations or extensions relying on previous behavior may need small adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
🤖 I have created a release *beep* *boop* --- ## [9.11.0](snakemake/snakemake@v9.10.1...v9.11.0) (2025-09-05) ### Features * add landing page to the report containing custom metadata defined with a YTE yaml template ([snakemake#3452](snakemake#3452)) ([c754d80](snakemake@c754d80)) * Issue a warning when unsupported characters are used in wildcard names ([snakemake#1587](snakemake#1587)) ([fa57355](snakemake@fa57355)) ### Bug Fixes * avoid checking output files in immediate-submit mode ([snakemake#3569](snakemake#3569)) ([58c42cf](snakemake@58c42cf)) * clarify --keep-going flag help text to distinguish runtime vs DAG construction errors ([snakemake#3705](snakemake#3705)) ([a3a8ef4](snakemake@a3a8ef4)) * enable docstring assignment in `use rule ... with:` block ([snakemake#3710](snakemake#3710)) ([2191962](snakemake@2191962)) * fix missing attribute error in greedy scheduler settings when using touch, dryrun or immediate submit. ([6471004](snakemake@6471004)) * only trigger script with CODE label ([snakemake#3707](snakemake#3707)) ([2d01f8c](snakemake@2d01f8c)) * parser.py incorrectly parsing multiline f-strings ([snakemake#3701](snakemake#3701)) ([06ece76](snakemake@06ece76)) * parsing ordinary yaml strings ([snakemake#3506](snakemake#3506)) ([ddf334e](snakemake@ddf334e)) * remove temp files when using checkpoints ([snakemake#3716](snakemake#3716)) ([5ff3e20](snakemake@5ff3e20)) * Restore python rules changes triggering reruns. (GH: [snakemake#3213](snakemake#3213)) ([snakemake#3485](snakemake#3485)) ([2f663be](snakemake@2f663be)) * unit testing ([snakemake#3680](snakemake#3680)) ([06ba7e6](snakemake@06ba7e6)) * use queue_input_wait_time when updating queue input jobs ([snakemake#3712](snakemake#3712)) ([a945a19](snakemake@a945a19)) ### Documentation * output files within output directories is an error ([snakemake#2848](snakemake#2848)) ([snakemake#2913](snakemake#2913)) ([37272e5](snakemake@37272e5)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).








Description
Currently, wildcard names can only include alphanumeric characters and underscores. When an unsupported character such as
-is used in a wildcard name, it is silently ignored. Making matters worse, the entire wildcard name is truncated, leading to confusing error messages like seen in #1349.This PR should close #1349 and #1579
Solution
When registering wildcard names, attempt is first made to match "legal" regex pattern for a whilcard name. When that matching fails and returns
None, it's either because there isn't a wildcard in the file name, or an "illegal" name is used for the wildcard. We then match again to a relaxed regex pattern, where we specifically look for any non-word characters (\W) before any commas in wildcard definition. If this returns a match, we issue a warning informing users of illegal names.Alternative
Alternatively, an error can be raised in lieu of a warning. I opted not to do this because I'm not sure if there could be any legitimate case where
{}are used but not for wildcard difinition ({{}}escaped brackets are already accounted for in relaxed regex matching) and therefore I'm not comfortable raising an exception at this stage. But if not, then we should probably raise aWildcardErrorinstead. I see thatWildcardErroris currently not implemented. Do we have a plan for this exception class yet?QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Refactor