Skip to content

Commit fa57355

Browse files
feat: Issue a warning when unsupported characters are used in wildcard names (#1587)
### Description Currently, wildcard names can only include alphanumeric characters and underscores. When an unsupported character such as `-` is used in a wildcard name, it is silently ignored. Making matters worse, the entire wildcard name is truncated, leading to confusing error messages like seen in #1349. This PR should close #1349 and #1579 #### Solution When registering wildcard names, attempt is first made to match "legal" regex pattern for a whilcard name. When that matching fails and returns `None`, it's either because there isn't a wildcard in the file name, or an "illegal" name is used for the wildcard. We then match again to a relaxed regex pattern, where we specifically look for any non-word characters (`\W`) before any commas in wildcard definition. If this returns a match, we issue a warning informing users of illegal names. #### Alternative Alternatively, an error can be raised in lieu of a warning. I opted not to do this because I'm not sure if there could be any legitimate case where `{}` are used but not for wildcard difinition (`{{}}` escaped brackets are already accounted for in relaxed regex matching) and therefore I'm not comfortable raising an exception at this stage. But if not, then we should probably raise a `WildcardError` instead. I see that `WildcardError` is currently not implemented. Do we have a plan for this exception class yet? ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added warnings when file patterns contain illegal wildcard names. * **Bug Fixes** * Validation now runs earlier to catch wildcard/name issues sooner. * Ensures outputs, logs, and benchmarks use a consistent wildcard set with clearer errors. * **Refactor** * Public wildcard-registration API updated; integrations or extensions relying on previous behavior may need small adjustments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
1 parent 06ba7e6 commit fa57355

2 files changed

Lines changed: 28 additions & 5 deletions

File tree

src/snakemake/io/__init__.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -487,6 +487,11 @@ def check(self):
487487
f"File path {self._file} contains double '{os.path.sep}'. "
488488
f"This is likely unintended. {hint}"
489489
)
490+
if _illegal_wildcard_name_regex.search(self._file) is not None:
491+
logger.warning(
492+
f"File path '{self._file}' contains illegal characters in a wildcard "
493+
f"name (only alphanumerics and underscores are allowed)."
494+
)
490495

491496
async def exists(self):
492497
if self.is_storage:
@@ -991,6 +996,17 @@ def remove_flag(value: MaybeAnnotated, flag: str) -> MaybeAnnotated:
991996

992997
_CONSIDER_LOCAL_DEFAULT = frozenset()
993998

999+
_illegal_wildcard_name_regex = re.compile(
1000+
r"""
1001+
\{(?!\{) # Start matching from the second {, otherwise \W will match the second {
1002+
\s*
1003+
(?P<name>
1004+
.*?\W[^,\{\}]*
1005+
),?[^,\{\}]*? # Do we see any non-word character before comma?
1006+
\}
1007+
""",
1008+
)
1009+
9941010

9951011
async def wait_for_files(
9961012
files,

src/snakemake/rules.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,8 @@ def benchmark(self, benchmark):
246246
benchmark = self._update_item_wildcard_constraints(benchmark)
247247

248248
self._benchmark = IOFile(benchmark, rule=self)
249-
self.register_wildcards(self._benchmark.get_wildcard_names())
249+
self._benchmark.check()
250+
self.register_wildcards(self._benchmark)
250251

251252
@property
252253
def conda_env(self):
@@ -318,7 +319,8 @@ def get_some_product(self):
318319
def has_products(self):
319320
return self.get_some_product() is not None
320321

321-
def register_wildcards(self, wildcard_names):
322+
def register_wildcards(self, item):
323+
wildcard_names = item.get_wildcard_names()
322324
if self._wildcard_names is None:
323325
self._wildcard_names = wildcard_names
324326
else:
@@ -363,7 +365,7 @@ def set_output(self, *output, **kwoutput):
363365
self._set_inoutput_item(item, output=True, name=name)
364366

365367
for item in self.output:
366-
self.register_wildcards(item.get_wildcard_names())
368+
self.register_wildcards(item)
367369
# Check output file name list for duplicates
368370
self.check_output_duplicates()
369371
self.check_caching()
@@ -531,6 +533,7 @@ def _set_inoutput_item(self, item, output=False, name=None, mark_ancient=False):
531533

532534
# record rule if this is an output file output
533535
_item = IOFile(item, rule=self)
536+
_item.check()
534537

535538
if is_flagged(item, "temp"):
536539
if output:
@@ -614,7 +617,7 @@ def set_log(self, *logs, **kwlogs):
614617
self._set_log_item(item, name=name)
615618

616619
for item in self.log:
617-
self.register_wildcards(item.get_wildcard_names())
620+
self.register_wildcards(item)
618621

619622
def _set_log_item(self, item, name=None):
620623
# Pathlib compatibility
@@ -625,7 +628,11 @@ def _set_log_item(self, item, name=None):
625628
item = self.apply_path_modifier(item, self.log_modifier, property="log")
626629
item = self._update_item_wildcard_constraints(item)
627630

628-
self.log.append(IOFile(item, rule=self) if isinstance(item, str) else item)
631+
if isinstance(item, str):
632+
item = IOFile(item, rule=self)
633+
item.check()
634+
635+
self.log.append(item)
629636
if name:
630637
self.log._add_name(name)
631638
else:

0 commit comments

Comments
 (0)