Skip to content

No default schema variables loaded when skipping schema validation inside of remote jobs #3614

@visze

Description

@visze

Snakemake version
9.5.1

Describe the bug
The recently introduced fix in 9.5.1:

  • skip unnecessary schema validation inside of remote jobs (#3601) (9129654)

causes crashing of remote jobs when schemas of the config files containes default values for properties. These default variables are not loaded anymore in the remote job and it will cause an KeyError of the config. I do not get the error in 9.5.0

Logs
E.g. that is what I get

KeyError in file "/data/cephfs-1/home/users/schubacm_c/work/projects/MPRAsnakeflow/workflow/rules/common.smk", line 70:
'skip_version_check'
  File "/data/cephfs-1/home/users/schubacm_c/work/projects/MPRAsnakeflow/workflow/rules/common.smk", line 94, in <module>

Minimal example
E.g. My config files are versionized so that I can check if the config file fits to the snakemake pipeline version. By default (within the schema) it is always on (skipping is set to false), so that the user does not need to specify this config.
Start of my config schema:

type: object
properties:
  version:
    description: Version of MPRAsnakeflow
    type: string
    pattern: ^(\d+(\.\d+)?(\.\d+)?)|(0\.\d+(\.\d+)?)$
  skip_version_check:
    description: Skip version check
    type: boolean
    default: false

Now in my code I always check at the beginning of my general common.smk I first validate the config and then I check if the version of the config fits to the workflow version. But the key skip_version_check is not present anymore in remote executed jobs and they fail. My code in the common.smk:

from snakemake.utils import validate
validate(config, schema="../schemas/config.schema.yaml")

import re

# Regular expression to match the first two digits with the dot in the middle
pattern_major_version = r"^(\d+)"
pattern_development_version = r"^(0(\.\d+)?)"


def check_version(pattern, version, config_version):
    # Search for the pattern in the string
    match_version = re.search(pattern, version)

    match_config = re.search(pattern, config_version)

    # Check if a match is found and print the result
    if match_version and match_config:
        if match_version.group(1) != match_config.group(1):
            raise ValueError(
                f"\033[38;2;255;165;0mVersion mismatch: MPRAsnakeflow version is {version}, but config version is {config_version}\033[0m"
            )


if not config["skip_version_check"]:
    check_version(pattern_development_version, version, config["version"])
    check_version(pattern_major_version, version, config["version"])

Running the workflow works fine first because the key skip_version_check is available. But when the first rule is executed remotely (SLURM cluster) and fails because of KeyError skip_version_check.

Additional context
I found it always a smart way to define defaults of variables within a schema. This seems not to be possible anymore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions