Skip to content

Indirect vs direct rule invocations raise different errors (when inputs use remote storage plugins) #3663

@jameshadfield

Description

@jameshadfield

Versions
snakemake v9.6.2
snakemake-interface-storage-plugins 4.2.1
snakemake-storage-plugin-s3 0.3.3

Describe the bug

When attempting to fetch a S3 file and credentials are missing, the raised error depends on whether the target rule uses storage.s3() directly or indirectly via another rule. Specifically, targetting an indirect rule (see example) results in a clear NoCredentialsError while targeting a direct rule results in a confusing MissingInputException.

This bug was triggered in the context of snakemake_storage_plugin_s3 however I believe it's ultimately a bug in the main Snakemake codebase. Please transfer this issue if I've got that wrong.

Minimal example

Using the following toy snakefile (the S3 file is public access, but you still need some valid AWS credentials to fetch such files):

storage:
    provider="s3",
    retries=1, 
    keep_local=True,

rule direct_access:
    input: storage.s3("s3://nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst")
    output: "metadata_usvi.tsv"
    shell:
        r"""
        zstd --decompress --stdout {input[0]} > {output[0]}
        """

rule indirect_access:
    input: "metadata_usvi.tsv"
    output: "sample.tsv"
    shell:
        r"""
        head {input[0]} > {output[0]}
        """

Running rm -rf .snakemake && snakemake --cores 1 -pf indirect_access (without any AWS credentials set) we get a suitably helpful error (see footnote [1]):

WorkflowError:
    Failed to get expected local footprint (i.e. size) of s3://nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst
    NoCredentialsError: Unable to locate credentials

However running rm -rf .snakemake && snakemake --cores 1 -pf direct_access results in a different error which doesn't point to the underlying issue of missing credentials:

MissingInputException in rule direct_access in file "/Users/naboo/scratch/snakemake-s3-exceptions/Snakefile", line 6:
Missing input files for rule direct_access:
    output: metadata_usvi.tsv
    affected files:
        .snakemake/storage/s3/nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst

Why this happens

This MissingInputException doesn't occur if credentials are provided. I dug into the code paths and figured out why this happens but I don't have enough big-picture understanding of Snakemake to propose a solution.

  • The MissingInputException is due to snakemake's dag.py calling res.file.exists(). This function will ultimately return False if credentials are missing and True if they're present.
  • The exists() function in snakemake.io calls exists_in_storage()
  • exists_in_storage() returns a different value depending on missing vs present credentials.
    • Specifically if we remove the @iocache decorator the difference goes away and we ultimately get the appropriate NoCredentialsError
  • Within the iocache decorator, cache[self] returns True when credentials exist and False when they don't.
  • Ultimately the IOCache's _exists_in_storage (ExistsDict) looks very different when run with/without credentials:
    • Missing credentials: A single key (.snakemake/storage/s3/nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst) is set, and given value False
    • Credentials provided: A lot of keys are set, all with value True. These keys indicate that a listing of the S3 bucket is happening here.
  • These cache keys are set via the s3 plugin's inventory method. Without credentials self.bucket_exists() is False and thus we set a single entry in the cache to False; with credentials we crawl the bucket and set a number of cache keys.
    • The bucket_exists() raises a botocore.exceptions.NoCredentialsError but it's swallowed by a blanket except. Perhaps this error should be left to propagate and caught elsewhere?
  • If we target an indirect rule (indirect_access in the example Snakefile) then we don't provision any keys in IOCache._exists_in_storage, and thus the @iocache decorator falls through to the underlying exists_in_storage function and we call the managed_exists function (see Footnote 1).

Footnote [1]
The error coms from this self.s3obj().load() call which throws <class 'botocore.exceptions.NoCredentialsError'>. That's caught by the surrounding managed_exists function in snakemake-interface-storage-plugins

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions