Versions
snakemake v9.6.2
snakemake-interface-storage-plugins 4.2.1
snakemake-storage-plugin-s3 0.3.3
Describe the bug
When attempting to fetch a S3 file and credentials are missing, the raised error depends on whether the target rule uses storage.s3() directly or indirectly via another rule. Specifically, targetting an indirect rule (see example) results in a clear NoCredentialsError while targeting a direct rule results in a confusing MissingInputException.
This bug was triggered in the context of snakemake_storage_plugin_s3 however I believe it's ultimately a bug in the main Snakemake codebase. Please transfer this issue if I've got that wrong.
Minimal example
Using the following toy snakefile (the S3 file is public access, but you still need some valid AWS credentials to fetch such files):
storage:
provider="s3",
retries=1,
keep_local=True,
rule direct_access:
input: storage.s3("s3://nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst")
output: "metadata_usvi.tsv"
shell:
r"""
zstd --decompress --stdout {input[0]} > {output[0]}
"""
rule indirect_access:
input: "metadata_usvi.tsv"
output: "sample.tsv"
shell:
r"""
head {input[0]} > {output[0]}
"""
Running rm -rf .snakemake && snakemake --cores 1 -pf indirect_access (without any AWS credentials set) we get a suitably helpful error (see footnote [1]):
WorkflowError:
Failed to get expected local footprint (i.e. size) of s3://nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst
NoCredentialsError: Unable to locate credentials
However running rm -rf .snakemake && snakemake --cores 1 -pf direct_access results in a different error which doesn't point to the underlying issue of missing credentials:
MissingInputException in rule direct_access in file "/Users/naboo/scratch/snakemake-s3-exceptions/Snakefile", line 6:
Missing input files for rule direct_access:
output: metadata_usvi.tsv
affected files:
.snakemake/storage/s3/nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst
Why this happens
This MissingInputException doesn't occur if credentials are provided. I dug into the code paths and figured out why this happens but I don't have enough big-picture understanding of Snakemake to propose a solution.
- The
MissingInputException is due to snakemake's dag.py calling res.file.exists(). This function will ultimately return False if credentials are missing and True if they're present.
- The
exists() function in snakemake.io calls exists_in_storage()
exists_in_storage() returns a different value depending on missing vs present credentials.
- Specifically if we remove the
@iocache decorator the difference goes away and we ultimately get the appropriate NoCredentialsError
- Within the
iocache decorator, cache[self] returns True when credentials exist and False when they don't.
- Ultimately the
IOCache's _exists_in_storage (ExistsDict) looks very different when run with/without credentials:
- Missing credentials: A single key (
.snakemake/storage/s3/nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst) is set, and given value False
- Credentials provided: A lot of keys are set, all with value True. These keys indicate that a listing of the S3 bucket is happening here.
- These cache keys are set via the s3 plugin's
inventory method. Without credentials self.bucket_exists() is False and thus we set a single entry in the cache to False; with credentials we crawl the bucket and set a number of cache keys.
- The
bucket_exists() raises a botocore.exceptions.NoCredentialsError but it's swallowed by a blanket except. Perhaps this error should be left to propagate and caught elsewhere?
- If we target an indirect rule (
indirect_access in the example Snakefile) then we don't provision any keys in IOCache._exists_in_storage, and thus the @iocache decorator falls through to the underlying exists_in_storage function and we call the managed_exists function (see Footnote 1).
Footnote [1]
The error coms from this self.s3obj().load() call which throws <class 'botocore.exceptions.NoCredentialsError'>. That's caught by the surrounding managed_exists function in snakemake-interface-storage-plugins
Versions
snakemake v9.6.2
snakemake-interface-storage-plugins 4.2.1
snakemake-storage-plugin-s3 0.3.3
Describe the bug
When attempting to fetch a S3 file and credentials are missing, the raised error depends on whether the target rule uses
storage.s3()directly or indirectly via another rule. Specifically, targetting an indirect rule (see example) results in a clearNoCredentialsErrorwhile targeting a direct rule results in a confusingMissingInputException.Minimal example
Using the following toy snakefile (the S3 file is public access, but you still need some valid AWS credentials to fetch such files):
Running
rm -rf .snakemake && snakemake --cores 1 -pf indirect_access(without any AWS credentials set) we get a suitably helpful error (see footnote [1]):However running
rm -rf .snakemake && snakemake --cores 1 -pf direct_accessresults in a different error which doesn't point to the underlying issue of missing credentials:Why this happens
This
MissingInputExceptiondoesn't occur if credentials are provided. I dug into the code paths and figured out why this happens but I don't have enough big-picture understanding of Snakemake to propose a solution.MissingInputExceptionis due to snakemake'sdag.pycallingres.file.exists(). This function will ultimately return False if credentials are missing and True if they're present.exists()function in snakemake.io callsexists_in_storage()exists_in_storage()returns a different value depending on missing vs present credentials.@iocachedecorator the difference goes away and we ultimately get the appropriateNoCredentialsErroriocachedecorator,cache[self]returns True when credentials exist and False when they don't.IOCache's_exists_in_storage(ExistsDict) looks very different when run with/without credentials:.snakemake/storage/s3/nextstrain-data/files/workflows/zika/metadata_usvi.tsv.zst) is set, and given valueFalseinventorymethod. Without credentialsself.bucket_exists()is False and thus we set a single entry in the cache to False; with credentials we crawl the bucket and set a number of cache keys.bucket_exists()raises abotocore.exceptions.NoCredentialsErrorbut it's swallowed by a blanketexcept. Perhaps this error should be left to propagate and caught elsewhere?indirect_accessin the example Snakefile) then we don't provision any keys inIOCache._exists_in_storage, and thus the@iocachedecorator falls through to the underlyingexists_in_storagefunction and we call themanaged_existsfunction (see Footnote 1).Footnote [1]
The error coms from this
self.s3obj().load()call which throws<class 'botocore.exceptions.NoCredentialsError'>. That's caught by the surroundingmanaged_existsfunction in snakemake-interface-storage-plugins