Skip to content

--jobs does not consider --local-cores when executing localrules #2896

@ckrushton

Description

@ckrushton

Snakemake version

8.4.11

Describe the bug

I'm running a snakemake workflow on a SLURM cluster (using snakemake-executor-plugin-cluster-generic), using cluster nodes with 48 cores and a head node with 8 cores.

My profile file looks like this:
jobs: 600
cores: 48
local-cores: 4

My workflow has a rule which waits for all samples to complete, then executes an I/O intensive step for each sample. To minimize filesystem burden, I specify this rule as a localrule (due to filesystem sync issues between cluster nodes).

However, when Snakemake executes this step, it launches up to 600 jobs on the localnode instead of 4. I understand that, as --jobs is described, this makes sense ; however, it seems sensible that snakemake should consider the resources available on the local node when selecting how many local jobs to execute simultaneously.
Logs

Minimal example

Expand example Snakefile
#!/usr/bin/env snakemake

localrules:
test_rule

samples=list(x for x in range(0,1000))

rule test_rule:
output:
txt="output/{sample}.txt"
shell:
"""echo SCIENCE > {output.txt}
&&
sleep 10
"""

rule all:
input:
expand("output/{sample}.txt", sample=samples)

Expand Profile
executor: cluster-generic
cluster-generic-submit-cmd:
  mkdir -p results/logs/cluster/{rule}/ &&
  sbatch
    --parsable
    --cpus-per-task={threads}
    --time={resources.time}
    --mem={resources.mem_mb}
    --job-name=smk-{rule}
    --output=results/logs/cluster/{rule}/{jobid}.out
    --error=results/logs/cluster/{rule}/{jobid}.err
    --partition={resources.partition}
default-resources:
  - time=1440
  - partition='large'
  - tmpdir='/tmp'
local-cores: 4
jobs: 600
cores: 48  # Match maximum node size.
latency-wait: 120
keep-going: True
rerun-incomplete: True
printshellcmds: True
scheduler: greedy
use-conda: True
conda-frontend: mamba
cluster-generic-cancel-cmd: scancel
Log of running the above Snakefile using cluster-generic
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 600
Job stats:
job          count
---------  -------
all              1
test_rule     1000
total         1001

Select jobs to execute...
Execute 600 jobs...

[Mon May 27 10:56:12 2024]
localrule test_rule:
output: output/141.txt
jobid: 142
reason: Missing output files: output/141.txt
wildcards: sample=141
resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=/tmp, time=1440, partition=large

echo SCIENCE > output/141.txt &&
sleep 10
...more job messages...

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions