Getting a number of CPU cores on PBS_pro

Hi,
  when rerunning a canu-2.1.1 job on a different machine I realized that canu is picking up the number of total CPU cores locally available instead of respecting what I have reserved through the queing system. Here is how I started the job:

```
#PBS -l select=1:ncpus=240:mem=6000gb:scratch_local=12tb,walltime=48:00:00

...

canu useGrid=false ... genomeSize=6.8g correctedErrorRate=0.16 corMhapSensitivity=high ovsMemory=1024 ovsConcurrency=5
```

```
-- Detected 504 CPUs and 10074 gigabytes of memory.
-- Detected PBSPro '19.0.0' with 'pbsnodes' binary in /opt/pbs/bin/pbsnodes.
-- Grid engine and staging disabled per useGrid=false option.
--
--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     64.000 GB    8 CPUs x  63 jobs  4032.000 GB 504 CPUs  (k-mer counting)
-- Local: hap       16.000 GB   63 CPUs x   8 jobs   128.000 GB 504 CPUs  (read-to-haplotype assignment)
-- Local: cormhap   64.000 GB   14 CPUs x  36 jobs  2304.000 GB 504 CPUs  (overlap detection with mhap)
-- Local: obtovl    24.000 GB   14 CPUs x  36 jobs   864.000 GB 504 CPUs  (overlap detection)
-- Local: utgovl    24.000 GB   14 CPUs x  36 jobs   864.000 GB 504 CPUs  (overlap detection)
-- Local: cor       24.000 GB    4 CPUs x 126 jobs  3024.000 GB 504 CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x 504 jobs  2016.000 GB 504 CPUs  (overlap store bucketizer)
-- Local: ovs      1024.000 GB    1 CPU  x   5 jobs  5120.000 GB   5 CPUs  (overlap store sorting)
-- Local: red       64.000 GB    9 CPUs x  56 jobs  3584.000 GB 504 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x 504 jobs  4032.000 GB 504 CPUs  (overlap error adjustment)
-- Local: bat      1024.000 GB   64 CPUs x   1 job   1024.000 GB  64 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)
```

  It picked 504 CPU cores and 10TB of RAM although I have in the environment:

```
PBS_NCPUS=240
PBS_NGPUS=0
PBS_NUM_NODES=1
PBS_NUM_PPN=240
PBS_RESC_MEM=6442450944000
PBS_RESC_SCRATCH_SSD=13194139533312
PBS_RESC_SCRATCH_VOLUME=13194139533312
PBS_RESC_TOTAL_MEM=6442450944000
PBS_RESC_TOTAL_PROCS=240
PBS_RESC_TOTAL_SCRATCH_VOLUME=13194139533312
PBS_RESC_TOTAL_WALLTIME=172800
SCRATCH=/scratch.ssd/mmokrejs/job_2227881.cerit-pbs.cerit-sc.cz
SCRATCHDIR=/scratch.ssd/mmokrejs/job_2227881.cerit-pbs.cerit-sc.cz
SCRATCH_TYPE=ssd
SCRATCH_VOLUME=13194139533312
TORQUE_RESC_MEM=6442450944000
TORQUE_RESC_PROC=240
TORQUE_RESC_SCRATCH_SSD=13194139533312
TORQUE_RESC_SCRATCH_VOLUME=13194139533312
TORQUE_RESC_TOTAL_MEM=6442450944000
TORQUE_RESC_TOTAL_PROCS=240
TORQUE_RESC_TOTAL_SCRATCH_VOLUME=13194139533312
TORQUE_RESC_TOTAL_WALLTIME=172800
```

I see some code in `canu/src/utility/src/utility/system.C` but although in comments there are more PBS_pro variables mentioned, only `PBS_NUM_PPN` is looked up (in theory).

Could it be that this code is neglected altogether because I started canu with `useGrid=false`? That's bad. I just wanted to avoid submitting childs jobs into the queing system but of course, I expected canu to understand it is being run under a job scheduling system anyway on an exec host picked by me, and respect its limits (6TB RAM and only 240 CPUs).

```
-- BEGIN CORRECTION
--
--
-- Creating overlap store correction/my_genome.ovlStore using:
--    147 buckets
--    616 slices
--        using at most 29 GB memory each
-- Finished stage 'cor-overlapStoreConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'ovB' concurrent execution on Thu Mar  4 09:11:40 2021 with 214055.941 GB free disk space (147 processes; 504 concurrently)

    cd correction/my_genome.ovlStore.BUILDING
    ./scripts/1-bucketize.sh 1 > ./logs/1-bucketize.000001.out 2>&1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting a number of CPU cores on PBS_pro #1912

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Getting a number of CPU cores on PBS_pro #1912

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions