Skip to content

Confused by memory reports using SLURMCluster and the dashboard #440

@jmarkow

Description

@jmarkow

What happened:

The Dask dashboard appears to mix individual memory limits with the total memory limit when using SLURMCluster.

What you expected to happen:

I expect that the total memory % in the Dask dashboard would reflect the total memory usage divided by the total memory limit. Instead, the total memory % appears to be the total memory usage divided by the individual worker's memory limit. So if I have 40 workers with a 56 GB memory limit, the total memory % looks like the total memory used across all workers divided by 56 GB, rather than 56 * 40 (what I expect).

image

Example:

I spin up my workers using this and then adapt(minimum=0, maximum=40)

from dask_jobqueue import SLURMCluster
extra = ['--resources WORKERS=1']
cluster = SLURMCluster(
    queue="small-compute-b-preemptible",
    cores=16,
    wall_time="72:00:00",
    memory="60GB",
    processes=1,
    extra=extra
)

Environment:

  • Dask version: 2.18.1
  • dask-jobqueue verison: 0.7.1
  • Python version: 3.6
  • Operating System: Ubuntu 18.04
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions