Skip to content

2.28.0 performance related issues #6694

@sofroniewn

Description

@sofroniewn

What happened:

Hello! After the 2.28.0 release the napari project is having some dask performance related issues, including timeout fails of some of tests that now take significantly longer. Changing the dask config did not recover the old behavior as I would have expected based on the docs.

A more detailed investigation can be found in the napari issue tracker here napari/napari#1656 but I will reproduce a minimal example below.

What you expected to happen:
I didn't expect such large slow downs with the new release.

Minimal Complete Verifiable Example:

import dask.array as da
import numpy as np
import time


data = da.random.random(
    size=(100_000, 1000, 1000), chunks=(1, 1000, 1000)
)

idxs = [(0,), (50_000,), (99_999,)]

t0 = time.time()
reduced_data = np.min([np.min(data[idx]) for idx in idxs])
t1 = time.time()
print(t1 - t0)

On 2.27.0 this takes about 0.13 seconds to run, on 2.28.0 this takes 4.3 seconds to run.

Anything else we need to know?:

Looking at the 2.28.0 release notes I saw #6665 and think this maybe related to that. I noticed the note on efficiency which suggested there may now be some additional overhead, but I didn't expect the slow down to be so large (note while the above is just a toy example, the slowdown is quite noticeable for real-world examples too).

I tried using

with dask.config.set({"array.slicing.split-large-chunks": False}):
    data = da.random.random(
        size=(100_000, 1000, 1000), chunks=(1, 1000, 1000)
    )

    idxs = [(0,), (50_000,), (99_999,)]

    t0 = time.time()
    reduced_data = np.min([np.min(data[idx]) for idx in idxs])
    t1 = time.time()
    print(t1 - t0)

but I saw no difference between having the setting True or False in both this toy example, or in the real config inside napari if I added it to napari here. If I could get the config working as expected then I can just change the default value and we would be fine.

Environment:

  • Dask version: 2.28.0
  • Python version: 3.7
  • Operating System: MacOS
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions