Skip to content

dask.array.store: support for distributed locks #10238

@ahnsws

Description

@ahnsws

Describe the issue:
Calling dask.array.store with a Lock, a distributed.Client context, and a zarr target throws an error.

Please let me know if this should be filed elsewhere, with zarr or distributed.

Minimal Complete Verifiable Example:

import dask.array as da
import zarr
from distributed import Client


def run_succeeds1():
    z_arr: zarr.Array = zarr.zeros(shape=(5, 5))
    d_arr: da.Array = da.from_zarr(z_arr)
    z = zarr.create(shape=d_arr.shape)
    with Client():
        return da.store(d_arr, z, lock=False)


def run_succeeds2():
    z_arr: zarr.Array = zarr.zeros(shape=(5, 5))
    d_arr: da.Array = da.from_zarr(z_arr)
    z = zarr.create(shape=d_arr.shape)
    return da.store(d_arr, z, lock=True)


def run_fails():
    z_arr: zarr.Array = zarr.zeros(shape=(5, 5))
    d_arr: da.Array = da.from_zarr(z_arr)
    z = zarr.create(shape=d_arr.shape)
    with Client():
        return da.store(d_arr, z, lock=True)


if __name__ == "__main__":
    run_succeeds1()
    run_succeeds2()
    # run_fails()

Anything else we need to know?:
The full stacktrace I get when I run run_fails is:

2023-04-28 19:51:54,465 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7fb099e26560>
 0. 140396455233728
>.
Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.lock' object
Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
TypeError: cannot pickle '_thread.lock' object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 350, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 73, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
    cp.dump(obj)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
    return Pickler.dump(self, obj)
TypeError: cannot pickle '_thread.lock' object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/titanium/PycharmProjects/debug-ome-zarr/run.py", line 32, in <module>
    run_fails()
  File "/home/titanium/PycharmProjects/debug-ome-zarr/run.py", line 26, in run_fails
    return da.store(d_arr, z, lock=True)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/dask/array/core.py", line 1237, in store
    compute_as_if_collection(Array, store_dsk, map_keys, **kwargs)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/dask/base.py", line 341, in compute_as_if_collection
    return schedule(dsk2, keys, **kwargs)
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/client.py", line 3204, in get
    futures = self._graph_to_futures(
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/client.py", line 3103, in _graph_to_futures
    header, frames = serialize(ToPickle(dsk), on_error="raise")
  File "/home/titanium/.cache/pypoetry/virtualenvs/debug-ome-zarr-XhBiG45F-py3.10/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 372, in serialize
    raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7fb099e26560>\n 0. 140396455233728\n>')

Process finished with exit code 1

Environment:

  • Dask version: 2023.4.1
  • Python version: 3.10.6
  • Operating System: ubuntu 22.04
  • Install method (conda, pip, source): poetry

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrayneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.needs triageNeeds a response from a contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions