-
Notifications
You must be signed in to change notification settings - Fork 1
Fix persist error due to pickling #104
Copy link
Copy link
Closed
Description
Right now when trying to bin multiple sequences, we run into an error on dask.persist, due to the fact that hdf5 files can't be pickled.
Essentially we're passing a memory mapped file handle as part of the metadata to dask.persist and it's not happy.
Things tried:
- passing different serializers/deserializers (that shouldn't use pickle) to the Client and/or dask.persist. It always fell back on default
- passing only the event_id and time_bins column to compute_with_progress
- creating a tmp_collection dataframe inside compute_with_progress with only the relevant columns and calling persist on it
- using dask arrays in compute_with_progress to try and avoid the open file handle
As it's been a few months since the last time these tools were needed, it's possible something changed in the way dask handles persist/compute.
Full error trace for info:
2023-09-21 14:08:57,792 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>
0. make_images-c26a593aeb9229ba8092420f91c84cbf
>.
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
erialize.py", line 75, in pickle_dumps
frames[0] = pickle.dumps(
^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
args.func(args)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
compute_with_progress(events_data)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
(collection,) = dask.persist(collection)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
results = client.persist(
^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
header, frames = serialize(ToPickle(dsk), on_error="raise")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 63, in dumps
result = pickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 68, in dumps
pickler.dump(x)
AttributeError: Can't pickle local object '_take_dask_array_from_numpy.<locals>.<lambda>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 352, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 75, in pickle_dumps
frames[0] = pickle.dumps(
^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/pickle.py", line 81, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/cloudpickle/cloudpickle_fast.py", line 632, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/dls/science/users/uhz96441/tristan-env/bin/images", line 33, in <module>
sys.exit(load_entry_point('tristan', 'console_scripts', 'images')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 774, in main
args.func(args)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/command_line/images.py", line 505, in multiple_sequences_cli
compute_with_progress(events_data)
File "/scratch/uhz96441/workspaces/python-tristan/src/tristan/__init__.py", line 32, in compute_with_progress
(collection,) = dask.persist(collection)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/uhz96441/.local/lib/python3.11/site-packages/dask/base.py", line 917, in persist
results = client.persist(
^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3566, in persist
futures = self._graph_to_futures(
^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/client.py", line 3146, in _graph_to_futures
header, frames = serialize(ToPickle(dsk), on_error="raise")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/dls/science/users/uhz96441/tristan-env/lib/python3.11/site-packages/distributed/protocol/serialize.py", line 374, in serialize
raise TypeError(msg, str(x)[:10000]) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7f4026517450>\n 0. make_images-c26a593aeb9229ba8092420f91c84cbf\n>')
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels