Avoid double msgpack serialization of HLGs#7525
Conversation
08facdd to
443aef2
Compare
dask/dask#7525 makes them obsolete
gjoseph92
left a comment
There was a problem hiding this comment.
LGTM, also this seems like a key first step for dask/distributed#4673
| } | ||
| ) | ||
| return dumps_msgpack({"layers": layers}) | ||
| return {"layers": layers} |
There was a problem hiding this comment.
This isn't important, but is there a msgpack/serialization reason for wrapping in a dict? Versus:
| return {"layers": layers} | |
| return layers |
There was a problem hiding this comment.
This is a good point. I think we can probably just pass the list of layers directly. @madsbk mentioned that serializing twice is blocking some folks, so I'm going to merge this as is and we can address the dict vs. list point in a follow-up PR
There was a problem hiding this comment.
The idea here was to make it easy to extend in the future but yes, changing to a list of layers is also fine.
There was a problem hiding this comment.
Thanks @madsbk! I agree utilizing distributeds built-in serialization is a better way to go
HighLevelGraph.__dask_distributed_pack__()would explicit serialize usingdumps_msgpack()instead of relying on implicit serialization. The result was double serialzaion of the high level graphs and large bytearrays in communication frames.Also, this PR makes
dumps_msgpack()andloads_msgpack()obsolete and can be removed.Fixes dask/distributed#4652
black dask/flake8 dask/isort dask