Skip to content

Avoid double msgpack serialization of HLGs#7525

Merged
jrbourbeau merged 1 commit intodask:mainfrom
madsbk:fix_double_msgpack_dumps
Apr 6, 2021
Merged

Avoid double msgpack serialization of HLGs#7525
jrbourbeau merged 1 commit intodask:mainfrom
madsbk:fix_double_msgpack_dumps

Conversation

@madsbk
Copy link
Contributor

@madsbk madsbk commented Apr 6, 2021

HighLevelGraph.__dask_distributed_pack__() would explicit serialize using dumps_msgpack() instead of relying on implicit serialization. The result was double serialzaion of the high level graphs and large bytearrays in communication frames.

Also, this PR makes dumps_msgpack() and loads_msgpack() obsolete and can be removed.

Fixes dask/distributed#4652

  • Tests added / passed
  • Passes black dask / flake8 dask / isort dask

Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also this seems like a key first step for dask/distributed#4673

}
)
return dumps_msgpack({"layers": layers})
return {"layers": layers}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't important, but is there a msgpack/serialization reason for wrapping in a dict? Versus:

Suggested change
return {"layers": layers}
return layers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. I think we can probably just pass the list of layers directly. @madsbk mentioned that serializing twice is blocking some folks, so I'm going to merge this as is and we can address the dict vs. list point in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here was to make it easy to extend in the future but yes, changing to a list of layers is also fine.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @madsbk! I agree utilizing distributeds built-in serialization is a better way to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when loading dataframe into GPUs

3 participants