Skip to content

Conversation

@Licht-T
Copy link
Contributor

@Licht-T Licht-T commented Nov 22, 2017

This closes ARROW-1758.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no custom serializer/deserializer is passed, should this default to pickle?

Copy link
Contributor

@pcmoritz pcmoritz Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it defaults to using __dict__, which is more efficient so it seems a good default, we use that in Ray for pretty much all types (however rarely I have seen cases where it doesn't work). I'd prefer to keep __dict__ the default but no big deals since it can easily be changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about lambdas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ray, lambdas are handled with cloudpickle as are builtin types like "type"; most user defined types are handled via dict; namedtuples are special cased.

One reasonable way to handle this is to provide default callbacks for types we encounter that don't work with dict (using cloudpickle or a custom serializer/deserializer), there are already examples of this in the repo, and use dict for anything else. That worked well for us so far. If this strategy doesn't work for the user, they have full flexibility by providing their own serialization context.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sufficiently advanced stuff that I think it's not unreasonable to ask people to explicitly use cloudpickle in a case like this. I would be OK if we changed this patch to do that to get the build passing, and we can always move on later

@wesm wesm force-pushed the clean-pickle-option-for-object-serialization branch from ebfc414 to 4e71bd3 Compare November 22, 2017 22:33
@wesm
Copy link
Member

wesm commented Nov 23, 2017

The tests fail here, need to use cloudpickle, I think

@Licht-T
Copy link
Contributor Author

Licht-T commented Nov 23, 2017

Okay, I'll check.

@wesm wesm force-pushed the clean-pickle-option-for-object-serialization branch from 4e71bd3 to 927f154 Compare November 26, 2017 19:01
@wesm
Copy link
Member

wesm commented Nov 26, 2017

@pcmoritz @robertnishihara does this look ok?

@robertnishihara
Copy link
Contributor

Looks good to me.

@pcmoritz
Copy link
Contributor

+1 LGTM

@pcmoritz pcmoritz closed this in 85e2d89 Nov 26, 2017
@Licht-T
Copy link
Contributor Author

Licht-T commented Nov 27, 2017

Thanks @wesm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants