-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1758: [Python] Remove pickle=True option for object serialization #1347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1758: [Python] Remove pickle=True option for object serialization #1347
Conversation
python/pyarrow/serialization.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no custom serializer/deserializer is passed, should this default to pickle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it defaults to using __dict__, which is more efficient so it seems a good default, we use that in Ray for pretty much all types (however rarely I have seen cases where it doesn't work). I'd prefer to keep __dict__ the default but no big deals since it can easily be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about lambdas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Ray, lambdas are handled with cloudpickle as are builtin types like "type"; most user defined types are handled via dict; namedtuples are special cased.
One reasonable way to handle this is to provide default callbacks for types we encounter that don't work with dict (using cloudpickle or a custom serializer/deserializer), there are already examples of this in the repo, and use dict for anything else. That worked well for us so far. If this strategy doesn't work for the user, they have full flexibility by providing their own serialization context.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sufficiently advanced stuff that I think it's not unreasonable to ask people to explicitly use cloudpickle in a case like this. I would be OK if we changed this patch to do that to get the build passing, and we can always move on later
ebfc414 to
4e71bd3
Compare
|
The tests fail here, need to use cloudpickle, I think |
|
Okay, I'll check. |
Change-Id: Id4423a228ae2388c3e3f75d5650f0f0126fa9cc8
4e71bd3 to
927f154
Compare
|
@pcmoritz @robertnishihara does this look ok? |
|
Looks good to me. |
|
+1 LGTM |
|
Thanks @wesm! |
This closes ARROW-1758.