Skip to content

fix serialization of nn.Parameter with dill#10296

Closed
elanmart wants to merge 3 commits intopytorch:masterfrom
elanmart:fix-dill-serialization
Closed

fix serialization of nn.Parameter with dill#10296
elanmart wants to merge 3 commits intopytorch:masterfrom
elanmart:fix-dill-serialization

Conversation

@elanmart
Copy link
Contributor

@elanmart elanmart commented Aug 7, 2018

Should resolve #9981.

@elanmart
Copy link
Contributor Author

elanmart commented Aug 7, 2018

@pytorchbot retest this please

1 similar comment
@elanmart
Copy link
Contributor Author

elanmart commented Aug 7, 2018

@pytorchbot retest this please

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SsnL has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Aug 7, 2018

This will lose backward hooks, you'll need to serialize (and reconstitute them as well).

@ezyang
Copy link
Contributor

ezyang commented Aug 7, 2018

See also #10220, where we have a parallel code path specifically for the multiprocessing serialization case.

@ailzhang
Copy link
Contributor

@elanmart Could you update the diff to include backward hooks like here? https://github.com/pytorch/pytorch/pull/10220/files#diff-0d553ed0e9ca41fdac21632b4d503a10R41

@elanmart
Copy link
Contributor Author

@ailzhang please excuse my late response, I was having a few days off. Yes, of course I'll do my best to fix that.

@ssnl
Copy link
Collaborator

ssnl commented Aug 28, 2018

No rush. Just wondering if there is any progress on this?

@elanmart elanmart force-pushed the fix-dill-serialization branch from 211f606 to f679203 Compare August 28, 2018 21:46
@elanmart
Copy link
Contributor Author

@ssnl well yeah there's really no excuse, I lost track of this PR, which kept people waiting.

I'm not sure how the multiprocessing codepath should affect the solution here. Is sticking an extra classmethod to the nn.Parameter an acceptable solution?

@ssnl
Copy link
Collaborator

ssnl commented Aug 29, 2018

@elanmart Adding a classmethod is fine by me, but it seems to break other things (see CI).

@elanmart
Copy link
Contributor Author

@ssnl the CI seems to run fine now, what do you think?

Copy link
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I wonder if we should install dill in CI and add a test...

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

soumith is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@elanmart elanmart deleted the fix-dill-serialization branch September 2, 2018 14:44
PenghuiCheng pushed a commit to PenghuiCheng/pytorch that referenced this pull request Sep 11, 2018
Summary:
Should resolve pytorch#9981.
Pull Request resolved: pytorch#10296

Differential Revision: D9196353

Pulled By: soumith

fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279
@tscholak
Copy link

tscholak commented Oct 2, 2018

Hi @elanmart
Serialization of my model with dill works with pytorch 0.4.0, but stopped working in 0.4.1.
I just tried to serialize it with torch-nightly 1.0.0.dev20180930, but unsuccessfully so.
I hit the default recursion limit, and increasing it results in a segfault. Sounds like infinite recursion to me. Any ideas?
Thanks.

@tscholak
Copy link

tscholak commented Oct 2, 2018

I just had a look at the initial issue, #9981.
The error I'm seeing on torch-nightly 1.0.0.dev20180930 is identical.
Either your patch is not in there, or it doesn't have the desired effect.

@tscholak
Copy link

tscholak commented Oct 2, 2018

Argh, never mind. I was still on torch 0.4.1 because a dependency pulled it in.
It is really unfortunate that the nightly build of torch is not called torch, but torch-nightly...
that is a headache for package management.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torch.save with dill

7 participants