Conversation
Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes pytorch#9447. Fixes pytorch#46. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2618def to
b1e7aaf
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| @@ -296,18 +296,7 @@ PyObject * THPStorage_(_setCdata)(THPStorage *self, PyObject *new_cdata) | |||
| PyObject * THPStorage_(_rootStorage)(THPStorage *self) | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| && (_handle == Py_None || PyBytes_Check(_handle)))) { | ||
| THPUtils_invalidArguments(args, NULL, "_new_shared in CUDA mode", 1, | ||
| "(int device, bytes handle, int storage_size, int offset, int view_size"); | ||
| "(int device, bytes handle, int storage_size)"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| _handle = PyBytes_FromStringAndSize((char *)&handle, CUDA_IPC_HANDLE_SIZE); | ||
| _offset = PyLong_FromSsize_t((Py_ssize_t)offset); | ||
| _offset = PyLong_FromSsize_t((Py_ssize_t)offset / sizeof(real)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if (offset != 0 || view_size != storage_size) { | ||
| return THPStorage_(newTHView)(base.get(), offset, view_size); | ||
| } | ||
| base->flag = TH_STORAGE_REFCOUNTED; // NB: Not resizable |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/serialization.py
Outdated
| deserialized_objects[target_cdata] = root[offset:offset + size] | ||
| if offset != 0 or size != root.size(): | ||
| warnings.warn("Detected storage view in legacy serialized data: " | ||
| "storages are no longer natively supported, so we are making " |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/serialization.py
Outdated
| tensor = torch._utils._rebuild_tensor(root, offset, (size,), (1,)) | ||
| obj = tensor.clone().storage() | ||
| else: | ||
| obj = root[offset:offset + size] |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes #9447. Fixes #46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: pytorch/pytorch#9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes pytorch#9447. Fixes pytorch#46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: pytorch#9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes pytorch#9447. Fixes pytorch#46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: pytorch#9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes pytorch#9447. Fixes pytorch#46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: pytorch#9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97
In pytorch#9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In pytorch#9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes pytorch#10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Summary: In #9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In #9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes #10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: #11314 Reviewed By: ailzhang Differential Revision: D9671966 Pulled By: ezyang fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
Summary: In pytorch#9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In pytorch#9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes pytorch#10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: pytorch#11314 Reviewed By: ailzhang Differential Revision: D9671966 Pulled By: ezyang fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
Storage views were previously used to implement CUDA IPC sharing,
but they weren't necessary. The new strategy is described in
Note [CUDA IPC and the caching allocator].
This also fixes an unrelated bug, where we weren't actually using
the Tensor forking pickler, because we didn't register a pickler
for torch.Tensor.
Fixes #9447. Fixes #46.
Signed-off-by: Edward Z. Yang ezyang@fb.com
CC @apaszke