Avoid unnecessary copy in TensorSource#8849
Conversation
2b3b31f to
636a787
Compare
636a787 to
e483f51
Compare
|
Hi @ysiraichi, just follow up on offline discussion on the copy operation. PTAL at the PR, thanks! |
ysiraichi
left a comment
There was a problem hiding this comment.
As a side note, we can use the DLPack machinery for doing the CUDA to XLA:CUDA transfer (that wasn't implemented at the time I worked on this). I will open an issue for this.
| // The purposes of copy are: | ||
| // 1. Ensure the memory is contiguous, which is expected by PJRT. | ||
| // 2. Move CUDA tensor to CPU since we cannot pass CUDA memory to PJRT now. | ||
| // 3. Cast data type. | ||
| // We can avoid if copy is not needed. | ||
| if (tensor.device() == at::kCPU && tensor.is_contiguous() && | ||
| tensor.dtype() == target_torch_type) { | ||
| tensor_ = std::move(tensor); | ||
| } else { | ||
| // TODO(ysiraichi): check, first, if tensor lives in a device that the | ||
| // current PjRt client has access. If so, we don't need to go through the | ||
| // CPU. | ||
| tensor_ = std::move(tensor.to( | ||
| at::TensorOptions().device(at::kCPU).dtype(target_torch_type), | ||
| /*non_blocking=*/false, | ||
| /*copy=*/true, at::MemoryFormat::Contiguous)); | ||
| } |
There was a problem hiding this comment.
As far as I understand it, tensor.to(...) (without the copy argument) already checks whether it should actually copy or not. So, what do you think of reverting to the old tensor.to(...) usage, but removing the copy argument, instead?
There was a problem hiding this comment.
Hi @ysiraichi, I didn't find a tensor.to(...) without the copy arg in C++, is it only in python?
There was a problem hiding this comment.
You are right. But, I think we can just /* copy= */false.
|
In the |
pgmoka
left a comment
There was a problem hiding this comment.
Left one small NIT. Otherwise, LGTM. If you could address the NIT before submission, that might be nice.
| tensor_ = std::move(tensor); | ||
| } else { | ||
| TORCH_LAZY_COUNTER("AtenSourceTensorCopy", 1); | ||
| // TODO(ysiraichi): check, first, if tensor lives in a device that the |
There was a problem hiding this comment.
NIT: I personally prefer to have TODOs linked to issues, and then having those assigned to people. That way, things can be more easily followed-up if a contributor is no longer active
|
@ysiraichi @tengyifei Actually took a 2nd thought on this. Skipping the copy seems to be unsafe if the underlying PJRT transfer is async. |
|
Hmm... I don't think this is the case. That's because PyTorch tensors are ref-counted in the C++ side (unless otherwise specified). So, if we hold a C++ |
|
By the way, I still don't think you need the if-else there. I believe you can just leave the old |
Thank you so much @ysiraichi for the suggestion! Updated accordingly. |
This reverts commit 8dc5b49.
This reverts commit 8dc5b49.
Whenever tensor is transfered to our runtime from torch_xla, tensor is copied. This causes unnecessary data duplication on our side. This change disables tensor copy if it can be borrowed. Note that PJRT expects data to be contiguous in memory, which is not always true on pytorch side, so copies can not be completely avoided. Since this change has been checked in before and later reverted, we need to test whether this will work in our environment. Please take a look at these for more info: pytorch#8849 pytorch#9379 pytorch#9378
Whenever tensor is transfered to our runtime from torch_xla, tensor is copied. This causes unnecessary data duplication on our side. This change disables tensor copy if it can be borrowed. Note that PJRT expects data to be contiguous in memory, which is not always true on pytorch side, so copies can not be completely avoided. Since this change has been checked in before and later reverted, we need to test whether this will work in our environment. Please take a look at these for more info: pytorch#8849 pytorch#9379 pytorch#9378
Avoid
at::Tensorcopy inTensorSourceif it's not necessary.The copy operations are needed under 2 cases:
The copy operation needs to be blocking, since the transfer operation depends on the copied tensor.