small fixes in fusion_compiler#7776
Conversation
torch/csrc/jit/fusion_compiler.cpp
Outdated
| std::unique_lock<std::mutex> cudaFreeMutexLock( | ||
| *(THCCachingAllocator_getCudaFreeMutex())); | ||
| cudaFree(0); | ||
| cudaFreeMutexLock.unlock(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/csrc/jit/fusion_compiler.cpp
Outdated
| TORCH_CU_CHECK(cuCtxGetCurrent(&pctx)); | ||
| if (!pctx) { | ||
| std::unique_lock<std::mutex> cudaFreeMutexLock( | ||
| *(THCCachingAllocator_getCudaFreeMutex())); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
* origin: [Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566) Call grad_mode.py context managers as decorators (pytorch#7737) catch CPU tensors in checkSameGPU (fixes pytorch#7689) (pytorch#7767) Mark stack as non-executable in NNPACK (pytorch#7752) small fixes in fusion_compiler (pytorch#7776) Run clang-format on c10d (pytorch#7791)
* small fixes in fusion_compiler * address review comments
What's actually needed to implement that for MPS? |
|
just got this error: Error occurred when executing AIO_Preprocessor: The operator 'aten::upsample_bicubic2d.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on #77764. As a temporary fix, you can set the environment variable |
Don't call cudaFree unconditionally, guard cudaFree call on cudaFreeMutex, submit kernel to current stream.