Conversation
💊 CI failures summary and remediationsAs of commit de8bfbd (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
fmassa
left a comment
There was a problem hiding this comment.
Thanks for all your work Prabhat!
I've left some more comments, but I think all of them can be addressed in a follow-up comment.
As we discussed in our call earlier, I've also gave a try at implementing the NV12->RGB conversion on the GPU and it avoids the 2 memcopies that we are currently doing (so a single kernel does the reading + conversion).
I'll post it in a branch soon
| check_for_cuda_errors(cuCtxPushCurrent(cu_context), __LINE__, __FILE__); | ||
| check_for_cuda_errors( | ||
| cuvidDecodePicture(decoder, pic_params), __LINE__, __FILE__); | ||
| check_for_cuda_errors(cuCtxPopCurrent(NULL), __LINE__, __FILE__); |
There was a problem hiding this comment.
For the future: I think it might be a good idea to guard the cuCtxPushCurrent / cuCtxPopCurrent with a RAII-style guard. Something like what VPF does or (more complicated) like decord.
This guard could be better than what we currently have if what is in between the push / pop fails. In those cases, the pop won't happen, which could be problematic (although in a synthetic case I tried out it didn't seem to have been a problem).
| if (!(decode_caps.nOutputFormatMask & (1 << video_output_format))) { | ||
| if (decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_NV12)) { | ||
| video_output_format = cudaVideoSurfaceFormat_NV12; | ||
| } else if ( | ||
| decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_P016)) { | ||
| video_output_format = cudaVideoSurfaceFormat_P016; | ||
| } else if ( | ||
| decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_YUV444)) { | ||
| video_output_format = cudaVideoSurfaceFormat_YUV444; | ||
| } else if ( | ||
| decode_caps.nOutputFormatMask & | ||
| (1 << cudaVideoSurfaceFormat_YUV444_16Bit)) { | ||
| video_output_format = cudaVideoSurfaceFormat_YUV444_16Bit; | ||
| } else { | ||
| TORCH_CHECK(false, "No supported output format found"); |
There was a problem hiding this comment.
I believe we can clean this up now that we will only support returning RGB. But this can be done in a follow-up PR
| video_codec = video_format->codec; | ||
| video_chroma_format = video_format->chroma_format; | ||
| bit_depth_minus8 = video_format->bit_depth_luma_minus8; | ||
| bytes_per_pixel = bit_depth_minus8 > 0 ? 2 : 1; | ||
| // Set the output surface format same as chroma format | ||
| switch (video_chroma_format) { | ||
| case cudaVideoChromaFormat_Monochrome: | ||
| case cudaVideoChromaFormat_420: | ||
| video_output_format = video_format->bit_depth_luma_minus8 | ||
| ? cudaVideoSurfaceFormat_P016 | ||
| : cudaVideoSurfaceFormat_NV12; | ||
| break; | ||
| case cudaVideoChromaFormat_444: | ||
| video_output_format = video_format->bit_depth_luma_minus8 | ||
| ? cudaVideoSurfaceFormat_YUV444_16Bit | ||
| : cudaVideoSurfaceFormat_YUV444; | ||
| break; | ||
| case cudaVideoChromaFormat_422: | ||
| video_output_format = cudaVideoSurfaceFormat_NV12; |
There was a problem hiding this comment.
Maybe this can be cleaned up in the future
| static auto check_for_cuda_errors = | ||
| [](CUresult result, int line_num, std::string file_name) { |
There was a problem hiding this comment.
nit: we don't usually define lambdas outside of the scope of functions in PyTorch codebase. But I believe this is just a stylistic nit
| auto options = torch::TensorOptions().dtype(torch::kU8).device(torch::kCUDA); | ||
| torch::Tensor frame = torch::zeros({0}, options); |
There was a problem hiding this comment.
nit: I believe you can just do
torch::Tensor frame;now that decoder.fetch_frame() returns a tensor of the right dtype and device
|
We need to check if the errors are related or not (maybe not). Also, in a follow-up PR we will need to add CI tests for those functions. This might be a bit involved, but looks like the NVIDIA-docker has nvdec available in the container |
|
Looks like it's only on Python 3.6, which reached its EOL last week https://endoflife.date/python So we should probably just remove support for Python 3.6 in torchvision main branch (as long as PyTorch also drops support for it) |
Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: datumbox, prabhat00155 Differential Revision: D33405358 fbshipit-source-id: 0e6251389508309a23c7afd843f298208dcd67e8 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Differential Revision: D33405358 Original commit changeset: 0e6251389508 Original Phabricator Diff: D33405358 fbshipit-source-id: b554aaa8003aca08826540883783644aa7eebea9
Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: NicolasHug Differential Revision: D33476941 fbshipit-source-id: e310435c966fe79ab77eaba305a03dd0af7a17a5 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
This PR adds support for GPU decoding in torchvision’s video reading API.
Resolves #2439 and partly addresses #4392.
This is the initial version of GPU video decoder. This can be called like this:
The result after performing GPU decoding can be returned in the form of a CUDA tensor(when using use_device_frame=True) or a CPU tensor(use_device_frame=False). When
use_device_frame=True,nv12is the only supported output format, when usinguse_device_frame=False,nv12andyuv420are the supported output formats.Work items to extend this further can be found here.