Add video GPU decoder by prabhat00155 · Pull Request #5019 · pytorch/vision

prabhat00155 · 2021-12-01T23:50:10Z

This PR adds support for GPU decoding in torchvision’s video reading API.
Resolves #2439 and partly addresses #4392.

This is the initial version of GPU video decoder. This can be called like this:

reader = torchvision.io.VideoReader(file_name, device="cuda:0")
for frame in reader:
  print(frames['data'])

The result after performing GPU decoding can be returned in the form of a CUDA tensor(when using use_device_frame=True) or a CPU tensor(use_device_frame=False). When use_device_frame=True, nv12 is the only supported output format, when using use_device_frame=False, nv12 and yuv420 are the supported output formats.

Work items to extend this further can be found here.

facebook-github-bot · 2021-12-01T23:50:18Z

💊 CI failures summary and remediations

As of commit de8bfbd (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

unittest_windows_cpu_py3.6 (1/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

Running setup.py install for av: finished with status 'error'

  Building wheel for av (setup.py): started

  Building wheel for av (setup.py): finished with status 'error'

  Running setup.py clean for av

Failed to build av

Installing collected packages: av

  Attempting uninstall: av

    Found existing installation: av 8.0.3

    Uninstalling av-8.0.3:

      Successfully uninstalled av-8.0.3

    Running setup.py install for av: started

    Running setup.py install for av: finished with status 'error'

  Rolling back uninstall of av

  Moving to c:\users\circleci\project\env\lib\site-packages\av-8.0.3.dist-info\

   from c:\users\circleci\project\env\lib\site-packages\~v-8.0.3.dist-info

  Moving to c:\users\circleci\project\env\lib\site-packages\av\

   from c:\users\circleci\project\env\lib\site-packages\~v

  Moving to c:\users\circleci\project\env\scripts\pyav.exe

   from C:\Users\circleci\AppData\Local\Temp\pip-uninstall-ma_nb1lw\pyav.exe


failed

unittest_macos_cpu_py3.6 (2/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

ERROR: Command errored out with exit status 1: ...hon3.6m/av Check the logs for full command output.

    creating build/lib.macosx-10.9-x86_64-3.6/av/filter
    copying av/filter/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/filter
    creating build/lib.macosx-10.9-x86_64-3.6/av/sidedata
    copying av/sidedata/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/sidedata
    creating build/lib.macosx-10.9-x86_64-3.6/av/data
    copying av/data/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/data
    running build_ext
    running config
    pkg-config is required for building PyAV
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/distiller/project/env/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-install-o8_kha3a/av_03a6e808ab7c4a1d8664aed9875b94f1/setup.py'"'"'; __file__='"'"'/private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-install-o8_kha3a/av_03a6e808ab7c4a1d8664aed9875b94f1/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-record-hxmbmkoc/install-record.txt --single-version-externally-managed --compile --install-headers /Users/distiller/project/env/include/python3.6m/av Check the logs for full command output.

��failed

CondaEnvException: Pip failed



Exited with code exit status 1

unittest_linux_cpu_py3.6 (3/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

ERROR: Command errored out with exit status 1: ...hon3.6m/av Check the logs for full command output.

    	PYAV_VERSION=8.1.0
    	PYAV_VERSION_STR="8.1.0"
    Could not find libavformat with pkg-config.
    Could not find libavcodec with pkg-config.
    Could not find libavdevice with pkg-config.
    Could not find libavutil with pkg-config.
    Could not find libavfilter with pkg-config.
    Could not find libswscale with pkg-config.
    Could not find libswresample with pkg-config.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /root/project/env/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-fwwie285/av_39e34a005f8d40268728b8b278f5770e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-fwwie285/av_39e34a005f8d40268728b8b278f5770e/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-bk_3yw3b/install-record.txt --single-version-externally-managed --compile --install-headers /root/project/env/include/python3.6m/av Check the logs for full command output.

��failed

CondaEnvException: Pip failed



Exited with code exit status 1

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

torchvision/io/__init__.py

torchvision/io/gpu_decoder.py

torchvision/io/_video_opt.py

… VideoReader API

torchvision/csrc/io/decoder/gpu/README.rst

fmassa

Thanks for all your work Prabhat!

I've left some more comments, but I think all of them can be addressed in a follow-up comment.

As we discussed in our call earlier, I've also gave a try at implementing the NV12->RGB conversion on the GPU and it avoids the 2 memcopies that we are currently doing (so a single kernel does the reading + conversion).

I'll post it in a branch soon

fmassa · 2021-12-30T16:44:28Z

torchvision/csrc/io/decoder/gpu/decoder.cpp

+  check_for_cuda_errors(cuCtxPushCurrent(cu_context), __LINE__, __FILE__);
+  check_for_cuda_errors(
+      cuvidDecodePicture(decoder, pic_params), __LINE__, __FILE__);
+  check_for_cuda_errors(cuCtxPopCurrent(NULL), __LINE__, __FILE__);


For the future: I think it might be a good idea to guard the cuCtxPushCurrent / cuCtxPopCurrent with a RAII-style guard. Something like what VPF does or (more complicated) like decord.

This guard could be better than what we currently have if what is in between the push / pop fails. In those cases, the pop won't happen, which could be problematic (although in a synthetic case I tried out it didn't seem to have been a problem).

torchvision/csrc/io/decoder/gpu/decoder.cpp

fmassa · 2021-12-30T16:48:10Z

torchvision/csrc/io/decoder/gpu/decoder.cpp

+  if (!(decode_caps.nOutputFormatMask & (1 << video_output_format))) {
+    if (decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_NV12)) {
+      video_output_format = cudaVideoSurfaceFormat_NV12;
+    } else if (
+        decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_P016)) {
+      video_output_format = cudaVideoSurfaceFormat_P016;
+    } else if (
+        decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_YUV444)) {
+      video_output_format = cudaVideoSurfaceFormat_YUV444;
+    } else if (
+        decode_caps.nOutputFormatMask &
+        (1 << cudaVideoSurfaceFormat_YUV444_16Bit)) {
+      video_output_format = cudaVideoSurfaceFormat_YUV444_16Bit;
+    } else {
+      TORCH_CHECK(false, "No supported output format found");


I believe we can clean this up now that we will only support returning RGB. But this can be done in a follow-up PR

fmassa · 2021-12-30T16:48:56Z

torchvision/csrc/io/decoder/gpu/decoder.cpp

+  video_codec = video_format->codec;
+  video_chroma_format = video_format->chroma_format;
+  bit_depth_minus8 = video_format->bit_depth_luma_minus8;
+  bytes_per_pixel = bit_depth_minus8 > 0 ? 2 : 1;
+  // Set the output surface format same as chroma format
+  switch (video_chroma_format) {
+    case cudaVideoChromaFormat_Monochrome:
+    case cudaVideoChromaFormat_420:
+      video_output_format = video_format->bit_depth_luma_minus8
+          ? cudaVideoSurfaceFormat_P016
+          : cudaVideoSurfaceFormat_NV12;
+      break;
+    case cudaVideoChromaFormat_444:
+      video_output_format = video_format->bit_depth_luma_minus8
+          ? cudaVideoSurfaceFormat_YUV444_16Bit
+          : cudaVideoSurfaceFormat_YUV444;
+      break;
+    case cudaVideoChromaFormat_422:
+      video_output_format = cudaVideoSurfaceFormat_NV12;


Maybe this can be cleaned up in the future

fmassa · 2021-12-30T16:54:38Z

torchvision/csrc/io/decoder/gpu/decoder.h

+static auto check_for_cuda_errors =
+    [](CUresult result, int line_num, std::string file_name) {


nit: we don't usually define lambdas outside of the scope of functions in PyTorch codebase. But I believe this is just a stylistic nit

torchvision/csrc/io/decoder/gpu/demuxer.h

fmassa · 2021-12-30T17:09:29Z

torchvision/csrc/io/decoder/gpu/gpu_decoder.cpp

+  auto options = torch::TensorOptions().dtype(torch::kU8).device(torch::kCUDA);
+  torch::Tensor frame = torch::zeros({0}, options);


nit: I believe you can just do

torch::Tensor frame;

now that decoder.fetch_frame() returns a tensor of the right dtype and device

torchvision/io/__init__.py

torchvision/io/_video_opt.py

fmassa · 2021-12-30T18:15:56Z

We need to check if the errors are related or not (maybe not).

Also, in a follow-up PR we will need to add CI tests for those functions. This might be a bit involved, but looks like the NVIDIA-docker has nvdec available in the container

fmassa · 2021-12-30T18:17:30Z

Looks like it's only on Python 3.6, which reached its EOL last week https://endoflife.date/python

So we should probably just remove support for Python 3.6 in torchvision main branch (as long as PyTorch also drops support for it)

Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: datumbox, prabhat00155 Differential Revision: D33405358 fbshipit-source-id: 0e6251389508309a23c7afd843f298208dcd67e8 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

Differential Revision: D33405358 Original commit changeset: 0e6251389508 Original Phabricator Diff: D33405358 fbshipit-source-id: b554aaa8003aca08826540883783644aa7eebea9

Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: NicolasHug Differential Revision: D33476941 fbshipit-source-id: e310435c966fe79ab77eaba305a03dd0af7a17a5 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>

[WIP] Add video GPU decoder

e25c169

prabhat00155 requested a review from fmassa December 1, 2021 23:50

pytorch-probot bot added the ciflow/default label Dec 1, 2021

facebook-github-bot added the cla signed label Dec 1, 2021

prabhat00155 marked this pull request as draft December 1, 2021 23:50

prabhat00155 added enhancement module: video labels Dec 1, 2021

bjuncek reviewed Dec 2, 2021

View reviewed changes

torchvision/io/__init__.py Outdated Show resolved Hide resolved

bjuncek reviewed Dec 2, 2021

View reviewed changes

torchvision/io/gpu_decoder.py Outdated Show resolved Hide resolved

prabhat00155 added 7 commits December 7, 2021 06:52

Expose use_dev_frame to python class and handle it internally

58dd405

Fixed invalid argument CUDA error

7bdf86b

Fixed empty and missing frames

86e373e

Free remaining frames in the queue

717ee01

Added nv12 to yuv420 conversion support for host frames

75f8f48

Merge branch 'master' into prabhat00155/gpu_decoder

a94cc75

Added unit test and cleaned up code

34b8205

prabhat00155 commented Dec 15, 2021

View reviewed changes

torchvision/io/_video_opt.py Outdated Show resolved Hide resolved

prabhat00155 marked this pull request as ready for review December 15, 2021 18:52

Use CUDA_HOME inside if

6672a05

prabhat00155 requested a review from bjuncek December 15, 2021 19:15

Undo commented out code

30af8ac

prabhat00155 changed the title ~~[WIP] Add video GPU decoder~~ Add video GPU decoder Dec 15, 2021

prabhat00155 added 7 commits December 15, 2021 15:49

Add Readme

4b9bab8

Remove output_format and use_device_frame optional arguments from the…

5afb6dd

… VideoReader API

Cleaned up init()

e8ae42e

Fix warnings

6433785

Fix python linter errors

962962a

Fix linter issues in setup.py

4dd1798

clang-format

116fd02

fmassa reviewed Dec 23, 2021

View reviewed changes

torchvision/csrc/io/decoder/gpu/README.rst Outdated Show resolved Hide resolved

Include ffmpeg_include_dir

f733a97

fmassa reviewed Dec 23, 2021

View reviewed changes

torchvision/csrc/io/decoder/gpu/README.rst Show resolved Hide resolved

prabhat00155 added 10 commits December 23, 2021 08:28

Remove space

d794cb1

Removed use of runtime_error

53a20b2

Update Readme

7ca13b7

Check for bsf.h

83ac2b1

Fixed merge conflicts

caf45fd

Change struct initialisation style

d69f820

Clean-up get_operating_point

5c5162e

Make variable naming convention uniform

83d84b0

Move checking for bsf.h around

d8d0fb5

Fix linter error

559639e

prabhat00155 requested a review from fmassa December 26, 2021 16:40

fmassa approved these changes Dec 30, 2021

View reviewed changes

Merge branch 'main' into prabhat00155/gpu_decoder

de8bfbd

fmassa merged commit 64d21d1 into pytorch:main Dec 30, 2021

prabhat00155 deleted the prabhat00155/gpu_decoder branch December 30, 2021 19:20

This was referenced Dec 30, 2021

guard cuCtxPushCurrent / cuCtxPopCurrent with a RAII-style guard #5144

Open

Explore the possibility of using cuda device instead of device index #5146

Closed

Run GPU decoding tests in CI #5147

Open

GPU decoder refactoring #5148

Open

This was referenced Feb 25, 2022

Update attribute name for readability #5484

Merged

Move VideoReader out of __init__ #5495

Merged

Remove unnecessary initialisation #5507

Merged

		static auto check_for_cuda_errors =
		[](CUresult result, int line_num, std::string file_name) {

		auto options = torch::TensorOptions().dtype(torch::kU8).device(torch::kCUDA);
		torch::Tensor frame = torch::zeros({0}, options);

Conversation

prabhat00155 commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 3 new failures recognized by patterns

unittest_windows_cpu_py3.6 (1/3)

unittest_macos_cpu_py3.6 (2/3)

unittest_linux_cpu_py3.6 (3/3)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fmassa commented Dec 30, 2021

Uh oh!

fmassa commented Dec 30, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prabhat00155 commented Dec 1, 2021 •

edited

Loading

facebook-github-bot commented Dec 1, 2021 •

edited

Loading