-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🚀 The feature
Create another backend for torchvision.io.write_video which uses ffmpeg-python as a backend, but which otherwise has exactly the same interface/functionality.
Motivation, pitch
torchvision.io.write_video currently calls PyAV, which in turn is a wrapper for ffmpeg. PyAV has an issue which seems still unresolved where setting the CRF (constant rate factor) through the options has no effect. This issue has been referenced as recently as March of this year. As far as I can tell, adjusting CRF is the canonical way to tune a video's level of compression. Adding support for ffmpeg-python as a backend would let users tune CRF, which would allow arbitrary levels of compression.
Alternatives
If there is some other set of options which can be passed to write_video to alter the level of compression, that would be an acceptable alternative (at least for my use-case). In this case, it would be ideal to include this alternative set of options in the write_video documentation as an example.
Additional context
I already kind of got it working in a notebook, but it's missing support for audio and such.
# Define output video parameters
output_filename = 'output_video.mp4'
fps = 30
codec = 'libx264'
# Create the input process from the NumPy array
process1 = (
ffmpeg
.input('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(video_array.shape[2], video_array.shape[1]))
.output(output_filename, pix_fmt='yuv420p', r=fps, vcodec=codec, crf=10)
.overwrite_output()
.run_async(pipe_stdin=True)
)
# Write the NumPy array to the input pipe
for frame in video_array:
process1.stdin.write(frame.tobytes())
# Close the input pipe
process1.stdin.close()
# Wait for the ffmpeg process to finish
process1.wait()
crf=10 produces something good-looking, while crf=50 produces something very compressed-looking as expected.