[VLM] Replace decord with torchcodec for video decoding#20055
[VLM] Replace decord with torchcodec for video decoding#20055ispobock merged 17 commits intosgl-project:mainfrom
Conversation
torchcodec 0.8.0 is only compatible with torch 2.6/2.7. Updated to 0.10.0 which supports torch 2.9.x. Also added ARM fallback with av package to main pyproject.toml (was already present in xpu.toml). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
decord is unmaintained (3+ years), causes segfaults on import order, hangs on certain videos, and its VideoReader is unpicklable (blocking multiprocess parallelism). torchcodec is already a declared dependency but was unused. This commit migrates all video decoding from decord to torchcodec's VideoDecoder API. Key changes: - load_video(), encode_video(), sample_video_frames() in common.py - All processor files (qwen_vl, internvl, base_processor, nano_nemotron) - Test and example files - Remove decord2/decord from all pyproject*.toml dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the video decoding infrastructure by transitioning from the unmaintained Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
torchcodec has no prebuilt wheels for Linux aarch64/arm64. Add graceful fallback to decord on those platforms so video loading still works. decord2 is kept as a conditional dependency for Linux ARM only.
9507c13 to
42f56ac
Compare
There was a problem hiding this comment.
Code Review
This pull request successfully replaces the unmaintained decord library with the officially supported torchcodec for video decoding, providing a fallback for ARM Linux platforms where torchcodec is not available. The changes are consistently applied across the codebase, including examples, tests, and dependency configurations. My review focuses on improving maintainability by reducing code duplication in several areas and addressing potential ZeroDivisionError bugs that could arise from video files with a reported framerate of zero.
Note: Security Review did not run due to the size of the PR.
Co-Authored-By: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
0062bf7 to
c09c35f
Compare
- VideoDecoderWrapper accepts device="cuda" for torchcodec GPU decoding
via set_cuda_backend("beta"), falls back to CPU silently
- _normalize_video_input returns bytes for URLs/base64 instead of temp
files, since both torchcodec and the wrapper handle bytes natively
- load_video passes use_gpu through to wrapper device parameter
67c60dc to
fa6005e
Compare
Review & Test Results (Updated — post-revision)Environment
Unit Tests (all passed ✅)
End-to-End VLM Inference (passed ✅)Launched
Test video: |
|
/tag-run-ci-label |
torchcodec import triggers libtorchcodec .so loading which raises RuntimeError (not ImportError) when FFmpeg shared libraries are unavailable (e.g. CI Docker images). Catch both exceptions to gracefully fall back to decord.
torchcodec requires FFmpeg shared libraries at import time. Without them, importing torchcodec raises RuntimeError (not ImportError). - Add ffmpeg + dev libs to CI apt-get install - Catch RuntimeError alongside ImportError in video_decoder.py
990330a to
2c8198b
Compare
|
/rerun-failed-ci |
3 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
…irect torchcodec import The test was importing torchcodec directly, causing ModuleNotFoundError in CI environments where torchcodec is not installed (e.g., AMD runners). Switch to VideoDecoderWrapper which handles the torchcodec/decord fallback.
Update vlm_utils.py and llava_onevision_server.py example to use VideoDecoderWrapper instead of importing torchcodec directly, ensuring graceful fallback to decord on platforms where torchcodec is unavailable.
|
/rerun-failed-ci |
2 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/tag-and-rerun-ci |
Use torchcodec's AudioDecoder for audio loading, which handles decoding, resampling, and channel conversion in a single step. This aligns audio loading with the video decoding migration to torchcodec (sgl-project#20055) and removes the soundfile/torchaudio dependency from the audio path.
Use torchcodec's AudioDecoder for audio loading, which handles decoding, resampling, and channel conversion in a single step. This aligns audio loading with the video decoding migration to torchcodec (sgl-project#20055) and removes the soundfile/torchaudio dependency from the audio path.
…20055) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
…20055) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
…20055) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
…20055) Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Summary
Replace unmaintained
decordwith PyTorch-officialtorchcodecfor video decoding.Why
What
VideoDecoderWrapper(python/sglang/srt/utils/video_decoder.py): unified adapter that abstracts torchcodec vs decord behind one interface. No_HAS_TORCHCODECchecks scattered in consumer code.device="cuda"via torchcodec's beta CUDA backend with cached initializationFiles changed
video_decoder.py(VideoDecoderWrapper)common.py:load_video(),encode_video(),sample_video_frames()internvl.py,qwen_vl.py,base_processor.py,nano_nemotron_vl.pypyproject*.toml:torchcodecbumped to 0.9.1,decord2conditional for ARM onlyRefs: #3885
Co-authored-by: BakerBunker 17872844+BakerBunker@users.noreply.github.com
Test plan