Skip to content

[VLM] Replace decord with torchcodec for video decoding#20055

Merged
ispobock merged 17 commits intosgl-project:mainfrom
JustinTong0323:fix/decord-to-torchcodec
Mar 9, 2026
Merged

[VLM] Replace decord with torchcodec for video decoding#20055
ispobock merged 17 commits intosgl-project:mainfrom
JustinTong0323:fix/decord-to-torchcodec

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 commented Mar 6, 2026

Summary

Replace unmaintained decord with PyTorch-official torchcodec for video decoding.

Why

  • decord is unmaintained (3+ years), causes segfaults, hangs, and is unpicklable
  • torchcodec is actively maintained, supports GPU decoding, and was already a dependency (but unused)

What

  • New VideoDecoderWrapper (python/sglang/srt/utils/video_decoder.py): unified adapter that abstracts torchcodec vs decord behind one interface. No _HAS_TORCHCODEC checks scattered in consumer code.
  • GPU video decoding: supports device="cuda" via torchcodec's beta CUDA backend with cached initialization
  • ARM fallback: graceful fallback to decord on Linux ARM where torchcodec has no prebuilt wheels. Context manager support for reliable temp file cleanup.
  • Simplified input handling: URLs and base64 inputs resolved to bytes directly (torchcodec accepts bytes natively), avoiding unnecessary temp files

Files changed

  • New: video_decoder.py (VideoDecoderWrapper)
  • common.py: load_video(), encode_video(), sample_video_frames()
  • Processors: internvl.py, qwen_vl.py, base_processor.py, nano_nemotron_vl.py
  • Tests and examples updated
  • pyproject*.toml: torchcodec bumped to 0.9.1, decord2 conditional for ARM only

Refs: #3885

Co-authored-by: BakerBunker 17872844+BakerBunker@users.noreply.github.com

Test plan

  • Test video loading with torchcodec on x86_64
  • Test GPU video decoding (device="cuda")
  • Verify fallback works when torchcodec is unavailable (ARM)
  • Run VLM tests (chunked prefill, InternVL, Qwen-VL)

JustinTong0323 and others added 2 commits March 6, 2026 18:17
torchcodec 0.8.0 is only compatible with torch 2.6/2.7. Updated to
0.10.0 which supports torch 2.9.x. Also added ARM fallback with av
package to main pyproject.toml (was already present in xpu.toml).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
decord is unmaintained (3+ years), causes segfaults on import order,
hangs on certain videos, and its VideoReader is unpicklable (blocking
multiprocess parallelism). torchcodec is already a declared dependency
but was unused. This commit migrates all video decoding from decord to
torchcodec's VideoDecoder API.

Key changes:
- load_video(), encode_video(), sample_video_frames() in common.py
- All processor files (qwen_vl, internvl, base_processor, nano_nemotron)
- Test and example files
- Remove decord2/decord from all pyproject*.toml dependencies

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added dependencies Pull requests that update a dependency file Multi-modal multi-modal language model npu labels Mar 6, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the video decoding infrastructure by transitioning from the unmaintained decord library to the actively developed and PyTorch-official torchcodec. This change addresses stability issues like segfaults and unpicklability associated with decord, enhancing the robustness and maintainability of video processing. A conditional fallback mechanism ensures continued support for Linux ARM architectures where torchcodec binaries are not readily available, maintaining broad compatibility.

Highlights

  • Primary Video Decoder Migration: Replaced the unmaintained decord library with torchcodec as the primary video decoder across the codebase, addressing stability issues and leveraging an actively maintained, PyTorch-official solution.
  • Platform Compatibility: Implemented a graceful fallback mechanism to decord specifically for Linux ARM platforms, where torchcodec currently lacks prebuilt wheels, ensuring continued functionality.
  • Dependency Management Updates: Adjusted pyproject.toml files to conditionally include decord2 only for Linux ARM and updated the torchcodec version to 0.10.0, along with adding av as a conditional dependency for ARM.
  • Codebase Refactoring: Updated video loading, encoding, and frame sampling logic in various multimodal processors (internvl.py, qwen_vl.py, base_processor.py, nano_nemotron_vl.py) and utility functions (common.py) to integrate torchcodec with decord as a fallback.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/runtime/multimodal/llava_onevision_server.py
    • Removed decord imports and added torchcodec.decoders.VideoDecoder.
    • Updated prepare_video_messages to use VideoDecoder for video loading and frame extraction.
  • python/pyproject.toml
    • Modified decord2 dependency to be conditional for Linux ARM platforms.
    • Updated torchcodec version from 0.8.0 to 0.10.0.
    • Added av as a conditional dependency for Linux ARM platforms.
  • python/pyproject_cpu.toml
    • Removed decord dependency.
  • python/pyproject_npu.toml
    • Removed decord2 dependency.
  • python/pyproject_other.toml
    • Removed decord2 dependency.
  • python/pyproject_xpu.toml
    • Updated torchcodec version from 0.8.0 to 0.10.0.
    • Removed decord dependency.
  • python/sglang/check_env.py
    • Replaced decord2 with torchcodec in the list of checked dependencies.
  • python/sglang/srt/multimodal/processors/base_processor.py
    • Replaced direct decord import with conditional import of _HAS_TORCHCODEC.
    • Modified get_estimated_frames_list to use torchcodec.decoders.VideoDecoder if available, otherwise fallback to decord.VideoReader.
  • python/sglang/srt/multimodal/processors/internvl.py
    • Introduced _HAS_TORCHCODEC flag based on torchcodec import success.
    • Updated _open_video_reader to return VideoDecoder if torchcodec is available, otherwise VideoReader.
    • Adjusted video object type checking in process_qwen_mm_data_async to account for torchcodec.VideoDecoder and decord.VideoReader.
    • Changed frame extraction from frame.asnumpy() to frame.numpy() for torchcodec compatibility.
  • python/sglang/srt/multimodal/processors/nano_nemotron_vl.py
    • Added conditional import for VideoDecoder (either torchcodec or decord) based on _HAS_TORCHCODEC.
    • Modified parse_video to handle frame extraction and FPS calculation differently for torchcodec and decord.
  • python/sglang/srt/multimodal/processors/qwen_vl.py
    • Introduced _HAS_TORCHCODEC flag based on torchcodec import success.
    • Updated preprocess_video to use torchcodec.VideoDecoder for frame extraction and FPS calculation when available, falling back to decord.VideoReader.
  • python/sglang/srt/utils/common.py
    • Introduced _HAS_TORCHCODEC flag and conditional import for VideoDecoder.
    • Renamed original load_video to _load_video_decord and made it a fallback.
    • Implemented a new load_video function that prioritizes torchcodec.VideoDecoder and handles various input types (bytes, URLs, data URIs, file paths).
    • Updated sample_video_frames to conditionally get average FPS from torchcodec.metadata.average_fps or decord.get_avg_fps().
    • Modified encode_video to use torchcodec.VideoDecoder for frame sampling and conversion to PIL Images, with decord as a fallback.
  • python/sglang/test/vlm_utils.py
    • Replaced decord import with torchcodec.decoders.VideoDecoder.
    • Updated prepare_video_images_messages to use VideoDecoder for video loading and frame extraction.
  • test/registered/vlm/test_vision_chunked_prefill.py
    • Replaced decord import with torchcodec.decoders.VideoDecoder.
    • Updated prepare_video_messages to use VideoDecoder for video loading and frame extraction.
Activity
  • Test video loading with torchcodec on x86_64.
  • Verify fallback mechanism works when torchcodec is unavailable.
  • Run VLM tests including chunked prefill, InternVL, and Qwen-VL.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

torchcodec has no prebuilt wheels for Linux aarch64/arm64. Add
graceful fallback to decord on those platforms so video loading
still works. decord2 is kept as a conditional dependency for
Linux ARM only.
@JustinTong0323 JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 9507c13 to 42f56ac Compare March 6, 2026 18:45
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully replaces the unmaintained decord library with the officially supported torchcodec for video decoding, providing a fallback for ARM Linux platforms where torchcodec is not available. The changes are consistently applied across the codebase, including examples, tests, and dependency configurations. My review focuses on improving maintainability by reducing code duplication in several areas and addressing potential ZeroDivisionError bugs that could arise from video files with a reported framerate of zero.

Note: Security Review did not run due to the size of the PR.

Comment thread python/sglang/srt/multimodal/processors/qwen_vl.py Outdated
Comment thread python/sglang/srt/multimodal/processors/nano_nemotron_vl.py Outdated
Comment thread python/sglang/srt/utils/common.py Outdated
Comment thread python/sglang/srt/utils/common.py Outdated
Comment thread python/sglang/srt/utils/common.py Outdated
@JustinTong0323 JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 0062bf7 to c09c35f Compare March 6, 2026 19:17
- VideoDecoderWrapper accepts device="cuda" for torchcodec GPU decoding
  via set_cuda_backend("beta"), falls back to CPU silently
- _normalize_video_input returns bytes for URLs/base64 instead of temp
  files, since both torchcodec and the wrapper handle bytes natively
- load_video passes use_gpu through to wrapper device parameter
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 6, 2026
@JustinTong0323 JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 67c60dc to fa6005e Compare March 6, 2026 19:30
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

JustinTong0323 commented Mar 6, 2026

Review & Test Results (Updated — post-revision)

Environment

  • GPU: NVIDIA B200 (2x, GPUs 6-7)
  • Python 3.12, torch 2.9.1+cu128, torchcodec 0.9.1, flashinfer 0.6.4

Unit Tests (all passed ✅)

  1. torchcodec import & basic APIVideoDecoder, get_frames_at, metadata.average_fps
  2. VideoDecoderWrapper — file path ✅, bytes ✅, len() ✅, avg_fps ✅, get_frames_at() ✅, __getitem__
  3. Context manager (with VideoDecoderWrapper(...) as d) — ✅
  4. CUDA backend caching (_cuda_backend_enabled global) — ✅
  5. load_video() — file path ✅, bytes ✅, URL ✅
  6. encode_video() — returns PIL frames ✅
  7. sample_video_frames() — frame sampling ✅
  8. _normalize_video_input() — URL→bytes ✅, base64→bytes ✅, file path passthrough ✅

End-to-End VLM Inference (passed ✅)

Launched lmms-lab/llava-onevision-qwen2-7b-ov server and tested:

  • /v1/chat/completions with 8 torchcodec-decoded video frames → 200 OK, coherent video description
  • /generate endpoint with 4 frames via base64 → 200 OK, correct output

Test video: jobs.mp4 (524 frames, 30fps, 1024×576)

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/tag-run-ci-label

torchcodec import triggers libtorchcodec .so loading which raises
RuntimeError (not ImportError) when FFmpeg shared libraries are
unavailable (e.g. CI Docker images). Catch both exceptions to
gracefully fall back to decord.
torchcodec requires FFmpeg shared libraries at import time. Without
them, importing torchcodec raises RuntimeError (not ImportError).

- Add ffmpeg + dev libs to CI apt-get install
- Catch RuntimeError alongside ImportError in video_decoder.py
@JustinTong0323 JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 990330a to 2c8198b Compare March 6, 2026 20:37
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026
@JustinTong0323 JustinTong0323 enabled auto-merge (squash) March 6, 2026 21:28
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

3 similar comments
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

…irect torchcodec import

The test was importing torchcodec directly, causing ModuleNotFoundError
in CI environments where torchcodec is not installed (e.g., AMD runners).
Switch to VideoDecoderWrapper which handles the torchcodec/decord fallback.
Update vlm_utils.py and llava_onevision_server.py example to use
VideoDecoderWrapper instead of importing torchcodec directly, ensuring
graceful fallback to decord on platforms where torchcodec is unavailable.
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

2 similar comments
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/rerun-failed-ci

@yuan-luo
Copy link
Copy Markdown
Collaborator

yuan-luo commented Mar 8, 2026

/tag-and-rerun-ci

@ispobock ispobock disabled auto-merge March 9, 2026 11:23
@ispobock ispobock merged commit 4a75799 into sgl-project:main Mar 9, 2026
677 of 765 checks passed
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 9, 2026
Use torchcodec's AudioDecoder for audio loading, which handles decoding,
resampling, and channel conversion in a single step. This aligns audio
loading with the video decoding migration to torchcodec (sgl-project#20055) and
removes the soundfile/torchaudio dependency from the audio path.
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 9, 2026
Use torchcodec's AudioDecoder for audio loading, which handles decoding,
resampling, and channel conversion in a single step. This aligns audio
loading with the video decoding migration to torchcodec (sgl-project#20055) and
removes the soundfile/torchaudio dependency from the audio path.
liubiyongge pushed a commit to liubiyongge/sglang that referenced this pull request Mar 13, 2026
…20055)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
…20055)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
…20055)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
…20055)

Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation Multi-modal multi-modal language model npu run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants