[VLM] Replace decord with torchcodec for video decoding by JustinTong0323 · Pull Request #20055 · sgl-project/sglang

JustinTong0323 · 2026-03-06T18:33:10Z

Summary

Replace unmaintained decord with PyTorch-official torchcodec for video decoding.

Why

decord is unmaintained (3+ years), causes segfaults, hangs, and is unpicklable
torchcodec is actively maintained, supports GPU decoding, and was already a dependency (but unused)

What

New VideoDecoderWrapper (python/sglang/srt/utils/video_decoder.py): unified adapter that abstracts torchcodec vs decord behind one interface. No _HAS_TORCHCODEC checks scattered in consumer code.
GPU video decoding: supports device="cuda" via torchcodec's beta CUDA backend with cached initialization
ARM fallback: graceful fallback to decord on Linux ARM where torchcodec has no prebuilt wheels. Context manager support for reliable temp file cleanup.
Simplified input handling: URLs and base64 inputs resolved to bytes directly (torchcodec accepts bytes natively), avoiding unnecessary temp files

Files changed

New: video_decoder.py (VideoDecoderWrapper)
common.py: load_video(), encode_video(), sample_video_frames()
Processors: internvl.py, qwen_vl.py, base_processor.py, nano_nemotron_vl.py
Tests and examples updated
pyproject*.toml: torchcodec bumped to 0.9.1, decord2 conditional for ARM only

Refs: #3885

Co-authored-by: BakerBunker 17872844+BakerBunker@users.noreply.github.com

Test plan

Test video loading with torchcodec on x86_64
Test GPU video decoding (device="cuda")
Verify fallback works when torchcodec is unavailable (ARM)
Run VLM tests (chunked prefill, InternVL, Qwen-VL)

torchcodec 0.8.0 is only compatible with torch 2.6/2.7. Updated to 0.10.0 which supports torch 2.9.x. Also added ARM fallback with av package to main pyproject.toml (was already present in xpu.toml). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

decord is unmaintained (3+ years), causes segfaults on import order, hangs on certain videos, and its VideoReader is unpicklable (blocking multiprocess parallelism). torchcodec is already a declared dependency but was unused. This commit migrates all video decoding from decord to torchcodec's VideoDecoder API. Key changes: - load_video(), encode_video(), sample_video_frames() in common.py - All processor files (qwen_vl, internvl, base_processor, nano_nemotron) - Test and example files - Remove decord2/decord from all pyproject*.toml dependencies Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-06T18:35:53Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the video decoding infrastructure by transitioning from the unmaintained decord library to the actively developed and PyTorch-official torchcodec. This change addresses stability issues like segfaults and unpicklability associated with decord, enhancing the robustness and maintainability of video processing. A conditional fallback mechanism ensures continued support for Linux ARM architectures where torchcodec binaries are not readily available, maintaining broad compatibility.

Highlights

Primary Video Decoder Migration: Replaced the unmaintained decord library with torchcodec as the primary video decoder across the codebase, addressing stability issues and leveraging an actively maintained, PyTorch-official solution.
Platform Compatibility: Implemented a graceful fallback mechanism to decord specifically for Linux ARM platforms, where torchcodec currently lacks prebuilt wheels, ensuring continued functionality.
Dependency Management Updates: Adjusted pyproject.toml files to conditionally include decord2 only for Linux ARM and updated the torchcodec version to 0.10.0, along with adding av as a conditional dependency for ARM.
Codebase Refactoring: Updated video loading, encoding, and frame sampling logic in various multimodal processors (internvl.py, qwen_vl.py, base_processor.py, nano_nemotron_vl.py) and utility functions (common.py) to integrate torchcodec with decord as a fallback.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/runtime/multimodal/llava_onevision_server.py
- Removed decord imports and added torchcodec.decoders.VideoDecoder.
- Updated prepare_video_messages to use VideoDecoder for video loading and frame extraction.
python/pyproject.toml
- Modified decord2 dependency to be conditional for Linux ARM platforms.
- Updated torchcodec version from 0.8.0 to 0.10.0.
- Added av as a conditional dependency for Linux ARM platforms.
python/pyproject_cpu.toml
- Removed decord dependency.
python/pyproject_npu.toml
- Removed decord2 dependency.
python/pyproject_other.toml
- Removed decord2 dependency.
python/pyproject_xpu.toml
- Updated torchcodec version from 0.8.0 to 0.10.0.
- Removed decord dependency.
python/sglang/check_env.py
- Replaced decord2 with torchcodec in the list of checked dependencies.
python/sglang/srt/multimodal/processors/base_processor.py
- Replaced direct decord import with conditional import of _HAS_TORCHCODEC.
- Modified get_estimated_frames_list to use torchcodec.decoders.VideoDecoder if available, otherwise fallback to decord.VideoReader.
python/sglang/srt/multimodal/processors/internvl.py
- Introduced _HAS_TORCHCODEC flag based on torchcodec import success.
- Updated _open_video_reader to return VideoDecoder if torchcodec is available, otherwise VideoReader.
- Adjusted video object type checking in process_qwen_mm_data_async to account for torchcodec.VideoDecoder and decord.VideoReader.
- Changed frame extraction from frame.asnumpy() to frame.numpy() for torchcodec compatibility.
python/sglang/srt/multimodal/processors/nano_nemotron_vl.py
- Added conditional import for VideoDecoder (either torchcodec or decord) based on _HAS_TORCHCODEC.
- Modified parse_video to handle frame extraction and FPS calculation differently for torchcodec and decord.
python/sglang/srt/multimodal/processors/qwen_vl.py
- Introduced _HAS_TORCHCODEC flag based on torchcodec import success.
- Updated preprocess_video to use torchcodec.VideoDecoder for frame extraction and FPS calculation when available, falling back to decord.VideoReader.
python/sglang/srt/utils/common.py
- Introduced _HAS_TORCHCODEC flag and conditional import for VideoDecoder.
- Renamed original load_video to _load_video_decord and made it a fallback.
- Implemented a new load_video function that prioritizes torchcodec.VideoDecoder and handles various input types (bytes, URLs, data URIs, file paths).
- Updated sample_video_frames to conditionally get average FPS from torchcodec.metadata.average_fps or decord.get_avg_fps().
- Modified encode_video to use torchcodec.VideoDecoder for frame sampling and conversion to PIL Images, with decord as a fallback.
python/sglang/test/vlm_utils.py
- Replaced decord import with torchcodec.decoders.VideoDecoder.
- Updated prepare_video_images_messages to use VideoDecoder for video loading and frame extraction.
test/registered/vlm/test_vision_chunked_prefill.py
- Replaced decord import with torchcodec.decoders.VideoDecoder.
- Updated prepare_video_messages to use VideoDecoder for video loading and frame extraction.

Activity

Test video loading with torchcodec on x86_64.
Verify fallback mechanism works when torchcodec is unavailable.
Run VLM tests including chunked prefill, InternVL, and Qwen-VL.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

torchcodec has no prebuilt wheels for Linux aarch64/arm64. Add graceful fallback to decord on those platforms so video loading still works. decord2 is kept as a conditional dependency for Linux ARM only.

gemini-code-assist

Code Review

This pull request successfully replaces the unmaintained decord library with the officially supported torchcodec for video decoding, providing a fallback for ARM Linux platforms where torchcodec is not available. The changes are consistently applied across the codebase, including examples, tests, and dependency configurations. My review focuses on improving maintainability by reducing code duplication in several areas and addressing potential ZeroDivisionError bugs that could arise from video files with a reported framerate of zero.

_{Note: Security Review did not run due to the size of the PR.}

Co-Authored-By: BakerBunker <17872844+BakerBunker@users.noreply.github.com>

- VideoDecoderWrapper accepts device="cuda" for torchcodec GPU decoding via set_cuda_backend("beta"), falls back to CPU silently - _normalize_video_input returns bytes for URLs/base64 instead of temp files, since both torchcodec and the wrapper handle bytes natively - load_video passes use_gpu through to wrapper device parameter

…2.10)

JustinTong0323 · 2026-03-06T19:38:14Z

Review & Test Results (Updated — post-revision)

Environment

GPU: NVIDIA B200 (2x, GPUs 6-7)
Python 3.12, torch 2.9.1+cu128, torchcodec 0.9.1, flashinfer 0.6.4

Unit Tests (all passed ✅)

torchcodec import & basic API — VideoDecoder, get_frames_at, metadata.average_fps ✅
VideoDecoderWrapper — file path ✅, bytes ✅, len() ✅, avg_fps ✅, get_frames_at() ✅, __getitem__ ✅
Context manager (with VideoDecoderWrapper(...) as d) — ✅
CUDA backend caching (_cuda_backend_enabled global) — ✅
load_video() — file path ✅, bytes ✅, URL ✅
encode_video() — returns PIL frames ✅
sample_video_frames() — frame sampling ✅
_normalize_video_input() — URL→bytes ✅, base64→bytes ✅, file path passthrough ✅

End-to-End VLM Inference (passed ✅)

Launched lmms-lab/llava-onevision-qwen2-7b-ov server and tested:

/v1/chat/completions with 8 torchcodec-decoded video frames → 200 OK, coherent video description
/generate endpoint with 4 frames via base64 → 200 OK, correct output

Test video: jobs.mp4 (524 frames, 30fps, 1024×576)

JustinTong0323 · 2026-03-06T20:19:30Z

/tag-run-ci-label

torchcodec import triggers libtorchcodec .so loading which raises RuntimeError (not ImportError) when FFmpeg shared libraries are unavailable (e.g. CI Docker images). Catch both exceptions to gracefully fall back to decord.

torchcodec requires FFmpeg shared libraries at import time. Without them, importing torchcodec raises RuntimeError (not ImportError). - Add ffmpeg + dev libs to CI apt-get install - Catch RuntimeError alongside ImportError in video_decoder.py

JustinTong0323 · 2026-03-06T21:47:30Z

/rerun-failed-ci

JustinTong0323 · 2026-03-06T21:50:00Z

/rerun-failed-ci

JustinTong0323 · 2026-03-06T21:55:43Z

/rerun-failed-ci

JustinTong0323 · 2026-03-06T22:56:45Z

/rerun-failed-ci

…irect torchcodec import The test was importing torchcodec directly, causing ModuleNotFoundError in CI environments where torchcodec is not installed (e.g., AMD runners). Switch to VideoDecoderWrapper which handles the torchcodec/decord fallback.

Update vlm_utils.py and llava_onevision_server.py example to use VideoDecoderWrapper instead of importing torchcodec directly, ensuring graceful fallback to decord on platforms where torchcodec is unavailable.

JustinTong0323 · 2026-03-07T00:30:45Z

/rerun-failed-ci

JustinTong0323 · 2026-03-07T01:20:52Z

/rerun-failed-ci

JustinTong0323 · 2026-03-07T03:23:38Z

/rerun-failed-ci

JustinTong0323 · 2026-03-07T08:22:51Z

/rerun-failed-ci

yuan-luo · 2026-03-08T01:55:29Z

/tag-and-rerun-ci

Use torchcodec's AudioDecoder for audio loading, which handles decoding, resampling, and channel conversion in a single step. This aligns audio loading with the video decoding migration to torchcodec (sgl-project#20055) and removes the soundfile/torchaudio dependency from the audio path.

…20055) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>

…20055) Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>

JustinTong0323 and others added 2 commits March 6, 2026 18:17

github-actions Bot added dependencies Pull requests that update a dependency file Multi-modal multi-modal language model npu labels Mar 6, 2026

Add decord fallback for platforms without torchcodec (Linux ARM)

42f56ac

torchcodec has no prebuilt wheels for Linux aarch64/arm64. Add graceful fallback to decord on those platforms so video loading still works. decord2 is kept as a conditional dependency for Linux ARM only.

JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 9507c13 to 42f56ac Compare March 6, 2026 18:45

gemini-code-assist Bot reviewed Mar 6, 2026

View reviewed changes

JustinTong0323 and others added 2 commits March 6, 2026 19:10

Address review: reduce duplication and add ZeroDivisionError guards

39fc7bd

Refactor: unified VideoDecoderWrapper to replace _HAS_TORCHCODEC checks

c09c35f

Co-Authored-By: BakerBunker <17872844+BakerBunker@users.noreply.github.com>

JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 0062bf7 to c09c35f Compare March 6, 2026 19:17

JustinTong0323 added 2 commits March 6, 2026 19:17

Fix torchcodec version: 0.9.1 for torch 2.9.x (0.10.0 requires torch …

b9b0be1

…2.10)

github-actions Bot added the documentation Improvements or additions to documentation label Mar 6, 2026

Fix lint: isort, ruff, black formatting

fa6005e

JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 67c60dc to fa6005e Compare March 6, 2026 19:30

Cache CUDA backend check and add context manager for temp file cleanup

0eaf63b

JustinTong0323 mentioned this pull request Mar 6, 2026

[Bugfix] Bump torchcodec 0.8.0 to 0.9.1 for torch 2.9.x compatibility #20053

Closed

2 tasks

JustinTong0323 marked this pull request as ready for review March 6, 2026 20:19

JustinTong0323 requested review from Fridge003, ispobock, merrymercy, mickqian, yhyang201 and yuan-luo as code owners March 6, 2026 20:19

github-actions Bot added the run-ci label Mar 6, 2026

JustinTong0323 mentioned this pull request Mar 6, 2026

Use torchcodec for audio and video loading #20057

Closed

5 tasks

JustinTong0323 added 2 commits March 6, 2026 20:37

Fix torchcodec import when FFmpeg is missing

6cc589f

torchcodec import triggers libtorchcodec .so loading which raises RuntimeError (not ImportError) when FFmpeg shared libraries are unavailable (e.g. CI Docker images). Catch both exceptions to gracefully fall back to decord.

JustinTong0323 force-pushed the fix/decord-to-torchcodec branch from 990330a to 2c8198b Compare March 6, 2026 20:37

JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026

Revert pyproject changes (belongs in PR sgl-project#20055, not here)

a86cc81

JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Mar 6, 2026

Revert pyproject changes (belongs in PR sgl-project#20055, not here)

9ea0eba

Fix test_video_utils: DummyVideo.get_avg_fps() -> avg_fps property

c75b73c

JustinTong0323 enabled auto-merge (squash) March 6, 2026 21:28

JustinTong0323 added 3 commits March 6, 2026 23:24

Fix remaining direct torchcodec imports to use VideoDecoderWrapper

abfce7b

Update vlm_utils.py and llava_onevision_server.py example to use VideoDecoderWrapper instead of importing torchcodec directly, ensuring graceful fallback to decord on platforms where torchcodec is unavailable.

Fix isort: add blank line before sglang import in example

1566926

yhyang201 approved these changes Mar 7, 2026

View reviewed changes

yuan-luo approved these changes Mar 8, 2026

View reviewed changes

ispobock disabled auto-merge March 9, 2026 11:23

ispobock merged commit 4a75799 into sgl-project:main Mar 9, 2026
677 of 765 checks passed

JustinTong0323 mentioned this pull request Mar 9, 2026

Replace soundfile+torchaudio with torchcodec AudioDecoder in load_audio #20190

Merged

8 tasks

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[VLM] Replace decord with torchcodec for video decoding (sgl-project#…

b26dfc0

…20055) Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>

Conversation

JustinTong0323 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What

Files changed

Test plan

Uh oh!

gemini-code-assist Bot commented Mar 6, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JustinTong0323 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review & Test Results (Updated — post-revision)

Environment

Unit Tests (all passed ✅)

End-to-End VLM Inference (passed ✅)

Uh oh!

JustinTong0323 commented Mar 6, 2026

Uh oh!

JustinTong0323 commented Mar 6, 2026

Uh oh!

JustinTong0323 commented Mar 6, 2026

Uh oh!

JustinTong0323 commented Mar 6, 2026

Uh oh!

JustinTong0323 commented Mar 6, 2026

Uh oh!

JustinTong0323 commented Mar 7, 2026

Uh oh!

JustinTong0323 commented Mar 7, 2026

Uh oh!

JustinTong0323 commented Mar 7, 2026

Uh oh!

JustinTong0323 commented Mar 7, 2026

Uh oh!

yuan-luo commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JustinTong0323 commented Mar 6, 2026 •

edited

Loading

JustinTong0323 commented Mar 6, 2026 •

edited

Loading