[Diffusion] LTX-2 Support PR1 by gmixiaojin · Pull Request #17495 · sgl-project/sglang

gmixiaojin · 2026-01-21T12:11:13Z

Motivation

Support LTX-2 Video & Audio Joint model.

Contribution

Modify entrypoints to support audio
Modify RoPE to support LTX-2's RoPE mechanism
Modify the component loader to load LTX-2's model parameter
Modify pipelines_core stages to support audio

Review Process

Ping Merge Oncalls to start the PR flow. See [PR Merge Process].
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- \tag-run-ci-label, \rerun-failed-ci, \tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-21T12:11:56Z

Summary of Changes

Hello @gmixiaojin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates full support for the LTX-2 Video & Audio Joint model, allowing the system to handle and generate both video and audio data seamlessly. The changes span across core components, from model loading and positional embeddings to the entire diffusion pipeline, ensuring that multimodal generation requests are processed efficiently and produce synchronized outputs.

Highlights

LTX-2 Model Support: Introduced comprehensive support for the LTX-2 Video & Audio Joint model, enabling the generation of synchronized video and audio outputs.
Multimodal Pipeline Stages: Added new pipeline stages including LTX2TextConnectorStage for text embedding transformation, LTX2AVLatentPreparationStage for joint video/audio latent setup, LTX2AVDenoisingStage for combined denoising, LTX2AVDecodingStage for joint decoding, and LTX2UpsamplingStage for video upsampling.
Audio Integration: Modified entrypoints and utility functions to process, save, and mux generated audio alongside video, including robust ffmpeg integration for combining audio into MP4 files.
Rotary Positional Embeddings (RoPE): Implemented LinearScalingRotaryEmbedding and updated the get_rope function to support linear scaling for RoPE, which is essential for LTX-2's architecture.
Component Loading Enhancements: Expanded the component loader to include dedicated loaders for audio VAEs, vocoders, and LTX-2 connectors, ensuring proper model parameter loading and device management.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the LTX-2 model, enabling joint video and audio generation. The changes are extensive, introducing new pipeline stages, component loaders, and updating data structures to handle audio. The implementation appears solid and well-structured. I've found a potential bug in the text encoding stage that could lead to missing attention masks, and a couple of areas for code improvement related to style and maintainability. Overall, this is a great contribution.

gemini-code-assist · 2026-01-21T12:14:25Z

+        if batch.prompt_attention_mask is None:
+            batch.prompt_attention_mask = []
            for am in prompt_masks_list:
                batch.prompt_attention_mask.append(am)


The current logic for appending attention masks will fail if batch.prompt_attention_mask is already an empty list ([]) instead of None, as the if condition would be false. A more robust approach would be to ensure the list exists and then extend it.

if batch.prompt_attention_mask is None: batch.prompt_attention_mask = [] batch.prompt_attention_mask.extend(prompt_masks_list)

gemini-code-assist · 2026-01-21T12:14:25Z

+            if batch.negative_attention_mask is None:
+                batch.negative_attention_mask = []
                for nm in neg_masks_list:
                    batch.negative_attention_mask.append(nm)


Similar to the prompt_attention_mask, the logic for negative_attention_mask is not robust. If batch.negative_attention_mask is an empty list, new masks won't be added. It's better to ensure the list exists and then extend it.

if batch.negative_attention_mask is None: batch.negative_attention_mask = [] batch.negative_attention_mask.extend(neg_masks_list)

gemini-code-assist · 2026-01-21T12:14:25Z

 def post_process_sample(
-    sample: torch.Tensor,
+    sample: Any,
    data_type: DataType,
    fps: int,
    save_output: bool = True,
-    save_file_path: str = None,
+    save_file_path: Optional[str] = None,
+    audio_sample_rate: Optional[int] = None,
 ):


The post_process_sample function contains several local imports (numpy, shutil, subprocess, tempfile, scipy.io.wavfile, imageio_ffmpeg). It's generally better practice to place all imports at the top of the file for clarity, consistency, and to avoid potential overhead from repeated imports if this function is called in a loop. Please consider moving these imports to the top of python/sglang/multimodal_gen/runtime/entrypoints/utils.py.

mickqian · 2026-01-21T13:29:45Z

        return model


+class AdapterLoader(ComponentLoader):


this file is too large, consider splitting it into a folder component_loader in a follow-up

Sure, I think we will follow-up after the PR

yhyang201 · 2026-01-22T03:12:02Z

https://github.com/sgl-project/sglang/tree/main/python/sglang/multimodal_gen/runtime/models/model_stages

Could you put the stages that are used only by LTX-2 here first? You could create an ltx_2 folder and move them there.

…pter.py

gmixiaojin · 2026-01-22T11:32:30Z

https://github.com/sgl-project/sglang/tree/main/python/sglang/multimodal_gen/runtime/models/model_stages

Could you put the stages that are used only by LTX-2 here first? You could create an ltx_2 folder and move them there.

As we discussed previously, we will refactor the code once the two PRs are merged.

gmixiaojin · 2026-01-22T11:48:59Z

lint also fixed @yhyang201 @mickqian

Remove decoding_av.py, denoising_av.py, latent_preparation_av.py, and text_connector.py from this PR, as they should be in PR2.

mickqian · 2026-01-23T01:37:07Z

/tag-and-rerun-ci

mickqian · 2026-01-23T03:14:52Z

/tag-and-rerun-ci

Co-authored-by: FlamingoPg <1106310035@qq.com>

LTX-2 PR1

30a3c65

github-actions Bot added the diffusion SGLang Diffusion label Jan 21, 2026

gemini-code-assist Bot reviewed Jan 21, 2026

View reviewed changes

gmixiaojin marked this pull request as ready for review January 21, 2026 12:17

gmixiaojin requested review from BBuf, mickqian and yhyang201 as code owners January 21, 2026 12:17

mickqian requested changes Jan 21, 2026

View reviewed changes

mickqian mentioned this pull request Jan 21, 2026

[Diffusion] LTX-2 Support #17210

Closed

18 tasks

Code Style Optimization

6e08fe8

gmixiaojin added 5 commits January 22, 2026 03:45

Refact python/sglang/multimodal_gen/runtime/entrypoints/utils.py

6121fcb

Remove all getattr function calls

b40531a

Rename python/sglang/multimodal_gen/runtime/pipelines_core/stages/ada…

62d2926

…pter.py

change variable name from frame_rate to fps

c752bb0

delete two upsampler related parts

98e2cf1

fix lint

2c960a0

Move LTX-2 stage files to PR2

5c3d9eb

Remove decoding_av.py, denoising_av.py, latent_preparation_av.py, and text_connector.py from this PR, as they should be in PR2.

mickqian approved these changes Jan 23, 2026

View reviewed changes

github-actions Bot added the run-ci label Jan 23, 2026

mickqian added high priority format Auto Format Code labels Jan 23, 2026

mickqian reviewed Jan 23, 2026

View reviewed changes

Comment thread python/sglang/cli/utils.py Outdated

restore git restore python/sglang/cli/utils.py

f636525

mickqian merged commit 797a981 into sgl-project:main Jan 24, 2026
166 of 173 checks passed

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[diffusion] model: LTX-2 (1/2) (sgl-project#17495)

b294642

Co-authored-by: FlamingoPg <1106310035@qq.com>

Conversation

gmixiaojin commented Jan 21, 2026

Motivation

Contribution

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mickqian Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

gmixiaojin Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yhyang201 commented Jan 22, 2026

Uh oh!

gmixiaojin commented Jan 22, 2026

Uh oh!

gmixiaojin commented Jan 22, 2026

Uh oh!

mickqian commented Jan 23, 2026

Uh oh!

mickqian commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants