Skip to content

[Diffusion] LTX-2 Support PR1#17495

Merged
mickqian merged 10 commits intosgl-project:mainfrom
gmixiaojin:pr1-ltx2
Jan 24, 2026
Merged

[Diffusion] LTX-2 Support PR1#17495
mickqian merged 10 commits intosgl-project:mainfrom
gmixiaojin:pr1-ltx2

Conversation

@gmixiaojin
Copy link
Copy Markdown
Contributor

Motivation

Support LTX-2 Video & Audio Joint model.

Contribution

  • Modify entrypoints to support audio
  • Modify RoPE to support LTX-2's RoPE mechanism
  • Modify the component loader to load LTX-2's model parameter
  • Modify pipelines_core stages to support audio

Review Process

  1. Ping Merge Oncalls to start the PR flow. See [PR Merge Process].
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • \tag-run-ci-label, \rerun-failed-ci, \tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label Jan 21, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @gmixiaojin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates full support for the LTX-2 Video & Audio Joint model, allowing the system to handle and generate both video and audio data seamlessly. The changes span across core components, from model loading and positional embeddings to the entire diffusion pipeline, ensuring that multimodal generation requests are processed efficiently and produce synchronized outputs.

Highlights

  • LTX-2 Model Support: Introduced comprehensive support for the LTX-2 Video & Audio Joint model, enabling the generation of synchronized video and audio outputs.
  • Multimodal Pipeline Stages: Added new pipeline stages including LTX2TextConnectorStage for text embedding transformation, LTX2AVLatentPreparationStage for joint video/audio latent setup, LTX2AVDenoisingStage for combined denoising, LTX2AVDecodingStage for joint decoding, and LTX2UpsamplingStage for video upsampling.
  • Audio Integration: Modified entrypoints and utility functions to process, save, and mux generated audio alongside video, including robust ffmpeg integration for combining audio into MP4 files.
  • Rotary Positional Embeddings (RoPE): Implemented LinearScalingRotaryEmbedding and updated the get_rope function to support linear scaling for RoPE, which is essential for LTX-2's architecture.
  • Component Loading Enhancements: Expanded the component loader to include dedicated loaders for audio VAEs, vocoders, and LTX-2 connectors, ensuring proper model parameter loading and device management.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the LTX-2 model, enabling joint video and audio generation. The changes are extensive, introducing new pipeline stages, component loaders, and updating data structures to handle audio. The implementation appears solid and well-structured. I've found a potential bug in the text encoding stage that could lead to missing attention masks, and a couple of areas for code improvement related to style and maintainability. Overall, this is a great contribution.

Comment on lines +81 to 84
if batch.prompt_attention_mask is None:
batch.prompt_attention_mask = []
for am in prompt_masks_list:
batch.prompt_attention_mask.append(am)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for appending attention masks will fail if batch.prompt_attention_mask is already an empty list ([]) instead of None, as the if condition would be false. A more robust approach would be to ensure the list exists and then extend it.

        if batch.prompt_attention_mask is None:
            batch.prompt_attention_mask = []
        batch.prompt_attention_mask.extend(prompt_masks_list)

Comment on lines +103 to 106
if batch.negative_attention_mask is None:
batch.negative_attention_mask = []
for nm in neg_masks_list:
batch.negative_attention_mask.append(nm)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the prompt_attention_mask, the logic for negative_attention_mask is not robust. If batch.negative_attention_mask is an empty list, new masks won't be added. It's better to ensure the list exists and then extend it.

            if batch.negative_attention_mask is None:
                batch.negative_attention_mask = []
            batch.negative_attention_mask.extend(neg_masks_list)

Comment on lines 52 to 59
def post_process_sample(
sample: torch.Tensor,
sample: Any,
data_type: DataType,
fps: int,
save_output: bool = True,
save_file_path: str = None,
save_file_path: Optional[str] = None,
audio_sample_rate: Optional[int] = None,
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The post_process_sample function contains several local imports (numpy, shutil, subprocess, tempfile, scipy.io.wavfile, imageio_ffmpeg). It's generally better practice to place all imports at the top of the file for clarity, consistency, and to avoid potential overhead from repeated imports if this function is called in a loop. Please consider moving these imports to the top of python/sglang/multimodal_gen/runtime/entrypoints/utils.py.

@gmixiaojin gmixiaojin marked this pull request as ready for review January 21, 2026 12:17
Comment thread python/sglang/multimodal_gen/runtime/entrypoints/openai/utils.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/entrypoints/diffusion_generator.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/entrypoints/utils.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/entrypoints/utils.py Outdated
return model


class AdapterLoader(ComponentLoader):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is too large, consider splitting it into a folder component_loader in a follow-up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think we will follow-up after the PR

Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/text_connector.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/denoising_av.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/stages/upsampling.py Outdated
Comment thread python/sglang/multimodal_gen/runtime/pipelines_core/schedule_batch.py Outdated
@mickqian mickqian mentioned this pull request Jan 21, 2026
18 tasks
@yhyang201
Copy link
Copy Markdown
Collaborator

https://github.com/sgl-project/sglang/tree/main/python/sglang/multimodal_gen/runtime/models/model_stages

Could you put the stages that are used only by LTX-2 here first? You could create an ltx_2 folder and move them there.

@gmixiaojin
Copy link
Copy Markdown
Contributor Author

https://github.com/sgl-project/sglang/tree/main/python/sglang/multimodal_gen/runtime/models/model_stages

Could you put the stages that are used only by LTX-2 here first? You could create an ltx_2 folder and move them there.

As we discussed previously, we will refactor the code once the two PRs are merged.

@gmixiaojin
Copy link
Copy Markdown
Contributor Author

lint also fixed @yhyang201 @mickqian

Remove decoding_av.py, denoising_av.py, latent_preparation_av.py,
and text_connector.py from this PR, as they should be in PR2.
@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@mickqian
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@mickqian mickqian added high priority format Auto Format Code labels Jan 23, 2026
Comment thread python/sglang/cli/utils.py Outdated
@mickqian mickqian merged commit 797a981 into sgl-project:main Jan 24, 2026
166 of 173 checks passed
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Co-authored-by: FlamingoPg <1106310035@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion format Auto Format Code high priority run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants