Update LTX-2 Docs to Cover LTX-2.3 Models by dg845 · Pull Request #13337 · huggingface/diffusers

dg845 · 2026-03-26T06:15:28Z

What does this PR do?

This PR updates the LTX-2 docs to cover multimodal guidance and prompt enhancement, which were added with LTX-2.3 model support in #13217. Additionally, the LTX-2.X official default negative prompt, T2V system prompt, and I2V system prompt have been added to src/diffusers/pipelines/ltx2/utils.py to make it easier to prompt the model for inference.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
@yiyixuxu
@stevhliu

HuggingFaceDocBuilderDev · 2026-03-26T06:24:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2026-03-26T09:39:50Z

+    num_inference_steps=30,
+    guidance_scale=3.0,  # Recommended LTX-2.3 guidance parameters
+    stg_scale=1.0,  # Note that 0.0 (not 1.0) means that STG is disabled (all other guidance is disabled at 1.0)
+    modality_scale=3.0,


What is modality_scale?

i think this refers to the modality isolation guidance?

sayakpaul · 2026-03-26T09:40:52Z

+)
+```
+
+## Prompt Enhancement


Wonder if we could low-key showcase our prompt enhancement custom block powered by Gemini?

The LTX-2.3 model seems to be quite sensitive in terms of sample quality to the input prompt. Since the current GeminiPromptExpander doesn't accept a system_prompt argument to guide the prompt expansion, I think it may not work well with LTX-2.3 because the prompts may still be out of distribution although they are expanded.

stevhliu

very nice, thanks for updating!

stevhliu · 2026-03-26T16:54:08Z

 </div>

-LTX-2 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
+[LTX-2](https://arxiv.org/abs/2601.03233) is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.


Suggested change

[LTX-2](https://arxiv.org/abs/2601.03233) is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

[LTX-2](https://arxiv.org/abs/2601.03233) is a DiT-based foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

stevhliu · 2026-03-26T17:34:11Z

+2. **Spatio-Temporal Guidance (STG)**: [STG](https://arxiv.org/pdf/2411.18664) moves away from a perturbed output created by short-cutting self-attention operations by substituting in the attention values instead. The idea is that this creates sharper videos and better spatiotemporal consistency.
+3. **Modality Isolation Guidance**: this moves away from a perturbed output created by disabling cross-modality (audio-to-video and video-to-audio) cross attention. This guidance is more specific to [LTX-2.X](https://arxiv.org/pdf/2601.03233) models, with the idea that this produces better consistency between the generated audio and video.
+
+These are controlled by the `guidance_scale`, `stg_scale`, and `modality_scale` arguments, respectively, and can be set separately for video and audio. Additionally, for STG, the transformer block indices where self-attention is skipped needs to be specified via the `spatio_temporal_guidance_blocks` argument. In addition, the LTX-2.X pipelines also support [guidance rescaling](https://arxiv.org/abs/2305.08891) to help reduce over-exposure, which can be a problem when the guidance scales are set to high values.


Suggested change

These are controlled by the `guidance_scale`, `stg_scale`, and `modality_scale` arguments, respectively, and can be set separately for video and audio. Additionally, for STG, the transformer block indices where self-attention is skipped needs to be specified via the `spatio_temporal_guidance_blocks` argument. In addition, the LTX-2.X pipelines also support [guidance rescaling](https://arxiv.org/abs/2305.08891) to help reduce over-exposure, which can be a problem when the guidance scales are set to high values.

These are controlled by the `guidance_scale`, `stg_scale`, and `modality_scale` arguments, and can be set separately for video and audio. Additionally, for STG, the transformer block indices where self-attention is skipped needs to be specified via the `spatio_temporal_guidance_blocks` argument. In addition, the LTX-2.X pipelines also support [guidance rescaling](https://huggingfaec.co/papers2305.08891) to help reduce over-exposure, which can be a problem when the guidance scales are set to high values.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

dg845 · 2026-03-27T00:51:22Z

Merging as the CI failures are unrelated.

* Update LTX-2 docs to cover multimodal guidance and prompt enhancement * Apply suggestions from code review Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply reviewer feedback --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Update LTX-2 docs to cover multimodal guidance and prompt enhancement

f87c539

dg845 requested a review from sayakpaul March 26, 2026 06:27

Merge branch 'main' into ltx2-3-update-doc-examples

ccbcd7b

sayakpaul approved these changes Mar 26, 2026

View reviewed changes

stevhliu approved these changes Mar 26, 2026

View reviewed changes

dg845 and others added 2 commits March 26, 2026 17:18

Apply suggestions from code review

5f288d5

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Apply reviewer feedback

6b2bc07

dg845 merged commit 7298f5b into main Mar 27, 2026
10 of 12 checks passed

dg845 deleted the ltx2-3-update-doc-examples branch March 27, 2026 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LTX-2 Docs to Cover LTX-2.3 Models#13337

Update LTX-2 Docs to Cover LTX-2.3 Models#13337
dg845 merged 4 commits intomainfrom
ltx2-3-update-doc-examples

dg845 commented Mar 26, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 26, 2026

Uh oh!

Uh oh!

sayakpaul Mar 26, 2026

Uh oh!

stevhliu Mar 26, 2026

Uh oh!

sayakpaul Mar 26, 2026

Uh oh!

dg845 Mar 27, 2026

Uh oh!

stevhliu left a comment

Uh oh!

stevhliu Mar 26, 2026

Uh oh!

Uh oh!

Uh oh!

stevhliu Mar 26, 2026

Uh oh!

Uh oh!

dg845 commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	[LTX-2](https://arxiv.org/abs/2601.03233) is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.
	[LTX-2](https://arxiv.org/abs/2601.03233) is a DiT-based foundation model designed to generate synchronized video and audio within a single model. It brings together the core building blocks of modern video generation, with open weights and a focus on practical, local execution.

Conversation

dg845 commented Mar 26, 2026

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 26, 2026

Uh oh!

Uh oh!

sayakpaul Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

stevhliu Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

dg845 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stevhliu Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dg845 commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants