[diffusion] model: support LTX2.3 two stage#22182
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements support for the LTX-2.3 model, including the two-stage pipeline, updated sampling parameters for resolution scaling, and model overlay materialization. Changes also include refinements to the denoising and latent preparation stages, along with new alignment scripts and unit tests. Review feedback identifies opportunities to improve artifact resolution by prioritizing newer versions (22b and 1.1) and suggests adding a safety check for zero-division when calculating the cross-attention gate factor.
| LTX23_DEV_CHECKPOINT_FILENAMES = ( | ||
| "ltx-2.3-20b-dev.safetensors", | ||
| "ltx-2.3-22b-dev.safetensors", | ||
| ) | ||
| LTX23_DISTILLED_LORA_FILENAMES = ( | ||
| "ltx-2.3-20b-distilled-lora-384.safetensors", | ||
| "ltx-2.3-22b-distilled-lora-384.safetensors", | ||
| ) | ||
| LTX23_SPATIAL_UPSAMPLER_FILENAMES = ( | ||
| "ltx-2.3-spatial-upscaler-x2-1.0.safetensors", | ||
| "ltx-2.3-spatial-upscaler-x2-1.1.safetensors", | ||
| ) |
There was a problem hiding this comment.
The resolution order for LTX-2.3 artifacts currently prioritizes older or smaller versions (e.g., 20b over 22b, 1.0 over 1.1). This is inconsistent with the overlay_manifest.json which specifies the 22b and 1.1 versions. Prioritizing the newer/larger versions ensures that the materializer picks the intended official artifacts when multiple versions are present in the source directory.
| av_ca_gate_factor = ( | ||
| self.av_ca_timestep_scale_multiplier / self.timestep_scale_multiplier | ||
| ) |
There was a problem hiding this comment.
Calculating av_ca_gate_factor by dividing by self.timestep_scale_multiplier without a zero-check poses a risk of ZeroDivisionError. Although this multiplier is typically non-zero (e.g., 1000), it is safer to handle the zero case or pre-calculate this factor during initialization.
av_ca_gate_factor = (
self.av_ca_timestep_scale_multiplier / self.timestep_scale_multiplier
if self.timestep_scale_multiplier != 0
else 1.0
)| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.0.safetensors"), | ||
| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.1.safetensors"), |
There was a problem hiding this comment.
The resolution order for spatial upsampler candidates should prioritize version 1.1 over 1.0, as 1.1 is the version listed in the overlay_manifest.json and is generally preferred for LTX-2.3.
| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.0.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.1.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.1.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-spatial-upscaler-x2-1.0.safetensors"), |
| os.path.join(model_path, "ltx-2.3-20b-distilled-lora-384.safetensors"), | ||
| os.path.join(model_path, "ltx-2.3-22b-distilled-lora-384.safetensors"), |
There was a problem hiding this comment.
The resolution order for distilled LoRA candidates should prioritize the 22b version over 20b to align with the official LTX-2.3 artifacts specified in the manifest.
| os.path.join(model_path, "ltx-2.3-20b-distilled-lora-384.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-22b-distilled-lora-384.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-22b-distilled-lora-384.safetensors"), | |
| os.path.join(model_path, "ltx-2.3-20b-distilled-lora-384.safetensors"), |
|
/tag-and-rerun-ci |
This reverts commit c192a31.
85a356b to
b5553e1
Compare
|
/rerun-failed-ci |
|
/rerun-failed-ci |
3 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
|
/rerun-failed-ci |
sglang:
sglang_two_stage.mp4
diffusers:
official_two_stage.mp4
difference:
sglang (sp):
sglang_two_stage.mp4
Motivation
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci