Skip to content

[diffusion] fix LTX2 resident defaults and stage profiling#25596

Merged
mickqian merged 4 commits into
sgl-project:mainfrom
mickqian:codex/ltx2-resident-profiler-fix-20260518
May 19, 2026
Merged

[diffusion] fix LTX2 resident defaults and stage profiling#25596
mickqian merged 4 commits into
sgl-project:mainfrom
mickqian:codex/ltx2-resident-profiler-fix-20260518

Conversation

@mickqian

@mickqian mickqian commented May 18, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • Keep unset auxiliary components resident for high-memory LTX-2.3 two-stage resident mode.
  • Preserve explicit --layerwise-offload-components and explicit component offload args.
  • Record pipeline stage metrics with the registered stage name when a stage has one, so duplicate stage classes no longer overwrite each other in perf logs.

Why

High-memory resident mode keeps both LTX2 DiTs on GPU. The previous auto defaults could still apply non-DiT layerwise offload to text/image encoders or VAE, so the mode was not fully resident for unset auxiliary placement.

The profiler also used only the Python class name, so repeated stage classes such as the two LTX2 LoRA switch stages collapsed into one LTX2LoRASwitchStage metric.

Validation

  • Added unit coverage for high-memory LTX-2.3 resident defaults, original-mode default layerwise behavior, explicit layerwise preservation, and registered stage-name profiling.
  • Not run locally per sglang-diffusion development policy; CI should validate.

CI States

Latest PR Test (Base): ⏳ Run #26044398041
Latest PR Test (Extra): ⚠️ Not enabled -- add run-ci-extra label to opt in.

@github-actions github-actions Bot added the diffusion SGLang Diffusion label May 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to automatically keep auxiliary components (text encoder, image encoder, and VAE) resident in GPU memory when running LTX-2.3 in two-stage 'resident' mode on high-memory CUDA devices. It also updates the pipeline stage profiling to use registered stage names and adds comprehensive unit tests for these changes. Feedback was provided to unify the high-memory detection logic by including a specific check for H200 devices, ensuring consistency with existing device-specific configurations.

Comment on lines +508 to +518
def _uses_ltx23_high_memory_resident_two_stage_mode(self) -> bool:
if (
self.ltx2_two_stage_device_mode != "resident"
or not self._is_ltx23_two_stage_pipeline()
or not current_platform.is_cuda()
):
return False
return (
current_platform.get_device_total_memory() / BYTES_PER_GB
>= LTX2_RESIDENT_AUTO_ENABLE_MEM_GB
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The high-memory check in _uses_ltx23_high_memory_resident_two_stage_mode is inconsistent with the logic used in _resolve_default_ltx2_two_stage_device_mode (lines 478-481). Specifically, it is missing the check for the H200 device name, which is also considered a high-memory platform regardless of the exact reported memory value. Unifying this logic ensures that auxiliary components are correctly kept resident on all high-memory platforms.

    def _uses_ltx23_high_memory_resident_two_stage_mode(self) -> bool:
        if (
            self.ltx2_two_stage_device_mode != "resident"
            or not self._is_ltx23_two_stage_pipeline()
            or not current_platform.is_cuda()
        ):
            return False

        device_name = str(current_platform.get_device_name(0)).upper()
        device_total_memory_gb = (
            current_platform.get_device_total_memory() / BYTES_PER_GB
        )
        return (
            "H200" in device_name
            or device_total_memory_gb >= LTX2_RESIDENT_AUTO_ENABLE_MEM_GB
        )

@mickqian mickqian marked this pull request as ready for review May 18, 2026 12:03
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@mickqian

Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@mickqian mickqian merged commit a7b3ced into sgl-project:main May 19, 2026
93 of 99 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant