[diffusion] fix LTX2 resident defaults and stage profiling#25596
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces logic to automatically keep auxiliary components (text encoder, image encoder, and VAE) resident in GPU memory when running LTX-2.3 in two-stage 'resident' mode on high-memory CUDA devices. It also updates the pipeline stage profiling to use registered stage names and adds comprehensive unit tests for these changes. Feedback was provided to unify the high-memory detection logic by including a specific check for H200 devices, ensuring consistency with existing device-specific configurations.
| def _uses_ltx23_high_memory_resident_two_stage_mode(self) -> bool: | ||
| if ( | ||
| self.ltx2_two_stage_device_mode != "resident" | ||
| or not self._is_ltx23_two_stage_pipeline() | ||
| or not current_platform.is_cuda() | ||
| ): | ||
| return False | ||
| return ( | ||
| current_platform.get_device_total_memory() / BYTES_PER_GB | ||
| >= LTX2_RESIDENT_AUTO_ENABLE_MEM_GB | ||
| ) |
There was a problem hiding this comment.
The high-memory check in _uses_ltx23_high_memory_resident_two_stage_mode is inconsistent with the logic used in _resolve_default_ltx2_two_stage_device_mode (lines 478-481). Specifically, it is missing the check for the H200 device name, which is also considered a high-memory platform regardless of the exact reported memory value. Unifying this logic ensures that auxiliary components are correctly kept resident on all high-memory platforms.
def _uses_ltx23_high_memory_resident_two_stage_mode(self) -> bool:
if (
self.ltx2_two_stage_device_mode != "resident"
or not self._is_ltx23_two_stage_pipeline()
or not current_platform.is_cuda()
):
return False
device_name = str(current_platform.get_device_name(0)).upper()
device_total_memory_gb = (
current_platform.get_device_total_memory() / BYTES_PER_GB
)
return (
"H200" in device_name
or device_total_memory_gb >= LTX2_RESIDENT_AUTO_ENABLE_MEM_GB
)|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
What changed
residentmode.--layerwise-offload-componentsand explicit component offload args.Why
High-memory
residentmode keeps both LTX2 DiTs on GPU. The previous auto defaults could still apply non-DiT layerwise offload to text/image encoders or VAE, so the mode was not fully resident for unset auxiliary placement.The profiler also used only the Python class name, so repeated stage classes such as the two LTX2 LoRA switch stages collapsed into one
LTX2LoRASwitchStagemetric.Validation
CI States
Latest PR Test (Base): ⏳ Run #26044398041⚠️ Not enabled -- add
Latest PR Test (Extra):
run-ci-extralabel to opt in.