[Diffusion][MOVA] Fix task type in MOVA pipeline and shared model placement#19489
[Diffusion][MOVA] Fix task type in MOVA pipeline and shared model placement#19489mickqian merged 2 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the MOVA pipeline by improving resource management for shared models and correcting its operational definition. It ensures that critical denoising modules are properly placed on the device when CPU offloading is active, preventing potential issues. Additionally, it clarifies the pipeline's core function by accurately setting its task type, enhancing overall system correctness and maintainability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several improvements to the MOVA pipeline. It correctly updates the task type to reflect its image-to-video nature, though a more precise type might be TI2V given it also processes text. The device placement logic for shared models during CPU offloading has been refactored and corrected, ensuring both audio_dit and dual_tower_bridge are on the correct device. I've also suggested a minor code cleanup to remove a commented-out parameter for better maintainability. Overall, these are good changes for correctness and robustness.
| causal=False, | ||
| softmax_scale=None, | ||
| is_cross_attention=True, | ||
| # is_cross_attention=True, |
There was a problem hiding this comment.
nit: why don't we remove it directly?
There was a problem hiding this comment.
nit: why don't we remove it directly?
possibly enable in the future, or remove it in next refactor
| """Configuration for MOVA (text+image -> video+audio) pipelines.""" | ||
|
|
||
| task_type: ModelTaskType = ModelTaskType.T2V | ||
| task_type: ModelTaskType = ModelTaskType.I2V |
There was a problem hiding this comment.
does MOVA do T2V as well? in that case, we should opt for TI2V
There was a problem hiding this comment.
does MOVA do T2V as well? in that case, we should opt for
TI2V
It really is necessary to provide an image input now; text alone is not sufficient.
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
Motivation
This pull request introduces several updates to the MOVA pipeline, focusing on device placement management for shared models and correcting the pipeline's task type. The changes improve resource handling when CPU offloading is enabled and clarify the model's intended use.
Modifications
Device placement and resource management improvements:
_ensure_shared_models_on_devicemethod tomova.pyto ensure that shared denoising modules (audio_ditanddual_tower_bridge) are placed on the correct device when CPU offload is enabled.forwardmethod inmova.pyto use_ensure_shared_models_on_deviceinstead of directly managing device placement foraudio_dit.Pipeline configuration correction:
task_typeinMOVAPipelineConfigfromModelTaskType.T2V(text-to-video) toModelTaskType.I2V(image-to-video), clarifying the pipeline's input/output modality.Code clarity and maintenance:
is_cross_attention=Trueparameter in themova_dual_tower.pybridge model initialization.Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci