[QUESTION] Support for Heterogeneous Parallelism in Multimodal Training

I have been using MegatronLM to train multimodal models and successfully followed the example under examples/multimodal. However, for efficient training, multimodal models often require different parallelism strategies for each component, as vision models are typically smaller than the LLM in such setups.

**Does MegatronLM support heterogeneous parallelism strategies**, where different models within a multimodal system can use distinct parallelization techniques? If not, are there any recommended workarounds?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Support for Heterogeneous Parallelism in Multimodal Training #1375

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[QUESTION] Support for Heterogeneous Parallelism in Multimodal Training #1375

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions