Skip to content

[support] Training QwenX models without the vision block #3891

@LucaTedeschini

Description

@LucaTedeschini

User problem

I recently updated Megatron Bridge to the latest stable version (0.4.1) from 0.3.1, and I noticed that the logical structure for the Qwen models has changed significantly, and my previous scripts no longer work.

My goal is to train a Qwen3.5-like model without the vision component. I noticed that the provider (Qwen35VLMoEModelProvider, inside megatron.bridge.models.qwen_vl) is, by default, a vision-language model, but I also see that there is a method to return a language-only version of the model:

@dataclass
class Qwen35VLMoEModelProvider(GPTModelProvider):
   ...
    def provide(self, pre_process=None, post_process=None, vp_stage=None) -> Qwen3VLModel:
       ...
        return model

    def provide_language_model(self, pre_process=None, post_process=None, vp_stage=None) -> MCoreGPTModel:
        """Provide just the language model component without vision."""
        return GPTModelProvider.provide(self, pre_process=pre_process, post_process=post_process, vp_stage=vp_stage)

So, to force the language-only version, in my script I overrode the methods like this:

from megatron.bridge.models.qwen_vl import Qwen35VLMoEModelProvider
from megatron.bridge.models.gpt_provider import GPTModelProvider

def _provide_language_only(self, pre_process=None, post_process=None, vp_stage=None):
    return GPTModelProvider.provide(
        self,
        pre_process=pre_process,
        post_process=post_process,
        vp_stage=vp_stage,
    )

Qwen35VLMoEModelProvider.provide = _provide_language_only

The training starts, but I was wondering:

  • Is this intended?
  • Is there a way to use provide_language_model without overriding Qwen35VLMoEModelProvider?
  • Will this work when exporting a checkpoint to HF?
  • Do I need to change other parameters (besides the default VL-related ones, such as multimodal RoPE) that are not explicitly written in Qwen35VLMoEModelProvider?

Thank you!

Desired outcome

My goal is to train a qwen3.5-like model without the vision part.

Alternatives or workarounds considered

My workaround is included in the problem description

Affected area

area:model

Urgency / use case

Important but not blocking

Environment

megatron-bridge==0.4.1
megatron-core==0.17.1
transformers==5.2.0

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions