User problem
I recently updated Megatron Bridge to the latest stable version (0.4.1) from 0.3.1, and I noticed that the logical structure for the Qwen models has changed significantly, and my previous scripts no longer work.
My goal is to train a Qwen3.5-like model without the vision component. I noticed that the provider (Qwen35VLMoEModelProvider, inside megatron.bridge.models.qwen_vl) is, by default, a vision-language model, but I also see that there is a method to return a language-only version of the model:
@dataclass
class Qwen35VLMoEModelProvider(GPTModelProvider):
...
def provide(self, pre_process=None, post_process=None, vp_stage=None) -> Qwen3VLModel:
...
return model
def provide_language_model(self, pre_process=None, post_process=None, vp_stage=None) -> MCoreGPTModel:
"""Provide just the language model component without vision."""
return GPTModelProvider.provide(self, pre_process=pre_process, post_process=post_process, vp_stage=vp_stage)
So, to force the language-only version, in my script I overrode the methods like this:
from megatron.bridge.models.qwen_vl import Qwen35VLMoEModelProvider
from megatron.bridge.models.gpt_provider import GPTModelProvider
def _provide_language_only(self, pre_process=None, post_process=None, vp_stage=None):
return GPTModelProvider.provide(
self,
pre_process=pre_process,
post_process=post_process,
vp_stage=vp_stage,
)
Qwen35VLMoEModelProvider.provide = _provide_language_only
The training starts, but I was wondering:
- Is this intended?
- Is there a way to use
provide_language_model without overriding Qwen35VLMoEModelProvider?
- Will this work when exporting a checkpoint to HF?
- Do I need to change other parameters (besides the default VL-related ones, such as multimodal RoPE) that are not explicitly written in
Qwen35VLMoEModelProvider?
Thank you!
Desired outcome
My goal is to train a qwen3.5-like model without the vision part.
Alternatives or workarounds considered
My workaround is included in the problem description
Affected area
area:model
Urgency / use case
Important but not blocking
Environment
megatron-bridge==0.4.1
megatron-core==0.17.1
transformers==5.2.0
User problem
I recently updated Megatron Bridge to the latest stable version (
0.4.1) from0.3.1, and I noticed that the logical structure for the Qwen models has changed significantly, and my previous scripts no longer work.My goal is to train a Qwen3.5-like model without the vision component. I noticed that the provider (
Qwen35VLMoEModelProvider, insidemegatron.bridge.models.qwen_vl) is, by default, a vision-language model, but I also see that there is a method to return a language-only version of the model:So, to force the language-only version, in my script I overrode the methods like this:
The training starts, but I was wondering:
provide_language_modelwithout overridingQwen35VLMoEModelProvider?Qwen35VLMoEModelProvider?Thank you!
Desired outcome
My goal is to train a qwen3.5-like model without the vision part.
Alternatives or workarounds considered
My workaround is included in the problem description
Affected area
area:model
Urgency / use case
Important but not blocking
Environment
megatron-bridge==0.4.1
megatron-core==0.17.1
transformers==5.2.0