[support] Training QwenX models without the vision block

### User problem

I recently updated Megatron Bridge to the latest stable version (`0.4.1`) from `0.3.1`, and I noticed that the logical structure for the Qwen models has changed significantly, and my previous scripts no longer work.

My goal is to train a Qwen3.5-like model without the vision component. I noticed that the provider (`Qwen35VLMoEModelProvider`, inside `megatron.bridge.models.qwen_vl`) is, by default, a vision-language model, but I also see that there is a method to return a language-only version of the model:

```python
@dataclass
class Qwen35VLMoEModelProvider(GPTModelProvider):
   ...
    def provide(self, pre_process=None, post_process=None, vp_stage=None) -> Qwen3VLModel:
       ...
        return model

    def provide_language_model(self, pre_process=None, post_process=None, vp_stage=None) -> MCoreGPTModel:
        """Provide just the language model component without vision."""
        return GPTModelProvider.provide(self, pre_process=pre_process, post_process=post_process, vp_stage=vp_stage)
```

So, to force the language-only version, in my script I overrode the methods like this:

```python
from megatron.bridge.models.qwen_vl import Qwen35VLMoEModelProvider
from megatron.bridge.models.gpt_provider import GPTModelProvider

def _provide_language_only(self, pre_process=None, post_process=None, vp_stage=None):
    return GPTModelProvider.provide(
        self,
        pre_process=pre_process,
        post_process=post_process,
        vp_stage=vp_stage,
    )

Qwen35VLMoEModelProvider.provide = _provide_language_only
```

The training starts, but I was wondering:

* Is this intended?
* Is there a way to use `provide_language_model` without overriding `Qwen35VLMoEModelProvider`?
* Will this work when exporting a checkpoint to HF?
* Do I need to change other parameters (besides the default VL-related ones, such as multimodal RoPE) that are not explicitly written in `Qwen35VLMoEModelProvider`?

Thank you!


### Desired outcome

My goal is to train a qwen3.5-like model without the vision part.

### Alternatives or workarounds considered

My workaround is included in the problem description

### Affected area

area:model

### Urgency / use case

Important but not blocking

### Environment

megatron-bridge==0.4.1
megatron-core==0.17.1
transformers==5.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[support] Training QwenX models without the vision block #3891

User problem

Desired outcome

Alternatives or workarounds considered

Affected area

Urgency / use case

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[support] Training QwenX models without the vision block #3891

Description

User problem

Desired outcome

Alternatives or workarounds considered

Affected area

Urgency / use case

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions