[model] Add initial Qwen3-Omni support in Megatron-Bridge by hbhflw2000 · Pull Request #2831 · NVIDIA-NeMo/Megatron-Bridge

hbhflw2000 · 2026-03-17T09:25:51Z

What does this PR do?

This PR adds initial Qwen3-Omni support to Megatron Bridge, including:

Hugging Face ↔ Megatron checkpoint conversion
Thinker-side multimodal runtime support for text, image, video, and audio
Unit and functional test coverage
Model documentation

The current scope focuses on checkpoint conversion and thinker-side multimodal runtime enablement.

Summary

New Features

Added Qwen3-Omni model support to Megatron Bridge.
Implemented Hugging Face ↔ Megatron conversion for Qwen3-Omni checkpoints.
Added thinker-side multimodal runtime support covering:
- text
- image
- video
- audio
Reused the existing Qwen3-VL language and vision path where applicable, and integrated the Hugging Face Qwen3-Omni vision/audio towers.
Added multimodal RoPE handling and Megatron runtime compatibility fixes for Qwen3-Omni.
Fixed mRoPE config propagation by setting position_embedding_type="mrope" in the provider/bridge path.
Registered Qwen3-Omni in Bridge model exports and AutoBridge support.

Implementation

Added the following components under src/megatron/bridge/models/qwen_omni/:

provider
bridge
transformer config
model wrapper
thinker runtime

Tests

Added unit tests for:

provider
bridge
model

Added functional tests for:

smoke checkpoint creation
single-GPU HF → Megatron → HF roundtrip conversion
Hugging Face multimodal end-to-end smoke
Megatron multimodal end-to-end smoke

Documentation

Added Qwen3-Omni model documentation under docs/models/vlm/qwen3-omni.md.
Updated docs and test utilities to be more upstream-friendly:
- removed copied upstream README.md from smoke artifacts
- allowed local asset paths to be overridden via environment variables
- replaced development-plan style docs with model support documentation

Validation

Local test runs

PYTHONPATH=src pytest tests/unit_tests/models/qwen_omni/test_qwen3_omni_provider.py \
  tests/unit_tests/models/qwen_omni/test_qwen3_omni_bridge.py \
  tests/unit_tests/models/qwen_omni/modeling_qwen3_omni/test_omni_model.py -q

PYTHONPATH=src pytest tests/functional_tests/models/qwen_omni -q

Observed results

Targeted unit tests: 11/11 passed
Qwen3-Omni functional suite: 4/4 passed

Current Scope

This PR currently covers:

checkpoint conversion
thinker-side multimodal runtime support
basic single-rank functional validation

Not included in this PR

The following are out of scope for this initial support PR:

Megatron inference with inference_params
packed sequence support
vision runtime with sequence parallel
distributed parallelism validation beyond the current single-rank functional coverage

Notes

Functional smoke tests use user-provided local assets.
Asset, model, and cache paths can be overridden through environment variables.

CI

See the CI section in the contributing documentation for how to trigger CI.

For external contributors, an NVIDIA developer must approve and trigger the CI workflow.

Pre-review Checklist

Read and followed the contributor guidelines
Added necessary tests
Added or updated necessary documentation
Does this PR affect optional-install components? (for example: Numba, Pynini, Apex)
Reviewer: verify correct import guards for optional libraries

If some items are still in progress, this PR can remain in Draft until ready.

Summary by CodeRabbit

Release Notes

New Features
- Added Qwen3-Omni multimodal model support with integrated audio, vision, and text processing capabilities.
- Implemented model-to-Megatron conversion bridge and provider for Qwen3-Omni.
- Added multimodal rotary position embedding support for accurate token positioning across modalities.
Documentation
- Added comprehensive Qwen3-Omni model documentation covering architecture, implementation details, and supported workflows.
Tests
- Added functional and unit tests for Qwen3-Omni model validation and conversion.

Signed-off-by: Lianglipeng <lianglipeng@didiglobal.com> Signed-off-by: hbhflw2000 <417911774@qq.com>

Signed-off-by: hbhflw2000 <417911774@qq.com>

copy-pr-bot · 2026-03-17T09:25:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-17T09:44:14Z

📝 Walkthrough

Walkthrough

This pull request adds comprehensive support for the Qwen3-Omni multimodal model to the Megatron-Bridge framework. It introduces model wrappers, bridge/provider implementations, multimodal RoPE position indexing logic, transformer configurations, and extensive test coverage for vision and audio modalities alongside existing language capabilities.

Changes

Cohort / File(s)	Summary
Documentation `docs/models/vlm/index.md`, `docs/models/vlm/qwen3-omni.md`	Added Qwen3-Omni documentation entry and new model documentation file describing architecture, checkpoint mappings, supported workflows, and validation coverage.
Core Model Integration `src/megatron/bridge/models/__init__.py`	Exported three new public entities (Qwen3OmniBridge, Qwen3OmniModel, Qwen3OmniModelProvider) for Qwen3-Omni support under the Omni Models section.
Qwen3-Omni Model Implementations `src/megatron/bridge/models/qwen_omni/__init__.py`, `src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/__init__.py`, `src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/model.py`	Introduced Qwen3OmniModel wrapper class delegating to Qwen3OmniThinkerModel with support for language, thinker, talker, and code2wav components; package-level exports for model access.
Transformer Configuration & RoPE `src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/transformer_config.py`, `src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/rope.py`	Added Qwen3OmniTransformerConfig dataclass with multimodal token IDs and audio/vision parameters; implemented complex multimodal RoPE position indexing logic supporting image, video, and audio modalities with grid-based spatial-temporal positioning.
Multimodal Thinker Model `src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/thinker_model.py`	Introduced Qwen3OmniThinkerModel orchestrating text, vision, and audio processing with multimodal embedding fusion, feature alignment, and MRope-based position encoding; supports selective freezing of sub-components.
Bridge & Provider `src/megatron/bridge/models/qwen_omni/qwen3_omni_bridge.py`, `src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py`	Implemented Qwen3OmniBridge connecting HF Qwen3OmniMoeForConditionalGeneration to Megatron format with parameter mapping registry; added Qwen3OmniModelProvider dataclass extending MoE provider with multimodal configuration knobs and model instantiation logic.
Qwen Provider Fallbacks `src/megatron/bridge/models/qwen/qwen_provider.py`, `src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py`, `src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py`	Replaced direct imports of experimental attention variant specs with guarded try/except blocks and fallback functions raising NotImplementedError; added get_rotary_seq_len helper for inference context compatibility.
Test Infrastructure `tests/unit_tests/models/qwen_omni/conftest.py`, `tests/functional_tests/models/qwen_omni/__init__.py`	Added pytest configuration overriding global cache-clearing fixture and test module initialization.
Functional Tests `tests/functional_tests/models/qwen_omni/test_qwen3_omni_conversion.py`, `tests/functional_tests/models/qwen_omni/test_qwen3_omni_smoke.py`, `tests/functional_tests/models/qwen_omni/utils.py`	Implemented end-to-end conversion and smoke tests validating HF and Megatron model compatibility; added utility functions for creating minimal smoke models and preparing real sample inputs with vision/audio data.
Unit Tests `tests/unit_tests/models/qwen_omni/modeling_qwen3_omni/test_omni_model.py`, `tests/unit_tests/models/qwen_omni/test_qwen3_omni_bridge.py`, `tests/unit_tests/models/qwen_omni/test_qwen3_omni_provider.py`	Added comprehensive unit tests for Qwen3OmniModel multimodal forward paths (text, image, audio), bridge configuration and dtype handling, and provider initialization with custom thinker configurations.

Sequence Diagram(s)

sequenceDiagram
    participant Input as Input Data
    participant TM as Qwen3OmniThinkerModel
    participant VE as Vision Encoder
    participant AE as Audio Encoder
    participant LM as Language Model
    participant Output as Output

    Input->>TM: forward(input_ids, pixel_values, input_features, ...)
    
    rect rgba(100, 150, 200, 0.5)
    Note over TM,AE: Feature Extraction Phase
    
    alt has vision features
        TM->>VE: get_image_features(pixel_values, image_grid_thw)
        VE-->>TM: image_embeddings
        TM->>VE: get_video_features(pixel_values_videos, video_grid_thw)
        VE-->>TM: video_embeddings
    end
    
    alt has audio features
        TM->>AE: get_audio_features(input_features, feature_attention_mask)
        AE-->>TM: audio_embeddings
    end
    end
    
    rect rgba(150, 100, 200, 0.5)
    Note over TM,LM: Embedding Fusion & Position Encoding Phase
    
    TM->>TM: compute position_ids via get_rope_index()
    TM->>TM: build inputs_embeds from input_ids
    TM->>TM: merge image/video/audio embeddings
    TM->>TM: apply attention masks and spatial merging
    end
    
    rect rgba(100, 200, 150, 0.5)
    Note over TM,LM: Language Model Forward Phase
    
    TM->>LM: forward(decoder_input, position_ids, attention_mask, ...)
    LM-->>TM: logits
    end
    
    TM-->>Output: logits

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The review demands careful attention across multiple heterogeneous components: multimodal forward flow logic in the thinker model, intricate RoPE position indexing for image/video/audio grid-based positioning, bridge parameter mapping registries, provider configuration propagation, and comprehensive test coverage spanning unit and functional domains. While individual files follow consistent patterns, the cross-file dependencies and complex multimodal fusion semantics require substantive reasoning per component area.

Possibly related PRs

#2634 (Qwen2.5-Omni multimodal bridge/provider): Introduces analogous "Omni" model family structure for a different Qwen variant with similar multimodal architecture, bridge/provider/rope/config modules, and test patterns.

Suggested labels

community-request

Suggested reviewers

yaoyu-33
ko3n1g
chtruong814

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.28% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	PR introduces major Qwen3-Omni multimodal support (~1000 lines) but no concrete test execution results, logs, or validation outputs are documented despite PR objectives stating they should be provided.	Add test execution results to PR description: unit/functional test outputs, checkpoint conversion results, end-to-end smoke test results with multimodal inputs, and verification that review comment code issues are fixed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title '[model] Add initial Qwen3-Omni support in Megatron-Bridge' accurately summarizes the main change: introducing Qwen3-Omni model support to the Megatron-Bridge framework with provider, bridge, config, and runtime components.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

🧹 Nitpick comments (1)

src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/model.py (1)
33-107: Add Google-style docstrings to the exported wrapper API.

Qwen3OmniModel is a new public entrypoint, but the class and its delegated methods do not currently describe Args, Returns, or Raises in a Sphinx-parseable way.

As per coding guidelines, "Use Google style docstrings for classes and functions (parseable by Sphinx), including Args, Returns, and Raises sections."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/model.py` around
lines 33 - 107, The Qwen3OmniModel class and its public methods (__init__,
shared_embedding_or_output_weight, set_input_tensor, freeze, forward) lack
Google-style docstrings; add Sphinx-parseable docstrings to the class and each
public method describing Args, Returns, and Raises (where applicable),
documenting parameter types/semantics (e.g., input_ids: torch.Tensor,
position_ids: torch.Tensor | None, inference_params: InferenceParams | None),
the return value of forward (torch.Tensor), and any exceptions set_input_tensor
or freeze may raise; ensure the class-level docstring explains that this is a
wrapper delegating to Qwen3OmniThinkerModel and notes which methods delegate to
thinker so generated docs are clear.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/models/vlm/qwen3-omni.md`:
- Around line 7-42: Add a new "Usage" subsection under the top of qwen3-omni.md
showing a minimal HF↔Megatron workflow for Qwen3-Omni-30B-A3B-Instruct (one-line
commands for Hugging Face -> Megatron bridge conversion and Megatron -> Hugging
Face export and a short example of running the single-GPU smoke validation), and
add a "Known limitations" subsection that explicitly lists unsupported
runtime/validation paths mentioned in the PR: inference_params, packed
sequences, sequence-parallel vision, and multi-rank/distributed validation;
ensure both headings are concise, call out the model name
Qwen3-Omni-30B-A3B-Instruct, and reference the bridge/export/validation steps
already described so readers know where to find full commands.

In `@src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/rope.py`:
- Around line 83-87: The current logic computing video_nums and detecting
subsequent vision blocks misses cases where vision blocks are represented by
audio_start_token_id (use_audio_in_video=True), causing ed_vision_start to skip
the combined branch; update the detection to treat audio-start tokens as valid
vision-block markers: when computing video_nums (and where code checks
input_tokens for presence of image_token_id/video_token_id) include
audio_start_token_id as an alternative marker, and when scanning for
ed_vision_start or looping over vision blocks (references: video_nums,
vision_tokens, use_audio_in_video, audio_start_token_id, video_token_id,
image_token_id, input_tokens, ed_vision_start, vision_start_token_id) ensure the
loop/condition tests for any of audio_start_token_id OR video_token_id OR
image_token_id so audio-in-video blocks are not skipped.
- Around line 64-65: The current multimodal branch only handles image/video and
skips pure-audio, causing the cumsum fallback to assert on missing
attention_mask; update the conditional logic in the block that sets
total_input_ids (and the similar block around the 215-220 region) to also detect
audio-only inputs (e.g., presence of audio grid/length variables used in this
module) and perform the audio-length expansion of total_input_ids/positions
before falling back to the plain cumsum path; ensure the fallback that asserts
on attention_mask runs only after all modality-specific expansions (image,
video, audio) are attempted so pure-audio requests produce sequential positions
instead of hitting the assert.

In `@src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/thinker_model.py`:
- Around line 315-323: The zip over deepstack_visual_embeds and
video_embeds_multiscale should fail fast on unequal lengths: update the loop
that builds joint_embeds (the for image_embed, video_embed in
zip(deepstack_visual_embeds, video_embeds_multiscale) block) to call zip with
strict=True (i.e., zip(deepstack_visual_embeds, video_embeds_multiscale,
strict=True)) so a mismatch raises immediately instead of silently truncating;
keep the rest of the logic (image_mask_joint/video_mask_joint, embed_joint
creation, and appending to joint_embeds) unchanged.
- Around line 243-261: The code currently only calls get_rope_index when visual
inputs exist, causing audio-only batches to use
_build_text_only_mrope_position_ids and ignore
position_id_per_seconds/seconds_per_chunk; update the conditional that sets
position_ids in the block that references pixel_values and pixel_values_videos
so it also detects audio inputs (e.g., audio_feature_lengths or audio-related
tensors like audio_feature_lengths/audio_seqlens) and calls get_rope_index when
audio is present; modify the branch around position_ids assignment (the logic
using _build_text_only_mrope_position_ids, get_rope_index, image_grid_thw,
video_grid_thw, audio_feature_lengths, position_id_per_seconds,
seconds_per_chunk) to ensure audio-only batches route through get_rope_index so
temporal rope indices for audio are applied.

In `@src/megatron/bridge/models/qwen_omni/qwen3_omni_bridge.py`:
- Around line 40-45: The provider_bridge function currently accepts
talker_config and code2wav_config but only maps thinker-side weights, which can
silently drop audio-output stacks for checkpoints like
Qwen3OmniMoeForConditionalGeneration; update provider_bridge (the method that
inspects hf_pretrained.config, thinker_config, talker_config, code2wav_config,
and text_config) to detect when talker_config or code2wav_config indicate audio
output (e.g., an enable_audio_output flag or equivalent) and immediately raise a
clear exception identifying that full-Omni audio-output checkpoints are not
supported, instead of proceeding with partial conversion.
- Around line 51-86: The provider instantiation ignores HF config fields used
later at runtime—populate language_max_sequence_length (used by Qwen3VLGPTModel
in Qwen3OmniModelProvider.provide) and spatial_merge_size (used by
Qwen3OmniThinkerModel.forward via self.config.spatial_merge_size) from the
HF/text or thinker configs when creating Qwen3OmniModelProvider; update the
Qwen3OmniModelProvider call site to pass
language_max_sequence_length=getattr(text_config,
"language_max_sequence_length", text_config.max_position_embeddings) and
spatial_merge_size=getattr(thinker_config, "spatial_merge_size",
<reasonable-default>) so the runtime config matches the checkpoint’s actual
context window and vision merge geometry.

In `@src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py`:
- Around line 67-93: The provide method forwards
pre_process/post_process/vp_stage as None instead of deriving the provider's
pipeline-stage defaults, causing Qwen3OmniModel/Qwen3OmniThinkerModel to skip
tower construction; fix by resolving those defaults before constructing
Qwen3OmniModel: compute resolved_pre_process and resolved_post_process (derive
from vp_stage or the provider's pipeline-stage default logic used elsewhere) so
they are explicit booleans (not None), and pass those resolved values into the
Qwen3OmniModel constructor (keep the same call to model.freeze as-is).

In `@src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py`:
- Around line 63-71: The except block currently only catches
ModuleNotFoundError, which misses ImportError when the module exists but the
symbol is absent; update the exception handler for the import of
get_transformer_block_with_experimental_attention_variant_spec to catch both
ModuleNotFoundError and ImportError (i.e., except (ModuleNotFoundError,
ImportError):) and keep the fallback function
get_transformer_block_with_experimental_attention_variant_spec that raises
NotImplementedError unchanged.

In `@src/megatron/bridge/models/qwen/qwen_provider.py`:
- Around line 25-34: The compatibility guard only catches ModuleNotFoundError so
an ImportError (module present but missing the symbol) still escapes; update the
except clause around the import of
get_transformer_block_with_experimental_attention_variant_spec from
megatron.core.models.gpt.experimental_attention_variant_module_specs to catch
both ModuleNotFoundError and ImportError, preserving the fallback definition of
get_transformer_block_with_experimental_attention_variant_spec that raises
NotImplementedError.

In `@tests/functional_tests/models/qwen_omni/test_qwen3_omni_conversion.py`:
- Around line 96-100: The subprocess invocation using subprocess.run(cmd,
capture_output=True, text=True, cwd=repo_root, env=env) can hang; add a timeout
(e.g., timeout=300) to the subprocess.run call and wrap the call in a try/except
that catches subprocess.TimeoutExpired and other non-zero return cases, printing
captured stdout/stderr and calling pytest.fail(...) (instead of assert False)
with a clear message on timeout or non-zero returncode; reference the
subprocess.run invocation and the variables cmd, result, repo_root, env in your
changes.

In `@tests/functional_tests/models/qwen_omni/test_qwen3_omni_smoke.py`:
- Around line 48-79: The test currently mutates process-global state in
_init_dist and _init_model_parallel (calls to dist.init_process_group,
parallel_state.initialize_model_parallel and setting os.environ MASTER_PORT), so
move the Megatron smoke path into a separate subprocess: create a small helper
entrypoint (or script) that performs the logic in _init_dist and
_init_model_parallel, use subprocess.run (or multiprocessing.Process with spawn)
to execute that entrypoint from the pytest worker, pass any needed env via the
subprocess env (use a dynamically chosen free port instead of a fixed
MASTER_PORT), assert the subprocess exit code and capture stdout/stderr for
failures, and remove direct calls to dist.init_process_group /
parallel_state.initialize_model_parallel from the main test process to avoid
polluting later tests.

In `@tests/functional_tests/models/qwen_omni/utils.py`:
- Around line 72-110: The function create_qwen3_omni_smoke_model has a TOCTOU
race around the exists/rmtree/mkdir sequence; serialize model creation using the
existing SMOKE_LOCK_DIR to prevent two workers from simultaneously
building/deleting the same output_dir. Wrap the cache-check and build logic in
an exclusive lock (create or acquire a lock file inside SMOKE_LOCK_DIR or use a
FileLock), so only one process builds: check if output exists after acquiring
the lock, if not build into a temporary directory (e.g.,
output_dir.with_suffix(".tmp") or ·output_dir + ".tmp"·), write config and
weights there, then atomically rename/replace the temp dir to output_dir, and
finally release the lock; ensure the lock acquisition/release is in try/finally
and do not remove output_dir if it already exists after reacquiring the lock.
Use the symbols create_qwen3_omni_smoke_model and SMOKE_LOCK_DIR to locate where
to add the lock and temp-dir logic.

In `@tests/unit_tests/models/qwen_omni/conftest.py`:
- Around line 18-26: The current autouse fixture clear_lru_cache in
tests/unit_tests/models/qwen_omni/conftest.py unconditionally becomes a no-op;
change it to attempt to import or call the repository-wide clear_lru_cache
behavior and only skip if that import fails — otherwise retain the
clear-before/clear-after semantics around yield so tests that call
read_run_config or read_train_state still get a cleared LRU cache; implement
this by importing the upstream fixture or its clear function inside the conftest
fixture (referencing clear_lru_cache) and delegating to it (or wrapping its
setup/teardown) when available, falling back to the current yield-only noop only
on ImportError or AttributeError.

In `@tests/unit_tests/models/qwen_omni/modeling_qwen3_omni/test_omni_model.py`:
- Around line 98-115: The setup_class currently hardcodes MASTER_PORT="29501",
causing port conflicts for concurrent test runs; modify setup_class to pick an
available ephemeral port (e.g., bind to port 0 on a temporary socket to discover
a free port) and set os.environ["MASTER_PORT"] to that value before calling
dist.init_process_group (keep referencing setup_class and
dist.init_process_group), and implement a matching teardown_class that restores
any previous MASTER_ADDR/MASTER_PORT/RANK/LOCAL_RANK/WORLD_SIZE environment
variables (or removes the ones you set) and calls dist.destroy_process_group if
initialized to clean up the session; ensure you preserve any pre-existing env
values so teardown_class can restore them.
- Around line 175-182: The tests build a Transformer Engine layer spec
unconditionally via _make_layer_spec using
get_gpt_layer_with_transformer_engine_spec (which requires CUDA), causing
CPU-only runs to fail; fix by either (A) adding `@pytest.mark.run_only_on`("gpu")
to the affected test functions so they skip when CUDA is unavailable, or (B)
change _make_layer_spec to detect CUDA availability and return
get_gpt_layer_local_spec() when CUDA is not present (mimic the pattern from
gpt_builder.py/gpt_provider.py); update only the _make_layer_spec implementation
or annotate the test functions accordingly and ensure references remain to
_make_layer_spec, get_gpt_layer_with_transformer_engine_spec, and
get_gpt_layer_local_spec.

In `@tests/unit_tests/models/qwen_omni/test_qwen3_omni_provider.py`:
- Around line 19-22: This test module is missing the module-level pytest unit
marker; add "import pytest" and set "pytestmark = pytest.mark.unit" near the top
alongside the existing import (i.e., next to the "from
megatron.bridge.models.qwen_omni import Qwen3OmniModelProvider") so the
TestQwen3OmniModelProvider class is discovered by marker-based test selection.

---

Nitpick comments:
In `@src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/model.py`:
- Around line 33-107: The Qwen3OmniModel class and its public methods (__init__,
shared_embedding_or_output_weight, set_input_tensor, freeze, forward) lack
Google-style docstrings; add Sphinx-parseable docstrings to the class and each
public method describing Args, Returns, and Raises (where applicable),
documenting parameter types/semantics (e.g., input_ids: torch.Tensor,
position_ids: torch.Tensor | None, inference_params: InferenceParams | None),
the return value of forward (torch.Tensor), and any exceptions set_input_tensor
or freeze may raise; ensure the class-level docstring explains that this is a
wrapper delegating to Qwen3OmniThinkerModel and notes which methods delegate to
thinker so generated docs are clear.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c0051edc-104e-4136-b6ed-e2820150003b

📥 Commits

Reviewing files that changed from the base of the PR and between 57d10ad and 2ba2be9.

📒 Files selected for processing (22)

docs/models/vlm/index.md
docs/models/vlm/qwen3-omni.md
src/megatron/bridge/models/__init__.py
src/megatron/bridge/models/qwen/qwen_provider.py
src/megatron/bridge/models/qwen_omni/__init__.py
src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/__init__.py
src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/model.py
src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/rope.py
src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/thinker_model.py
src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/transformer_config.py
src/megatron/bridge/models/qwen_omni/qwen3_omni_bridge.py
src/megatron/bridge/models/qwen_omni/qwen3_omni_provider.py
src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl/rope.py
src/megatron/bridge/models/qwen_vl/qwen35_vl_provider.py
tests/functional_tests/models/qwen_omni/__init__.py
tests/functional_tests/models/qwen_omni/test_qwen3_omni_conversion.py
tests/functional_tests/models/qwen_omni/test_qwen3_omni_smoke.py
tests/functional_tests/models/qwen_omni/utils.py
tests/unit_tests/models/qwen_omni/conftest.py
tests/unit_tests/models/qwen_omni/modeling_qwen3_omni/test_omni_model.py
tests/unit_tests/models/qwen_omni/test_qwen3_omni_bridge.py
tests/unit_tests/models/qwen_omni/test_qwen3_omni_provider.py

Signed-off-by: hbhflw2000 <417911774@qq.com>

yaoyu-33 · 2026-03-18T22:06:13Z

/claude review

claude · 2026-03-18T22:07:51Z

+    ) -> torch.Tensor:
+        if feature_attention_mask is not None:
+            audio_feature_lengths = torch.sum(feature_attention_mask, dim=1)
+            input_features = input_features.permute(0, 2, 1)[feature_attention_mask.bool()].permute(1, 0)


Nit/potential bug: When feature_attention_mask is not None, line 202 does .permute(0, 2, 1)[mask].permute(1, 0) — but after boolean indexing, the result is 2D (num_selected, num_mel_bins), not 3D, so the second .permute(1, 0) transposes to (num_mel_bins, num_selected). This matches the HF upstream behavior, but it's worth double-checking that self.audio_model expects (feature_dim, time) layout rather than (time, feature_dim) after masking.

This is intentional and matches the HF Qwen3-Omni thinker implementation.
The masked audio features are reshaped into the layout expected by the reused HF audio encoder path. We kept this behavior aligned with upstream, and the current audio unit/smoke validation did not show a shape mismatch on this path.

yaoyu-33 · 2026-03-18T22:10:30Z

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Functional tests for local Qwen3-Omni smoke-checkpoint creation and conversion."""


need L0 bash script to actual run the tests, otherwise this test would not run. Check latest contribute.md

Thanks for calling this out.

I followed up on the latest CONTRIBUTING.md guidance and wired this through the functional CI path using an L2_Launch_*.sh launcher, which is the same pattern currently used by Qwen VL / Qwen3.5 VL model functional tests.

Specifically, I added:

tests/functional_tests/L2_Launch_models_qwen3_omni.sh

the corresponding cicd-functional-tests matrix entry in .github/workflows/cicd-main.yml

So this test is now connected to the functional CI workflow rather than remaining only as a standalone local pytest file.

I also validated the path locally with:

a local CI-equivalent toy functional conversion run

local full-model multimodal inference

local smoke HF -> Megatron -> HF conversion/inference

Signed-off-by: hbhflw2000 <417911774@qq.com>

hbhflw2000 · 2026-03-20T02:14:36Z

Thanks again for the detailed review.

I’ve addressed the review feedback in the latest updates, including the example scripts, functional CI launcher integration, and the related code/doc cleanups. I also re-ran the local validation paths for the toy functional conversion flow and the local multimodal smoke paths.

Whenever you have time, this should be ready for another review pass.

cuichenx · 2026-03-23T03:35:12Z

@hbhflw2000 we had a functional test refactor recently. Could you please rename L2_Launch_models_qwen3_omni.sh -> L0_Launch_models_qwen3_omni.sh and resolve the conflict in cicd-main.yml by adding L0_Launch_models_qwen3_omni.sh to the current list of tests? Thanks

Signed-off-by: hbhflw2000 <417911774@qq.com>

hbhflw2000 · 2026-03-23T06:31:29Z

Thanks for the update.

I’ve renamed the launcher to L0_Launch_models_qwen3_omni.sh and updated cicd-main.yml to add L0_Launch_models_qwen3_omni.sh to the current functional test list.

I’ve also forward-ported the branch onto the current upstream Qwen layout in 31e2295b, so from my side this should have addressed the layout drift and CI launcher alignment issues as well.

Please take another look when convenient, especially at the current mergeability / review state.

Signed-off-by: hbhflw2000 <417911774@qq.com>

suiyoubi · 2026-03-30T13:56:05Z

/ok to test 0c1ff02

copy-pr-bot · 2026-03-30T13:56:09Z

/ok to test 0c1ff02

@suiyoubi, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

hbhflw2000 · 2026-04-03T03:39:59Z

Thanks again for the previous review.

It looks like the last /ok to test hit an infra issue (E2), so the CI workflows didn’t actually start.

From my side, I believe all author-side follow-ups have been addressed. If there is anything still pending for needs-author, please let me know and I’ll update it right away.

Could you help retry the CI when convenient? Thanks!

hbhflw2000 · 2026-04-17T10:26:40Z

Closing this in favor of #3317.

#3317 builds on this work and extends it to training, so I’ll keep the review there to avoid duplicate effort.

hbhflw2000 added 5 commits March 17, 2026 17:09

[model,docs,test] feat: add qwen3 omni stage1 scaffolding

bd917ea

Signed-off-by: Lianglipeng <lianglipeng@didiglobal.com> Signed-off-by: hbhflw2000 <417911774@qq.com>

[model,test] feat: add qwen3 omni vision runtime

f92d8cc

Signed-off-by: Lianglipeng <lianglipeng@didiglobal.com> Signed-off-by: hbhflw2000 <417911774@qq.com>

[model,test] feat: add qwen3 omni audio runtime

866b5d5

Signed-off-by: Lianglipeng <lianglipeng@didiglobal.com> Signed-off-by: hbhflw2000 <417911774@qq.com>

[model,test] fix: add qwen3 omni functional smoke coverage

2b2ef81

Signed-off-by: hbhflw2000 <417911774@qq.com>

[docs,test] fix: align qwen3 omni docs and test paths

2ba2be9

Signed-off-by: hbhflw2000 <417911774@qq.com>

github-actions Bot added the community-request label Mar 17, 2026

hbhflw2000 changed the title ~~Omni3~~ [model,test] Add initial Qwen3-Omni support in Megatron-Bridge Mar 17, 2026

coderabbitai Bot reviewed Mar 17, 2026

View reviewed changes

hbhflw2000 added 2 commits March 17, 2026 20:04

[model,test,docs] fix: address qwen3 omni review feedback

a8d7ed9

Signed-off-by: hbhflw2000 <417911774@qq.com>

[docs] chore: improve qwen3 omni docstring coverage

7ae0f55

Signed-off-by: hbhflw2000 <417911774@qq.com>

yaoyu-33 added needs-follow-up area:model Model implementations and HF bridge logic labels Mar 18, 2026

chtruong814 removed the needs-follow-up label Mar 18, 2026

yaoyu-33 reviewed Mar 18, 2026

View reviewed changes

Comment thread docs/models/vlm/qwen3-omni.md

claude Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/thinker_model.py

claude Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread src/megatron/bridge/models/qwen_omni/modeling_qwen3_omni/rope.py Outdated

claude Bot reviewed Mar 18, 2026

View reviewed changes

yaoyu-33 reviewed Mar 18, 2026

View reviewed changes

Comment thread docs/models/vlm/qwen3-omni.md Outdated

yaoyu-33 reviewed Mar 18, 2026

View reviewed changes

Comment thread docs/models/vlm/qwen3-omni.md Outdated

yaoyu-33 reviewed Mar 18, 2026

View reviewed changes

[examples,model,test] feat: add qwen3 omni repro scripts

3f9d7f7

Signed-off-by: hbhflw2000 <417911774@qq.com>

hbhflw2000 force-pushed the omni3 branch from 85872f9 to 3f9d7f7 Compare March 19, 2026 04:04

hbhflw2000 added 3 commits March 19, 2026 16:52

[model,docs,test] fix: refine qwen3 omni rope and doc review items

4f03b90

Signed-off-by: hbhflw2000 <417911774@qq.com>

[examples,docs] fix: remove local qwen3 omni path defaults

382d144

Signed-off-by: hbhflw2000 <417911774@qq.com>

[test,ci] feat: add qwen3 omni toy functional coverage

1240ba7

Signed-off-by: hbhflw2000 <417911774@qq.com>

hbhflw2000 requested a review from a team as a code owner March 19, 2026 09:20

chtruong814 added the needs-follow-up label Mar 22, 2026

chtruong814 removed the needs-follow-up label Mar 23, 2026

[test,ci] fix: rename qwen3 omni launcher to l0

0c1ff02

Signed-off-by: hbhflw2000 <417911774@qq.com>

[test,ci] fix: resolve qwen3 omni workflow conflict

6468712

chtruong814 added the needs-follow-up label Mar 25, 2026

[model,test] fix: forward-port qwen3 omni onto upstream qwen layout

31e2295

Signed-off-by: hbhflw2000 <417911774@qq.com>

hbhflw2000 force-pushed the omni3 branch from 00ca3e9 to 31e2295 Compare March 26, 2026 09:08

chtruong814 added needs-follow-up and removed needs-follow-up labels Mar 26, 2026

chtruong814 removed the needs-follow-up label Mar 30, 2026

github-actions Bot removed the needs-author label Apr 3, 2026

chtruong814 added the needs-follow-up label Apr 5, 2026

martinzhang03 mentioned this pull request Apr 13, 2026

feat: Add support for Qwen 2.5-Omni model bridge #2296

Closed

hbhflw2000 mentioned this pull request Apr 14, 2026

[model]Add Qwen3‑Omni training support #3317

Merged

5 tasks

hbhflw2000 changed the title ~~[model,test] Add initial Qwen3-Omni support in Megatron-Bridge~~ [model] Add initial Qwen3-Omni support in Megatron-Bridge Apr 15, 2026

hbhflw2000 closed this Apr 17, 2026

chtruong814 removed the needs-follow-up label Apr 17, 2026

Conversation

hbhflw2000 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary

New Features

Implementation

Tests

Documentation

Validation

Local test runs

Observed results

Current Scope

Not included in this PR

Notes

CI

Pre-review Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Mar 17, 2026

Uh oh!

coderabbitai Bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

hbhflw2000 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yaoyu-33 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

hbhflw2000 Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

hbhflw2000 commented Mar 20, 2026

Uh oh!

cuichenx commented Mar 23, 2026

Uh oh!

hbhflw2000 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

suiyoubi commented Mar 30, 2026

Uh oh!

copy-pr-bot Bot commented Mar 30, 2026

Uh oh!

hbhflw2000 commented Apr 3, 2026

Uh oh!

hbhflw2000 commented Apr 17, 2026

Uh oh!

hbhflw2000 commented Mar 17, 2026 •

edited

Loading

coderabbitai Bot commented Mar 17, 2026 •

edited

Loading

hbhflw2000 commented Mar 23, 2026 •

edited

Loading