Skip to content

Add falcon h1#1462

Merged
yaoyu-33 merged 15 commits into
NVIDIA-NeMo:mainfrom
dhiaEddineRhaiem:add-falcon-h1
May 18, 2026
Merged

Add falcon h1#1462
yaoyu-33 merged 15 commits into
NVIDIA-NeMo:mainfrom
dhiaEddineRhaiem:add-falcon-h1

Conversation

@dhiaEddineRhaiem

@dhiaEddineRhaiem dhiaEddineRhaiem commented Nov 21, 2025

Copy link
Copy Markdown
Contributor

This PR adds support for FalconH1 in MegatronBridge.
FalconH1 is introduced as a new ParallelHybrid layer in the hybrid design of MegatronBridge.
cc @sbhavani

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for FalconH1 model family with multiple sizes (500M, 1.5B, 7B, 34B parameters)
    • Enabled configurable hybrid Mamba/attention/MLP layer architectures for flexible model composition
    • Added HuggingFace pretrained model integration for seamless loading and conversion

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot

copy-pr-bot Bot commented Nov 21, 2025

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
@yaoyu-33

Copy link
Copy Markdown
Contributor

@dhiaEddineRhaiem thanks a lot for the contribution.
Overall it looks very good!
We will need to verify the conversion between hf / megatron. Can you check https://docs.nvidia.com/nemo/megatron-bridge/latest/adding-new-models.html#validate-with-examples and maybe attach you validation results here?

@dhiaEddineRhaiem

Copy link
Copy Markdown
Contributor Author

hello @yaoyu-33 ,

this the output of the hf conversion of falcon h1 after the previous fixes but before adding Falcon h1 Mup Fwd multipliers.

For reference , here is the complete list of FalconH1 MuP multipliers (from config):

embedding_multiplier: scales embedding outputs
lm_head_multiplier:scales final logits
attention_in_multiplier: scales attention input
attention_out_multiplier: scales attention output
key_multiplier: scales K projection (ref)
mlp_multipliers: gate/down projections (ref)
ssm_in_multiplier: scales SSM input
ssm_out_multiplier: cales SSM output
ssm_multipliers: z/x/B/C/dt components (ref)

So the gibberish output was expected because we were not applying Falcon h1 Mup Fwd multipliers(MLP , attention and ssm multipliers) yet.

I managed to add some of them in bridge except MLP gate and down proj multipliers , ssm_multipliers, key_multiplier which i think should be added to core in the fwd of MambaMixer , MLP and selfAttention.
More info on how these multipliers are being applied could be found here

  1. MLP gate and down proj

2.ssm_multipliers

3.key_multiplier

so IMO , they should be add to core in the fwd of MambaMixer , MLP and self attention.

Any help on this please ?

@Separius

Copy link
Copy Markdown

Hi @dhiaEddineRhaiem, how bad would it be to fuse the multipliers into the weights (during conversion)? I'm guessing changing the mcore is a no go, will the training break or will it be just worse? or can it be accounted for in lr? can we multiply the learning rate of those fused weights with 1/multiplier and get the same updates? (I think wd needs to be changed too)

@sbhavani

sbhavani commented Dec 1, 2025

Copy link
Copy Markdown
Contributor

@dhiaEddineRhaiem thanks for adding the muP multipliers. I think this is a good approach while we figure out general muP support in Megatron Core.

One potential bug in falconh1_layer.py: The attention out multiplier overwrites the tensor:

attn_output = self.config.attention_out_multiplier

@yaoyu-33

yaoyu-33 commented Dec 8, 2025

Copy link
Copy Markdown
Contributor

@dhiaEddineRhaiem : sry for a bit late in reply.
MCore also accept PRs. These configs doesn't seem very intrusive so we can add it in mcore.
Another way we can do is, if we want to merge faster, we can override some the mixer class / mlp class directly in mcore. Example is https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl where we have to do deepstack etc. for qwen3vl

yaoyu-33
yaoyu-33 previously approved these changes Dec 17, 2025
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test ee80ca9

@coderabbitai

coderabbitai Bot commented Jan 30, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR adds comprehensive support for the FalconH1 hybrid model architecture to Megatron, including bridge adapters for HuggingFace model conversion, configuration-driven model providers for multiple model sizes (500M, 1.5B, 7B, 34B), stack implementations with Mamba and attention hybrid layers, module specifications, and layer allocation logic.

Changes

Cohort / File(s) Summary
Public API Exports
src/megatron/bridge/models/__init__.py, src/megatron/bridge/models/falcon_h1/__init__.py
Added imports and re-exports of FalconH1 bridge, model providers, and core model components (FalconH1Model, FalconH1Config, FalconH1Stack, FalconH1Layer) to enable public access and discoverability.
Bridge and Model Providers
src/megatron/bridge/models/falcon_h1/falconh1_bridge.py, src/megatron/bridge/models/falcon_h1/falconh1_provider.py
Introduced FalconH1Bridge class with HuggingFace-to-Megatron parameter mapping registry (including Mamba, attention, QKV, gated MLP mappings). Added FalconH1ModelProvider base and four size-specific providers (500M, 1.5B Deep, 7B, 34B) with architecture presets, MuP/scaling multipliers, and vocab handling logic.
Core Model Architecture
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py
Introduced FalconH1Config dataclass with Mamba/SSM parameters and validation. Added FalconH1Model as a LanguageModule with hybrid attention/Mamba/MLP support, handling embeddings, rotary position embeddings, inference contexts, and language modeling loss computation.
Stack and Layer Implementation
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py, src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py
Implemented FalconH1Stack with layer allocation, FP8 context management, sharded state dictionary construction, and pipeline-parallel support. Added FalconH1Layer composing Mamba mixer, self-attention, and MLP with optional enabling/disabling and bias-dropout-add fusion.
Layer Specifications and Allocation
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py, src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py
Defined falconh1_stack_spec ModuleSpec configuring the full FalconH1Stack architecture with nested submodules. Introduced allocate_layers() utility supporting auto-allocation and override patterns for hybrid layer type distribution (Mamba, attention, MLP, parallel).

Sequence Diagram(s)

sequenceDiagram
    participant HF as HuggingFace Model
    participant Bridge as FalconH1Bridge
    participant Provider as FalconH1ModelProvider
    participant Model as FalconH1Model
    participant Stack as FalconH1Stack
    participant Layers as FalconH1Layer/Mamba/Attention
    
    HF->>Bridge: provider_bridge(hf_pretrained)
    Bridge->>Provider: Create with HF config
    Provider->>Provider: Configure model params
    Provider->>Model: provide() instantiate
    Model->>Stack: build_module with spec
    Stack->>Stack: allocate_layers()
    Stack->>Layers: construct per-layer
    Layers->>Layers: initialize submodules
    Model->>Model: forward() inference
    Stack->>Layers: forward through stack
    Layers->>Layers: Mamba/Attention/MLP
    Layers-->>Stack: aggregated output
    Stack-->>Model: layer outputs
    Model-->>Model: output projection & logits
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 3
❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning Major feature adding ~2000+ lines affecting model behavior lacks test results, validation metrics, or performance benchmarks in PR description. Add HuggingFace-to-Megatron conversion validation, logit matching tests, output correctness confirmation, and performance benchmarks to PR description with detailed results or links.
Title check ❓ Inconclusive The title 'Add falcon h1' is vague and generic, lacking specific detail about what aspect of Falcon H1 support is being added. Consider a more descriptive title that clarifies the main contribution, such as 'Add FalconH1 model bridge and provider support' or 'Introduce FalconH1 hybrid architecture integration'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py`:
- Around line 1-2: The top of
src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently contains only
a partial copyright line; replace it with the full NVIDIA copyright and Apache
License 2.0 header block (including copyright years, "NVIDIA CORPORATION.  All
rights reserved.", the Apache License, Version 2.0 notice, and license URL) so
the file has the complete standard header; add this header at the very top of
the module before any imports or code to ensure compliance with project
licensing guidelines.

In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py`:
- Around line 153-154: The call currently uses boolean-or which treats explicit
False as unset; change the logic so defaults are only applied when arguments are
None: for the call site that passes pre_process and post_process (in
falconh1_provider.py) replace the inline "pre_process=pre_process or
parallel_state.is_pipeline_first_stage()" and "post_process=post_process or
parallel_state.is_pipeline_last_stage()" with conditional logic that uses the
provided value when it is not None (e.g., pre_process if pre_process is not None
else parallel_state.is_pipeline_first_stage(), and similarly for
post_process/post_process is not None else
parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`:
- Around line 220-221: The else branch handling unexpected layer_type in
modeling_falconh1/falconh1_block.py should not use `assert False` because
assertions can be stripped with Python -O; replace that assertion with an
explicit exception by raising an AssertionError (e.g., in the else of the
layer_type switch/if in the function/class handling layer dispatch in
falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise
AssertionError("unexpected layer_type")`) so the error is always raised in
production.
- Around line 367-388: The loop's branching erroneously uses two separate ifs
causing TransformerLayer instances to also hit the MambaLayer branch; change the
second check to an elif so only one branch runs per layer (i.e., use elif
isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the
existing argument sets for TransformerLayer, FalconH1Layer and the else
(MambaLayer) branches as-is to avoid double execution.

In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`:
- Line 1: Replace the short copyright line at the top of falconh1_layer_specs.py
with an updated full Apache License, Version 2.0 header: change the year from
2023 to 2025 and insert the complete Apache 2.0 boilerplate (including copyright
notice, "Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.", link to
http://www.apache.org/licenses/LICENSE-2.0, and the standard disclaimer). Ensure
the new header appears at the very top of the file before any imports or code
(i.e., replacing the existing copyright comment).

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py`:
- Line 283: The bug is that attn_output is being overwritten by the scalar
self.config.attention_out_multiplier; instead of assigning the multiplier, scale
the attention tensor (e.g., attn_output = attn_output *
self.config.attention_out_multiplier or in-place attn_output *= ...) so the
actual attention result is preserved; ensure the multiplication occurs in the
same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed)
and update the code around the attn_output assignment in
modeling_falconh1/falconh1_layer.py (the attention block / forward method) to
perform tensor scaling rather than scalar replacement.

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`:
- Around line 365-367: The current call to self.output_layer(...) returns a
tuple (logits, bias) but the code multiplies the whole tuple by
self.config.lm_head_multiplier; change the unpacking and apply the multiplier
only to the logits tensor by assigning something like logits, bias =
self.output_layer(hidden_states, weight=output_weight,
runtime_gather_output=runtime_gather_output) and then set logits = logits *
self.config.lm_head_multiplier (or multiply logits inline after unpacking) so
only the logits tensor is scaled.

In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`:
- Around line 192-199: The test harness in the __main__ block defines test_cases
including an override pattern "M*-M*-M*-" but supplies all-zero ratios which
yields all-MAMBA layers and will cause a ValueError when validating the
override; fix by either updating that tuple in test_cases to supply non-zero
ratios that produce a layer distribution compatible with the override pattern
(so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that
specific test case execution in a try/except that asserts the expected
ValueError (i.e., catch the error and treat it as a passed negative test);
locate the test_cases list and the for t in test_cases loop in the __main__
block to apply the change.
- Around line 35-40: Rename the ambiguous loop variable `l` to `layer_idx` (or
similar) in all loops in this module to satisfy linters and improve readability;
specifically update every occurrence in the loops that assign into
layer_type_list using total_layers_count and any other loops within the same
file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x)
so the index variable is consistently named `layer_idx` and all references
inside those loops (assignments, indexing, increments/decrements) are updated to
the new name.
🧹 Nitpick comments (16)
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py (2)

1-1: Update copyright header.

Add the complete Apache 2.0 license text. The copyright year can remain 2024 or be updated to 2025.


3-21: Remove unused imports.

Static analysis indicates several unused imports that should be removed for code cleanliness.

🧹 Proposed cleanup
-import math
 from dataclasses import dataclass
 from typing import Optional, Tuple, Union

 import torch
-import torch.nn as nn
 from megatron.core.process_groups_config import ProcessGroupCollection
 from megatron.bridge.models.falcon_h1.modeling_falconh1.falconh1_model import FalconH1Config
 from megatron.core.transformer.module import MegatronModule
 from megatron.core.transformer.spec_utils import ModuleSpec, build_module
 from megatron.core.transformer.enums import AttnMaskType
 from megatron.core.transformer.identity_op import IdentityOp
 from megatron.core.packed_seq_params import PackedSeqParams
 from megatron.core.inference.contexts import BaseInferenceContext
 from megatron.core.utils import log_single_rank
 import logging
-from megatron.core.ssm.mamba_mixer import MambaMixer, MambaMixerSubmodules
+from megatron.core.ssm.mamba_mixer import MambaMixerSubmodules
-from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules
+from megatron.core.transformer.attention import SelfAttentionSubmodules
 from megatron.core.transformer.mlp import MLP
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py (3)

1-1: Update copyright header to 2025 with full license text.


20-21: Remove duplicate Optional and Tuple imports.

These types are already imported on line 3. The redefinition is flagged by static analysis.

🧹 Proposed fix
-from dataclasses import dataclass
-from typing import Optional, Tuple
+from dataclasses import dataclass

The Tuple type from line 3 can be used, and Optional is already imported.


177-177: Use explicit str | None for nullable type hints.

Per coding guidelines, use T | None instead of implicit Optional. PEP 484 prohibits implicit Optional.

📝 Proposed fix
-        hybrid_override_pattern: str = None,
+        hybrid_override_pattern: str | None = None,
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py (2)

402-406: Remove unused output variable.

The make_viewless_tensor result is assigned to output but hidden_states is returned instead. Either use output or remove the assignment.

🧹 Proposed fix
         # Ensure that the tensor passed between pipeline parallel stages is
         # viewless. See related notes in TransformerBlock and TransformerLayer
-        output = make_viewless_tensor(
+        hidden_states = make_viewless_tensor(
             inp=hidden_states, requires_grad=hidden_states.requires_grad, keep_graph=True
         )

         return hidden_states

131-131: Minor style issues flagged by static analysis.

  1. Line 131: Use explicit str | None for nullable type hint
  2. Line 453: Use is not instead of not ... is
📝 Proposed fixes
-        hybrid_override_pattern: str = None,
+        hybrid_override_pattern: str | None = None,
-            if not module is self.layers:
+            if module is not self.layers:

Also applies to: 453-453

src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py (6)

1-1: Update copyright year to 2025.

The copyright header uses 2024, but per the coding guidelines and consistency with other files in this PR (e.g., falconh1_provider.py), it should be 2025.

-# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

17-22: Consider using frozenset for immutability or annotate with ClassVar.

The VALID set is a mutable class attribute. For a constant set that shouldn't be modified, consider using frozenset for true immutability, or at minimum annotate with ClassVar to satisfy the linter.

+from typing import ClassVar
+
 class Symbols:
     MAMBA = "M"
     ATTENTION = "*"
     MLP = "-"
     PARALLEL = "P"
-    VALID = {MAMBA, ATTENTION, MLP, PARALLEL}
+    VALID: ClassVar[frozenset[str]] = frozenset({MAMBA, ATTENTION, MLP, PARALLEL})

107-108: Use explicit str | None type hint.

Per coding guidelines, use T | None for nullable types instead of implicit Optional. The current signature uses str = None which implicitly means Optional[str].

-    override_pattern: str = None,
+    override_pattern: str | None = None,

As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'"


109-113: Consider splitting compound assertions for clearer error messages.

When a compound assertion fails, it's not immediately clear which condition was violated. Splitting them provides better debugging information.

♻️ Proposed fix
     assert total_layers_count > 0
-    assert target_attention_ratio >= 0.0 and target_attention_ratio <= 1.0
-    assert target_mlp_ratio >= 0.0 and target_mlp_ratio <= 1.0
-    assert target_parallel_hybrid_ratio >= 0.0 and target_parallel_hybrid_ratio <= 1.0
+    assert 0.0 <= target_attention_ratio <= 1.0, f"target_attention_ratio must be in [0, 1], got {target_attention_ratio}"
+    assert 0.0 <= target_mlp_ratio <= 1.0, f"target_mlp_ratio must be in [0, 1], got {target_mlp_ratio}"
+    assert 0.0 <= target_parallel_hybrid_ratio <= 1.0, f"target_parallel_hybrid_ratio must be in [0, 1], got {target_parallel_hybrid_ratio}"
     assert target_attention_ratio + target_mlp_ratio + target_parallel_hybrid_ratio <= 1.0

25-27: Add parameterized return type hints.

The return type list could be more specific as list[str] to improve type safety and IDE support. This applies to _allocate_auto, _allocate_override, and allocate_layers.

 def _allocate_auto(
     total_layers_count: int, target_attention_ratio: float, target_mlp_ratio: float, target_parallel_hybrid_ratio: float
-) -> list:
+) -> list[str]:

133-135: Use logging.WARNING level for warning messages.

The message says "Warning:" but uses logging.INFO level. Consider using logging.WARNING for consistency with the message semantics.

-            log_single_rank(logger, logging.INFO, "Warning: overriding pattern A with pattern B")
+            log_single_rank(logger, logging.WARNING, "Overriding pattern A with pattern B")
src/megatron/bridge/models/falcon_h1/falconh1_provider.py (3)

17-17: Prefer X | None and X | Y syntax over Optional and Union.

Per coding guidelines, use T | None for nullable types and X | Y for union types instead of Optional[T] and Union[X, Y]. The Optional and Union imports can be removed if you update the type hints.

Lines affected:

  • Line 67: Optional[str]str | None
  • Line 73: Optional[float]float | None
  • Line 78: Optional[int]int | None
  • Line 107: Union[ModuleSpec, Callable[[], ModuleSpec]]ModuleSpec | Callable[[], ModuleSpec]

As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'" and "Use 'X | Y' for union types instead of 'Union[X, Y]'"


94-94: Fix typo in comment.

-    `#Falcon` H1 Mup Fwd Multpliers
+    # Falcon H1 MuP Forward Multipliers

102-103: Add parameterized tuple type hints.

For better type safety, specify the tuple element types:

-    mlp_multipliers: tuple = (1.0, 1.0)
-    ssm_multipliers: tuple = (1.0, 1.0, 1.0, 0.5, 1.0)
+    mlp_multipliers: tuple[float, float] = (1.0, 1.0)
+    ssm_multipliers: tuple[float, float, float, float, float] = (1.0, 1.0, 1.0, 0.5, 1.0)

Comment on lines +1 to +2
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add the complete NVIDIA copyright and Apache 2.0 license header.

The file has only a partial copyright line. As per coding guidelines, Python files should include the full NVIDIA copyright header with Apache 2.0 license text.

📝 Proposed fix
-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
-
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import logging
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py` around lines 1 - 2,
The top of src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently
contains only a partial copyright line; replace it with the full NVIDIA
copyright and Apache License 2.0 header block (including copyright years,
"NVIDIA CORPORATION.  All rights reserved.", the Apache License, Version 2.0
notice, and license URL) so the file has the complete standard header; add this
header at the very top of the module before any imports or code to ensure
compliance with project licensing guidelines.

Comment on lines +153 to +154
pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
post_process=post_process or parallel_state.is_pipeline_last_stage(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: Explicit False for pre_process/post_process will be incorrectly overridden.

Using or treats explicit False as falsy, so pre_process=False would be overridden to parallel_state.is_pipeline_first_stage(). The signature uses =None as default, suggesting the intent is to only apply the default when the argument is not provided.

🐛 Proposed fix
-            pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
-            post_process=post_process or parallel_state.is_pipeline_last_stage(),
+            pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process,
+            post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
post_process=post_process or parallel_state.is_pipeline_last_stage(),
pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process,
post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process,
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py` around lines 153 -
154, The call currently uses boolean-or which treats explicit False as unset;
change the logic so defaults are only applied when arguments are None: for the
call site that passes pre_process and post_process (in falconh1_provider.py)
replace the inline "pre_process=pre_process or
parallel_state.is_pipeline_first_stage()" and "post_process=post_process or
parallel_state.is_pipeline_last_stage()" with conditional logic that uses the
provided value when it is not None (e.g., pre_process if pre_process is not None
else parallel_state.is_pipeline_first_stage(), and similarly for
post_process/post_process is not None else
parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.

Comment on lines +220 to +221
else:
assert False, "unexpected layer_type"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Replace assert False with raise AssertionError().

assert False statements are removed when Python runs with optimization (-O flag), which could allow unexpected layer types to pass silently in production.

🛡️ Proposed fix
                 else:
-                    assert False, "unexpected layer_type"
+                    raise AssertionError(f"unexpected layer_type: {layer_type}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else:
assert False, "unexpected layer_type"
else:
raise AssertionError(f"unexpected layer_type: {layer_type}")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 221-221: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`
around lines 220 - 221, The else branch handling unexpected layer_type in
modeling_falconh1/falconh1_block.py should not use `assert False` because
assertions can be stripped with Python -O; replace that assertion with an
explicit exception by raising an AssertionError (e.g., in the else of the
layer_type switch/if in the function/class handling layer dispatch in
falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise
AssertionError("unexpected layer_type")`) so the error is always raised in
production.

Comment on lines +367 to +388
if isinstance(layer, TransformerLayer):
hidden_states, _ = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
if isinstance(layer, FalconH1Layer):
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
else: # MambaLayer
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical logic bug: Missing elif causes double execution for TransformerLayer.

The forward loop has incorrect branching:

  1. if isinstance(layer, TransformerLayer) executes for TransformerLayer
  2. if isinstance(layer, FalconH1Layer) is a separate if, not elif
  3. else (MambaLayer) executes for anything that's NOT FalconH1Layer

This means for TransformerLayer, both the TransformerLayer branch AND the MambaLayer branch will execute, processing the layer twice with potentially mismatched arguments.

🐛 Fix the branching logic
                 with inner_fp8_context:
                     if isinstance(layer, TransformerLayer):
                         hidden_states, _  = layer(
                             hidden_states=hidden_states,
                             attention_mask=attention_mask,
                             inference_context=inference_context,
                             rotary_pos_emb=rotary_pos_emb,
                             sequence_len_offset=sequence_len_offset,
                         )
-                    if isinstance(layer, FalconH1Layer):
+                    elif isinstance(layer, FalconH1Layer):
                         hidden_states = layer(
                             hidden_states=hidden_states,
                             attention_mask=attention_mask,
                             inference_context=inference_context,
                             rotary_pos_emb=rotary_pos_emb,
                             sequence_len_offset=sequence_len_offset,
                         )
                     else:  # MambaLayer
                         hidden_states = layer(
                             hidden_states=hidden_states,
                             attention_mask=attention_mask,
                             inference_context=inference_context,
                         )
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`
around lines 367 - 388, The loop's branching erroneously uses two separate ifs
causing TransformerLayer instances to also hit the MambaLayer branch; change the
second check to an elif so only one branch runs per layer (i.e., use elif
isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the
existing argument sets for TransformerLayer, FalconH1Layer and the else
(MambaLayer) branches as-is to avoid double execution.

@@ -0,0 +1,109 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update copyright header.

The copyright year is 2023 and the full license text is missing. Update to 2025 and add the complete Apache 2.0 license header.

📝 Proposed fix
-# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from megatron.core.extensions.transformer_engine import (
🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`
at line 1, Replace the short copyright line at the top of
falconh1_layer_specs.py with an updated full Apache License, Version 2.0 header:
change the year from 2023 to 2025 and insert the complete Apache 2.0 boilerplate
(including copyright notice, "Licensed under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with the
License.", link to http://www.apache.org/licenses/LICENSE-2.0, and the standard
disclaimer). Ensure the new header appears at the very top of the file before
any imports or code (i.e., replacing the existing copyright comment).

packed_seq_params=packed_seq_params,
sequence_len_offset=sequence_len_offset,
)
attn_output = self.config.attention_out_multiplier

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical bug: Attention output tensor is overwritten with scalar multiplier.

This line replaces the attention output tensor with the scalar multiplier value instead of scaling the tensor. This will cause incorrect model outputs as the actual attention computation result is discarded.

This bug was also identified in the PR comments by @sbhavani.

🐛 Fix the tensor scaling
-            attn_output = self.config.attention_out_multiplier
+            attn_output = attn_output * self.config.attention_out_multiplier
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
attn_output = self.config.attention_out_multiplier
attn_output = attn_output * self.config.attention_out_multiplier
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py` at
line 283, The bug is that attn_output is being overwritten by the scalar
self.config.attention_out_multiplier; instead of assigning the multiplier, scale
the attention tensor (e.g., attn_output = attn_output *
self.config.attention_out_multiplier or in-place attn_output *= ...) so the
actual attention result is preserved; ensure the multiplication occurs in the
same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed)
and update the code around the attn_output assignment in
modeling_falconh1/falconh1_layer.py (the attention block / forward method) to
perform tensor scaling rather than scalar replacement.

Comment on lines +365 to +367
logits, _ = self.output_layer(
hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output
) * self.config.lm_head_multiplier

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Multiplier applied to tuple instead of logits tensor.

self.output_layer() returns a tuple (logits, bias). The multiplication * self.config.lm_head_multiplier is applied to this tuple, not to the logits tensor. This will likely cause a runtime error.

🐛 Proposed fix
-        logits, _ = self.output_layer(
+        logits, _ = self.output_layer(
             hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output
-        ) * self.config.lm_head_multiplier
+        )
+        logits = logits * self.config.lm_head_multiplier
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`
around lines 365 - 367, The current call to self.output_layer(...) returns a
tuple (logits, bias) but the code multiplies the whole tuple by
self.config.lm_head_multiplier; change the unpacking and apply the multiplier
only to the logits tensor by assigning something like logits, bias =
self.output_layer(hidden_states, weight=output_weight,
runtime_gather_output=runtime_gather_output) and then set logits = logits *
self.config.lm_head_multiplier (or multiply logits inline after unpacking) so
only the logits tensor is scaled.

Comment on lines +35 to +40
for l in range(total_layers_count):
if x < 0.5:
layer_type_list[l] = Symbols.ATTENTION
x += mamba_section_length
else:
x -= 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Rename ambiguous variable l to layer_idx or similar.

The variable name l is flagged by linters (E741) because it's easily confused with the digit 1 in many fonts. This applies to all loop variables named l in this file (lines 35, 48, 61, 68, 88).

♻️ Proposed fix (example for first loop)
-    for l in range(total_layers_count):
-        if x < 0.5:
-            layer_type_list[l] = Symbols.ATTENTION
+    for layer_idx in range(total_layers_count):
+        if x < 0.5:
+            layer_type_list[layer_idx] = Symbols.ATTENTION
🧰 Tools
🪛 Flake8 (7.3.0)

[error] 35-35: ambiguous variable name 'l'

(E741)

🪛 Ruff (0.14.14)

[error] 35-35: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`
around lines 35 - 40, Rename the ambiguous loop variable `l` to `layer_idx` (or
similar) in all loops in this module to satisfy linters and improve readability;
specifically update every occurrence in the loops that assign into
layer_type_list using total_layers_count and any other loops within the same
file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x)
so the index variable is consistently named `layer_idx` and all references
inside those loops (assignments, indexing, increments/decrements) are updated to
the new name.

Comment on lines +192 to +199
if __name__ == "__main__":
test_cases = [
(9, 0.0, 0.0, 0.0, "M*-M*-M*-"),
(9, 0.0, 0.0, 0.0, "MMMMMMMMM"),
(10, 0.2, 0.1, 0.2),
]
for t in test_cases:
print("")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test harness has a test case that will fail.

The test case at line 194 passes an override pattern "M*-M*-M*-" but with all ratios set to 0.0. When ratios are all zero, the auto-allocation produces all MAMBA layers, so the override pattern won't match the layer counts (has ATTENTION * and MLP - symbols). This will raise a ValueError at runtime.

If this is intentional test behavior, consider adding a comment. Otherwise, this test case should either:

  1. Have matching ratios for the override pattern, or
  2. Be wrapped in a try-except to verify the expected error
🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`
around lines 192 - 199, The test harness in the __main__ block defines
test_cases including an override pattern "M*-M*-M*-" but supplies all-zero
ratios which yields all-MAMBA layers and will cause a ValueError when validating
the override; fix by either updating that tuple in test_cases to supply non-zero
ratios that produce a layer distribution compatible with the override pattern
(so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that
specific test case execution in a try/except that asserts the expected
ValueError (i.e., catch the error and treat it as a passed negative test);
locate the test_cases list and the for t in test_cases loop in the __main__
block to apply the change.

@gautham-kollu

Copy link
Copy Markdown
Contributor

/ok to test 1fe2d0c

@gautham-kollu

gautham-kollu commented Feb 17, 2026

Copy link
Copy Markdown
Contributor

@dhiaEddineRhaiem Can you please follow the instructions here https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/CONTRIBUTING.md#-linting-and-formatting to fix linting ?

--hf-path "$HF_EXPORT_PATH" \
--trust-remote-code

uv run python - "$HF_MODEL_ID" "$HF_EXPORT_PATH" <<'PY'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these, clean up if not needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. I kept the tokenizer copy and documented why it is needed: this export flow reconstructs the HF config/weights from a Megatron checkpoint, but tokenizer artifacts are not written by that config-only export path. The exported HF directory is later used as a standalone AutoTokenizer.from_pretrained(...) path by the inference example, so the tokenizer files need to be copied from the source HF checkpoint.

Comment thread examples/models/falcon_h1/README.md Outdated
To run the same check with a different parallelism layout:

```bash
TP=2 PP=1 NPROC_PER_NODE=2 bash examples/models/falcon_h1/conversion.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does default value TP=1 work? if not , update default value please

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The default TP=1, PP=1, NPROC_PER_NODE=1 path was validated for tiiuae/Falcon-H1-0.5B-Instruct, and the README now states that this default layout works for the 0.5B model.

@yaoyu-33

Copy link
Copy Markdown
Contributor

/claude review

Comment on lines +367 to +375
if isinstance(layer, FalconH1Layer):
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
else: # MambaLayer

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: if/if/else dispatch causes TransformerLayer to be called twice per forward pass.

When a TransformerLayer is encountered, the first if executes it. Then the second if isinstance(layer, FalconH1Layer) is False (TransformerLayer is not FalconH1Layer), so the else branch fires — calling the same layer again as if it were a MambaLayer.

The second if should be elif:

Suggested change
if isinstance(layer, FalconH1Layer):
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
else: # MambaLayer
if isinstance(layer, TransformerLayer):
hidden_states, _ = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
elif isinstance(layer, FalconH1Layer):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The second dispatch branch is now elif isinstance(layer, FalconH1Layer), so TransformerLayer instances no longer fall through into the Mamba path.

@@ -0,0 +1,38 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: copyright year is 2025; per project rules it should be 2026.

Suggested change
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

@@ -0,0 +1,161 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same 2025→2026 copyright year fix needed here.

Suggested change
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

@claude

claude Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Code Review

Critical: Forward dispatch bug in FalconH1Stack.forward()

falconh1_block.py:367-380 — The layer dispatch loop uses if/if/else instead of if/elif/else. When a TransformerLayer is encountered (attention-only layers), the first if runs it, then the second if isinstance(layer, FalconH1Layer) is False, so the else (MambaLayer path) fires too — calling the layer twice per forward pass. This will produce incorrect hidden states and double the compute for any hybrid config that includes standalone attention layers. Fix: change the second if to elif.

Minor: Copyright years — falcon_h1/init.py and falconh1_provider.py use 2025 copyright headers; should be 2026 per project rules.

Test coverage gaps — The unit tests cover bridge registration, config mapping, provider construction, and layer allocation well. Consider adding: (1) FalconH1Config.post_init validation: test that invalid A_init_dist or non-divisible d_inner % mamba_head_dim raise ValueError. (2) FalconH1MLP.forward() multiplier logic: a unit test with a small tensor would catch regressions. (3) FalconH1SelfAttention.get_query_key_value_tensors() key scaling: verify key_multiplier != 1.0 actually scales the key tensor. (4) FalconH1MambaMixer._scale_zxbc_dt(): verify the 5-element SSM multiplier vector is applied correctly.

Observation: no-op test — test_falcon_h1_provider.py::test_size_specific_providers_are_not_public_api checks that names like FalconH1ModelProvider500M are absent from all — but these classes do not exist anywhere in the PR. The test passes trivially. Consider removing it.

Suggested test cases: No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor

Addressed the latest Falcon H1 review summary in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

Changes:

  • Fixed FalconH1Stack.forward() dispatch so standalone TransformerLayer instances do not also run through the Mamba branch.
  • Updated Falcon H1 copyright headers requested in review.
  • Documented why the Falcon H1 conversion example copies tokenizer artifacts into the exported HF directory.
  • Clarified that the default TP=1, PP=1, NPROC_PER_NODE=1 example layout works for the 0.5B Falcon H1 checkpoint.
  • Added focused Falcon H1 unit coverage for config validation, MLP multipliers, attention key scaling, and Mamba SSM multipliers.
  • Removed the no-op size-specific-provider public API test.

Validation:

  • uv run pre-commit run --all-files passed.
  • Targeted Falcon H1 validation passed: unit tests, default TP=1 conversion, and default TP=1 inference flow.

Residual blocker:

  • DCO is still failing from inherited unsigned history and is unchanged by this patch.

@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test f454d16

@yaoyu-33 yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label May 15, 2026
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@yaoyu-33

Copy link
Copy Markdown
Contributor

/ok to test a3d3333

@svcnvidia-nemo-ci svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 18, 2026
@yaoyu-33 yaoyu-33 removed the needs-review PR is ready for code review and waiting on a reviewer label May 18, 2026
@yaoyu-33 yaoyu-33 merged commit 5d1d07a into NVIDIA-NeMo:main May 18, 2026
74 checks passed
@svcnvidia-nemo-ci svcnvidia-nemo-ci removed the waiting-on-maintainers Waiting on maintainers to respond label May 18, 2026
vasunvidia pushed a commit to vasunvidia/Megatron-Bridge that referenced this pull request Jun 10, 2026
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic community-request feature New capabilities, enhancements, or enablement work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants