Add falcon h1 by dhiaEddineRhaiem · Pull Request #1462 · NVIDIA-NeMo/Megatron-Bridge

dhiaEddineRhaiem · 2025-11-21T17:39:08Z

This PR adds support for FalconH1 in MegatronBridge.
FalconH1 is introduced as a new ParallelHybrid layer in the hybrid design of MegatronBridge.
cc @sbhavani

Summary by CodeRabbit

Release Notes

New Features
- Added support for FalconH1 model family with multiple sizes (500M, 1.5B, 7B, 34B parameters)
- Enabled configurable hybrid Mamba/attention/MLP layer architectures for flexible model composition
- Added HuggingFace pretrained model integration for seamless loading and conversion

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-11-21T17:39:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>

yaoyu-33 · 2025-11-21T21:24:32Z

@dhiaEddineRhaiem thanks a lot for the contribution.
Overall it looks very good!
We will need to verify the conversion between hf / megatron. Can you check https://docs.nvidia.com/nemo/megatron-bridge/latest/adding-new-models.html#validate-with-examples and maybe attach you validation results here?

dhiaEddineRhaiem · 2025-11-22T17:01:51Z

hello @yaoyu-33 ,

this the output of the hf conversion of falcon h1 after the previous fixes but before adding Falcon h1 Mup Fwd multipliers.

For reference , here is the complete list of FalconH1 MuP multipliers (from config):

embedding_multiplier: scales embedding outputs
lm_head_multiplier:scales final logits
attention_in_multiplier: scales attention input
attention_out_multiplier: scales attention output
key_multiplier: scales K projection (ref)
mlp_multipliers: gate/down projections (ref)
ssm_in_multiplier: scales SSM input
ssm_out_multiplier: cales SSM output
ssm_multipliers: z/x/B/C/dt components (ref)

So the gibberish output was expected because we were not applying Falcon h1 Mup Fwd multipliers(MLP , attention and ssm multipliers) yet.

I managed to add some of them in bridge except MLP gate and down proj multipliers , ssm_multipliers, key_multiplier which i think should be added to core in the fwd of MambaMixer , MLP and selfAttention.
More info on how these multipliers are being applied could be found here

MLP gate and down proj

2.ssm_multipliers

3.key_multiplier

so IMO , they should be add to core in the fwd of MambaMixer , MLP and self attention.

Any help on this please ?

Separius · 2025-11-27T10:05:20Z

Hi @dhiaEddineRhaiem, how bad would it be to fuse the multipliers into the weights (during conversion)? I'm guessing changing the mcore is a no go, will the training break or will it be just worse? or can it be accounted for in lr? can we multiply the learning rate of those fused weights with 1/multiplier and get the same updates? (I think wd needs to be changed too)

sbhavani · 2025-12-01T16:29:16Z

@dhiaEddineRhaiem thanks for adding the muP multipliers. I think this is a good approach while we figure out general muP support in Megatron Core.

One potential bug in falconh1_layer.py: The attention out multiplier overwrites the tensor:

attn_output = self.config.attention_out_multiplier

yaoyu-33 · 2025-12-08T00:30:22Z

@dhiaEddineRhaiem : sry for a bit late in reply.
MCore also accept PRs. These configs doesn't seem very intrusive so we can add it in mcore.
Another way we can do is, if we want to merge faster, we can override some the mixer class / mlp class directly in mcore. Example is https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/src/megatron/bridge/models/qwen_vl/modelling_qwen3_vl where we have to do deepstack etc. for qwen3vl

yaoyu-33 · 2025-12-17T02:53:05Z

/ok to test ee80ca9

coderabbitai · 2026-01-30T20:45:59Z

📝 Walkthrough

Walkthrough

This PR adds comprehensive support for the FalconH1 hybrid model architecture to Megatron, including bridge adapters for HuggingFace model conversion, configuration-driven model providers for multiple model sizes (500M, 1.5B, 7B, 34B), stack implementations with Mamba and attention hybrid layers, module specifications, and layer allocation logic.

Changes

Cohort / File(s)	Summary
Public API Exports `src/megatron/bridge/models/__init__.py`, `src/megatron/bridge/models/falcon_h1/__init__.py`	Added imports and re-exports of FalconH1 bridge, model providers, and core model components (FalconH1Model, FalconH1Config, FalconH1Stack, FalconH1Layer) to enable public access and discoverability.
Bridge and Model Providers `src/megatron/bridge/models/falcon_h1/falconh1_bridge.py`, `src/megatron/bridge/models/falcon_h1/falconh1_provider.py`	Introduced FalconH1Bridge class with HuggingFace-to-Megatron parameter mapping registry (including Mamba, attention, QKV, gated MLP mappings). Added FalconH1ModelProvider base and four size-specific providers (500M, 1.5B Deep, 7B, 34B) with architecture presets, MuP/scaling multipliers, and vocab handling logic.
Core Model Architecture `src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`	Introduced FalconH1Config dataclass with Mamba/SSM parameters and validation. Added FalconH1Model as a LanguageModule with hybrid attention/Mamba/MLP support, handling embeddings, rotary position embeddings, inference contexts, and language modeling loss computation.
Stack and Layer Implementation `src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`, `src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py`	Implemented FalconH1Stack with layer allocation, FP8 context management, sharded state dictionary construction, and pipeline-parallel support. Added FalconH1Layer composing Mamba mixer, self-attention, and MLP with optional enabling/disabling and bias-dropout-add fusion.
Layer Specifications and Allocation `src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`, `src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`	Defined falconh1_stack_spec ModuleSpec configuring the full FalconH1Stack architecture with nested submodules. Introduced allocate_layers() utility supporting auto-allocation and override patterns for hybrid layer type distribution (Mamba, attention, MLP, parallel).

Sequence Diagram(s)

sequenceDiagram
    participant HF as HuggingFace Model
    participant Bridge as FalconH1Bridge
    participant Provider as FalconH1ModelProvider
    participant Model as FalconH1Model
    participant Stack as FalconH1Stack
    participant Layers as FalconH1Layer/Mamba/Attention
    
    HF->>Bridge: provider_bridge(hf_pretrained)
    Bridge->>Provider: Create with HF config
    Provider->>Provider: Configure model params
    Provider->>Model: provide() instantiate
    Model->>Stack: build_module with spec
    Stack->>Stack: allocate_layers()
    Stack->>Layers: construct per-layer
    Layers->>Layers: initialize submodules
    Model->>Model: forward() inference
    Stack->>Layers: forward through stack
    Layers->>Layers: Mamba/Attention/MLP
    Layers-->>Stack: aggregated output
    Stack-->>Model: layer outputs
    Model-->>Model: output projection & logits

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes	⚠️ Warning	Major feature adding ~2000+ lines affecting model behavior lacks test results, validation metrics, or performance benchmarks in PR description.	Add HuggingFace-to-Megatron conversion validation, logit matching tests, output correctness confirmation, and performance benchmarks to PR description with detailed results or links.
Title check	❓ Inconclusive	The title 'Add falcon h1' is vague and generic, lacking specific detail about what aspect of Falcon H1 support is being added.	Consider a more descriptive title that clarifies the main contribution, such as 'Add FalconH1 model bridge and provider support' or 'Introduce FalconH1 hybrid architecture integration'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🤖 Fix all issues with AI agents

In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py`:
- Around line 1-2: The top of
src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently contains only
a partial copyright line; replace it with the full NVIDIA copyright and Apache
License 2.0 header block (including copyright years, "NVIDIA CORPORATION.  All
rights reserved.", the Apache License, Version 2.0 notice, and license URL) so
the file has the complete standard header; add this header at the very top of
the module before any imports or code to ensure compliance with project
licensing guidelines.

In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py`:
- Around line 153-154: The call currently uses boolean-or which treats explicit
False as unset; change the logic so defaults are only applied when arguments are
None: for the call site that passes pre_process and post_process (in
falconh1_provider.py) replace the inline "pre_process=pre_process or
parallel_state.is_pipeline_first_stage()" and "post_process=post_process or
parallel_state.is_pipeline_last_stage()" with conditional logic that uses the
provided value when it is not None (e.g., pre_process if pre_process is not None
else parallel_state.is_pipeline_first_stage(), and similarly for
post_process/post_process is not None else
parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`:
- Around line 220-221: The else branch handling unexpected layer_type in
modeling_falconh1/falconh1_block.py should not use `assert False` because
assertions can be stripped with Python -O; replace that assertion with an
explicit exception by raising an AssertionError (e.g., in the else of the
layer_type switch/if in the function/class handling layer dispatch in
falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise
AssertionError("unexpected layer_type")`) so the error is always raised in
production.
- Around line 367-388: The loop's branching erroneously uses two separate ifs
causing TransformerLayer instances to also hit the MambaLayer branch; change the
second check to an elif so only one branch runs per layer (i.e., use elif
isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the
existing argument sets for TransformerLayer, FalconH1Layer and the else
(MambaLayer) branches as-is to avoid double execution.

In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`:
- Line 1: Replace the short copyright line at the top of falconh1_layer_specs.py
with an updated full Apache License, Version 2.0 header: change the year from
2023 to 2025 and insert the complete Apache 2.0 boilerplate (including copyright
notice, "Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.", link to
http://www.apache.org/licenses/LICENSE-2.0, and the standard disclaimer). Ensure
the new header appears at the very top of the file before any imports or code
(i.e., replacing the existing copyright comment).

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py`:
- Line 283: The bug is that attn_output is being overwritten by the scalar
self.config.attention_out_multiplier; instead of assigning the multiplier, scale
the attention tensor (e.g., attn_output = attn_output *
self.config.attention_out_multiplier or in-place attn_output *= ...) so the
actual attention result is preserved; ensure the multiplication occurs in the
same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed)
and update the code around the attn_output assignment in
modeling_falconh1/falconh1_layer.py (the attention block / forward method) to
perform tensor scaling rather than scalar replacement.

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`:
- Around line 365-367: The current call to self.output_layer(...) returns a
tuple (logits, bias) but the code multiplies the whole tuple by
self.config.lm_head_multiplier; change the unpacking and apply the multiplier
only to the logits tensor by assigning something like logits, bias =
self.output_layer(hidden_states, weight=output_weight,
runtime_gather_output=runtime_gather_output) and then set logits = logits *
self.config.lm_head_multiplier (or multiply logits inline after unpacking) so
only the logits tensor is scaled.

In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`:
- Around line 192-199: The test harness in the __main__ block defines test_cases
including an override pattern "M*-M*-M*-" but supplies all-zero ratios which
yields all-MAMBA layers and will cause a ValueError when validating the
override; fix by either updating that tuple in test_cases to supply non-zero
ratios that produce a layer distribution compatible with the override pattern
(so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that
specific test case execution in a try/except that asserts the expected
ValueError (i.e., catch the error and treat it as a passed negative test);
locate the test_cases list and the for t in test_cases loop in the __main__
block to apply the change.
- Around line 35-40: Rename the ambiguous loop variable `l` to `layer_idx` (or
similar) in all loops in this module to satisfy linters and improve readability;
specifically update every occurrence in the loops that assign into
layer_type_list using total_layers_count and any other loops within the same
file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x)
so the index variable is consistently named `layer_idx` and all references
inside those loops (assignments, indexing, increments/decrements) are updated to
the new name.

🧹 Nitpick comments (16)

src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py (2)
1-1: Update copyright header.

Add the complete Apache 2.0 license text. The copyright year can remain 2024 or be updated to 2025.

3-21: Remove unused imports.

Static analysis indicates several unused imports that should be removed for code cleanliness.
🧹 Proposed cleanup
-import math
 from dataclasses import dataclass
 from typing import Optional, Tuple, Union

 import torch
-import torch.nn as nn
 from megatron.core.process_groups_config import ProcessGroupCollection
 from megatron.bridge.models.falcon_h1.modeling_falconh1.falconh1_model import FalconH1Config
 from megatron.core.transformer.module import MegatronModule
 from megatron.core.transformer.spec_utils import ModuleSpec, build_module
 from megatron.core.transformer.enums import AttnMaskType
 from megatron.core.transformer.identity_op import IdentityOp
 from megatron.core.packed_seq_params import PackedSeqParams
 from megatron.core.inference.contexts import BaseInferenceContext
 from megatron.core.utils import log_single_rank
 import logging
-from megatron.core.ssm.mamba_mixer import MambaMixer, MambaMixerSubmodules
+from megatron.core.ssm.mamba_mixer import MambaMixerSubmodules
-from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules
+from megatron.core.transformer.attention import SelfAttentionSubmodules
 from megatron.core.transformer.mlp import MLP
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py (3)
1-1: Update copyright header to 2025 with full license text.

20-21: Remove duplicate Optional and Tuple imports.

These types are already imported on line 3. The redefinition is flagged by static analysis.
🧹 Proposed fix
-from dataclasses import dataclass
-from typing import Optional, Tuple
+from dataclasses import dataclass
The Tuple type from line 3 can be used, and Optional is already imported.
177-177: Use explicit str | None for nullable type hints.

Per coding guidelines, use T | None instead of implicit Optional. PEP 484 prohibits implicit Optional.
📝 Proposed fix
-        hybrid_override_pattern: str = None,
+        hybrid_override_pattern: str | None = None,
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py (2)
402-406: Remove unused output variable.

The make_viewless_tensor result is assigned to output but hidden_states is returned instead. Either use output or remove the assignment.
🧹 Proposed fix
         # Ensure that the tensor passed between pipeline parallel stages is
         # viewless. See related notes in TransformerBlock and TransformerLayer
-        output = make_viewless_tensor(
+        hidden_states = make_viewless_tensor(
             inp=hidden_states, requires_grad=hidden_states.requires_grad, keep_graph=True
         )

         return hidden_states
131-131: Minor style issues flagged by static analysis.

Line 131: Use explicit str | None for nullable type hint

Line 453: Use is not instead of not ... is
📝 Proposed fixes
-        hybrid_override_pattern: str = None,
+        hybrid_override_pattern: str | None = None,
-            if not module is self.layers:
+            if module is not self.layers:
Also applies to: 453-453
src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py (6)
1-1: Update copyright year to 2025.

The copyright header uses 2024, but per the coding guidelines and consistency with other files in this PR (e.g., falconh1_provider.py), it should be 2025.
-# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
17-22: Consider using frozenset for immutability or annotate with ClassVar.

The VALID set is a mutable class attribute. For a constant set that shouldn't be modified, consider using frozenset for true immutability, or at minimum annotate with ClassVar to satisfy the linter.
+from typing import ClassVar
+
 class Symbols:
     MAMBA = "M"
     ATTENTION = "*"
     MLP = "-"
     PARALLEL = "P"
-    VALID = {MAMBA, ATTENTION, MLP, PARALLEL}
+    VALID: ClassVar[frozenset[str]] = frozenset({MAMBA, ATTENTION, MLP, PARALLEL})
107-108: Use explicit str | None type hint.

Per coding guidelines, use T | None for nullable types instead of implicit Optional. The current signature uses str = None which implicitly means Optional[str].
-    override_pattern: str = None,
+    override_pattern: str | None = None,
As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'"

109-113: Consider splitting compound assertions for clearer error messages.

When a compound assertion fails, it's not immediately clear which condition was violated. Splitting them provides better debugging information.
♻️ Proposed fix
     assert total_layers_count > 0
-    assert target_attention_ratio >= 0.0 and target_attention_ratio <= 1.0
-    assert target_mlp_ratio >= 0.0 and target_mlp_ratio <= 1.0
-    assert target_parallel_hybrid_ratio >= 0.0 and target_parallel_hybrid_ratio <= 1.0
+    assert 0.0 <= target_attention_ratio <= 1.0, f"target_attention_ratio must be in [0, 1], got {target_attention_ratio}"
+    assert 0.0 <= target_mlp_ratio <= 1.0, f"target_mlp_ratio must be in [0, 1], got {target_mlp_ratio}"
+    assert 0.0 <= target_parallel_hybrid_ratio <= 1.0, f"target_parallel_hybrid_ratio must be in [0, 1], got {target_parallel_hybrid_ratio}"
     assert target_attention_ratio + target_mlp_ratio + target_parallel_hybrid_ratio <= 1.0
25-27: Add parameterized return type hints.

The return type list could be more specific as list[str] to improve type safety and IDE support. This applies to _allocate_auto, _allocate_override, and allocate_layers.
 def _allocate_auto(
     total_layers_count: int, target_attention_ratio: float, target_mlp_ratio: float, target_parallel_hybrid_ratio: float
-) -> list:
+) -> list[str]:
133-135: Use logging.WARNING level for warning messages.

The message says "Warning:" but uses logging.INFO level. Consider using logging.WARNING for consistency with the message semantics.
-            log_single_rank(logger, logging.INFO, "Warning: overriding pattern A with pattern B")
+            log_single_rank(logger, logging.WARNING, "Overriding pattern A with pattern B")
src/megatron/bridge/models/falcon_h1/falconh1_provider.py (3)
17-17: Prefer X | None and X | Y syntax over Optional and Union.

Per coding guidelines, use T | None for nullable types and X | Y for union types instead of Optional[T] and Union[X, Y]. The Optional and Union imports can be removed if you update the type hints.

Lines affected:

Line 67: Optional[str] → str | None

Line 73: Optional[float] → float | None

Line 78: Optional[int] → int | None

Line 107: Union[ModuleSpec, Callable[[], ModuleSpec]] → ModuleSpec | Callable[[], ModuleSpec]

As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'" and "Use 'X | Y' for union types instead of 'Union[X, Y]'"

94-94: Fix typo in comment.
-    `#Falcon` H1 Mup Fwd Multpliers
+    # Falcon H1 MuP Forward Multipliers
102-103: Add parameterized tuple type hints.

For better type safety, specify the tuple element types:
-    mlp_multipliers: tuple = (1.0, 1.0)
-    ssm_multipliers: tuple = (1.0, 1.0, 1.0, 0.5, 1.0)
+    mlp_multipliers: tuple[float, float] = (1.0, 1.0)
+    ssm_multipliers: tuple[float, float, float, float, float] = (1.0, 1.0, 1.0, 0.5, 1.0)

coderabbitai · 2026-01-30T20:46:02Z

+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+


🛠️ Refactor suggestion | 🟠 Major

Add the complete NVIDIA copyright and Apache 2.0 license header.

The file has only a partial copyright line. As per coding guidelines, Python files should include the full NVIDIA copyright header with Apache 2.0 license text.

📝 Proposed fix

-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. - +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import logging

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py` around lines 1 - 2, The top of src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently contains only a partial copyright line; replace it with the full NVIDIA copyright and Apache License 2.0 header block (including copyright years, "NVIDIA CORPORATION. All rights reserved.", the Apache License, Version 2.0 notice, and license URL) so the file has the complete standard header; add this header at the very top of the module before any imports or code to ensure compliance with project licensing guidelines.

coderabbitai · 2026-01-30T20:46:02Z

+            pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
+            post_process=post_process or parallel_state.is_pipeline_last_stage(),


⚠️ Potential issue | 🟠 Major

Bug: Explicit False for pre_process/post_process will be incorrectly overridden.

Using or treats explicit False as falsy, so pre_process=False would be overridden to parallel_state.is_pipeline_first_stage(). The signature uses =None as default, suggesting the intent is to only apply the default when the argument is not provided.

🐛 Proposed fix

- pre_process=pre_process or parallel_state.is_pipeline_first_stage(), - post_process=post_process or parallel_state.is_pipeline_last_stage(), + pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process, + post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pre_process=pre_process or parallel_state.is_pipeline_first_stage(),

post_process=post_process or parallel_state.is_pipeline_last_stage(),

pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process,

post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process,

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py` around lines 153 - 154, The call currently uses boolean-or which treats explicit False as unset; change the logic so defaults are only applied when arguments are None: for the call site that passes pre_process and post_process (in falconh1_provider.py) replace the inline "pre_process=pre_process or parallel_state.is_pipeline_first_stage()" and "post_process=post_process or parallel_state.is_pipeline_last_stage()" with conditional logic that uses the provided value when it is not None (e.g., pre_process if pre_process is not None else parallel_state.is_pipeline_first_stage(), and similarly for post_process/post_process is not None else parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.

coderabbitai · 2026-01-30T20:46:02Z

+                else:
+                    assert False, "unexpected layer_type"


⚠️ Potential issue | 🟡 Minor

Replace assert False with raise AssertionError().

assert False statements are removed when Python runs with optimization (-O flag), which could allow unexpected layer types to pass silently in production.

🛡️ Proposed fix

else: - assert False, "unexpected layer_type" + raise AssertionError(f"unexpected layer_type: {layer_type}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

else:

assert False, "unexpected layer_type"

else:

raise AssertionError(f"unexpected layer_type: {layer_type}")

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 221-221: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py` around lines 220 - 221, The else branch handling unexpected layer_type in modeling_falconh1/falconh1_block.py should not use `assert False` because assertions can be stripped with Python -O; replace that assertion with an explicit exception by raising an AssertionError (e.g., in the else of the layer_type switch/if in the function/class handling layer dispatch in falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise AssertionError("unexpected layer_type")`) so the error is always raised in production.

coderabbitai · 2026-01-30T20:46:02Z

+                    if isinstance(layer, TransformerLayer):
+                        hidden_states, _  = layer(
+                            hidden_states=hidden_states,
+                            attention_mask=attention_mask,
+                            inference_context=inference_context,
+                            rotary_pos_emb=rotary_pos_emb,
+                            sequence_len_offset=sequence_len_offset,
+                        )
+                    if isinstance(layer, FalconH1Layer):
+                        hidden_states = layer(
+                            hidden_states=hidden_states,
+                            attention_mask=attention_mask,
+                            inference_context=inference_context,
+                            rotary_pos_emb=rotary_pos_emb,
+                            sequence_len_offset=sequence_len_offset,
+                        )
+                    else:  # MambaLayer
+                        hidden_states = layer(
+                            hidden_states=hidden_states,
+                            attention_mask=attention_mask,
+                            inference_context=inference_context,
+                        )


⚠️ Potential issue | 🔴 Critical

Critical logic bug: Missing elif causes double execution for TransformerLayer.

The forward loop has incorrect branching:

if isinstance(layer, TransformerLayer) executes for TransformerLayer

if isinstance(layer, FalconH1Layer) is a separate if, not elif

else (MambaLayer) executes for anything that's NOT FalconH1Layer

This means for TransformerLayer, both the TransformerLayer branch AND the MambaLayer branch will execute, processing the layer twice with potentially mismatched arguments.

🐛 Fix the branching logic

with inner_fp8_context: if isinstance(layer, TransformerLayer): hidden_states, _ = layer( hidden_states=hidden_states, attention_mask=attention_mask, inference_context=inference_context, rotary_pos_emb=rotary_pos_emb, sequence_len_offset=sequence_len_offset, ) - if isinstance(layer, FalconH1Layer): + elif isinstance(layer, FalconH1Layer): hidden_states = layer( hidden_states=hidden_states, attention_mask=attention_mask, inference_context=inference_context, rotary_pos_emb=rotary_pos_emb, sequence_len_offset=sequence_len_offset, ) else: # MambaLayer hidden_states = layer( hidden_states=hidden_states, attention_mask=attention_mask, inference_context=inference_context, )

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py` around lines 367 - 388, The loop's branching erroneously uses two separate ifs causing TransformerLayer instances to also hit the MambaLayer branch; change the second check to an elif so only one branch runs per layer (i.e., use elif isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the existing argument sets for TransformerLayer, FalconH1Layer and the else (MambaLayer) branches as-is to avoid double execution.

coderabbitai · 2026-01-30T20:46:02Z

@@ -0,0 +1,109 @@
+# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.


🛠️ Refactor suggestion | 🟠 Major

Update copyright header.

The copyright year is 2023 and the full license text is missing. Update to 2025 and add the complete Apache 2.0 license header.

📝 Proposed fix

-# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from megatron.core.extensions.transformer_engine import (

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py` at line 1, Replace the short copyright line at the top of falconh1_layer_specs.py with an updated full Apache License, Version 2.0 header: change the year from 2023 to 2025 and insert the complete Apache 2.0 boilerplate (including copyright notice, "Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.", link to http://www.apache.org/licenses/LICENSE-2.0, and the standard disclaimer). Ensure the new header appears at the very top of the file before any imports or code (i.e., replacing the existing copyright comment).

coderabbitai · 2026-01-30T20:46:02Z

+                packed_seq_params=packed_seq_params,
+                sequence_len_offset=sequence_len_offset,
+            )
+            attn_output = self.config.attention_out_multiplier


⚠️ Potential issue | 🔴 Critical

Critical bug: Attention output tensor is overwritten with scalar multiplier.

This line replaces the attention output tensor with the scalar multiplier value instead of scaling the tensor. This will cause incorrect model outputs as the actual attention computation result is discarded.

This bug was also identified in the PR comments by @sbhavani.

🐛 Fix the tensor scaling

- attn_output = self.config.attention_out_multiplier + attn_output = attn_output * self.config.attention_out_multiplier

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

attn_output = self.config.attention_out_multiplier

attn_output = attn_output * self.config.attention_out_multiplier

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py` at line 283, The bug is that attn_output is being overwritten by the scalar self.config.attention_out_multiplier; instead of assigning the multiplier, scale the attention tensor (e.g., attn_output = attn_output * self.config.attention_out_multiplier or in-place attn_output *= ...) so the actual attention result is preserved; ensure the multiplication occurs in the same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed) and update the code around the attn_output assignment in modeling_falconh1/falconh1_layer.py (the attention block / forward method) to perform tensor scaling rather than scalar replacement.

coderabbitai · 2026-01-30T20:46:02Z

+        logits, _ = self.output_layer(
+            hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output
+        ) * self.config.lm_head_multiplier


⚠️ Potential issue | 🔴 Critical

Bug: Multiplier applied to tuple instead of logits tensor.

self.output_layer() returns a tuple (logits, bias). The multiplication * self.config.lm_head_multiplier is applied to this tuple, not to the logits tensor. This will likely cause a runtime error.

🐛 Proposed fix

- logits, _ = self.output_layer( + logits, _ = self.output_layer( hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output - ) * self.config.lm_head_multiplier + ) + logits = logits * self.config.lm_head_multiplier

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py` around lines 365 - 367, The current call to self.output_layer(...) returns a tuple (logits, bias) but the code multiplies the whole tuple by self.config.lm_head_multiplier; change the unpacking and apply the multiplier only to the logits tensor by assigning something like logits, bias = self.output_layer(hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output) and then set logits = logits * self.config.lm_head_multiplier (or multiply logits inline after unpacking) so only the logits tensor is scaled.

coderabbitai · 2026-01-30T20:46:02Z

+    for l in range(total_layers_count):
+        if x < 0.5:
+            layer_type_list[l] = Symbols.ATTENTION
+            x += mamba_section_length
+        else:
+            x -= 1


🛠️ Refactor suggestion | 🟠 Major

Rename ambiguous variable l to layer_idx or similar.

The variable name l is flagged by linters (E741) because it's easily confused with the digit 1 in many fonts. This applies to all loop variables named l in this file (lines 35, 48, 61, 68, 88).

♻️ Proposed fix (example for first loop)

- for l in range(total_layers_count): - if x < 0.5: - layer_type_list[l] = Symbols.ATTENTION + for layer_idx in range(total_layers_count): + if x < 0.5: + layer_type_list[layer_idx] = Symbols.ATTENTION

🧰 Tools

🪛 Flake8 (7.3.0)

[error] 35-35: ambiguous variable name 'l'

(E741)

🪛 Ruff (0.14.14)

[error] 35-35: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py` around lines 35 - 40, Rename the ambiguous loop variable `l` to `layer_idx` (or similar) in all loops in this module to satisfy linters and improve readability; specifically update every occurrence in the loops that assign into layer_type_list using total_layers_count and any other loops within the same file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x) so the index variable is consistently named `layer_idx` and all references inside those loops (assignments, indexing, increments/decrements) are updated to the new name.

coderabbitai · 2026-01-30T20:46:02Z

+if __name__ == "__main__":
+    test_cases = [
+        (9, 0.0, 0.0, 0.0, "M*-M*-M*-"),
+        (9, 0.0, 0.0, 0.0, "MMMMMMMMM"),
+        (10, 0.2, 0.1, 0.2),
+    ]
+    for t in test_cases:
+        print("")


⚠️ Potential issue | 🟡 Minor

Test harness has a test case that will fail.

The test case at line 194 passes an override pattern "M*-M*-M*-" but with all ratios set to 0.0. When ratios are all zero, the auto-allocation produces all MAMBA layers, so the override pattern won't match the layer counts (has ATTENTION * and MLP - symbols). This will raise a ValueError at runtime.

If this is intentional test behavior, consider adding a comment. Otherwise, this test case should either:

Have matching ratios for the override pattern, or

Be wrapped in a try-except to verify the expected error

🤖 Prompt for AI Agents

In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py` around lines 192 - 199, The test harness in the __main__ block defines test_cases including an override pattern "M*-M*-M*-" but supplies all-zero ratios which yields all-MAMBA layers and will cause a ValueError when validating the override; fix by either updating that tuple in test_cases to supply non-zero ratios that produce a layer distribution compatible with the override pattern (so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that specific test case execution in a try/except that asserts the expected ValueError (i.e., catch the error and treat it as a passed negative test); locate the test_cases list and the for t in test_cases loop in the __main__ block to apply the change.

gautham-kollu · 2026-02-17T18:15:13Z

/ok to test 1fe2d0c

gautham-kollu · 2026-02-17T18:23:15Z

@dhiaEddineRhaiem Can you please follow the instructions here https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/CONTRIBUTING.md#-linting-and-formatting to fix linting ?

yaoyu-33 · 2026-05-15T22:28:54Z

+    --hf-path "$HF_EXPORT_PATH" \
+    --trust-remote-code
+
+uv run python - "$HF_MODEL_ID" "$HF_EXPORT_PATH" <<'PY'


why do we need these, clean up if not needed

Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. I kept the tokenizer copy and documented why it is needed: this export flow reconstructs the HF config/weights from a Megatron checkpoint, but tokenizer artifacts are not written by that config-only export path. The exported HF directory is later used as a standalone AutoTokenizer.from_pretrained(...) path by the inference example, so the tokenizer files need to be copied from the source HF checkpoint.

yaoyu-33 · 2026-05-15T22:29:24Z

+To run the same check with a different parallelism layout:
+
+```bash
+TP=2 PP=1 NPROC_PER_NODE=2 bash examples/models/falcon_h1/conversion.sh


does default value TP=1 work? if not , update default value please

Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The default TP=1, PP=1, NPROC_PER_NODE=1 path was validated for tiiuae/Falcon-H1-0.5B-Instruct, and the README now states that this default layout works for the 0.5B model.

yaoyu-33 · 2026-05-15T22:30:00Z

/claude review

claude · 2026-05-15T22:31:54Z

+                    if isinstance(layer, FalconH1Layer):
+                        hidden_states = layer(
+                            hidden_states=hidden_states,
+                            attention_mask=attention_mask,
+                            inference_context=inference_context,
+                            rotary_pos_emb=rotary_pos_emb,
+                            sequence_len_offset=sequence_len_offset,
+                        )
+                    else:  # MambaLayer


Bug: if/if/else dispatch causes TransformerLayer to be called twice per forward pass.

When a TransformerLayer is encountered, the first if executes it. Then the second if isinstance(layer, FalconH1Layer) is False (TransformerLayer is not FalconH1Layer), so the else branch fires — calling the same layer again as if it were a MambaLayer.

The second if should be elif:

Suggested change

if isinstance(layer, FalconH1Layer):

hidden_states = layer(

hidden_states=hidden_states,

attention_mask=attention_mask,

inference_context=inference_context,

rotary_pos_emb=rotary_pos_emb,

sequence_len_offset=sequence_len_offset,

)

else: # MambaLayer

if isinstance(layer, TransformerLayer):

hidden_states, _ = layer(

hidden_states=hidden_states,

attention_mask=attention_mask,

inference_context=inference_context,

rotary_pos_emb=rotary_pos_emb,

sequence_len_offset=sequence_len_offset,

)

elif isinstance(layer, FalconH1Layer):

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The second dispatch branch is now elif isinstance(layer, FalconH1Layer), so TransformerLayer instances no longer fall through into the Mamba path.

claude · 2026-05-15T22:31:56Z

@@ -0,0 +1,38 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


Nit: copyright year is 2025; per project rules it should be 2026.

Suggested change

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

claude · 2026-05-15T22:31:57Z

@@ -0,0 +1,161 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


Same 2025→2026 copyright year fix needed here.

Suggested change

# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

claude · 2026-05-15T22:34:13Z

Code Review

Critical: Forward dispatch bug in FalconH1Stack.forward()

falconh1_block.py:367-380 — The layer dispatch loop uses if/if/else instead of if/elif/else. When a TransformerLayer is encountered (attention-only layers), the first if runs it, then the second if isinstance(layer, FalconH1Layer) is False, so the else (MambaLayer path) fires too — calling the layer twice per forward pass. This will produce incorrect hidden states and double the compute for any hybrid config that includes standalone attention layers. Fix: change the second if to elif.

Minor: Copyright years — falcon_h1/init.py and falconh1_provider.py use 2025 copyright headers; should be 2026 per project rules.

Test coverage gaps — The unit tests cover bridge registration, config mapping, provider construction, and layer allocation well. Consider adding: (1) FalconH1Config.post_init validation: test that invalid A_init_dist or non-divisible d_inner % mamba_head_dim raise ValueError. (2) FalconH1MLP.forward() multiplier logic: a unit test with a small tensor would catch regressions. (3) FalconH1SelfAttention.get_query_key_value_tensors() key scaling: verify key_multiplier != 1.0 actually scales the key tensor. (4) FalconH1MambaMixer._scale_zxbc_dt(): verify the 5-element SSM multiplier vector is applied correctly.

Observation: no-op test — test_falcon_h1_provider.py::test_size_specific_providers_are_not_public_api checks that names like FalconH1ModelProvider500M are absent from all — but these classes do not exist anywhere in the PR. The test passes trivially. Consider removing it.

Suggested test cases: No perf tests impacted.

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T22:55:07Z

Addressed the latest Falcon H1 review summary in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.

Changes:

Fixed FalconH1Stack.forward() dispatch so standalone TransformerLayer instances do not also run through the Mamba branch.
Updated Falcon H1 copyright headers requested in review.
Documented why the Falcon H1 conversion example copies tokenizer artifacts into the exported HF directory.
Clarified that the default TP=1, PP=1, NPROC_PER_NODE=1 example layout works for the 0.5B Falcon H1 checkpoint.
Added focused Falcon H1 unit coverage for config validation, MLP multipliers, attention key scaling, and Mamba SSM multipliers.
Removed the no-op size-specific-provider public API test.

Validation:

uv run pre-commit run --all-files passed.
Targeted Falcon H1 validation passed: unit tests, default TP=1 conversion, and default TP=1 inference flow.

Residual blocker:

DCO is still failing from inherited unsigned history and is unchanged by this patch.

yaoyu-33 · 2026-05-15T22:55:22Z

/ok to test f454d16

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

yaoyu-33 · 2026-05-15T23:13:54Z

/ok to test a3d3333

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>

github-actions Bot added the community-request label Nov 21, 2025

add Falcon h1 support in Megatron Bridge

89cf2f5

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>

dhiaEddineRhaiem force-pushed the add-falcon-h1 branch from da96f18 to 7848627 Compare November 21, 2025 17:44

yaoyu-33 reviewed Nov 21, 2025

View reviewed changes

Comment thread src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py

dhiaEddineRhaiem added 2 commits November 21, 2025 20:53

add falconh1 providers to models constructor

5c12ce4

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>

add modeling_falconh1 folder

12a9be7

Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>

dhiaEddineRhaiem force-pushed the add-falcon-h1 branch from 7848627 to 12a9be7 Compare November 21, 2025 20:53

feat: H1 Mup and fix layer name mappings

ee80ca9

yaoyu-33 previously approved these changes Dec 17, 2025

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci December 17, 2025 02:53 Inactive

yaoyu-33 self-requested a review December 17, 2025 02:54

chtruong814 added the needs-follow-up label Jan 11, 2026

Merge branch 'main' into add-falcon-h1

1fe2d0c

coderabbitai Bot reviewed Jan 30, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci February 17, 2026 18:15 Inactive

sbhavani mentioned this pull request Mar 23, 2026

[ROADMAP][2026 Q1] Megatron Core Roadmap NVIDIA/Megatron-LM#4003

Open

chtruong814 added waiting-for-customer waiting-on-customer Waiting on the original author to respond and removed waiting-for-customer labels Apr 14, 2026

yaoyu-33 reviewed May 15, 2026

View reviewed changes

claude Bot reviewed May 15, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:46 Inactive

[models] fix: address Falcon H1 review comments

f454d16

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 22:56 Inactive

copy-pr-bot Bot had a problem deploying to test May 15, 2026 22:56 Error

yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label May 15, 2026

[models] docs: simplify Falcon H1 examples

a3d3333

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:14 Inactive

copy-pr-bot Bot temporarily deployed to test May 15, 2026 23:14 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:55 Inactive

copy-pr-bot Bot temporarily deployed to public May 15, 2026 23:56 Inactive

copy-pr-bot Bot temporarily deployed to public May 16, 2026 00:10 Inactive

svcnvidia-nemo-ci added the waiting-on-maintainers Waiting on maintainers to respond label May 18, 2026

yaoyu-33 removed the needs-review PR is ready for code review and waiting on a reviewer label May 18, 2026

yaoyu-33 approved these changes May 18, 2026

View reviewed changes

yaoyu-33 merged commit 5d1d07a into NVIDIA-NeMo:main May 18, 2026
74 checks passed

svcnvidia-nemo-ci removed the waiting-on-maintainers Waiting on maintainers to respond label May 18, 2026

		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

		pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
		post_process=post_process or parallel_state.is_pipeline_last_stage(),

		@@ -0,0 +1,109 @@
		# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.

	attn_output = self.config.attention_out_multiplier
	attn_output = attn_output * self.config.attention_out_multiplier

		@@ -0,0 +1,38 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

	# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
	# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

		@@ -0,0 +1,161 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

Conversation

dhiaEddineRhaiem commented Nov 21, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Nov 21, 2025

Uh oh!

Uh oh!

yaoyu-33 commented Nov 21, 2025

Uh oh!

dhiaEddineRhaiem commented Nov 22, 2025

Uh oh!

Separius commented Nov 27, 2025

Uh oh!

sbhavani commented Dec 1, 2025

Uh oh!

yaoyu-33 commented Dec 8, 2025

Uh oh!

yaoyu-33 commented Dec 17, 2025

Uh oh!

coderabbitai Bot commented Jan 30, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gautham-kollu commented Feb 17, 2026

Uh oh!

gautham-kollu commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaoyu-33 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 commented May 15, 2026

Uh oh!

claude Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yaoyu-33 May 15, 2026

dhiaEddineRhaiem commented Nov 21, 2025 •

edited by coderabbitai Bot

Loading

gautham-kollu commented Feb 17, 2026 •

edited

Loading