Add falcon h1#1462
Conversation
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
da96f18 to
7848627
Compare
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae>
7848627 to
12a9be7
Compare
|
@dhiaEddineRhaiem thanks a lot for the contribution. |
|
hello @yaoyu-33 , this the output of the hf conversion of falcon h1 after the previous fixes but before adding Falcon h1 Mup Fwd multipliers. For reference , here is the complete list of FalconH1 MuP multipliers (from config): embedding_multiplier: scales embedding outputs So the gibberish output was expected because we were not applying Falcon h1 Mup Fwd multipliers(MLP , attention and ssm multipliers) yet. I managed to add some of them in bridge except MLP gate and down proj multipliers , ssm_multipliers, key_multiplier which i think should be added to core in the fwd of MambaMixer , MLP and selfAttention. so IMO , they should be add to core in the fwd of MambaMixer , MLP and self attention. Any help on this please ? |
|
Hi @dhiaEddineRhaiem, how bad would it be to fuse the multipliers into the weights (during conversion)? I'm guessing changing the mcore is a no go, will the training break or will it be just worse? or can it be accounted for in lr? can we multiply the learning rate of those fused weights with 1/multiplier and get the same updates? (I think wd needs to be changed too) |
|
@dhiaEddineRhaiem thanks for adding the muP multipliers. I think this is a good approach while we figure out general muP support in Megatron Core. One potential bug in
|
|
@dhiaEddineRhaiem : sry for a bit late in reply. |
|
/ok to test ee80ca9 |
📝 WalkthroughWalkthroughThis PR adds comprehensive support for the FalconH1 hybrid model architecture to Megatron, including bridge adapters for HuggingFace model conversion, configuration-driven model providers for multiple model sizes (500M, 1.5B, 7B, 34B), stack implementations with Mamba and attention hybrid layers, module specifications, and layer allocation logic. Changes
Sequence Diagram(s)sequenceDiagram
participant HF as HuggingFace Model
participant Bridge as FalconH1Bridge
participant Provider as FalconH1ModelProvider
participant Model as FalconH1Model
participant Stack as FalconH1Stack
participant Layers as FalconH1Layer/Mamba/Attention
HF->>Bridge: provider_bridge(hf_pretrained)
Bridge->>Provider: Create with HF config
Provider->>Provider: Configure model params
Provider->>Model: provide() instantiate
Model->>Stack: build_module with spec
Stack->>Stack: allocate_layers()
Stack->>Layers: construct per-layer
Layers->>Layers: initialize submodules
Model->>Model: forward() inference
Stack->>Layers: forward through stack
Layers->>Layers: Mamba/Attention/MLP
Layers-->>Stack: aggregated output
Stack-->>Model: layer outputs
Model-->>Model: output projection & logits
Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🤖 Fix all issues with AI agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py`:
- Around line 1-2: The top of
src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently contains only
a partial copyright line; replace it with the full NVIDIA copyright and Apache
License 2.0 header block (including copyright years, "NVIDIA CORPORATION. All
rights reserved.", the Apache License, Version 2.0 notice, and license URL) so
the file has the complete standard header; add this header at the very top of
the module before any imports or code to ensure compliance with project
licensing guidelines.
In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py`:
- Around line 153-154: The call currently uses boolean-or which treats explicit
False as unset; change the logic so defaults are only applied when arguments are
None: for the call site that passes pre_process and post_process (in
falconh1_provider.py) replace the inline "pre_process=pre_process or
parallel_state.is_pipeline_first_stage()" and "post_process=post_process or
parallel_state.is_pipeline_last_stage()" with conditional logic that uses the
provided value when it is not None (e.g., pre_process if pre_process is not None
else parallel_state.is_pipeline_first_stage(), and similarly for
post_process/post_process is not None else
parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`:
- Around line 220-221: The else branch handling unexpected layer_type in
modeling_falconh1/falconh1_block.py should not use `assert False` because
assertions can be stripped with Python -O; replace that assertion with an
explicit exception by raising an AssertionError (e.g., in the else of the
layer_type switch/if in the function/class handling layer dispatch in
falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise
AssertionError("unexpected layer_type")`) so the error is always raised in
production.
- Around line 367-388: The loop's branching erroneously uses two separate ifs
causing TransformerLayer instances to also hit the MambaLayer branch; change the
second check to an elif so only one branch runs per layer (i.e., use elif
isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the
existing argument sets for TransformerLayer, FalconH1Layer and the else
(MambaLayer) branches as-is to avoid double execution.
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`:
- Line 1: Replace the short copyright line at the top of falconh1_layer_specs.py
with an updated full Apache License, Version 2.0 header: change the year from
2023 to 2025 and insert the complete Apache 2.0 boilerplate (including copyright
notice, "Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License.", link to
http://www.apache.org/licenses/LICENSE-2.0, and the standard disclaimer). Ensure
the new header appears at the very top of the file before any imports or code
(i.e., replacing the existing copyright comment).
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py`:
- Line 283: The bug is that attn_output is being overwritten by the scalar
self.config.attention_out_multiplier; instead of assigning the multiplier, scale
the attention tensor (e.g., attn_output = attn_output *
self.config.attention_out_multiplier or in-place attn_output *= ...) so the
actual attention result is preserved; ensure the multiplication occurs in the
same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed)
and update the code around the attn_output assignment in
modeling_falconh1/falconh1_layer.py (the attention block / forward method) to
perform tensor scaling rather than scalar replacement.
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`:
- Around line 365-367: The current call to self.output_layer(...) returns a
tuple (logits, bias) but the code multiplies the whole tuple by
self.config.lm_head_multiplier; change the unpacking and apply the multiplier
only to the logits tensor by assigning something like logits, bias =
self.output_layer(hidden_states, weight=output_weight,
runtime_gather_output=runtime_gather_output) and then set logits = logits *
self.config.lm_head_multiplier (or multiply logits inline after unpacking) so
only the logits tensor is scaled.
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`:
- Around line 192-199: The test harness in the __main__ block defines test_cases
including an override pattern "M*-M*-M*-" but supplies all-zero ratios which
yields all-MAMBA layers and will cause a ValueError when validating the
override; fix by either updating that tuple in test_cases to supply non-zero
ratios that produce a layer distribution compatible with the override pattern
(so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that
specific test case execution in a try/except that asserts the expected
ValueError (i.e., catch the error and treat it as a passed negative test);
locate the test_cases list and the for t in test_cases loop in the __main__
block to apply the change.
- Around line 35-40: Rename the ambiguous loop variable `l` to `layer_idx` (or
similar) in all loops in this module to satisfy linters and improve readability;
specifically update every occurrence in the loops that assign into
layer_type_list using total_layers_count and any other loops within the same
file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x)
so the index variable is consistently named `layer_idx` and all references
inside those loops (assignments, indexing, increments/decrements) are updated to
the new name.
🧹 Nitpick comments (16)
src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py (2)
1-1: Update copyright header.Add the complete Apache 2.0 license text. The copyright year can remain 2024 or be updated to 2025.
3-21: Remove unused imports.Static analysis indicates several unused imports that should be removed for code cleanliness.
🧹 Proposed cleanup
-import math from dataclasses import dataclass from typing import Optional, Tuple, Union import torch -import torch.nn as nn from megatron.core.process_groups_config import ProcessGroupCollection from megatron.bridge.models.falcon_h1.modeling_falconh1.falconh1_model import FalconH1Config from megatron.core.transformer.module import MegatronModule from megatron.core.transformer.spec_utils import ModuleSpec, build_module from megatron.core.transformer.enums import AttnMaskType from megatron.core.transformer.identity_op import IdentityOp from megatron.core.packed_seq_params import PackedSeqParams from megatron.core.inference.contexts import BaseInferenceContext from megatron.core.utils import log_single_rank import logging -from megatron.core.ssm.mamba_mixer import MambaMixer, MambaMixerSubmodules +from megatron.core.ssm.mamba_mixer import MambaMixerSubmodules -from megatron.core.transformer.attention import SelfAttention, SelfAttentionSubmodules +from megatron.core.transformer.attention import SelfAttentionSubmodules from megatron.core.transformer.mlp import MLPsrc/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py (3)
1-1: Update copyright header to 2025 with full license text.
20-21: Remove duplicateOptionalandTupleimports.These types are already imported on line 3. The redefinition is flagged by static analysis.
🧹 Proposed fix
-from dataclasses import dataclass -from typing import Optional, Tuple +from dataclasses import dataclassThe
Tupletype from line 3 can be used, andOptionalis already imported.
177-177: Use explicitstr | Nonefor nullable type hints.Per coding guidelines, use
T | Noneinstead of implicit Optional. PEP 484 prohibits implicitOptional.📝 Proposed fix
- hybrid_override_pattern: str = None, + hybrid_override_pattern: str | None = None,src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py (2)
402-406: Remove unusedoutputvariable.The
make_viewless_tensorresult is assigned tooutputbuthidden_statesis returned instead. Either useoutputor remove the assignment.🧹 Proposed fix
# Ensure that the tensor passed between pipeline parallel stages is # viewless. See related notes in TransformerBlock and TransformerLayer - output = make_viewless_tensor( + hidden_states = make_viewless_tensor( inp=hidden_states, requires_grad=hidden_states.requires_grad, keep_graph=True ) return hidden_states
131-131: Minor style issues flagged by static analysis.
- Line 131: Use explicit
str | Nonefor nullable type hint- Line 453: Use
is notinstead ofnot ... is📝 Proposed fixes
- hybrid_override_pattern: str = None, + hybrid_override_pattern: str | None = None,- if not module is self.layers: + if module is not self.layers:Also applies to: 453-453
src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py (6)
1-1: Update copyright year to 2025.The copyright header uses 2024, but per the coding guidelines and consistency with other files in this PR (e.g.,
falconh1_provider.py), it should be 2025.-# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
17-22: Consider usingfrozensetfor immutability or annotate withClassVar.The
VALIDset is a mutable class attribute. For a constant set that shouldn't be modified, consider usingfrozensetfor true immutability, or at minimum annotate withClassVarto satisfy the linter.+from typing import ClassVar + class Symbols: MAMBA = "M" ATTENTION = "*" MLP = "-" PARALLEL = "P" - VALID = {MAMBA, ATTENTION, MLP, PARALLEL} + VALID: ClassVar[frozenset[str]] = frozenset({MAMBA, ATTENTION, MLP, PARALLEL})
107-108: Use explicitstr | Nonetype hint.Per coding guidelines, use
T | Nonefor nullable types instead of implicitOptional. The current signature usesstr = Nonewhich implicitly meansOptional[str].- override_pattern: str = None, + override_pattern: str | None = None,As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'"
109-113: Consider splitting compound assertions for clearer error messages.When a compound assertion fails, it's not immediately clear which condition was violated. Splitting them provides better debugging information.
♻️ Proposed fix
assert total_layers_count > 0 - assert target_attention_ratio >= 0.0 and target_attention_ratio <= 1.0 - assert target_mlp_ratio >= 0.0 and target_mlp_ratio <= 1.0 - assert target_parallel_hybrid_ratio >= 0.0 and target_parallel_hybrid_ratio <= 1.0 + assert 0.0 <= target_attention_ratio <= 1.0, f"target_attention_ratio must be in [0, 1], got {target_attention_ratio}" + assert 0.0 <= target_mlp_ratio <= 1.0, f"target_mlp_ratio must be in [0, 1], got {target_mlp_ratio}" + assert 0.0 <= target_parallel_hybrid_ratio <= 1.0, f"target_parallel_hybrid_ratio must be in [0, 1], got {target_parallel_hybrid_ratio}" assert target_attention_ratio + target_mlp_ratio + target_parallel_hybrid_ratio <= 1.0
25-27: Add parameterized return type hints.The return type
listcould be more specific aslist[str]to improve type safety and IDE support. This applies to_allocate_auto,_allocate_override, andallocate_layers.def _allocate_auto( total_layers_count: int, target_attention_ratio: float, target_mlp_ratio: float, target_parallel_hybrid_ratio: float -) -> list: +) -> list[str]:
133-135: Uselogging.WARNINGlevel for warning messages.The message says "Warning:" but uses
logging.INFOlevel. Consider usinglogging.WARNINGfor consistency with the message semantics.- log_single_rank(logger, logging.INFO, "Warning: overriding pattern A with pattern B") + log_single_rank(logger, logging.WARNING, "Overriding pattern A with pattern B")src/megatron/bridge/models/falcon_h1/falconh1_provider.py (3)
17-17: PreferX | NoneandX | Ysyntax overOptionalandUnion.Per coding guidelines, use
T | Nonefor nullable types andX | Yfor union types instead ofOptional[T]andUnion[X, Y]. TheOptionalandUnionimports can be removed if you update the type hints.Lines affected:
- Line 67:
Optional[str]→str | None- Line 73:
Optional[float]→float | None- Line 78:
Optional[int]→int | None- Line 107:
Union[ModuleSpec, Callable[[], ModuleSpec]]→ModuleSpec | Callable[[], ModuleSpec]As per coding guidelines: "Use 'T | None' for nullable types instead of 'Optional[T]'" and "Use 'X | Y' for union types instead of 'Union[X, Y]'"
94-94: Fix typo in comment.- `#Falcon` H1 Mup Fwd Multpliers + # Falcon H1 MuP Forward Multipliers
102-103: Add parameterized tuple type hints.For better type safety, specify the tuple element types:
- mlp_multipliers: tuple = (1.0, 1.0) - ssm_multipliers: tuple = (1.0, 1.0, 1.0, 0.5, 1.0) + mlp_multipliers: tuple[float, float] = (1.0, 1.0) + ssm_multipliers: tuple[float, float, float, float, float] = (1.0, 1.0, 1.0, 0.5, 1.0)
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add the complete NVIDIA copyright and Apache 2.0 license header.
The file has only a partial copyright line. As per coding guidelines, Python files should include the full NVIDIA copyright header with Apache 2.0 license text.
📝 Proposed fix
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
-
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
import logging📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_bridge.py` around lines 1 - 2,
The top of src/megatron/bridge/models/falcon_h1/falconh1_bridge.py currently
contains only a partial copyright line; replace it with the full NVIDIA
copyright and Apache License 2.0 header block (including copyright years,
"NVIDIA CORPORATION. All rights reserved.", the Apache License, Version 2.0
notice, and license URL) so the file has the complete standard header; add this
header at the very top of the module before any imports or code to ensure
compliance with project licensing guidelines.
| pre_process=pre_process or parallel_state.is_pipeline_first_stage(), | ||
| post_process=post_process or parallel_state.is_pipeline_last_stage(), |
There was a problem hiding this comment.
Bug: Explicit False for pre_process/post_process will be incorrectly overridden.
Using or treats explicit False as falsy, so pre_process=False would be overridden to parallel_state.is_pipeline_first_stage(). The signature uses =None as default, suggesting the intent is to only apply the default when the argument is not provided.
🐛 Proposed fix
- pre_process=pre_process or parallel_state.is_pipeline_first_stage(),
- post_process=post_process or parallel_state.is_pipeline_last_stage(),
+ pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process,
+ post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pre_process=pre_process or parallel_state.is_pipeline_first_stage(), | |
| post_process=post_process or parallel_state.is_pipeline_last_stage(), | |
| pre_process=parallel_state.is_pipeline_first_stage() if pre_process is None else pre_process, | |
| post_process=parallel_state.is_pipeline_last_stage() if post_process is None else post_process, |
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/falconh1_provider.py` around lines 153 -
154, The call currently uses boolean-or which treats explicit False as unset;
change the logic so defaults are only applied when arguments are None: for the
call site that passes pre_process and post_process (in falconh1_provider.py)
replace the inline "pre_process=pre_process or
parallel_state.is_pipeline_first_stage()" and "post_process=post_process or
parallel_state.is_pipeline_last_stage()" with conditional logic that uses the
provided value when it is not None (e.g., pre_process if pre_process is not None
else parallel_state.is_pipeline_first_stage(), and similarly for
post_process/post_process is not None else
parallel_state.is_pipeline_last_stage()) so explicit False values are preserved.
| else: | ||
| assert False, "unexpected layer_type" |
There was a problem hiding this comment.
Replace assert False with raise AssertionError().
assert False statements are removed when Python runs with optimization (-O flag), which could allow unexpected layer types to pass silently in production.
🛡️ Proposed fix
else:
- assert False, "unexpected layer_type"
+ raise AssertionError(f"unexpected layer_type: {layer_type}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| assert False, "unexpected layer_type" | |
| else: | |
| raise AssertionError(f"unexpected layer_type: {layer_type}") |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 221-221: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`
around lines 220 - 221, The else branch handling unexpected layer_type in
modeling_falconh1/falconh1_block.py should not use `assert False` because
assertions can be stripped with Python -O; replace that assertion with an
explicit exception by raising an AssertionError (e.g., in the else of the
layer_type switch/if in the function/class handling layer dispatch in
falconh1_block.py, replace `assert False, "unexpected layer_type"` with `raise
AssertionError("unexpected layer_type")`) so the error is always raised in
production.
| if isinstance(layer, TransformerLayer): | ||
| hidden_states, _ = layer( | ||
| hidden_states=hidden_states, | ||
| attention_mask=attention_mask, | ||
| inference_context=inference_context, | ||
| rotary_pos_emb=rotary_pos_emb, | ||
| sequence_len_offset=sequence_len_offset, | ||
| ) | ||
| if isinstance(layer, FalconH1Layer): | ||
| hidden_states = layer( | ||
| hidden_states=hidden_states, | ||
| attention_mask=attention_mask, | ||
| inference_context=inference_context, | ||
| rotary_pos_emb=rotary_pos_emb, | ||
| sequence_len_offset=sequence_len_offset, | ||
| ) | ||
| else: # MambaLayer | ||
| hidden_states = layer( | ||
| hidden_states=hidden_states, | ||
| attention_mask=attention_mask, | ||
| inference_context=inference_context, | ||
| ) |
There was a problem hiding this comment.
Critical logic bug: Missing elif causes double execution for TransformerLayer.
The forward loop has incorrect branching:
if isinstance(layer, TransformerLayer)executes for TransformerLayerif isinstance(layer, FalconH1Layer)is a separateif, notelifelse(MambaLayer) executes for anything that's NOT FalconH1Layer
This means for TransformerLayer, both the TransformerLayer branch AND the MambaLayer branch will execute, processing the layer twice with potentially mismatched arguments.
🐛 Fix the branching logic
with inner_fp8_context:
if isinstance(layer, TransformerLayer):
hidden_states, _ = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
- if isinstance(layer, FalconH1Layer):
+ elif isinstance(layer, FalconH1Layer):
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
rotary_pos_emb=rotary_pos_emb,
sequence_len_offset=sequence_len_offset,
)
else: # MambaLayer
hidden_states = layer(
hidden_states=hidden_states,
attention_mask=attention_mask,
inference_context=inference_context,
)🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_block.py`
around lines 367 - 388, The loop's branching erroneously uses two separate ifs
causing TransformerLayer instances to also hit the MambaLayer branch; change the
second check to an elif so only one branch runs per layer (i.e., use elif
isinstance(layer, FalconH1Layer) instead of a standalone if) and keep the
existing argument sets for TransformerLayer, FalconH1Layer and the else
(MambaLayer) branches as-is to avoid double execution.
| @@ -0,0 +1,109 @@ | |||
| # Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Update copyright header.
The copyright year is 2023 and the full license text is missing. Update to 2025 and add the complete Apache 2.0 license header.
📝 Proposed fix
-# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
from megatron.core.extensions.transformer_engine import (🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer_specs.py`
at line 1, Replace the short copyright line at the top of
falconh1_layer_specs.py with an updated full Apache License, Version 2.0 header:
change the year from 2023 to 2025 and insert the complete Apache 2.0 boilerplate
(including copyright notice, "Licensed under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with the
License.", link to http://www.apache.org/licenses/LICENSE-2.0, and the standard
disclaimer). Ensure the new header appears at the very top of the file before
any imports or code (i.e., replacing the existing copyright comment).
| packed_seq_params=packed_seq_params, | ||
| sequence_len_offset=sequence_len_offset, | ||
| ) | ||
| attn_output = self.config.attention_out_multiplier |
There was a problem hiding this comment.
Critical bug: Attention output tensor is overwritten with scalar multiplier.
This line replaces the attention output tensor with the scalar multiplier value instead of scaling the tensor. This will cause incorrect model outputs as the actual attention computation result is discarded.
This bug was also identified in the PR comments by @sbhavani.
🐛 Fix the tensor scaling
- attn_output = self.config.attention_out_multiplier
+ attn_output = attn_output * self.config.attention_out_multiplier📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| attn_output = self.config.attention_out_multiplier | |
| attn_output = attn_output * self.config.attention_out_multiplier |
🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_layer.py` at
line 283, The bug is that attn_output is being overwritten by the scalar
self.config.attention_out_multiplier; instead of assigning the multiplier, scale
the attention tensor (e.g., attn_output = attn_output *
self.config.attention_out_multiplier or in-place attn_output *= ...) so the
actual attention result is preserved; ensure the multiplication occurs in the
same dtype/device as the tensor (use .to(attn_output.dtype) or cast if needed)
and update the code around the attn_output assignment in
modeling_falconh1/falconh1_layer.py (the attention block / forward method) to
perform tensor scaling rather than scalar replacement.
| logits, _ = self.output_layer( | ||
| hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output | ||
| ) * self.config.lm_head_multiplier |
There was a problem hiding this comment.
Bug: Multiplier applied to tuple instead of logits tensor.
self.output_layer() returns a tuple (logits, bias). The multiplication * self.config.lm_head_multiplier is applied to this tuple, not to the logits tensor. This will likely cause a runtime error.
🐛 Proposed fix
- logits, _ = self.output_layer(
+ logits, _ = self.output_layer(
hidden_states, weight=output_weight, runtime_gather_output=runtime_gather_output
- ) * self.config.lm_head_multiplier
+ )
+ logits = logits * self.config.lm_head_multiplier🤖 Prompt for AI Agents
In `@src/megatron/bridge/models/falcon_h1/modeling_falconh1/falconh1_model.py`
around lines 365 - 367, The current call to self.output_layer(...) returns a
tuple (logits, bias) but the code multiplies the whole tuple by
self.config.lm_head_multiplier; change the unpacking and apply the multiplier
only to the logits tensor by assigning something like logits, bias =
self.output_layer(hidden_states, weight=output_weight,
runtime_gather_output=runtime_gather_output) and then set logits = logits *
self.config.lm_head_multiplier (or multiply logits inline after unpacking) so
only the logits tensor is scaled.
| for l in range(total_layers_count): | ||
| if x < 0.5: | ||
| layer_type_list[l] = Symbols.ATTENTION | ||
| x += mamba_section_length | ||
| else: | ||
| x -= 1 |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Rename ambiguous variable l to layer_idx or similar.
The variable name l is flagged by linters (E741) because it's easily confused with the digit 1 in many fonts. This applies to all loop variables named l in this file (lines 35, 48, 61, 68, 88).
♻️ Proposed fix (example for first loop)
- for l in range(total_layers_count):
- if x < 0.5:
- layer_type_list[l] = Symbols.ATTENTION
+ for layer_idx in range(total_layers_count):
+ if x < 0.5:
+ layer_type_list[layer_idx] = Symbols.ATTENTION🧰 Tools
🪛 Flake8 (7.3.0)
[error] 35-35: ambiguous variable name 'l'
(E741)
🪛 Ruff (0.14.14)
[error] 35-35: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`
around lines 35 - 40, Rename the ambiguous loop variable `l` to `layer_idx` (or
similar) in all loops in this module to satisfy linters and improve readability;
specifically update every occurrence in the loops that assign into
layer_type_list using total_layers_count and any other loops within the same
file (e.g., the blocks that test x < 0.5 and set Symbols.ATTENTION or modify x)
so the index variable is consistently named `layer_idx` and all references
inside those loops (assignments, indexing, increments/decrements) are updated to
the new name.
| if __name__ == "__main__": | ||
| test_cases = [ | ||
| (9, 0.0, 0.0, 0.0, "M*-M*-M*-"), | ||
| (9, 0.0, 0.0, 0.0, "MMMMMMMMM"), | ||
| (10, 0.2, 0.1, 0.2), | ||
| ] | ||
| for t in test_cases: | ||
| print("") |
There was a problem hiding this comment.
Test harness has a test case that will fail.
The test case at line 194 passes an override pattern "M*-M*-M*-" but with all ratios set to 0.0. When ratios are all zero, the auto-allocation produces all MAMBA layers, so the override pattern won't match the layer counts (has ATTENTION * and MLP - symbols). This will raise a ValueError at runtime.
If this is intentional test behavior, consider adding a comment. Otherwise, this test case should either:
- Have matching ratios for the override pattern, or
- Be wrapped in a try-except to verify the expected error
🤖 Prompt for AI Agents
In
`@src/megatron/bridge/models/falcon_h1/modeling_falconh1/mamba_hybrid_layer_allocation.py`
around lines 192 - 199, The test harness in the __main__ block defines
test_cases including an override pattern "M*-M*-M*-" but supplies all-zero
ratios which yields all-MAMBA layers and will cause a ValueError when validating
the override; fix by either updating that tuple in test_cases to supply non-zero
ratios that produce a layer distribution compatible with the override pattern
(so the pattern "M*-M*-M*-" can match attention vs MLP counts) or wrap that
specific test case execution in a try/except that asserts the expected
ValueError (i.e., catch the error and treat it as a passed negative test);
locate the test_cases list and the for t in test_cases loop in the __main__
block to apply the change.
|
/ok to test 1fe2d0c |
|
@dhiaEddineRhaiem Can you please follow the instructions here https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/CONTRIBUTING.md#-linting-and-formatting to fix linting ? |
| --hf-path "$HF_EXPORT_PATH" \ | ||
| --trust-remote-code | ||
|
|
||
| uv run python - "$HF_MODEL_ID" "$HF_EXPORT_PATH" <<'PY' |
There was a problem hiding this comment.
why do we need these, clean up if not needed
There was a problem hiding this comment.
Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. I kept the tokenizer copy and documented why it is needed: this export flow reconstructs the HF config/weights from a Megatron checkpoint, but tokenizer artifacts are not written by that config-only export path. The exported HF directory is later used as a standalone AutoTokenizer.from_pretrained(...) path by the inference example, so the tokenizer files need to be copied from the source HF checkpoint.
| To run the same check with a different parallelism layout: | ||
|
|
||
| ```bash | ||
| TP=2 PP=1 NPROC_PER_NODE=2 bash examples/models/falcon_h1/conversion.sh |
There was a problem hiding this comment.
does default value TP=1 work? if not , update default value please
There was a problem hiding this comment.
Addressed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The default TP=1, PP=1, NPROC_PER_NODE=1 path was validated for tiiuae/Falcon-H1-0.5B-Instruct, and the README now states that this default layout works for the 0.5B model.
|
/claude review |
| if isinstance(layer, FalconH1Layer): | ||
| hidden_states = layer( | ||
| hidden_states=hidden_states, | ||
| attention_mask=attention_mask, | ||
| inference_context=inference_context, | ||
| rotary_pos_emb=rotary_pos_emb, | ||
| sequence_len_offset=sequence_len_offset, | ||
| ) | ||
| else: # MambaLayer |
There was a problem hiding this comment.
Bug: if/if/else dispatch causes TransformerLayer to be called twice per forward pass.
When a TransformerLayer is encountered, the first if executes it. Then the second if isinstance(layer, FalconH1Layer) is False (TransformerLayer is not FalconH1Layer), so the else branch fires — calling the same layer again as if it were a MambaLayer.
The second if should be elif:
| if isinstance(layer, FalconH1Layer): | |
| hidden_states = layer( | |
| hidden_states=hidden_states, | |
| attention_mask=attention_mask, | |
| inference_context=inference_context, | |
| rotary_pos_emb=rotary_pos_emb, | |
| sequence_len_offset=sequence_len_offset, | |
| ) | |
| else: # MambaLayer | |
| if isinstance(layer, TransformerLayer): | |
| hidden_states, _ = layer( | |
| hidden_states=hidden_states, | |
| attention_mask=attention_mask, | |
| inference_context=inference_context, | |
| rotary_pos_emb=rotary_pos_emb, | |
| sequence_len_offset=sequence_len_offset, | |
| ) | |
| elif isinstance(layer, FalconH1Layer): |
There was a problem hiding this comment.
Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. The second dispatch branch is now elif isinstance(layer, FalconH1Layer), so TransformerLayer instances no longer fall through into the Mamba path.
| @@ -0,0 +1,38 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Nit: copyright year is 2025; per project rules it should be 2026.
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.
| @@ -0,0 +1,161 @@ | |||
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |||
There was a problem hiding this comment.
Same 2025→2026 copyright year fix needed here.
| # Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. | |
| # Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. |
There was a problem hiding this comment.
Fixed in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0.
Code ReviewCritical: Forward dispatch bug in FalconH1Stack.forward() falconh1_block.py:367-380 — The layer dispatch loop uses if/if/else instead of if/elif/else. When a TransformerLayer is encountered (attention-only layers), the first if runs it, then the second if isinstance(layer, FalconH1Layer) is False, so the else (MambaLayer path) fires too — calling the layer twice per forward pass. This will produce incorrect hidden states and double the compute for any hybrid config that includes standalone attention layers. Fix: change the second if to elif. Minor: Copyright years — falcon_h1/init.py and falconh1_provider.py use 2025 copyright headers; should be 2026 per project rules. Test coverage gaps — The unit tests cover bridge registration, config mapping, provider construction, and layer allocation well. Consider adding: (1) FalconH1Config.post_init validation: test that invalid A_init_dist or non-divisible d_inner % mamba_head_dim raise ValueError. (2) FalconH1MLP.forward() multiplier logic: a unit test with a small tensor would catch regressions. (3) FalconH1SelfAttention.get_query_key_value_tensors() key scaling: verify key_multiplier != 1.0 actually scales the key tensor. (4) FalconH1MambaMixer._scale_zxbc_dt(): verify the 5-element SSM multiplier vector is applied correctly. Observation: no-op test — test_falcon_h1_provider.py::test_size_specific_providers_are_not_public_api checks that names like FalconH1ModelProvider500M are absent from all — but these classes do not exist anywhere in the PR. The test passes trivially. Consider removing it. Suggested test cases: No perf tests impacted. |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
Addressed the latest Falcon H1 review summary in f454d169c1f3bd6acfb09f2d6b19a63b3d248c3b0. Changes:
Validation:
Residual blocker:
|
|
/ok to test f454d16 |
Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
|
/ok to test a3d3333 |
Signed-off-by: dhiaEddineRhaiem <dhia.rhaiem@tii.ae> Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Vasudevan Rengasamy <vrengasamy@nvidia.com>
This PR adds support for FalconH1 in MegatronBridge.
FalconH1 is introduced as a new ParallelHybrid layer in the hybrid design of MegatronBridge.
cc @sbhavani
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.