Skip to content

Fix MoE module_count alignment and prefix inference for Qwen3.5 merge#555

Merged
danielhanchen merged 3 commits into
mainfrom
fix/moe-count-and-prefix-remap
Mar 17, 2026
Merged

Fix MoE module_count alignment and prefix inference for Qwen3.5 merge#555
danielhanchen merged 3 commits into
mainfrom
fix/moe-count-and-prefix-remap

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

Cherry-picked the two targeted fixes from #499 (by @GoldenGrapeGentleman), without the unrelated removals of vocab-resize handling and tied-embedding logic that would introduce regressions.

  • MoE module_count alignment (#3405, #3701): Custom MoE LoRA wrappers (e.g. GPT-OSS expert LoRA) that match the fallback branch have lora_A/lora_B/scaling but no .base_layer child, leaving module_count short. Count entries with module is None after the main loop to suppress the false [Unsloth merge debug] LoRA count mismatch warning. Diagnostic-only; does not affect merge behavior.
  • Prefix inference fallback (fixes #4294): Composite models like Qwen3.5 store safetensors with prefix model.language_model. but runtime LoRA keys use model.. When no _checkpoint_conversion_mapping exists, _infer_prefix_and_remap() detects the discrepancy by suffix-matching a LoRA key against safetensor keys and remaps all LoRA keys. Includes validation that at least one remapped key actually exists in the shard to prevent bad inferences from partial suffix matches.

What was excluded from #499 and why

The original PR also removed vocab-resize handling (_merge_lora shape-mismatch branch, _resized shard-rewrite path) and the tie_word_embeddings lm_head-to-embed_tokens fallback. Those removals would break:

  1. Users who call resize_token_embeddings() then merge (crash on shape mismatch)
  2. Models with tie_word_embeddings=True targeting lm_head in LoRA (silent data corruption -- merge completes but LoRA weights not applied to shared embeddings)

Test plan

Tested on GPU with two models, three conditions each:

Model Baseline (main) With PR Merge Verified
imdatta0/tiny_qwen3_moe_2.8B_0.7B train OK, merge OK train OK, merge OK PASS (96/4704 layers modified)
imdatta0/small_qwen3_5_20b train OK, merge FAILED (LoRAs=280, saved=0) train OK, merge OK PASS (200/11720 layers modified)
  • MoE model: no regression, no mismatch warning
  • Qwen3.5 model: merge now succeeds (was n_saved_modules = 0)
  • Merged weights verified: non-zero diffs on LoRA-targeted layers confirm deltas were applied
  • No changes to vocab-resize or tied-embedding code paths

Co-authored-by: GoldenGrapeGentleman yueyuan@amd.com

…t _checkpoint_conversion_mapping

Two fixes extracted from PR #499 (by @GoldenGrapeGentleman), without the
unrelated removals of vocab-resize and tied-embedding handling.

1. MoE module_count alignment (#3405, #3701): Custom MoE LoRA wrappers
   (e.g. GPT-OSS expert LoRA) that match the fallback branch have
   lora_A/B/scaling but no .base_layer child, leaving module_count short.
   Count entries with module=None after the main loop to suppress the
   false "[Unsloth merge debug] LoRA count mismatch" warning.

2. Prefix inference fallback (fixes #4294): Composite models like Qwen3.5
   store safetensors with prefix "model.language_model." but runtime LoRA
   keys use "model.". When no _checkpoint_conversion_mapping exists, infer
   the prefix by suffix-matching a LoRA key against safetensor keys and
   remap. Includes validation that at least one remapped key exists in the
   shard to prevent bad inferences from partial suffix matches.
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses two critical issues affecting LoRA merging in specific model architectures. It refines the counting of LoRA modules to correctly handle custom MoE wrappers and introduces an intelligent prefix inference system to align LoRA keys with safetensor keys in composite models, thereby enabling successful merges that previously failed.

Highlights

  • MoE Module Count Alignment: Corrected the module_count for custom MoE LoRA wrappers that lack a .base_layer child, preventing false LoRA count mismatch warnings and aligning statistics.
  • Prefix Inference Fallback: Implemented a mechanism to infer and remap LoRA keys for composite models (like Qwen3.5) where safetensor prefixes differ from runtime keys, ensuring successful merges when no explicit conversion mapping exists.
Changelog
  • unsloth_zoo/saving_utils.py
    • Adjusted module_count logic to correctly account for custom MoE LoRA wrappers that may not have a .base_layer child.
    • Removed a TODO comment related to MoE target parameters.
    • Introduced a new function _infer_prefix_and_remap to dynamically determine and apply prefixes to LoRA keys based on safetensor keys.
    • Integrated the new prefix inference logic into _convert_lora_keys_to_safetensor_format to be used when no explicit checkpoint conversion mapping is available.
Activity
  • The author cherry-picked two targeted fixes from a previous PR (Fix MoE target_parameters module_count alignment (#3405, #3701) #499) by @GoldenGrapeGentleman, specifically excluding unrelated removals to prevent regressions.
  • Extensive GPU testing was performed on two models (imdatta0/tiny_qwen3_moe_2.8B_0.7B and imdatta0/small_qwen3_5_20b) under various conditions.
  • Verification confirmed no regressions for MoE models and successful merges for Qwen3.5 models, which previously failed.
  • Merged weights were verified to have non-zero diffs on LoRA-targeted layers, confirming delta application.
  • It was confirmed that no changes were made to vocab-resize or tied-embedding code paths, as intended.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Comment thread unsloth_zoo/saving_utils.py Fixed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two targeted fixes. First, it corrects the module_count alignment for MoE models to prevent spurious mismatch warnings during merging. This is achieved by correctly accounting for custom LoRA wrappers. Second, it adds a prefix inference mechanism to handle key discrepancies in composite models like Qwen3.5, ensuring successful merges. The changes are well-implemented and include necessary validation. I've provided a couple of minor suggestions to improve code conciseness and readability in the new _infer_prefix_and_remap function.

Comment thread unsloth_zoo/saving_utils.py Outdated
Comment on lines +2681 to +2687
found_any = False
for k in lora_weights:
if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set:
found_any = True
break
if not found_any:
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validation loop can be made more concise and Pythonic by using an any() with a generator expression. This improves readability without changing the logic.

Suggested change
found_any = False
for k in lora_weights:
if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set:
found_any = True
break
if not found_any:
return None
if not any(
isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set
for k in lora_weights
):
return None

Comment thread unsloth_zoo/saving_utils.py Outdated
Comment on lines +2689 to +2693
remapped = defaultdict(lora_weights.default_factory)
for k, v in lora_weights.items():
new_key = inferred_prefix + k if isinstance(k, str) else k
remapped[new_key] = v
return remapped

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The remapping of keys can be expressed more concisely using a dictionary comprehension passed to the defaultdict constructor. This is a more Pythonic way to build the new dictionary.

Suggested change
remapped = defaultdict(lora_weights.default_factory)
for k, v in lora_weights.items():
new_key = inferred_prefix + k if isinstance(k, str) else k
remapped[new_key] = v
return remapped
return defaultdict(
lora_weights.default_factory,
{
(inferred_prefix + k if isinstance(k, str) else k): v
for k, v in lora_weights.items()
},
)

Address review suggestions: replace explicit loop with any() generator
expression for validation, and use dict comprehension for remapping.
Remove trailing pass statement.
Replace the global single-prefix rewrite with per-key matching:
- Keys that already match a safetensor key are preserved as-is
- Keys with a single unambiguous suffix match are remapped individually
- Keys that cannot be suffix-matched (e.g. MoE fused expert params
  with different naming) inherit the most common inferred prefix from
  successfully matched keys

This handles mixed-prefix shards correctly and avoids double-prefixing
keys that are already in the right format, while still supporting MoE
expert parameters that use different naming conventions.
@danielhanchen danielhanchen merged commit e5f2095 into main Mar 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants