Fix MoE module_count alignment and prefix inference for Qwen3.5 merge by danielhanchen · Pull Request #555 · unslothai/unsloth-zoo

danielhanchen · 2026-03-17T10:49:21Z

Summary

Cherry-picked the two targeted fixes from #499 (by @GoldenGrapeGentleman), without the unrelated removals of vocab-resize handling and tied-embedding logic that would introduce regressions.

MoE module_count alignment (#3405, #3701): Custom MoE LoRA wrappers (e.g. GPT-OSS expert LoRA) that match the fallback branch have lora_A/lora_B/scaling but no .base_layer child, leaving module_count short. Count entries with module is None after the main loop to suppress the false [Unsloth merge debug] LoRA count mismatch warning. Diagnostic-only; does not affect merge behavior.
Prefix inference fallback (fixes #4294): Composite models like Qwen3.5 store safetensors with prefix model.language_model. but runtime LoRA keys use model.. When no _checkpoint_conversion_mapping exists, _infer_prefix_and_remap() detects the discrepancy by suffix-matching a LoRA key against safetensor keys and remaps all LoRA keys. Includes validation that at least one remapped key actually exists in the shard to prevent bad inferences from partial suffix matches.

What was excluded from #499 and why

The original PR also removed vocab-resize handling (_merge_lora shape-mismatch branch, _resized shard-rewrite path) and the tie_word_embeddings lm_head-to-embed_tokens fallback. Those removals would break:

Users who call resize_token_embeddings() then merge (crash on shape mismatch)
Models with tie_word_embeddings=True targeting lm_head in LoRA (silent data corruption -- merge completes but LoRA weights not applied to shared embeddings)

Test plan

Tested on GPU with two models, three conditions each:

Model	Baseline (main)	With PR	Merge Verified
`imdatta0/tiny_qwen3_moe_2.8B_0.7B`	train OK, merge OK	train OK, merge OK	PASS (96/4704 layers modified)
`imdatta0/small_qwen3_5_20b`	train OK, merge FAILED (`LoRAs=280, saved=0`)	train OK, merge OK	PASS (200/11720 layers modified)

MoE model: no regression, no mismatch warning
Qwen3.5 model: merge now succeeds (was n_saved_modules = 0)
Merged weights verified: non-zero diffs on LoRA-targeted layers confirm deltas were applied
No changes to vocab-resize or tied-embedding code paths

Co-authored-by: GoldenGrapeGentleman yueyuan@amd.com

@GoldenGrapeGentleman

…t _checkpoint_conversion_mapping Two fixes extracted from PR #499 (by @GoldenGrapeGentleman), without the unrelated removals of vocab-resize and tied-embedding handling. 1. MoE module_count alignment (#3405, #3701): Custom MoE LoRA wrappers (e.g. GPT-OSS expert LoRA) that match the fallback branch have lora_A/B/scaling but no .base_layer child, leaving module_count short. Count entries with module=None after the main loop to suppress the false "[Unsloth merge debug] LoRA count mismatch" warning. 2. Prefix inference fallback (fixes #4294): Composite models like Qwen3.5 store safetensors with prefix "model.language_model." but runtime LoRA keys use "model.". When no _checkpoint_conversion_mapping exists, infer the prefix by suffix-matching a LoRA key against safetensor keys and remap. Includes validation that at least one remapped key exists in the shard to prevent bad inferences from partial suffix matches.

chatgpt-codex-connector · 2026-03-17T10:49:28Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

gemini-code-assist · 2026-03-17T10:49:36Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses two critical issues affecting LoRA merging in specific model architectures. It refines the counting of LoRA modules to correctly handle custom MoE wrappers and introduces an intelligent prefix inference system to align LoRA keys with safetensor keys in composite models, thereby enabling successful merges that previously failed.

Highlights

MoE Module Count Alignment: Corrected the module_count for custom MoE LoRA wrappers that lack a .base_layer child, preventing false LoRA count mismatch warnings and aligning statistics.
Prefix Inference Fallback: Implemented a mechanism to infer and remap LoRA keys for composite models (like Qwen3.5) where safetensor prefixes differ from runtime keys, ensuring successful merges when no explicit conversion mapping exists.

Changelog

unsloth_zoo/saving_utils.py
- Adjusted module_count logic to correctly account for custom MoE LoRA wrappers that may not have a .base_layer child.
- Removed a TODO comment related to MoE target parameters.
- Introduced a new function _infer_prefix_and_remap to dynamically determine and apply prefixes to LoRA keys based on safetensor keys.
- Integrated the new prefix inference logic into _convert_lora_keys_to_safetensor_format to be used when no explicit checkpoint conversion mapping is available.

Activity

The author cherry-picked two targeted fixes from a previous PR (Fix MoE target_parameters module_count alignment (#3405, #3701) #499) by @GoldenGrapeGentleman, specifically excluding unrelated removals to prevent regressions.
Extensive GPU testing was performed on two models (imdatta0/tiny_qwen3_moe_2.8B_0.7B and imdatta0/small_qwen3_5_20b) under various conditions.
Verification confirmed no regressions for MoE models and successful merges for Qwen3.5 models, which previously failed.
Merged weights were verified to have non-zero diffs on LoRA-targeted layers, confirming delta application.
It was confirmed that no changes were made to vocab-resize or tied-embedding code paths, as intended.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two targeted fixes. First, it corrects the module_count alignment for MoE models to prevent spurious mismatch warnings during merging. This is achieved by correctly accounting for custom LoRA wrappers. Second, it adds a prefix inference mechanism to handle key discrepancies in composite models like Qwen3.5, ensuring successful merges. The changes are well-implemented and include necessary validation. I've provided a couple of minor suggestions to improve code conciseness and readability in the new _infer_prefix_and_remap function.

gemini-code-assist · 2026-03-17T10:52:16Z

+    found_any = False
+    for k in lora_weights:
+        if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set:
+            found_any = True
+            break
+    if not found_any:
+        return None


This validation loop can be made more concise and Pythonic by using an any() with a generator expression. This improves readability without changing the logic.

Suggested change

found_any = False

for k in lora_weights:

if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set:

found_any = True

break

if not found_any:

return None

if not any(

isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set

for k in lora_weights

):

return None

gemini-code-assist · 2026-03-17T10:52:16Z

+    remapped = defaultdict(lora_weights.default_factory)
+    for k, v in lora_weights.items():
+        new_key = inferred_prefix + k if isinstance(k, str) else k
+        remapped[new_key] = v
+    return remapped


The remapping of keys can be expressed more concisely using a dictionary comprehension passed to the defaultdict constructor. This is a more Pythonic way to build the new dictionary.

Suggested change

remapped = defaultdict(lora_weights.default_factory)

for k, v in lora_weights.items():

new_key = inferred_prefix + k if isinstance(k, str) else k

remapped[new_key] = v

return remapped

return defaultdict(

lora_weights.default_factory,

{

(inferred_prefix + k if isinstance(k, str) else k): v

for k, v in lora_weights.items()

},

)

Address review suggestions: replace explicit loop with any() generator expression for validation, and use dict comprehension for remapping. Remove trailing pass statement.

Replace the global single-prefix rewrite with per-key matching: - Keys that already match a safetensor key are preserved as-is - Keys with a single unambiguous suffix match are remapped individually - Keys that cannot be suffix-matched (e.g. MoE fused expert params with different naming) inherit the most common inferred prefix from successfully matched keys This handles mixed-prefix shards correctly and avoids double-prefixing keys that are already in the right format, while still supporting MoE expert parameters that use different naming conventions.

github-code-quality Bot found potential problems Mar 17, 2026

View reviewed changes

Comment thread unsloth_zoo/saving_utils.py Fixed

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

GoldenGrapeGentleman added 2 commits March 17, 2026 12:44

Style: use any() and dict comprehension in _infer_prefix_and_remap

f52bc8b

Address review suggestions: replace explicit loop with any() generator expression for validation, and use dict comprehension for remapping. Remove trailing pass statement.

danielhanchen merged commit e5f2095 into main Mar 17, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MoE module_count alignment and prefix inference for Qwen3.5 merge#555

Fix MoE module_count alignment and prefix inference for Qwen3.5 merge#555
danielhanchen merged 3 commits into
mainfrom
fix/moe-count-and-prefix-remap

danielhanchen commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented Mar 17, 2026

Summary

What was excluded from #499 and why

Test plan

Uh oh!

chatgpt-codex-connector Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants