Fix MoE module_count alignment and prefix inference for Qwen3.5 merge#555
Conversation
…t _checkpoint_conversion_mapping Two fixes extracted from PR #499 (by @GoldenGrapeGentleman), without the unrelated removals of vocab-resize and tied-embedding handling. 1. MoE module_count alignment (#3405, #3701): Custom MoE LoRA wrappers (e.g. GPT-OSS expert LoRA) that match the fallback branch have lora_A/B/scaling but no .base_layer child, leaving module_count short. Count entries with module=None after the main loop to suppress the false "[Unsloth merge debug] LoRA count mismatch" warning. 2. Prefix inference fallback (fixes #4294): Composite models like Qwen3.5 store safetensors with prefix "model.language_model." but runtime LoRA keys use "model.". When no _checkpoint_conversion_mapping exists, infer the prefix by suffix-matching a LoRA key against safetensor keys and remap. Includes validation that at least one remapped key exists in the shard to prevent bad inferences from partial suffix matches.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses two critical issues affecting LoRA merging in specific model architectures. It refines the counting of LoRA modules to correctly handle custom MoE wrappers and introduces an intelligent prefix inference system to align LoRA keys with safetensor keys in composite models, thereby enabling successful merges that previously failed. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces two targeted fixes. First, it corrects the module_count alignment for MoE models to prevent spurious mismatch warnings during merging. This is achieved by correctly accounting for custom LoRA wrappers. Second, it adds a prefix inference mechanism to handle key discrepancies in composite models like Qwen3.5, ensuring successful merges. The changes are well-implemented and include necessary validation. I've provided a couple of minor suggestions to improve code conciseness and readability in the new _infer_prefix_and_remap function.
| found_any = False | ||
| for k in lora_weights: | ||
| if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set: | ||
| found_any = True | ||
| break | ||
| if not found_any: | ||
| return None |
There was a problem hiding this comment.
This validation loop can be made more concise and Pythonic by using an any() with a generator expression. This improves readability without changing the logic.
| found_any = False | |
| for k in lora_weights: | |
| if isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set: | |
| found_any = True | |
| break | |
| if not found_any: | |
| return None | |
| if not any( | |
| isinstance(k, str) and (inferred_prefix + k + ".weight") in sf_key_set | |
| for k in lora_weights | |
| ): | |
| return None |
| remapped = defaultdict(lora_weights.default_factory) | ||
| for k, v in lora_weights.items(): | ||
| new_key = inferred_prefix + k if isinstance(k, str) else k | ||
| remapped[new_key] = v | ||
| return remapped |
There was a problem hiding this comment.
The remapping of keys can be expressed more concisely using a dictionary comprehension passed to the defaultdict constructor. This is a more Pythonic way to build the new dictionary.
| remapped = defaultdict(lora_weights.default_factory) | |
| for k, v in lora_weights.items(): | |
| new_key = inferred_prefix + k if isinstance(k, str) else k | |
| remapped[new_key] = v | |
| return remapped | |
| return defaultdict( | |
| lora_weights.default_factory, | |
| { | |
| (inferred_prefix + k if isinstance(k, str) else k): v | |
| for k, v in lora_weights.items() | |
| }, | |
| ) |
Address review suggestions: replace explicit loop with any() generator expression for validation, and use dict comprehension for remapping. Remove trailing pass statement.
Replace the global single-prefix rewrite with per-key matching: - Keys that already match a safetensor key are preserved as-is - Keys with a single unambiguous suffix match are remapped individually - Keys that cannot be suffix-matched (e.g. MoE fused expert params with different naming) inherit the most common inferred prefix from successfully matched keys This handles mixed-prefix shards correctly and avoids double-prefixing keys that are already in the right format, while still supporting MoE expert parameters that use different naming conventions.
Summary
Cherry-picked the two targeted fixes from #499 (by @GoldenGrapeGentleman), without the unrelated removals of vocab-resize handling and tied-embedding logic that would introduce regressions.
lora_A/lora_B/scalingbut no.base_layerchild, leavingmodule_countshort. Count entries withmodule is Noneafter the main loop to suppress the false[Unsloth merge debug] LoRA count mismatchwarning. Diagnostic-only; does not affect merge behavior.model.language_model.but runtime LoRA keys usemodel.. When no_checkpoint_conversion_mappingexists,_infer_prefix_and_remap()detects the discrepancy by suffix-matching a LoRA key against safetensor keys and remaps all LoRA keys. Includes validation that at least one remapped key actually exists in the shard to prevent bad inferences from partial suffix matches.What was excluded from #499 and why
The original PR also removed vocab-resize handling (
_merge_lorashape-mismatch branch,_resizedshard-rewrite path) and thetie_word_embeddingslm_head-to-embed_tokens fallback. Those removals would break:resize_token_embeddings()then merge (crash on shape mismatch)tie_word_embeddings=Truetargetinglm_headin LoRA (silent data corruption -- merge completes but LoRA weights not applied to shared embeddings)Test plan
Tested on GPU with two models, three conditions each:
imdatta0/tiny_qwen3_moe_2.8B_0.7Bimdatta0/small_qwen3_5_20bLoRAs=280, saved=0)n_saved_modules = 0)Co-authored-by: GoldenGrapeGentleman yueyuan@amd.com