FIX: weight tying for LoRA embeddings and lm_head by oKatanaaa · Pull Request #3711 · unslothai/unsloth

oKatanaaa · 2025-12-11T03:31:27Z

Issue

Training of models with tied weights (such as Qwen3-4B) is broken when training token embeddings.

When using LoRA with modules_to_save=["embed_tokens","lm_head"] and ensure_weight_tying=True (param from PEFT config), Unsloth’s offload/duplication of embeddings and lm_head breaks the shared storage, so the trainable copies diverge and PEFT’s tying isn’t applied. This leaves embed_tokens and lm_head untied during training/merging.

Summary

Add ensure_weight_tying to FastLanguageModel.get_peft_model and forward it into the LoRA config.
Re-tie embed_tokens and lm_head trainable copies after PEFT wrapping so Unsloth’s offload/rebuild path keeps them sharing storage.
Leave vision.py compatible with ensure_weight_tying via LoraConfig passthrough.

for more information, see https://pre-commit.ci

gemini-code-assist · 2025-12-11T03:31:42Z

Summary of Changes

Hello @oKatanaaa, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where weight tying between token embeddings and the language model head was being disrupted when using LoRA with Unsloth's offloading mechanism, particularly for models like Qwen3-4B. The changes introduce a new configuration option and explicit re-tying logic to preserve the shared storage of these weights, preventing their divergence during training and ensuring the integrity of models that rely on this architectural constraint.

Highlights

New ensure_weight_tying Parameter: Introduced a new ensure_weight_tying parameter to the FastLanguageModel.get_peft_model function in both llama.py and vision.py, allowing explicit control over weight tying behavior during LoRA fine-tuning.
Re-tying Logic for Embeddings and LM Head: Implemented a mechanism to re-tie embed_tokens (input embeddings) and lm_head (output embeddings) after PEFT wrapping. This ensures that Unsloth's offloading and duplication processes do not inadvertently break the shared storage required for tied weights, which was causing trainable copies to diverge.
Vision Model Compatibility: Ensured that the vision.py module is compatible with the new ensure_weight_tying parameter by passing it through to the underlying LoRA configuration, maintaining consistent behavior across different model types.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a fix for a weight tying issue with LoRA embeddings and the language model head, which occurs when ensure_weight_tying=True. The changes correctly propagate this flag through get_peft_model and add logic to manually re-tie the weights after the PEFT model is created, addressing the problem of weights diverging due to Unsloth's offloading mechanism. The implementation is sound and effectively resolves the described issue. I have one minor suggestion to enhance logging within an exception handler, which will improve debuggability should any unexpected errors arise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-11T03:34:53Z

            loftq_config = loftq_config,
            use_rslora = use_rslora,
            modules_to_save = modules_to_save,
+            ensure_weight_tying = ensure_weight_tying,


Guard ensure_weight_tying for older PEFT versions

The new arguments dict always injects ensure_weight_tying and then passes it straight into LoraConfig(**arguments) later in get_peft_model. Our dependency range in pyproject.toml still allows PEFT 0.7.x, whose LoraConfig signature does not accept this keyword; with such a supported version, this call will raise TypeError: __init__() got an unexpected keyword argument 'ensure_weight_tying' even if callers leave the default False. Either filter this argument based on inspect.signature(LoraConfig) (as vision.py already does) or bump the minimum PEFT version to one that supports the parameter.

Useful? React with 👍 / 👎.

I think bumping the version is the way here to avoid complicating logic further

Datta0 · 2025-12-11T10:51:36Z


        model = FastLlamaModel.patch_peft_model(model, use_gradient_checkpointing)

+        if ensure_weight_tying:


Can you please explain the need for such an argument
Can't we infer that from the config itself?

Given the Unsloth's spirit, having it inferred automatically would be nice. But I'm not sure if all of the model's have the tie_word_embeddings parameter in their config. Looking at Gemma3 models, their config does not declare this parameter, but ALL Gemma3 models actually have tied word embeddings.
EDIT: Gemma3 has tie_word_embeddings in its config, it's just not exposed on huggingface model pages. I personally would still have it as an explicit parameter (or by checking if it is present in kwargs) since tie_word_embeddings only tells the base model ships tied weights, not that the user actually wants PEFT to re-alias the trainable modules_to_save copies

So I think it is safer to have an explicit argument.

I guess we could check weights' pointers to see if they are equal and then raise a warning to the user (not enforce tying though, as there might be cases when you want to untie weights).

Datta0 · 2025-12-11T10:56:22Z

+                            target_module._parameters.pop("weight")
+                        if hasattr(target_module, "weight"):
+                            try:
+                                delattr(target_module, "weight")


Can you elaborate on when each of these cases happen?
target_module.weight vs target_module._paramters.weight

The if "weight" in getattr(target_module, "_parameters", {}): happens on PEFT’s trainable copies (modules_to_save.default for embed_tokens/lm_head) and non-offloaded originals. They still have a proper nn.Parameter registered under "weight", so we have to pop it before re-registering the shared one to avoid the name collision error.

The if hasattr(target_module, "weight") happens on Unsloth’s offload/rebuild of lm_head. In offload_output_embeddings, Unsloth deletes the registered parameter (del new_output_embeddings.weight) and then assigns a plain tensor back (new_output_embeddings.weight = offloaded_W). That leaves a weight attribute that is not registered, so you must delete the attribute first or it shadows the new registration.

Since we retie both the saved modules ("default" in the wrappers) and the original ones (for merge consistency and to avoid any other potential issues), both guards are needed.

oKatanaaa · 2025-12-12T23:45:22Z

Here is a simple script to reproduce the issue btw:

import torch
from unsloth import FastLanguageModel

# Assumes unsloth without the post-PEFT retie fix. ensure_weight_tying is passed via kwargs.
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen3-4B-Base-unsloth-bnb-4bit",
    max_seq_length=256,
    dtype=None,
    load_in_4bit=True,
)

def ptr(mod):
    return None if mod is None else mod.weight.data_ptr()

def check(stage):
    in_emb = model.get_input_embeddings()
    out_emb = model.get_output_embeddings()
    in_def = getattr(getattr(in_emb, "modules_to_save", {}), "default", None)
    out_def = getattr(getattr(out_emb, "modules_to_save", {}), "default", None)
    print(f"{stage}: base tied? {ptr(in_emb)==ptr(out_emb)}, "
          f"default tied? {ptr(in_def)==ptr(out_def)}")

check("before PEFT")

model = FastLanguageModel.get_peft_model(
    model,
    r=8,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
    modules_to_save=["embed_tokens","lm_head"],
    ensure_weight_tying=True,              # flag is set, but no fix present
    use_gradient_checkpointing="unsloth",  # triggers offload
    max_seq_length=256,
)

check("after PEFT (expect untied without fix)")

Datta0 · 2025-12-16T03:51:19Z

hey @oKatanaaa can you please tell me the motivation to tie the embedding and lm_head weights for a model that didn't have it to begin with?

oKatanaaa · 2025-12-16T07:47:07Z

hey @oKatanaaa can you please tell me the motivation to tie the embedding and lm_head weights for a model that didn't have it to begin with?

I'm not sure. This PR is not enabling this functionality though. Even if ensure_weight_tying=True, PEFT won't tie weights if the model's embeddings are untied initially (huggingface/peft#2864).

Some advanced users could enforce weight tying to intentionally reduce model size (usually word embeddings take ~13% of total model size, tying would reduce it by ~6% which could be significant).

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

danielhanchen · 2026-01-01T12:54:53Z

Thanks for the PR and Happy New Year!

I tested this on a B200 with the following configurations:

Test Results:

Test	Status
Qwen3 4-bit training + inference	[PASS]
Qwen3 8-bit training + inference	[PASS]
Qwen3 16-bit training + inference	[PASS]
Qwen3-MoE 4-bit training + inference	[PASS]
Vision (Qwen2-VL) load + LoRA + inference	[PASS]
ensure_weight_tying=True with modules_to_save	[PASS]

Weight tying verification:

Before PEFT: base embeddings tied = True
After PEFT with ensure_weight_tying=True: base tied = True, trainable copies tied = True
After training: weights remain tied = True

The fix correctly re-ties the trainable copies after PEFT wrapping, which preserves weight sharing during Unsloth's offload/rebuild path.

I also added a commit with a TODO comment for vision.py since the parameter is added but not yet implemented for vision models.

Sidenote: This PR was reviewed automatically by the Unsloth Code Review Bot.

danielhanchen · 2026-01-01T12:59:01Z

Thanks @oKatanaaa again :) Was trying out our new auto review system as well!

FIX: weight tying for LoRA embeddings and lm_head

oKatanaaa and others added 2 commits December 11, 2025 03:21

fix: weights tying

e6f9c41

[pre-commit.ci] auto fixes from pre-commit.com hooks

cd0ca56

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Dec 11, 2025

View reviewed changes

Comment thread unsloth/models/llama.py Outdated

chatgpt-codex-connector Bot reviewed Dec 11, 2025

View reviewed changes

Datta0 reviewed Dec 11, 2025

View reviewed changes

oKatanaaa requested a review from Datta0 December 12, 2025 22:12

fix: add a log instead of silent exception

4d2fe69

Add TODO comment for ensure_weight_tying in vision models

9176dd3

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

danielhanchen merged commit cf4342b into unslothai:main Jan 1, 2026
1 check passed

danielhanchen mentioned this pull request Jan 1, 2026

Add target_parameters support for MoE models and fix trainer bugs #3708

Closed

marcandrelarochelle mentioned this pull request Feb 24, 2026

[Bug] lm_head is not trained using LoRA and merging is broken #4098

Closed

abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026

Merge pull request unslothai#3711 from oKatanaaa/ensure-weight-tying

12004df

FIX: weight tying for LoRA embeddings and lm_head


		model = FastLlamaModel.patch_peft_model(model, use_gradient_checkpointing)

		if ensure_weight_tying:

Uh oh!

Conversation

oKatanaaa commented Dec 11, 2025

Uh oh!

gemini-code-assist Bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

oKatanaaa Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Datta0 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

oKatanaaa Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Datta0 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

oKatanaaa Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

oKatanaaa commented Dec 12, 2025

Uh oh!

Datta0 commented Dec 16, 2025

Uh oh!

oKatanaaa commented Dec 16, 2025

Uh oh!

danielhanchen commented Jan 1, 2026

Uh oh!

Uh oh!

danielhanchen commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oKatanaaa Dec 12, 2025 •

edited

Loading