[`SFTTrainer`] Adds NEFTune into `SFTTrainer` by younesbelkada · Pull Request #871 · huggingface/trl

younesbelkada · 2023-10-13T15:37:59Z

What does this PR do?

This PR adds NEFTune: a new technique for enhancing Supervised fine-tuning results results proposed in: https://arxiv.org/abs/2310.05914

I propose a very simple API which is as simple as passing a valid neftune_noise_alpha argument when initializing the SFTTrainer. To avoid any surprising behaviour, we should revert to the original forward method at the end of the training. This is handled inside def train() that is a wrapper around Trainer's train method.

from datasets import load_dataset
from trl import SFTTrainer

dataset = load_dataset("imdb", split="train")

trainer = SFTTrainer(
    "facebook/opt-350m",
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,
    neftune_noise_alpha=5,
)
trainer.train()

I still need to add few lines in the documentation.

Fixes: #870

cc @lvwerra @neelsjain @YuxinWenRick

younesbelkada · 2023-10-13T15:40:25Z

trl/trainer/sft_trainer.py

+        # After training we make sure to retrieve back the original forward pass method
+        # for the embedding layer
+        if self.neftune_noise_alpha is not None:
+
+            if isinstance(self.model, PreTrainedModel):
+                embeddings = self.model.get_input_embeddings()
+            elif isinstance(self.model, PeftModel):
+                embeddings = self.model.base_model.get_input_embeddings()
+
+            if hasattr(embeddings, "_trl_old_forward"):
+                embeddings.forward = embeddings._trl_old_forward
+                del embeddings._trl_old_forward


Here we make sure to retrieve the original behaviour after training

HuggingFaceDocBuilderDev · 2023-10-13T15:46:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

imrankh46 · 2023-10-15T07:24:49Z

i got this error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-13-340218026309>](https://localhost:8080/#) in <cell line: 3>()
      1 from trl import SFTTrainer
      2 
----> 3 supervised_finetuning_trainer = SFTTrainer(
      4     base_model,
      5     train_dataset=formatted_dataset["train"],

TypeError: SFTTrainer.__init__() got an unexpected keyword argument 'neftune_noise_alpha'

BenjaminBossan · 2023-10-15T19:29:26Z

I wonder if the implementation would be cleaner by using a post forward hook for the embedding layer instead of replacing the forward method completely.

younesbelkada · 2023-10-15T19:36:38Z

@imrankh46 that feature is not merged yet in TRL main branch to use it please run:

pip install -U git+https://github.com/huggingface/trl.git@add-neftune

@BenjaminBossan , yes this is possible indeed and I think it is cleaner, however I found it easier to understand for future users to have a standalone forward method. Would it also hurt existing hooks that accelerate attaches in case we manipulate forward post hooks?

BenjaminBossan · 2023-10-15T19:58:27Z

yes this is possible indeed and I think it is cleaner, however I found it easier to understand for future users to have a standalone forward method. Would it also hurt existing hooks that accelerate attaches in case we manipulate forward post hooks?

My thinking was that with a forward hook, you don't need to monkey patch (which can break stuff sometimes) and there is no need to call torch.nn.functional.embedding explicitly (what if the embedding layer of a model does something extra?). Whether it's easier to read or not, I don't know.

Existing hooks shouldn't be affected. When registering the hook, you get back a handle for this specific hook, which allows you to remove the hook once you don't need it anymore.

lewtun

Thanks for adding this amazing trick to boost SFT performance @younesbelkada 🔥 !

I'll leave the decision about hooks vs monkey patching to @lvwerra but otherwise this looks great to me.

docs/source/sft_trainer.mdx

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

lvwerra

Looks good to me! 🚀

cuongtran-uva · 2023-10-27T01:26:25Z

@imrankh46 that feature is not merged yet in TRL main branch to use it please run:
pip install -U git+https://github.com/huggingface/trl.git@add-neftune
@BenjaminBossan , yes this is possible indeed and I think it is cleaner, however I found it easier to understand for future users to have a standalone forward method. Would it also hurt existing hooks that accelerate attaches in case we manipulate forward post hooks?

I tried to update the trl with neftune by using your command but it was unsuccessful.

Cloning https://github.com/huggingface/trl.git (to revision add-neftune) to /tmp/pip-req-build-copawfzq
  Running command git clone --quiet https://github.com/huggingface/trl.git /tmp/pip-req-build-copawfzq
  WARNING: Did not find branch or tag 'add-neftune', assuming revision or ref.
  Running command git checkout -q add-neftune
  error: pathspec 'add-neftune' did not match any file(s) known to git.
  error: subprocess-exited-with-error

lvwerra · 2023-10-31T09:18:39Z

The branch got deleted after merging. now you can use @main instead!

daehuikim · 2023-12-11T18:36:05Z

The branch got deleted after merging. now you can use @main instead!

$ pip install -U git+https://github.com/huggingface/trl.git@main
This works for me. Thanks!

$ pip freeze | grep trl
trl @ git+https://github.com/huggingface/trl.git@(codes)

Installation check

* v1 neftune * docstring * add doc + fix nit * add more docs * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

younesbelkada added 2 commits October 13, 2023 15:33

v1 neftune

67ba58e

docstring

f095e62

younesbelkada commented Oct 13, 2023

View reviewed changes

add doc + fix nit

cf26ff6

younesbelkada requested a review from lvwerra October 13, 2023 16:19

add more docs

a6aa216

lewtun approved these changes Oct 16, 2023

View reviewed changes

Apply suggestions from code review

86899a8

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

lvwerra approved these changes Oct 16, 2023

View reviewed changes

neelsjain mentioned this pull request Oct 16, 2023

RuntimeError: sharded_state_dict can only be used when parameters are flatten and sharded. neelsjain/NEFTune#2

Closed

younesbelkada merged commit c4ed327 into main Oct 17, 2023

younesbelkada deleted the add-neftune branch October 17, 2023 04:58

nivibilla mentioned this pull request Oct 18, 2023

NEFTune Support pls huggingface/transformers#26899

Closed

MilkClouds mentioned this pull request Jun 24, 2024

Neftune is applied twice; in trl and transformers BOTH! #1766

Closed

Conversation

younesbelkada commented Oct 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

younesbelkada Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 13, 2023

Uh oh!

imrankh46 commented Oct 15, 2023

Uh oh!

BenjaminBossan commented Oct 15, 2023

Uh oh!

younesbelkada commented Oct 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan commented Oct 15, 2023

Uh oh!

lewtun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvwerra left a comment

Choose a reason for hiding this comment

Uh oh!

cuongtran-uva commented Oct 27, 2023

Uh oh!

lvwerra commented Oct 31, 2023

Uh oh!

daehuikim commented Dec 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

younesbelkada commented Oct 13, 2023 •

edited

Loading

younesbelkada commented Oct 15, 2023 •

edited

Loading

daehuikim commented Dec 11, 2023 •

edited

Loading