[`Llama2`] Add disabling TP behavior by younesbelkada · Pull Request #728 · huggingface/peft

younesbelkada · 2023-07-19T06:43:23Z

Fixes #726

This PR is on par with huggingface/transformers#24906

Currently the TP paradigm that is supported by transformers, technically is not really a real Tensor Parallelism paradigm but rather a simulation of TP by manually splitting the layers into chunks and concatenating the results.

The motivation of that implementation is to mimic the TP paradigm that was used during pre-training of these models, as slicing weight tensors and input leads to slight numerical differences: pytorch/pytorch#76232

I would argue that this might be not that important for training as the model will be fine-tuned, thus the weights of the model will be adapted accordingly.

cc @pacman100 @BenjaminBossan

I propose to properly support TP in the future once this is properly implemented, currently the TP that is in place is more a patch to match the logits of the original implementation

HuggingFaceDocBuilderDev · 2023-07-19T06:47:32Z

The documentation is not available anymore as the PR was closed or merged.

pacman100

Thank you @younesbelkada for the quick fix, LGTM!

BenjaminBossan

It sounds reasonable to me that for fine-tuning, TP is disabled, especially if it is just simulated. I wonder if this should be documented somewhere, since, as you mentioned, it can lead to small numerical differences. Perhaps a comment above these new lines of code?

younesbelkada · 2023-07-19T08:26:17Z

Perhaps a comment above these new lines of code?

Sure yes, will add this !

* add disabling TP behavior * add comments * adapt from new changes of transformers PR

* update to `prepare_model_for_kbit_training` from deprecated `prepare_model_for_int8_training` and add `use_gradient_checkpointing=args.gradient_checkpointing` to automatically follow the gradient checkpointing choice is also the workaround for huggingface#694 * workaround for gradient checkpointing issue calling model.gradient_checkpointing_enable() twice causes issues this workaround calls it in prepare_model_for_kbit_training and then changes the arg to false to make sure it isn't called again in huggingface trainer inner loop also changes stack_llama_2 sft trainer to use correct device map for ddp training so that you can test this issue

add disabling TP behavior

e6df57b

younesbelkada mentioned this pull request Jul 19, 2023

[Llama2] replace self.pretraining_tp with self.config.pretraining_tp huggingface/transformers#24906

Merged

younesbelkada requested a review from pacman100 July 19, 2023 06:52

pacman100 approved these changes Jul 19, 2023

View reviewed changes

BenjaminBossan approved these changes Jul 19, 2023

View reviewed changes

add comments

8a52ae7

psinger mentioned this pull request Jul 19, 2023

LLama2 config fix h2oai/h2o-llmstudio#294

Merged

adapt from new changes of transformers PR

c037679

younesbelkada merged commit a09f66c into huggingface:main Jul 19, 2023

younesbelkada deleted the fix-llama-tp branch July 19, 2023 12:29

TheBloke mentioned this pull request Jul 20, 2023

[BUG] Sample issue - division by zero AutoGPTQ/AutoGPTQ#199

Closed

hiyouga mentioned this pull request Jul 20, 2023

(LLama-2-13b-hf) - Qlora - SFT - RuntimeError: mat1 and mat2 shapes cannot be multiplied (3264x5120 and 1x2560) hiyouga/LlamaFactory#202

Closed

Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025

[Llama2] Add disabling TP behavior (huggingface#728)

4f56e80

* add disabling TP behavior * add comments * adapt from new changes of transformers PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Llama2`] Add disabling TP behavior#728

[`Llama2`] Add disabling TP behavior#728
younesbelkada merged 3 commits intohuggingface:mainfrom
younesbelkada:fix-llama-tp

younesbelkada commented Jul 19, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 19, 2023 •

edited

Loading

Uh oh!

pacman100 left a comment

Uh oh!

BenjaminBossan left a comment

Uh oh!

younesbelkada commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

younesbelkada commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pacman100 left a comment

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

younesbelkada commented Jul 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 19, 2023 •

edited

Loading