do not scale gradient in bf16 mode by kashif · Pull Request #21428 · huggingface/transformers

kashif · 2023-02-02T17:55:30Z

What does this PR do?

Turn off gradient scaling in the trainer when bf16 mode is selected. Only use gradient scaling in float16 mode.

Who can review?

@sgugger and @stas00

HuggingFaceDocBuilderDev · 2023-02-02T18:32:58Z

The documentation is not available anymore as the PR was closed or merged.

stas00

Thank you, Kashif. This has been long overdue!

sgugger

Thanks for working on this! I think we can clean up a tiny bit more the code but this is the crux of the issue.

src/transformers/trainer.py

sgugger · 2023-02-02T19:49:01Z

src/transformers/trainer.py

+                        else:
+                            self.do_grad_scaling = False
+                            self.use_cuda_amp = False
+                            self.amp_dtype = None


Just realized there is this else block here. Clearly self.do_grad_scaling = False is not necessary, but you might need to have the two other lines somewhere else.

@pacman100 FSDP doesn't handle bfloat16 at all?

Hello @sgugger, similar to DeepSpeed, FSDP also manages their own half-precision, however, for FP16 it needs ShardedGradScaler. Here's an example notebook from PyTorch team wrt FSDP MixedPrecision: https://github.com/lessw2020/transformer_central/blob/main/mixed_precision/mixed_precision_fsdp.ipynb

sgugger

Perfect, thanks!

no dot scale gradient in bf16 mode

b2b9d68

kashif changed the title ~~no dot scale gradient in bf16 mode~~ do not scale gradient in bf16 mode Feb 2, 2023

kashif added 3 commits February 2, 2023 19:04

fix since args.fp16 might be none

8643767

fixed typo

f90d6ad

typo

1daead8

stas00 approved these changes Feb 2, 2023

View reviewed changes

sgugger approved these changes Feb 2, 2023

View reviewed changes

src/transformers/trainer.py Show resolved Hide resolved

only do if grad scaling is true

1dce844

sgugger reviewed Feb 2, 2023

View reviewed changes

kashif added 2 commits February 2, 2023 22:53

self.amp_dtype == torch.float16 is true

9b7bb0c

put back prop when fsdp is not none

1e07391

sgugger approved these changes Feb 2, 2023

View reviewed changes

sgugger merged commit fb13a7d into huggingface:main Feb 3, 2023

kashif deleted the grad-scaling branch February 3, 2023 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do not scale gradient in bf16 mode#21428

do not scale gradient in bf16 mode#21428
sgugger merged 7 commits intohuggingface:mainfrom
kashif:grad-scaling

kashif commented Feb 2, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Feb 2, 2023 •

edited

Loading

Uh oh!

stas00 left a comment

Uh oh!

sgugger left a comment

Uh oh!

Uh oh!

Uh oh!

sgugger Feb 2, 2023

Uh oh!

pacman100 Feb 3, 2023

Uh oh!

sgugger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kashif commented Feb 2, 2023

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stas00 left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sgugger Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

pacman100 Feb 3, 2023

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Feb 2, 2023 •

edited

Loading