[BUG] bf16 incorrectly configured in src/transformers/deepspeed.py

## Environment info


- `transformers` version: 4.17.0
- Platform: Ubuntu
- Python version: 3.8
- PyTorch version (GPU?): 8x A10
- Tensorflow version (GPU?):
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?:

### Who can help
@stas00 

Models: T5

## Information

I am trying to fine-tune T5 using the Huggingface Trainer in bf16 using its built-in DeepSpeed integration. While I added `bf16=True` and `"bf16": {
      "enabled": true
    },` to the TrainingArguments and DeepSpeed config respectively, this flag makes no difference in GPU memory usage or training speed. Hence, I found a typo in src/transformers/deepspeed.py: At line 253,
`if self.is_true("bfoat16.enabled"):
            self._dtype = torch.bfloat16`

Instead of `bfoat16.enabled`, it should be `bfloat16.enabled`. Yet that in itself is also outdated, as the latest DeepSpeed docs say that `bf16` not `bfloat16` should be in the DeepSpeed config.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] bf16 incorrectly configured in src/transformers/deepspeed.py #16596

Environment info

Who can help

Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] bf16 incorrectly configured in src/transformers/deepspeed.py #16596

Description

Environment info

Who can help

Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions