-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Labels
Description
Environment info
transformersversion: 4.17.0- Platform: Ubuntu
- Python version: 3.8
- PyTorch version (GPU?): 8x A10
- Tensorflow version (GPU?):
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?:
Who can help
Models: T5
Information
I am trying to fine-tune T5 using the Huggingface Trainer in bf16 using its built-in DeepSpeed integration. While I added bf16=True and "bf16": { "enabled": true }, to the TrainingArguments and DeepSpeed config respectively, this flag makes no difference in GPU memory usage or training speed. Hence, I found a typo in src/transformers/deepspeed.py: At line 253,
if self.is_true("bfoat16.enabled"): self._dtype = torch.bfloat16
Instead of bfoat16.enabled, it should be bfloat16.enabled. Yet that in itself is also outdated, as the latest DeepSpeed docs say that bf16 not bfloat16 should be in the DeepSpeed config.
Reactions are currently unavailable