Skip to content

[BUG] FP16 used for all reduce even if BFLOAT16 is enabled #2071

@owmohamm

Description

@owmohamm

Even if BFLOAT16 is enabled, unless the communication datatype is also explicitly set to bfp16, fp16 is still used: https://github.com/microsoft/DeepSpeed/blob/9b70ce56e7af89d5226f9b06ebe1137407f371dc/deepspeed/runtime/engine.py#L702-L711

Moreover in zero, even if the communication datatype is set to bfp16 at some place the default value is fp16
https://github.com/microsoft/DeepSpeed/blob/9b70ce56e7af89d5226f9b06ebe1137407f371dc/deepspeed/runtime/zero/stage_1_and_2.py#L1334-L1338

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions