https://github.com/microsoft/DeepSpeed/blob/0a73e6e6137a91e1b776f725b637f8b37a75f8e7/deepspeed/runtime/zero/stage3.py#L1168 Only works for float32.