Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Safe accumulation type for nd.norm #14126

@eric-haibin-lin

Description

@eric-haibin-lin

For FP16, reduction operators tend to loss precision if the accumulation data type remains fp16. Instead, the accumulation dtype should be in fp32. i.e.

dtype = fp16 -> acc_type = fp32
dtype = fp32 -> acc_type = fp64
dtype = fp64 -> acc_type = fp64

We should do it for the norm op.

Reference impl for softmax: #14098

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions