Safe accumulation type for nd.norm

For FP16, reduction operators tend to loss precision if the accumulation data type remains fp16. Instead, the accumulation dtype should be in fp32. i.e. 
```
dtype = fp16 -> acc_type = fp32
dtype = fp32 -> acc_type = fp64
dtype = fp64 -> acc_type = fp64
```
We should do it for the norm op. 

Reference impl for softmax: https://github.com/apache/incubator-mxnet/pull/14098