Roberta fp16 got wrong inference results 

## Description

Roberta（bert4keras） model  got wrong inference results with fp16( fp32 is right)

Using tf2onnx，convert tensorflow savedmodel to onnx, and there are two out-of-bounds constants in the onnx model：
1. Infinity(flota32),  Min op’s input[1] in the layernorm structure
2. -999999995904(float32), Mul op's input[0] in the self-attention structure

I found out that layernorm was causing the fp16  got wrong inference results.

## Environment

**TensorRT Version**:  8.4.2.4
**NVIDIA GPU**:  A30
**NVIDIA Driver Version**:  510.68.02
**CUDA Version**:  11.7
**CUDNN Version**: 
**Operating System**:  Ubuntu 20.04.2 LTS
**Python Version (if applicable)**:  3.8
**Tensorflow Version (if applicable)**:  1.15
**Container version**:  nvcr.io/nvidia/tensorrt:22.08-py3


## Relevant Files


![image](https://user-images.githubusercontent.com/18715426/200571559-9dc92fe9-8f19-48ba-8b58-e0129b4d4ec7.png)


## Steps To Reproduce



###  Method 1: Use Clip op to replace Min and Max
1. Modify onnx model, use Clip to replace Min-->Max in all layernorm structures
2. When onnx2engine with fp16, set the precision of Clip and Mul
```
network.get_layer(i).precision = trt.DataType.FLOAT

config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
```
3. In this case, the fp16 inference turned out to be correct. However, compared with fp32：

- When there are two encoders in the model, the throughput of the model is improved using fp16.

- When there are 6 encoders in the model, fp16 quantization results in negative optimization, throughput decrease and latency increase

### Method 2: Set all ops‘s precision fp32 in layernorm（The red box in the image above）
1. When onnx2engine with fp16, set the precision of fp32 to layernorm（from GlobalAveragePool to Add） and Mul 
2. In this case, the fp16 inference  is still wrong，it is no different from the model using fp16 directly

How to use fp16 to improve inference performance while ensuring model's accuracy？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roberta fp16 got wrong inference results #2466

Description

Environment

Relevant Files

Steps To Reproduce

Method 1: Use Clip op to replace Min and Max

Method 2: Set all ops‘s precision fp32 in layernorm（The red box in the image above）

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roberta fp16 got wrong inference results #2466

Description

Description

Environment

Relevant Files

Steps To Reproduce

Method 1: Use Clip op to replace Min and Max

Method 2: Set all ops‘s precision fp32 in layernorm（The red box in the image above）

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions