Skip to content

Roberta fp16 got wrong inference results  #2466

@binbabou

Description

@binbabou

Description

Roberta(bert4keras) model got wrong inference results with fp16( fp32 is right)

Using tf2onnx,convert tensorflow savedmodel to onnx, and there are two out-of-bounds constants in the onnx model:

  1. Infinity(flota32), Min op’s input[1] in the layernorm structure
  2. -999999995904(float32), Mul op's input[0] in the self-attention structure

I found out that layernorm was causing the fp16 got wrong inference results.

Environment

TensorRT Version: 8.4.2.4
NVIDIA GPU: A30
NVIDIA Driver Version: 510.68.02
CUDA Version: 11.7
CUDNN Version:
Operating System: Ubuntu 20.04.2 LTS
Python Version (if applicable): 3.8
Tensorflow Version (if applicable): 1.15
Container version: nvcr.io/nvidia/tensorrt:22.08-py3

Relevant Files

image

Steps To Reproduce

Method 1: Use Clip op to replace Min and Max

  1. Modify onnx model, use Clip to replace Min-->Max in all layernorm structures
  2. When onnx2engine with fp16, set the precision of Clip and Mul
network.get_layer(i).precision = trt.DataType.FLOAT

config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
  1. In this case, the fp16 inference turned out to be correct. However, compared with fp32:
  • When there are two encoders in the model, the throughput of the model is improved using fp16.

  • When there are 6 encoders in the model, fp16 quantization results in negative optimization, throughput decrease and latency increase

Method 2: Set all ops‘s precision fp32 in layernorm(The red box in the image above)

  1. When onnx2engine with fp16, set the precision of fp32 to layernorm(from GlobalAveragePool to Add) and Mul
  2. In this case, the fp16 inference is still wrong,it is no different from the model using fp16 directly

How to use fp16 to improve inference performance while ensuring model's accuracy?

Metadata

Metadata

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions