-
Notifications
You must be signed in to change notification settings - Fork 75.3k
TF-Lite converter bug in 2.17 for TransposeConv with quantized weights #76624
Description
On Apple M1 Pro with MacOs Sonoma 14.1.2 (23B92)
TensorFlow version 2.17.0 (not an issue with TF 2.16.1)
I ran this one a Samsung Galaxy S24 Ultra
sources + assets:
TransposeConv_bug.zip
The zip contains a TF saved model and a convert.py script that converts it to a TF-Lite mode.
the zip file contains two TF-Lite models, and created with TF 2.16.1, and the other with 2.17.0.
convert.py also runs the TF-Lite model, but it needs to run on a target with GPU and/or NPU delegates to really see the issue.
When running the TF 2.16.1 TF-Lite model, the model runs on the NPU (Qualcomm QNN delegate).
When running the TF 2.17.0 TF-Lite model, the model runs on the CPU, using the XNNpack/TFLite delegates.
we see this log from TF-Lite on the device:
tflite.log
the interesting lines are:
[27/Sept/2024:11:32:40 +08:00: profiler/warning] [job_id: j87gj31pd] [model.tflite] [tflite] tensorflow/lite/kernels/transpose_conv.cc:487 affine_quantization->scale->size != weights->dims->data[affine_quantization->quantized_dimension] (1 != 4)
and
[27/Sept/2024:11:32:40 +08:00: profiler/warning] [job_id: j87gj31pd] [model.tflite] [tflite] tensorflow/lite/kernels/transpose_conv.cc:487 affine_quantization->scale->size != weights->dims->data[affine_quantization->quantized_dimension] (1 != 4)
the first one fails the NPU delegate check and the latter the GPU delegate check.
the check here is:
https://github.com/tensorflow/tensorflow/blob/9e4fc3d09c298ca56bd11f72d2f1beb622a0f76b/tensorflow/lite/kernels/transpose_conv.cc#L485-L487
This check is performed only when the Transpose Conv layer has float input but integral weights. This was not the case in TF 2.16, where the weights were first dequantized.
the check makes it looks like this must be a per channel quantized weight, which is clearly not the case.
So the converter generates code that the runtime forces back to the slower CPU delegates