Ask questionsFL16 model run on GPU
Describe the problem We tried to run a post-quantized (to float16) model on a robot with GPU delegate according to https://www.tensorflow.org/lite/performance/gpu and https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-float16-quantization-halves-model-size-cc113c75a2fa but it fails to run on GPU even after we graph transformed non-GPU supported operators in it. The logs is attached. Interesting thing is if we do not quantize it to fl16, all operators of the model can successfully run on GPU. Netron shows there are lots of 'dequantize' operators added to the graph after we use tflite converter to quantize the model to fl16. So what should we do to let the quantized fl16 model run on GPU entirely?
One more question is we found a parameter SetAllowFp16PrecisionForFp32 in tflite c++. What is the difference between 1).set this to true and use a fl32 model. 2). set this to true and use fl16 model. 3). set this to false and use fl32 model. 4) set this to false and use fl16 model?
Model is uploaded in: https://drive.google.com/drive/folders/18B4Wx4BEPxfptsTmIEZySwILLZNXbE2v?usp=sharing Inputs are image of size 1933213
Please provide the exact sequence of commands/steps when you ran into the problem INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. ERROR: Next operations are not supported by GPU delegate: CONV_2D: Expected 1 input tensor(s), but node has 3 runtime input(s). DEPTHWISE_CONV_2D: Expected 1 input tensor(s), but node has 3 runtime input(s). DEQUANTIZE: Operation is not supported. First 0 operations will run on the GPU, and the remaining 198 on the CPU.
Answer questions rxiang040
@rxiang040 You are using an older version of the GPU delegate, which isn't maintained anymore. Could you try with TfLiteGpuDelegateOptionsV2? Its in a different header. Look at the Delegate page for other languages.
Out of curiosity, have you tried using the 8-bit quantization support w/ GPU? There might be a slight precision dip, but the models turn out 50% smaller compared to fp16.
Thanks so much for your advice. We will try TfLiteGpuDelegateOptionsV2. As for the 8-bit quantization support w/ GPU, can GPU perform 8-bit computations? I thought only 32-bit and 16-bit model can use GPU delegate.