profile
viewpoint

rxiang040/models 0

Models and examples built with TensorFlow

issue closedtensorflow/tensorflow

FL16 model run on GPU

System information

  • Host OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • Tensorflow version (commit SHA if source): 1.15
  • Target platform (e.g. Arm Mbed OS, Arduino Nano 33 etc.):Android 9 api28 ,Mali-T864 GPU

Describe the problem We tried to run a post-quantized (to float16) model on a robot with GPU delegate according to https://www.tensorflow.org/lite/performance/gpu and https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-float16-quantization-halves-model-size-cc113c75a2fa but it fails to run on GPU even after we graph transformed non-GPU supported operators in it. The logs is attached. Interesting thing is if we do not quantize it to fl16, all operators of the model can successfully run on GPU. Netron shows there are lots of 'dequantize' operators added to the graph after we use tflite converter to quantize the model to fl16. So what should we do to let the quantized fl16 model run on GPU entirely?

One more question is we found a parameter SetAllowFp16PrecisionForFp32 in tflite c++. What is the difference between 1).set this to true and use a fl32 model. 2). set this to true and use fl16 model. 3). set this to false and use fl32 model. 4) set this to false and use fl16 model?

Many thanks.

Model is uploaded in: https://drive.google.com/drive/folders/18B4Wx4BEPxfptsTmIEZySwILLZNXbE2v?usp=sharing Inputs are image of size 1933213

Please provide the exact sequence of commands/steps when you ran into the problem INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. ERROR: Next operations are not supported by GPU delegate: CONV_2D: Expected 1 input tensor(s), but node has 3 runtime input(s). DEPTHWISE_CONV_2D: Expected 1 input tensor(s), but node has 3 runtime input(s). DEQUANTIZE: Operation is not supported. First 0 operations will run on the GPU, and the remaining 198 on the CPU.

closed time in a month

rxiang040

issue commenttensorflow/tensorflow

FL16 model run on GPU

Cool! Thank:) I will close this issue. Thanks for help:)

rxiang040

comment created time in a month

issue commenttensorflow/tensorflow

FL16 model run on GPU

Sure, we will try that. So if it's on an RK3399 with GPU using C++ implementation, what would be the fastest setup to run a tflite in general? Like, what the options and quantiztations should be without considering RAM usage.

rxiang040

comment created time in 2 months

issue openedtensorflow/tensorflow

Huge size difference of GPU delegate library between static and dynamic

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: RK3399
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.3
  • Python version: 3.6
  • Bazel version (if compiling from source): 3.1.0
  • GCC/Compiler version (if compiling from source): 5.4.0
  • CUDA/cuDNN version: 10.4
  • GPU model and memory: GTX 1050 Describe the current behavior So I tried to use bazel to build the gpu delegate library as written in https://www.tensorflow.org/lite/performance/gpu_advanced bazel build -c opt --config android_arm64 tensorflow/lite/delegates/gpu:delegate # for static library bazel build -c opt --config android_arm64 tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so # for dynamic library

The resulting dynamic library .so is 106 MB and static library .a is 1.1 MB. Why their size are so different? I need to use the dynamic library, however, 106 MB RAM will be used only for loading this dynamic library. Is there any way to reduce the size of the dynamic library?

Thanks!

created time in 2 months

issue commenttensorflow/tensorflow

FL16 model run on GPU

We have a RK3399 board with Andoird system and we use c++ for deployment. Basically the time is measured from inputting an image to obtaining the final segmentation result. E.g with total 77 ops, test1: time of (76 GPU operation + 1 CPU operation), 162 ms/frame; test2: time of 77 ops on GPU, 195 ms/frame

rxiang040

comment created time in 2 months

issue commenttensorflow/tensorflow

FL16 model run on GPU

The model is faster on GPU. We actually tested the inference time of our model (fl32) with the last layer run on GPU or CPU. I would say the difference is huge. If the whole model is on GPU, the inference time is 195 ms/frame, and if the last layer on CPU and others on GPU, the inference time is 162 ms/frame.

So for int8 model, will it run faster on GPU?

rxiang040

comment created time in 2 months

issue commenttensorflow/tensorflow

FL16 model run on GPU

@srjoglekar246 Thanks very much for your advice. Now it can run on GPU with tflite2.3. But still we have one problem. I have attached our model structure. At the last layer of our model, it is a ResizeBilinear layer. We found that this operation is much efficient if we can run it with CPU. So we modified tensorflow/lite/delegates/utils.cc at line 219 by adding the following code:

if (node_id == 197) { std::string msg = "197 Bilinear upsamping is not run on GPU but CPU"; unsupported_details = &msg; return false; }

However when we run our program on the bot, the log writes:

ERROR: Following operations are not supported by GPU delegate: DEQUANTIZE: RESIZE_BILINEAR: 197 operations will run on the GPU, and the remaining 1 operations will run on the CPU.

Ideally, RESIZE_BILINEAR should be the only one op that will not run on GPU, but somehow, DEQUANTIZE shows up here. More strangely, even here, it reports two names of ops, but in the next sentence it says "the remaining 1 operations will run on the CPU". So do you know what's happening here?

Also I tested FL32 model and FL16M model, they almost has no inference time difference. So why should we use fl16 quantization here? (any advantage of fl16 comparing to fl32?)

Thanks!

model_halfed_V2_fl16 tflite

rxiang040

comment created time in 2 months

issue commenttensorflow/tensorflow

FL16 model run on GPU

@rxiang040 You are using an older version of the GPU delegate, which isn't maintained anymore. Could you try with TfLiteGpuDelegateOptionsV2? Its in a different header. Look at the Delegate page for other languages.

Out of curiosity, have you tried using the 8-bit quantization support w/ GPU? There might be a slight precision dip, but the models turn out 50% smaller compared to fp16.

Thanks so much for your advice. We will try TfLiteGpuDelegateOptionsV2. As for the 8-bit quantization support w/ GPU, can GPU perform 8-bit computations? I thought only 32-bit and 16-bit model can use GPU delegate.

rxiang040

comment created time in 2 months

more